The italian dictionary makes 20 million entries cache files, it slows down Spell Fu #40

Open
opened 2023-03-08 18:28:21 +00:00 by grese · 4 comments

Hi, creating the cache for the italian dictionary takes about 4 minutes on my system and it creates a 378 MB, 22 million lines words_spell-fu-ispell-words-it.txt file and a 178 MB words_spell-fu-ispell-words-it.el.data file. At that point, each time I enable Spell Fu Emacs hangs for a few seconds.

I'm on Fedora 37, my Aspell is v. 0.60.8 and the italian dictionary is aspell-it-2.2 from the distro's repositories. I believe it's not a problem with my italian dictionary, I've never tweaked it apart from adding a few words. Apparently it's more complex than other dictionaries, see https://superuser.com/questions/137957/how-to-convert-aspell-dictionary-to-simple-list-of-words/1636272#1636272

That said, I'm a fan of your packages! :-)

Hi, creating the cache for the italian dictionary takes about 4 minutes on my system and it creates a 378 MB, 22 million lines `words_spell-fu-ispell-words-it.txt` file and a 178 MB `words_spell-fu-ispell-words-it.el.data` file. At that point, each time I enable Spell Fu Emacs hangs for a few seconds. I'm on Fedora 37, my Aspell is v. 0.60.8 and the italian dictionary is `aspell-it-2.2` from the distro's repositories. I believe it's not a problem with my italian dictionary, I've never tweaked it apart from adding a few words. Apparently it's more complex than other dictionaries, see https://superuser.com/questions/137957/how-to-convert-aspell-dictionary-to-simple-list-of-words/1636272#1636272 That said, I'm a fan of your packages! :-)
grese changed title from The italian dictionary makes a 22 million entries cache and it slows down Spell Fu to The italian dictionary makes 20 million entries cache files, it slows down Spell Fu 2023-03-08 18:48:59 +00:00

seems like pt_BR has the same issue. spell-fu then has a 600+ MiB variable for it, which adds to any buffer with spell-fu mode on (checked with memory-report). So Emacs memory allocation quickly grows to multiple GiBs, leading to various issues including blocking Emacs (likely garbage collector), crashing emacs, OOM killed...

seems like `pt_BR` has the same issue. spell-fu then has a 600+ MiB variable for it, which adds to any buffer with spell-fu mode on (checked with memory-report). So Emacs memory allocation quickly grows to multiple GiBs, leading to various issues including blocking Emacs (likely garbage collector), crashing emacs, OOM killed...

Same issue here with the Italian dictionary. Find a way to circumvent the issue?
thank you

Same issue here with the Italian dictionary. Find a way to circumvent the issue? thank you

@rainbow @grese I recommend trying Jinx instead https://github.com/minad/jinx

spell-fu: Spell-fu however incurs high memory overhead on account of its dictionary in a hash table. For languages with compound words and inflected word forms, this memory overhead magnifies. By accessing the Enchant API directly, Jinx avoids this overhead. Jinx also benefits from Enchant’s advanced spell-checker algorithms (affixation, compound words, etc.).

---https://github.com/minad/jinx#alternative-spell-checking-packages

@rainbow @grese I recommend trying Jinx instead https://github.com/minad/jinx > spell-fu: Spell-fu however incurs high memory overhead on account of its dictionary in a hash table. For languages with compound words and inflected word forms, this memory overhead magnifies. By accessing the Enchant API directly, Jinx avoids this overhead. Jinx also benefits from Enchant’s advanced spell-checker algorithms (affixation, compound words, etc.). > >---https://github.com/minad/jinx#alternative-spell-checking-packages

thank you @yuuyin I give a try to Jinx

thank you @yuuyin I give a try to Jinx
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: ideasman42/emacs-spell-fu#40
There is no content yet.