Spelling and Grammer Checker #148

Open
opened 1 year ago by n · 8 comments
n commented 1 year ago
Collaborator

An automated linter to check spelling and grammar on incoming PRs would be nice.

My proposal is to use TeXtidote (built on the excellent LanguageTool library) to conduct these checks on incoming markdown files.

I'm not sure how we can automate and integrate it. Maybe use webhooks to trigger a shell script, which then comments the TeXtidote result to the PR?

An automated linter to check spelling and grammar on incoming PRs would be nice. My proposal is to use [TeXtidote](https://github.com/sylvainhalle/textidote) (built on the excellent [LanguageTool](https://github.com/languagetool-org/languagetool) library) to conduct these checks on incoming markdown files. I'm not sure how we can automate and integrate it. Maybe use webhooks to trigger a shell script, which then comments the TeXtidote result to the PR?
n added the
Status: Needs feedback
Kind: Enhancement
labels 1 year ago
n commented 1 year ago
Poster
Collaborator

Here's an example of an implementation with GitHub Actions:
https://github.com/daniel-vera-g/typo/blob/master/.github/workflows/language.yml

Here's an example of an implementation with GitHub Actions: https://github.com/daniel-vera-g/typo/blob/master/.github/workflows/language.yml
n commented 1 year ago
Poster
Collaborator

Another option would be to use textlint.

https://textlint.github.io/

Another option would be to use textlint. https://textlint.github.io/
Owner

I created an aspell based spellchecker with Woodpecker for the internal monthly letter drafts. It simply checks the files modified in a commit, prints the unknown words and if exits if there are more than 5 errors per file.

It's plain stupid, and IMHO a waste of energy. Most of the time is spent on installing Git and aspell in the CI, then takes one second to check the files.

But in theory this would be possible, also with your mentioned projects which should serve higher quality (aspell was merely a proof of concept).

We could compare the commits against the main branch, and detect all changed files in an PR, then spellcheck them. I might start working on it, as soon as I find a reasonable way to do this without the whole overhead, e.g. with a prepared container that doesn't download all the tools each time. Or maybe start with a local tool and put it in CI later.

I created an aspell based spellchecker with Woodpecker for the internal monthly letter drafts. It simply checks the files modified in a commit, prints the unknown words and if exits if there are more than 5 errors per file. It's plain stupid, and IMHO a waste of energy. Most of the time is spent on installing Git and aspell in the CI, then takes one second to check the files. But in theory this would be possible, also with your mentioned projects which should serve higher quality (aspell was merely a proof of concept). We could compare the commits against the main branch, and detect all changed files in an PR, then spellcheck them. I might start working on it, as soon as I find a reasonable way to do this without the whole overhead, e.g. with a prepared container that doesn't download all the tools each time. Or maybe start with a local tool and put it in CI later.
Poster
Collaborator

I made this quick and dirty script to get all markdown files modified from main (except for deleted ones) and run textidote on it.

#!/bin/bash

FILES=$(git diff origin/master...HEAD --name-only --diff-filter=d | grep "content\|.md" | tr '\n' ' ') 
textidote --output html $FILES > check.html
I made this quick and dirty script to get all markdown files modified from main (except for deleted ones) and run textidote on it. ```bash #!/bin/bash FILES=$(git diff origin/master...HEAD --name-only --diff-filter=d | grep "content\|.md" | tr '\n' ' ') textidote --output html $FILES > check.html ```
Collaborator

I opened #219 as a draft to implement spell checking. I do not think it is useable in its current form.

I tried TeXtidote and Spellchecker-CLI but I was not satisfied with the results. Many of my example mistakes (like simple typos) where not detected at all.

I looked around for alternatives and found hunspell which is used by a bunch of huge projects.

Unfortunately the output is text only. I guess one would have to parse the more detailed output one gets when calling hunspell with -a and render a nice view to make it somewhat useful.

The result is a pipeline which works in theory and detected all my sample mistakes (not included in pull request).

I had to add some foo for the pipeline to fail if one or more mistakes were detected.

As a result of the check the line containing the mistake is simply written into the output file. As I said one would have to somehow interpret the output of -a to make it more useful.

Additionally the container needs to install git and hunspell every time. I agree that this would be a waste of resources. However I guess it would be simple to just create a container image in a registry of choice to reduce the effort to only launching a container and the actual tests.

As I said I do not think it to be usable as of now. The text output is too simple. I do not think that every user wanting to improve the documentation could handle the output of the pipeline correctly. I guess it would lead to frustration. Simply looking over the documentation from time to time seems to be the better approach.

I opened #219 as a draft to implement spell checking. I do not think it is useable in its current form. I tried TeXtidote and [Spellchecker-CLI](https://github.com/tbroadley/spellchecker-cli) but I was not satisfied with the results. Many of my example mistakes (like simple typos) where not detected at all. I looked around for alternatives and found [hunspell](https://hunspell.github.io/) which is used by a bunch of huge projects. Unfortunately the output is text only. I guess one would have to parse the more detailed output one gets when calling hunspell with `-a` and render a nice view to make it somewhat useful. The result is a pipeline which works in theory and detected all my sample mistakes (not included in pull request). I had to add some foo for the pipeline to fail if one or more mistakes were detected. As a result of the check the line containing the mistake is simply written into the output file. As I said one would have to somehow interpret the output of `-a` to make it more useful. Additionally the container needs to install git and hunspell every time. I agree that this would be a waste of resources. However I guess it would be simple to just create a container image in a registry of choice to reduce the effort to only launching a container and the actual tests. As I said I do not think it to be usable as of now. The text output is too simple. I do not think that every user wanting to improve the documentation could handle the output of the pipeline correctly. I guess it would lead to frustration. Simply looking over the documentation from time to time seems to be the better approach.
Collaborator

I just found https://github.com/languagetool-org/languagetool (LGPL-2) which might be worth a look.

I just found https://github.com/languagetool-org/languagetool (LGPL-2) which might be worth a look.
Owner

I use languagetool as a browser addon and it's nice, but as mentioned in comment 1, textidote it's built on top of that.

I use languagetool as a browser addon and it's nice, but as mentioned in comment 1, textidote it's built on top of that.
Collaborator

Hm, I didn't have that in my mind anymore when I added the comment. Thanks for the hint.

However my editor uses languagetool and it provides a lot of good tips and hints.
My experiments yesterday did not lead to these. Strange. It is definitely worth another look.

Hm, I didn't have that in my mind anymore when I added the comment. Thanks for the hint. However my editor uses languagetool and it provides a lot of good tips and hints. My experiments yesterday did not lead to these. Strange. It is definitely worth another look.
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Codeberg/Documentation#148
Loading…
There is no content yet.