Cspell: sanitize suggested words for dictionary

Migrated issue

Reported by: grimreaper

Related to !303 (merged)

Problem/Motivation

Currently the Cspell job generates an artifact with a txt file with the words to put in a dictionary ".cspell-project-words.txt" in a project.

But in this txt file, the list of words is from Cspell output:
- unsorted
- as it is found in the code: capitalized, lowercased, uppercased, any step in between.
- duplicated

Proposed resolution

Add a manipulation to sanitize this output.

In my project skeleton, I have a small script that do this sanitization: remove duplicate, lowercase, sort

https://gitlab.com/florenttorregrosa-drupal/docker-drupal-project/-/blob/10.x/scripts/quality/spellcheck/clean-dictionaries.sh?ref_type=heads

cat ${DICTIONARY_FILE_PATH} | tr '[:upper:]' '[:lower:]' | LC_ALL=C sort -u -o ${DICTIONARY_FILE_PATH}

Maybe the best will be, if a project already have a ".cspell-project-words.txt" file, also provide a file with merged existing and new words in this project dictionary.

Remaining tasks

- Discuss if maintainers want such addition: YES
- Provide MR
- Goals:
-- sorted, lowercase, unique list of words provided as artifact.
-- Merge the new reported words with a possible list of existing words (via file and/or variable??)

Assignee Loading
Time tracking Loading