Cspell: sanitize suggested words for dictionary
Problem/Motivation
Currently the Cspell job generates an artifact with a txt file with the words to put in a dictionary ".cspell-project-words.txt" in a project.
But in this txt file, the list of words is from Cspell output:
- unsorted
- as it is found in the code: capitalized, lowercased, uppercased, any step in between.
- duplicated
Proposed resolution
Add a manipulation to sanitize this output.
In my project skeleton, I have a small script that do this sanitization: remove duplicate, lowercase, sort
cat ${DICTIONARY_FILE_PATH} | tr '[:upper:]' '[:lower:]' | LC_ALL=C sort -u -o ${DICTIONARY_FILE_PATH}
Maybe the best will be, if a project already have a ".cspell-project-words.txt" file, also provide a file with merged existing and new words in this project dictionary.
Remaining tasks
- Discuss if maintainers want such addition: YES
- Provide MR
- Goals:
-- sorted, lowercase, unique list of words provided as artifact.
-- Merge the new reported words with a possible list of existing words (via file and/or variable??)