Word sketch Thesaurus can only work if word sketches exist in the corpus. The corpus has to be tagged in Sketch Engine or use the same tagset. A custom
word sketch grammar has to be used if the corpus is tagged with a different tagset.
Thesaurus will work even with universal sketch grammars with all the related limitations. See word sketch.
Tags and lemmas
A %[tag|tagged]% and %[lemma|lemmatized]% corpus is required for a full-fledged thesaurus. Thesauri generated from untagged and non-lemmatized corpora with universal word sketches will suffer in quality. Yet they can be very useful, especially with less-resourced languages where tagging and lemmatization are not realistic.
Corpus size
The quality of the thesaurus is entirely dependent on rich word sketches. A rich word sketch is defined by a large number of collocations in all grammatical relations. A rich word sketch must exist for the search word but also for all other words with the same part of speech so that they can be compared. This requirement can only be met if the word has a high %[frequency]% in the corpus, ideally thousands of occurrences or more. Consequently, a very large corpus is needed so that even less frequent words can produce rich word sketches. The use of our multi-billion word corpora is recommended for any serious thesaurus work.