Please, provide a detailed description of the issue.
Corpus, settings, query and your username are sent automatically.

Interface language
This action may take several minutes for large corpora, please wait.

Corpus Somali WaC [2016] – statistics and info

Somali web corpus. Crawled by SpiderLing in January 2016. Encoded in UTF-8, cleaned, deduplicated.

Counts
Tokens79741231
Words71871585
Sentences2643336
Paragraphs1937758
Documents385338
General info
Corpus description Document
LanguageSomali
EncodingUTF-8
Compiled06/02/2017 13:53:07
Word sketch grammar Definition
Lexicon sizes
word1399350
tag13
lc1159063

Structures and attributes

hide detail