This action may take several minutes for large corpora, please wait.
Amharic WaC [2013 + 2015 + 2016]
Amharic web corpus. Crawled by SpiderLing in August 2013 and October 2015 and January 2016. Encoded in UTF-8, cleaned, deduplicated. Tagged by TreeTagger trained on Amharic WIC corpus.
Counts |
Tokens | 20287250 |
Words | 17320000 |
Sentences | 1208926 |
Paragraphs | 341327 |
Documents | 33542 |
General info |
Corpus description |
Document |
Language | Amharic |
Encoding | UTF-8 |
Compiled | 05/05/2017 20:44:44 |
Tagset |
Description |
Word sketch grammar |
Definition |
Lexicon sizes |
word | |
tag | |
sera | |