HaBiT System
Corpus search
Example searches:
Amharic WIC:
sentences with "እንዲሁም"
Amharic WaC (2013, 2015, 2016):
wordlist
Norwegian Web (2015, Bokmål):
collocations of "bruke"
Oromo spoken (Text laboratory, Oslo):
sentences with "tokko"
Oromo WaC (2016):
the most frequent bigrams
Somali WaC (2016):
wordlist
Tigrinya WaC (2016):
sentences with "እግዚአብሔር"
Czech-Norwegian Parallel Corpus:
parallel search for "Praha"
Ethiopian web corpora for download
WebBootCaT
, a search engine querying and corpus building tool.
SpiderLing
, software for obtaining texts from the web.
Chared
, a tool for detecting the character encoding of a text in a known language.
Onion
, a tool for removing duplicate parts from large collections of texts.
Unitok
, a universal text tokeniser with specific settings for many languages.
jusText Habit module
Geez → SERA
Tagger modules:
Amharic
Oromo
Somali
Tigrinya