Please, provide a detailed description of the issue.
Corpus, settings, query and your username are sent automatically.

Interface language
This action may take several minutes for large corpora, please wait.

Corpus Czech Web (czTenTen16) [2016, 2015] – statistics and info

Czech web corpus crawled by SpiderLing in November and December 2015 and October to December 2016. Encoded in UTF-8, cleaned, deduplicated.

Counts
Tokens9307649368
Words7795495171
Sentences591133106
Paragraphs185928405
Documents32739566
General info
Corpus description Document
LanguageCzech
EncodingUTF-8
Compiled06/01/2017 12:31:26
Tagset Description
Word sketch grammar Definition
Lexicon sizes
word32359252
tag4979
lemma31265018
lc27874979
lemma_lc27268907
Tags legend (tagset)
nounk1.*
adjectivek2.*
pronounk3.*
numeralk4.*
verbk5.*
adverbk6.*
prepositionk7.*
conjunctionk8.*

Structures and attributes

hide detail