Corpora at NLP Centre
- ske.fi.muni.cz provides
access to corpora at NLP Centre
- Corpus Architect – tool for
building your own corpora, can be accessed after registration
- corpora (being) developed at NLP Centre:
| Corpus (lang) | M tokens |
| Russian | 20,162 |
| French | 12,369 |
| Japanese | 11,113 |
| American Spanish | 8,719 |
| Arabic | 6,637 |
| Czech | 5,818 |
| Turkish | 4,125 |
| czes (Czech) | 465 |
| Kazakh | 139 |
| Azerbaijani | 115 |
| Tajik | 52 |
| Uzbek | 25 |
| Kyrgyz | 24 |
| Turkmen | 2 |
| DESAM (Czech) | 1 |
- software for corpus processing:
- corpus related projects at NLP Centre:
If necessary, contact
corpadm@aurora.fi.muni.cz