Estonian gap tests

Alexander Tkachenko
Estonian gap tests corpus represents a collection of sentences, in which one word is marked as a "gap", accompanied with a list of candidate words. The corpus can be used as a benchmark for evaluating language models. The corpus covers both frequent and infrequent gap-words and includes candidate lists generated in different ways. Sentences originate from the Estonian Reference Corpus ( The corpus has been tokenized using Estnltk toolkit ( An archive contains sentence files...
This data repository is not currently reporting usage information. For information on how your repository can submit usage information, please see our documentation.