Problems of construction of text corpus of minor peoples of Russia in the case of Open corpus of Veps and Karelian languages

Krizhanovsky, Andrew A.
The Russian Fund for Basic Research, № 18-012-00117

The construction and developing of Open corpus of Veps and Karelian languages.

The development of Vepsian corpus will continue. A corpus of dialects of the Karelian language (Karelian, Livvik and Ludic) and electronic dictionaries of these dialects will be created. Extralinguistic data (information about the author, text, informant, place of recording, etc.) will be added to texts in the corpus.

The results of the project (corpus and dictionaries of Vepsian and Karelian languages) will correspond to the world level, since there are currently no such electronic linguistic resources in the world. The availability of this corpus will help to preserve and to popularize Vepsian and Karelian languages that are under threat of extinction. This corpus will be a basis for studying the interaction of the Russian language and its dialects with the Baltic-Finnish languages of the peoples of Karelia.

The word "open" in the corpus name (Open corpus of Veps and Karelian languages) reflects an important feature of this project, which is the openness and accessibility of the results, namely:

1. The source code of the developed software will be distributed with an open license (free software).

2. The data of the corpus and dictionaries will be publicly available with an open license.

3. Users can search the dictionary in the "Dictionary" section of the website, users can search texts of the corpus in the section "Corpus", editors can edit and add new entries to the dictionaries and the corpus.

4. The results of scientific research will become public domain in the form of publications.
Last modified: June 19, 2018