ANALYSIS OF DIGITIZATION PROCESS OF OLD ROMANIAN CYRILLIC TEXTS
Abstract
The paper discusses recognition of Romanian texts of the 17th–20th centuries printed in the Cyrillic script, and their conversion to the modern Latin script. The elaborated technology and a tool pack include historical alphabets, sets of recognition patterns, and spelling dictionaries in the corresponding orthographies for ABBYY Finereader. In addition, virtual keyboards, fonts, transliteration utilities, and the user manual were developed. This permits successful recognition of old Romanian texts in the Cyrillic script. Transliteration to the Latin script grants no-barrier access to historical documents.
Keywords
References
Cojocaru S., Colesnicov A., Malahov L., Bumbu T.. Optical Character Recognition Applied to Romanian Printed Texts of the 18th–20th Century. Computer Science Journal of Moldova, v. 24, Nr. 1(70), 2016, p. 106–117. ISSN 1561–4042.
Cojocaru S., Burtseva L., Ciubotaru C., Colesnicov A., Demidova V., Malahov L., Petic M., Bumbu T., Ungur Ș. On Technology for Digitization of Romanian Historical Heritage Printed in the Cyrillic Script. In: Proceedings of the Conference on Mathematical Foundations of Informatics. MFOI-2016, July 25–29, 2016, Chisinau, Republic of Moldova, p. 160–176. ISBN 978–9975–4237–4–8.
Springmann U., Lüdeling A. OCR of historical printings with an application to building diachronic corpora: A case study using the RIDGES herbal corpus. arXiv:1608.02153v1
[cs.CL], 6 August 2016.
Mărănduc C., Malahov L, Perez C.-A., Colesnicov A. RoDia – Project of a Regional and Historical Corpus for Romanian. In: Proceedings of the Conference on Mathematical Foundations of Informatics MFOI-2016, July 25–29, 2016, Chisinau, Republic of Moldova, p. 268–284. ISBN 978–9975–4237–4–8.
http://www.digitisation.eu/tools-resources/tools-for-text-digitisation/
Ciubotaru C., Cojocaru S., Colesnicov A., Demidov V., Malahov L. Regeneration of Cultural Heritage: Problems Related to Moldavian Cyrillic Alphabet. In: Proceedings of the 11th International Conference “Linguistic Resources and Tools for Processing the Romanian Language”. 26–27 November 2015. Eds: D.Gîfu, D.Trandabăț, D.Cristea, D.Tufiș. P. 177–184. ISSN 1843–911X. http://consilr.info.uaic.ro/2015/Consilr_2015.pdf
Cazimir S. The transitional alphabet. Bucharest: Humanitas, 2006. – ISSN 973–50–1401–7. (in Romanian).
Bărbulescu I. Phonetics of the Cyrillc alphabet in Romanian texts of the 16th and 17th centuries. București, 1904. (in Romanian).
https://github.com/cisocrgroup/PoCoTo
Tong X., Evans D.A. A Statistical Approach to Automatic OCR Error Correction in Context. https://pdfs.semanticscholar.org/132b/b9dd928749af370a87ecb939bbb0b05d7aff.pdf
Refbacks
- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.