Preview

Nauchnyi dialog

Advanced search

Corpus Analysis of Artificial Intelligence Terminology in Russian: Insights from Almanac “Artificial Intelligence” Using AntConc

https://doi.org/10.24224/2227-1295-2025-14-7-133-160

Abstract

This study is situated at the intersection of corpus linguistics and terminology studies. It highlights the significant evolution of corpus linguistics, from early text collections to the establishment of large national and specialized corpora in the 21st century. The importance of contemporary technologies, such as machine learning and natural language processing, is emphasized for their role in opening new avenues for analyzing large data sets. The article addresses the methodological aspects of researching terminological units within the field of artificial intelligence (AI) based on modern analytical compilations. The aim of the research is to identify patterns in the formation of compound designations, as well as the orthographic and stylistic norms governing the use of AI terms in the Russian language. To achieve this goal, frequency analysis and content analysis methods were employed using AntConc, resulting in the identification of 100 core terms, along with collocations constructed from these terms. The findings indicate that AI terminology in Russian is actively evolving, with a predominance of Anglicisms and hybrid forms. The stylistic features of texts reflecting the technical context and target audience are discussed. In conclusion, the necessity for establishing norms for the use of AI terms in light of their integration into the Russian language is underscored.

About the Authors

O. V. Shadrina
Moscow Institute of Physics and Technology (National Research University)
Russian Federation

Olesya V. Shadrina - senior lecturer, Department of Foreign Languages 

Moscow



O. V. Marunevich
Moscow Institute of Physics and Technology (National Research University)
Russian Federation

Oksana V. Marunevich - PhD in Philology, Associate Professor 

Moscow



References

1. A global taxonomy of interpretable AI: unifying the terminology for the technical and social sciences. Artificial Intelligence Review, 56 (4): 347—3504. DOI: 10.1007/s10462022-10256-8.

2. Aarts, J., Meij, W. (1984). Corpus Linguistics. Amsterdam: Rodopi. 229 p.

3. Abercrombie, D. (1965). Studies in Phonetics and Linguistics. London: Oxford University Press. 151 p.

4. Allwood, J. (2009). Multimodal corpora. In: Corpus Linguistics. An International Handbook. Berlin: de Gruyter. 207—225.

5. Anthony, L. (2011). AntConc: A Learner and Classroom Friendly, Multi-Platform Corpus Analysis Toolkit. IWLeL 2004: An Interactive Workshop on Language e-Learning. 7—13.

6. Arkhangelsky, T. A. (2019). Internet corpus of the Finno-Ugric languages of Russia. Yearbook of Finno-Ugric studies, 13 (3): 528—537. DOI: https://doi.org/10.35634/22249443-2019-13-3-528-537. (In Russ.).

7. Assunção, C. (2019). Entries on the History of Corpus Linguistics. Linha D Água, 32 (1): 39—57. DOI: 10.11606/issn.2236-4242.v32i1p39-57.

8. Atkins, B. T. S., Rundell, M. (2008). The Oxford guide to practical lexicography. Oxford: Oxford university press. 540 p.

9. Bataillon, L. J., Dahan, G. (2004). Hugues de Saint-Cher († 1263), bibliste et théologien. Turnhout: Brepols. 520 p.

10. Biber, D. (1991). On the exploitation of computerized corpora in variation studies. In: English corpus linguistics: Studies in honour of Jan Svartvik. London: Longman. 204—220.

11. Boas, F. (2013). Handbook of American Indian Languages. Cambridge: Cambridge University Press. 570 p.

12. Boulton, A., Landure, C. (2016). Using Corpora in Language Teaching, Learning and Use. Recherche et pratiques pédagogiques en langues de spécialité, 35 (2): 67—72. DOI: 10.4000/apliut.5433.

13. Breiter, M. A. (1997). Anglicisms in the Russian language: history and prospects: a handbook for foreigners. students of Russian studies. Moscow: Dialog-MSU. 156 p. (In Russ.).

14. Casson, L. F. (1960). A Fourteenth Century Concordance to the Vulgate. Libri, 10 (2): 111— 128. DOI: 10.1515/libr.1960.10.2.111.

15. Chang, L. (2023). A Corpus-Based Mechanical Engineering Academic Word List. International Journal of TESOL Studies, 5 (3): 126—142. DOI: 10.58304/ijts.20230310.

16. Chomsky, N. (1968). Quine’s empirical assumptions. Synthese, 19: 53—68. DOI: 10.1007/BF00568049.

17. Corpus Linguistics and Corpus-Based Research and Its Implication in Applied Linguistics: A Systematic Review. PAROLE: Journal of Linguistics and Education, 10 (2): 176—181.

18. Dernoncourt, F. (2017). PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts. In: Proceedings of the 8th International Joint Conference on Natural Language Processing. Taipei: IEEE Signal Processing Society. 308—313.

19. Doğan, R. I., Lu, Z. (2012). An improved corpus of disease mentions in PubMed citations. In: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing (BioNLP 2012). Montreal: Association for Computational Linguistics. 91—99.

20. Eaton, H. (1940). Semantic frequency list for English, French, German, and Spanish; a correlation of the first six thousand words in four single-language frequency lists. Chicago: Chicago University Press. 440 р.

21. Ermakova, O. I. (2001). Features of computer jargon as a specific subsystem of the Russian language. Dialog. P. 173. (In Russ.).

22. Francis, W. N. (1964). Brown Corpus Manual: Manual of information to accompany. A Standard Corpus of Present-Day Edited American English, for use with Digital Computers. Providence: Brown University. 467 p.

23. Grammar of Spoken and Written English. (1999). Longman Harlow: Pearson Education Limited. 1204 p.

24. Guietti, P. (1993). Hermeneutic of Aquinas’s Texts: Notes on the Index Thomisticus. The Thomist: A Speculative Quarterly Review, 57 (4): 667—686. DOI: 10.1353/tho.1993.0006.

25. Harris, Z. S. (1960). Structural Linguistics. Chicago: University of Chicago Press. 384 p.

26. Hill, J. (1997). LTP Dictionary of Selected Collocations. Hove: Language Teaching Publications. 288 р.

27. Hunston, S. (2000). Pattern Grammar. Amsterdam: John Benjamins Publishing. 288 p.

28. Hyland, K. (2008). As it can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes, 27: 4—21. DOI: 10.1016/j.esp.2007.06.00.

29. Johansson, S. (2009). Some aspects of the development of corpus linguistics in the 1970-s and 1980-s. In: Corpus Linguistics: An International Handbook. Berlin: De Gruyter. 33—53.

30. Kondratyukova, L. K. (2012). Borrowings and internationalisms in the terminology of English computer technology. Dynamics of systems, mechanisms and machines, 4: 155— 158. (In Russ.).

31. Kononenko, A. P. (2023). Linguistic potential of computer technologies in modern philology. Humanities and social sciences, 97 (2): 50—54. DOI: 10.18522/2070-1403-202397-2-50-54. (In Russ.).

32. Kozlova, N. V. (2013). Linguistic corpus: definition of basic concepts and typology. Bulletin of the NSU. Linguistics and intercultural communication, 1: 79—88. (In Russ.).

33. Kozlovskaya, N. V., Musayeva, A. S., Sumenikina, Yu. V. (2023). Transterminologization in the field of artificial intelligence: towards raising the question of subterminology. Art Logos, 3 (24): 98—118. DOI: https://doi.org/10.24224/2227-1295-2025-14-49-37. (In Russ.).

34. Kuebler, S. (2015). Corpus Linguistics and Linguistically Annotated Corpora. London: Bloomsbury Publishing. 320 p.

35. Lei, L., Liu, D. (2016). A new medical academic word list: A corpus-based study with enhanced methodology. Journal of English for Academic Purposes, 22: 42—53. DOI: 10.1016/j.jeap.2016.01.008.

36. Liu, J. (2015). A corpus-based environmental academic word list building and its validity test. English for Specific Purposes, 39 (1): 1—11. DOI: 10.1016/j.esp.2015.03.001.

37. Lyashevskaya, O. N., Sharov, S. A. (2009). Frequency dictionary of the modern Russian language (based on the materials of the National Corpus of the Russian language). Moscow: Azbukovnik. 1090 p. ISBN 978-5-91172-024-7. (In Russ.).

38. Martínez, I. A. (2009). Academic vocabulary in agriculture research articles: a corpus-based study. English for Specific Purposes, 28 (3): 183—198. DOI: 10.1016/j.esp.2009.04.003.

39. McEnery, T. (2012). Corpus Linguistics: Method, Theory and Practice. Cambridge: Cambridge University Press. 312 p.

40. Mcgillivray, B., Passarotti, M., Ruffolo, P. (2009). The Index Thomisticus Treebank Project: Annotation, Parsing and Valency Lexicon. Traitement Automatique des Langues, 50 (2): 103—127.

41. O’Keeffe, A. (2010). Routledge handbook of corpus linguistics. London: Routledge. 682 p.

42. Partington, A., Marchi, A. (2015). Using corpora in discourse analysis. In: The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press. 216—234.

43. Pawley, A. (1983). Two puzzles for linguistic theory: Nativelike selection and nativelike frequency. In: Language and Communication. London: Longman. 191—226.

44. Petrova, I. M., Ivanova, A. M. (2022). Modern digital technologies in linguistic research: textbook. handbook for students in the field of Linguistics. Moscow: Languages of the Peoples of the World. 259 p. ISBN 978-5-6048046-8-1. (In Russ.).

45. Plungyan, V. A. (2005). Why do we need a National corpus of the Russian language? Informal introduction. In: National corpus of the Russian language: 2003—2005. Moscow: Indrik. 6—20. (In Russ.).

46. Resslerová, V. (2024). La terminologie du domaine de l'intelligence artificielle: néologie et pluridisciplinarité. Studia Romanistica, 24 (2): 59—71. DOI: 10.15452/SR.2024.24.0012.

47. Rockwell, G. (2019). The Index Thomisticus as a Digital Humanities Big Data Project. Umanistica Digitale, 5: 13—34. DOI: 10.6092/issn.2532-8816/8575.

48. Sabahuddin, A. (2024). AI Lexica: Exploring the Vocabulary of Artificial Intelligence. Journal of Emerging Technologies and Innovative Research, 11 (4): 123—137.

49. Scott, M., Tribble, C. (2006). Textual Patterns: Key words and corpus analysis in language education. Amsterdam: John Benjamins Publishing. 203 р.

50. Selivan, L. (2023). Corpus Linguistics and Vocabulary Teaching. Demystifying Corpus Linguistics for English Language Teaching. Springer. 139—161. DOI: 10.1007/978-3031-11220-1_8.

51. Shalimova, P. A. (2024). On the question of terms and neologisms in the field of artificial intelligence and neural networks. Society, economics, culture: development strategies. Materials of the XV All-Russian Scientific and Practical Conference. 218—223. (In Russ.).

52. Sinclair, J. (1987). Looking up: an account of the COBUILD Project in lexical computing. London and Glasgow: Collins ELT. 182 p.

53. Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford: University of Oxford. 179 p.

54. Stefanowitsch, A. (2020). Corpus linguistics: A guide to the methodology. Berlin: Language Science Press. 510 p.

55. Stefchov, E., Angelova, G. (2018). Towards Constructing a Corpus for Studying the Effects of Treatments and Substances Reported in PubMed Abstracts. Lecture Notes in Computer Science, 11089: 115—125. DOI: 10.1007/978-3-319-99344-7_11.

56. Suleimanova, O. A. (2024). Anthropocentrical Turn in Linguistics Through the Digital Lens: Evidence from Analyses of Russian Mnemonic Verbs. Journal of Siberian Federal University. Humanities and Social Sciences, 17 (5): 847—861.

57. Suleymanova, O. A., Guliyants, A. B. (2022). Methodology of linguistic research as an actual section of modern scientific publication. Bulletin of the Moscow State Pedagogical University. Series: Philology. Theory of language. Language education, 4 (48): 89—101. DOI: 10.25688/2076-913X.2022.48.4.07. (In Russ.).

58. Terms and concepts of artificial intelligence in linguistic illumination. (2024). Moscow: Sputnik+. 193 p. ISBN 978-5-9973-6887-6. (In Russ.).

59. Valipouri, L., Nassaji, H. (2013). A corpus-based study of academic vocabulary in chemistry research articles. Journal of English for Academic Purposes, 12 (4): 248—263. DOI: 10.1016/j.jeap.2013.07.001.

60. Vinokurova, T. N. (2016). Structural features of artificial intelligence terminology in English. International Scientific Research Journal, 10—3 (52): 14—23. DOI:10.18454/IRJ.2016.52.024. (In Russ.).

61. Zakharov, V. P. (2005). Corpus linguistics. Saint Petersburg: Saint Petersburg State University. 48 p. ISBN 978-5-288-05997-1. (In Russ.).


Review

For citations:


Shadrina O.V., Marunevich O.V. Corpus Analysis of Artificial Intelligence Terminology in Russian: Insights from Almanac “Artificial Intelligence” Using AntConc. Nauchnyi dialog. 2025;14(7):133-160. (In Russ.) https://doi.org/10.24224/2227-1295-2025-14-7-133-160

Views: 23


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2225-756X (Print)
ISSN 2227-1295 (Online)