
Defense of the dissertation of Kalman Gulzhamal for the degree of Doctor of Philosophy (PhD) to the educational program «8D06103 - Information systems»
L.N. Gumilyov Eurasian National University, a dissertation defense for the degree of Doctor of Philosophy (PhD) by Kalman Gulzhamal on the topic «Development of models and methods for resolving references in natural language texts based on domain knowledge» to the educational program of «8D06103 - Information systems».
The dissertation was carried out at the «Information systems» of L.N. Gumilyov Eurasian National University.
The language of defense is kazakh
Official reviewers:
Mansurova Madina - is a candidate of Physical and Mathematical Sciences, Associate Professor, Head of the Department of Artificial Intelligence and Big Data at Al-Farabi Kazakh National University (Almaty, Kazakhstan).
Oralbekova Dina - Doctor of Philosophy PhD, Researcher at the Institute of Information and Communication Technologies (Almaty, Kazakhstan).
Temporary members of the Dissertation Committee:
Barakhnin Vladimir - Doctor of Technical Sciences, Professor, Leading Researcher at the Federal Research Center for Information and Computing Technologies (FITZ IVT) (Novosibirsk, Russia).
Kaibassova Dinara - doctor of Philosophy PhD, acting associate professor of the Department of Computer Engineering of Astana IT University (Astana, Kazakhstan)
Tukeyev Ualsher - Doctor of Technical Sciences, Professor at the Department of Information Systems, Faculty of Mechanics and Mathematics, Al-Farabi Kazakh National University. (Almaty, Kazakhstan).
Academic Advisors:
Sambetbaeva Madina - Doctor of Philosophy (PhD), Associate Professor, Acting Professor of the Department of Information Systems of the L.N. Gumilyov Eurasian National University (Astana, Kazakhstan).
Sidorova Elena - Candidate of Physical and Mathematical Sciences, Senior Researcher at the A.P. Ershov Institute of Computer Science SB RAS (Russia, Novosibirsk).
The defense will take place on April 24, 2024, at 11:00 AM in the Dissertation Council for the training direction «8D061 - Information and communication technologies» in the specialty «8D06103 - Information systems» of L.N. Gumilyov Eurasian National University. The defense meeting is planned to be held online.
Link: https://clck.ru/39TZvf
Address: Astana, Satpayev STR., 2, educational and administrative building, room № 302.
Abstract (English): Annotation Dissertation work Kalman Gulzhamal «Development of models and methods for resolving references in natural language texts based on domain knowledge» submitted for the degree of Doctor of Philosophy (PhD) in the specialty 8D06103 - «Information systems» Relevance of the research topic: The development of models and methods for resolving references in natural language texts based on knowledge about the subject area" is due to the following factors. The growth of the volume of text data. With the advent of the Internet and digital technologies, the amount of text data available for processing has increased significantly. This creates the need to develop new methods and models of reference resolution that can process large volumes of texts and extract useful information from them. The need for precise reference resolution: Solving the problem of reference resolution is important in various fields such as automatic text analysis, information retrieval and machine translation. The absence or errors in the resolution of the reference can lead to a misunderstanding of the texts and incorrect conclusions. One of the factors influencing the relevance of this topic is the lack of research on the analysis of texts and the search for referential relations in the Kazakh language. Complexity The task of reference resolution: Reference resolution is a difficult task in natural language processing, especially when it comes to multi-valued expressions and the repeated use of the same expressions in the text. The development of effective methods and models for resolving references based on knowledge about the subject area is an urgent task that can lead to a more accurate and effective analysis of texts. The need to take into account the subject area: in order to effectively resolve the reference, it is necessary to take into account knowledge about the subject area in which expressions and objects to which they point are used. This requires the development of models and methods that can use domain knowledge to resolve the reference. The purpose of the dissertation research and scientific results. Development of new models and methods for resolving references in natural language texts based on knowledge about the subject area. To achieve this goal, the following tasks were set in the work: 1. The study of existing methods and models of reference resolution in natural language texts based on knowledge about the subject area. 2. The study of the theoretical foundations and practical aspects of knowledge about the subject area and their application in natural language processing tasks. 3. Development of new methods and models of reference resolution based on knowledge about the language and the subject area. 4. Development of a prototype architecture for a reference resolution system based on knowledge of the language and the subject area. 5. Development of a methodology for preparing data and datasets in three languages for conducting an experimental comparative study. 6. Conducting experimental studies and evaluating the quality of the developed models and methods on a set of test data. The following are submitted for protection: The reference model for the Kazakh language, the method of resorting to reference, the method of data preparation, the software of the reference resolution method. Scientific novelty: For the first time, the referential structure of a popular scientific text in the Kazakh language has been comprehensively studied: from general pragmatic attitudes set by the subject area and genre of the text to the referential interpretation of individual linguistic means. 1. A reference model for the Kazakh language has been developed, describing preferential and anaphoric relations in the Kazakh language. 2. A method of resolving references in texts in Kazakh, Russian, and English based on morphological, syntactic, and subject knowledge is proposed. The object of the study is the ways of expressing reference in natural language texts using the example of Kazakh, Russian and English languages. The subject of the research is methods and algorithms for resolving references in texts in the Kazakh language. Methodology and methods of research. The dissertation work used methods and methodology of ontological engineering, methods of corpus research of texts, methods of lexicographic research, as well as methods related to information technology and used in the processing of texts in natural language, namely: - descriptive and analytical method - for the selection and presentation of the provisions of scientific literature related to the object and subject of research; - semantic classification of vocabulary - to determine the structure of the "reference world" of a popular scientific article; - referential analysis of vocabulary with the meaning of a person and groups/sets of persons - to determine the types of correlation of objects of a popular science article with participants in a pragmatic situation - semantic and syntactic analysis - to determine the referential properties of fragments of articles; - contextual analysis - to determine the volume and content of the referents of linguistic expressions denoting persons and their sets; - computer modeling - for modeling the reference structure and processes of resolving referential conflicts - quantitative analysis of the distribution of samples - to calculate and obtain quantitative conclusions of the study. Personal contribution of the author. The contribution of the author in obtaining the results of the research work is significant. Based on the materials of the dissertation, articles were published in journals cited in the databases Scopus, Web of Science, KOKSNVO of the Ministry of Foreign Affairs of the Republic of Kazakhstan. All the methods and programs proposed in the dissertation were developed and implemented by the author personally or with his direct participation. The practical significance of the study. Information retrieval: The developed models and methods can be used to improve the quality of information retrieval in large volumes of text data. They can help you more accurately determine which objects in the texts relate to the search queries and reduce the number of incorrect results. Automatic text analysis: The developed models and methods can be used to improve the quality of automatic text analysis, for example, in the field of medicine, where the accuracy and effectiveness of text analysis is important for the diagnosis and treatment of diseases. Machine translation: The developed models and methods can be used to improve the quality of machine translation, especially in cases where expressions are used in texts that may have different meanings depending on the subject area. Approbation of the results of the dissertation. The main results of the dissertation work and research results were reported and discussed at the following foreign, international, republican scientific and practical conferences: 2 publications in international publications with a non-zero impact factor included in the Scopus databases: 1. Yerzhan zhumabay, Gulzhamal Kalman, Madina Sambetbayeva, Aigerim Yerimbetova, Assem Ayapbergenova, Almagul Bizhanova. Building a model for resolving referential relations in a multilingual system. Eastern-european journal of enterprise technologies: Vol. 2 No. 2 (116), 2022. Pp-27-35. ISSN:1729-3774 Https://doi.Org/10.15587/1729-4061.2022.255786 2. Gulzhamal Kalman, Yerzhan Zhumabay, Elmira Nurgalieva, Assel Kuanysheva, Musatay Esmaganbet. Algorithm for solving pronominal anaphora in the kazakh language. Journal of theoretical and applied information technology 31st march 2023. Vol.101. No 6 ISSN: 1992-8645 E-ISSN: 1817-3195 2 articles in peer-reviewed publications recommended by the National Scientific and Scientific Council of the Republic of Kazakhstan: 1. Қалман Г., Есмағанбет М.Ғ., Жаманкарин М.М., Габдулина А.И., Плескачев Д.В. Кластерлеу әдісін қолданып кореференциян шешу. // ҚР ҰҒА Хабарлары. Физика-математика сериясы. Алматы. 2023. (1), 121-135б. 2. Қалман Г., Самбетбаева М.А., Актаева Д., Илюбаев А. Машиналық оқыту әдістеріне негізделген анафораны шешу моделі. // ҚР ҰҒА Хабарлары. Физика-математика сериясы. Алматы. 2022. (4), 56-67б. 5 articles in materials of international conferences in the far abroad and the Republic of Kazakhstan: 1. Gulzhamal Kalman., Madina Sambetbayeva., Yerzhan Zhumabay. Algorithm for solving pronominal anaphora in the kazakh language //7 th international conference on digital technologies in education, science and industry, DTESI almaty, kazakhstan, 2022 october 20-21. C. 133-140 2. Kalman G., Dauletbek A. Creating a genre model of a scientific publication. International Scientific-Practical conference «Advances in Science and Technology»Research and Publishing Center «Actualnots.RF», Moscow, Russia January, 31, 2023, pp 107. Moscow, 3. G. Kalman. A multi-agent system for natural language. Collection of materials of the international scientific conference "Modern science: new approaches and relevant researches" December 24-25, 2020 4. E.S. Zhumabai, G. Kalman, M.A. Sambetbaeva, "Development of a model for resolving referential relations in a multilingual system." Satbaev studies-2022. Trends of modern scientific research" proceedings of the international scientific and practical conference April 12, 2022. Volume II 5. G. Kalman, E.S. Zhumabay, M.A. Sambetbaeva. Algorithm for solving referential relation. Satbaev readings-2022. Trends of modern scientific research" international scientific and practical conference proceedings April 12, 2022. Volume II. Volume and structure of the dissertation. The dissertation is written in Kazakh, consists of an introduction, three interrelated sections, which are divided into sub-sections, a conclusion and a list of used sources. The work consists of 90 pages, 20 drawings, 8 tables. The list of references consists of 122 items. In the introduction, the scientific apparatus of the research is shown, the relevance of the topic, the degree of its development in theory and practice is justified, the purpose, object, subject and tasks of the research are defined, the scientific novelty of the research, the theoretical and practical importance of the work are disclosed, research methods are defined, the conclusions proposed for defense are indicated, the personal contribution of the author , a list of publications and approval of work results are offered. In the first part, in this part, a theoretical study aimed at the development of new methods for solving referential relationships is made; the concept of reference and its types, the types of pronoun anaphora in English, Russian and Kazakh are compared and studied using methods of comparative analysis. theoretical researches are carried out on current models and methods of resolving reference, the concept of reference in the Kazakh language, its context and its future are discussed. The second part, in this part, taking into account the peculiarities of reference relations in the Kazakh language, a model of reference resolution in the Kazakh language is developed and a general description of the method of reference resolution based on morphological, syntactic and subject knowledge is presented. In the third part, in this part, the model of reference relation resolution in the Kazakh language based on the field of subject knowledge and the implementation of the method of reference resolution based on morphological, syntactic and subject field knowledge in the visual c++ software environment, and the architecture of the prototype of the reference resolution system, the description of its components, the methodology for preparing data and data sets , conducting experimental research and evaluating the effectiveness of the developed model and method and quantitative indicators of the researched work are reported. In the conclusion, the research results are summarized, and the conclusions presented to the defense confirm and prove the truth of the main conclusions. Practical research materials are presented in the appendix.
