EKSRAKCIJA INFORMACIJA VOĐENA ONTOLOGIJAMA (MODEL ZA SRPSKI JEZIK)

Zur Kurzanzeige

dc.contributor.advisor	Vitas, Duško
dc.contributor.author	Vujičić Stanković, Staša
dc.date.accessioned	2017-01-17T16:49:47Z
dc.date.available	2017-01-17T16:49:47Z
dc.date.issued	2016
dc.identifier.uri	http://hdl.handle.net/123456789/4410
dc.description.abstract	The basic goal of this doctoral thesis is a research into different techniques and models which are applied in information extraction, and providing an informatic support in processing of natural language texts from culinary and gastronomy domain. Information extraction is a subfield of computational linguistics which includes techniques for natural languages processing, in order to find relevant information, define their meaning and establish relations between them. A very special attention is given to ontology based information extraction. It consists of the following: recognition of instances of ontology concepts in non‐structured or semistructured texts written in natural language, reasoning over the identified instances based on the rules defined in the ontology, as well as recognition of instances and their use for instantiating the proper ontology concepts. The main result of thesis reflects in the presentation of a new model for ontology based information extraction. Besides solving tasks of information extraction, the new model includes not only upgrade of existing lexical resources and ontologies, but also creation of the new ones. Its application resulted in development of a system for extraction of information related to the culinary domain, but this new model can be used in other fields as well. Beside this, the food ontology has been developed, Serbian WordNet is extended for another 1.404 synsets from the culinary domain, while electronic dictionary of Serbian is enlarged with 1.248 entries. The significance of the model application comes from the fact that the new and enriched linguistic resources can be used in other systems for natural language processing. The opening chapter of the thesis elaborates the need of providing an informatic model for processing a huge linguistic corpus related to culinary and gastronomy domain, through methodologically precise and solid approach integrating pieces of information on the domain. Also, the formalization of the basic research subject, text in electronic form, has been presented. Further on, the chapter contains a description of the natural languages approximations introduced in order to enable modern information technologies to process texts written in natural languages, and it emphasizes the need to make the characterisation of the text language with corresponding corpus and sublanguage. Further on in the first chapter, the task of information extraction, and the models for informatic processing of non‐structured or semi‐structured texts, used by the computer to interpret the meaning that the author (not necessarily a human) has intended to give while writing the text, are defined. Additionally, this chapter contains the description of the methods used in information extraction field – methods based on rules and methods based on machine learning. Their advantages and shortcomings are listed, so as the reasons why in this thesis are used techniques based on linguistic knowledge. As a conclusion to the introduction chapter, a special attention is given to ontologies, WordNet, and the significance of its usage as ontology. The second chapter contains the presentation of the linguistic resources and tools exploited in this thesis. It describes morphological dictionaries and local grammars used for solving the problem of information extraction from texts written in Serbian. A review of information extraction systems is given subsequently. At the end of the second chapter, the stages in processing of Serbian written texts during the information extraction in the software systems Unitex and GATE are described. The main result of the thesis is presented in the third chapter. It is the model for solving the problem of information extraction by integrating linguistic resources and tools, which includes creation of a text corpus, definition of tasks for information extraction, establishment of finite state models for information extraction, and their application accordingly, iterative enlarging of electronic morphological dictionaries, enrichment and enhancement of WordNet, and creation of new ontologies. Each of these steps is described thoroughly. Even though the model was at first considered as a solution for problems in processing Serbian, it can be equally applied for processing texts written in other languages, with the development of suitable language resources accordingly. The implementation of the above explained steps is described in the fourth chapter, through a system for information extraction from the culinary texts written in Serbian. Then follows the description of a bond in the development and mutual complement of lexical resources through steps in creating domain corpus, identifying culinary lexica, expanding and upgrading of WordNet and electronic morphological dictionaries, and developing of domain ontologies – the food ontology, the approximate measure ontology, and the ontology of ingredients that can be used as mutual replacements in the culinary domain. This system, developed for information extraction, has served for creating an advanced search system which, based on a corpus of culinary texts, generates all possible answers to inquiries made by users. In the frame of this system is implemented a specific method which serves for creation of links between different recipes. This is used in case when the user reviews a text of a recipe and notices that in preparing description features some part which already had appeared in other recipe, but with additional or different explanation. Another contribution of this thesis is application of developed ontologies in tasks that convert approximate measures into standard measures, and establishment of similarities among the recipes. The similarity of the recipes is defined as similarity of texts which describe process of course preparation in accordance with a specific recipe. The last chapter contains final conclusions and directions for future research.	en_US
dc.description.provenance	Submitted by Slavisha Milisavljevic (slavisha) on 2017-01-17T16:49:47Z No. of bitstreams: 1 teza_Stasa.pdf: 10384201 bytes, checksum: 08f35c4bcd04023aa59c856190dcb54d (MD5)	en
dc.description.provenance	Made available in DSpace on 2017-01-17T16:49:47Z (GMT). No. of bitstreams: 1 teza_Stasa.pdf: 10384201 bytes, checksum: 08f35c4bcd04023aa59c856190dcb54d (MD5) Previous issue date: 2016	en
dc.language.iso	sr	en_US
dc.publisher	Beograd	en_US
dc.title	EKSRAKCIJA INFORMACIJA VOĐENA ONTOLOGIJAMA (MODEL ZA SRPSKI JEZIK)	en_US
mf.author.birth-date	1982
mf.author.birth-place	Beograd	en_US
mf.author.birth-country	Srbija	en_US
mf.author.residence-state	Srbija	en_US
mf.author.citizenship	Srpsko	en_US
mf.author.nationality	Srpkinja	en_US
mf.subject.area	Computer science	en_US
mf.subject.keywords	Information Extracion, Natural Language Processing, Ontologies, Culinary Domain, WordNet	en_US
mf.subject.subarea	Text Processing	en_US
mf.contributor.committee	Pavlović - Lažetić, Gordana
mf.contributor.committee	Pajić, Vesna
mf.contributor.committee	Milutinović, Veljko
mf.university.faculty	Mathematical Faculty	en_US
mf.document.references	103	en_US
mf.document.pages	213	en_US
mf.document.location	Beograd	en_US
mf.document.genealogy-project	No	en_US
mf.university	Belgrade University	en_US

Dateien zu dieser Ressource

Dateien	Größe	Format	Anzeige
teza_Stasa.pdf	10.38Mb	PDF	Öffnen

Das Dokument erscheint in:

Computer Science

Zur Kurzanzeige

EKSRAKCIJA INFORMACIJA VOĐENA ONTOLOGIJAMA (MODEL ZA SRPSKI JEZIK)

eBibliothek Repositorium

EKSRAKCIJA INFORMACIJA VOĐENA ONTOLOGIJAMA (MODEL ZA SRPSKI JEZIK)

Dateien zu dieser Ressource

Das Dokument erscheint in:

eBibliothek Suche

Stöbern

Gesamter Bestand

Diese Sammlung

Mein Benutzerkonto

Relited sites

COPYRIGHT STATEMENT