MULTIMEDIJALNE BAZE PODATAKA U UPRAVLJANJU NEMATERIJALNIM KULTURNIM NASLEDEM

Show full item record

Title:	MULTIMEDIJALNE BAZE PODATAKA U UPRAVLJANJU NEMATERIJALNIM KULTURNIM NASLEDEM
Author:	Tanasijević, Ivana
Abstract:	The motivation for writing this doctoral dissertation is a multimedia col-lection that is the result of many years of field research conducted by researchers from the Institute for Balkan studies of the Serbian Academy of Sciences and Arts. The collection consists of materials in the form of recorded interviews, various recorded customs, associated textual descriptions (protocols) and numerous other documents.The subject of research of this dissertation is the study of possibilities and the development of new methods that could be used as a starting point in solving the problem of managing the intangible cultural heritage of the Balkans. The subtasksthat emerge in this endeavor are the development of adequate design and implemen-tation of a multimedia database of intangible cultural heritage that would meet the needs of different types of users, automatic semantic annotation of protocols using natural language processing methods, as a basis for semi-automatic annotation of the multimedia collection, and successful search by metadata which comply with the CIDOC CRM standard, study of additional search possibilities of this collection in order to gain new knowledge, as well as development of selected methods.The main problem with the available methods is that there is still not enough developed infrastructure in the context of natural language processing, organization and management in the field of cultural heritage in the Balkans and especially for the Serbian language, which could be effectively used to solve the proposed problem.There is thus a strong need to develop methods to reach an appropriate solution.For the semi-automatic annotation of multimedia materials, automatic semantic annotation of the protocols associated with the materials was used. It was carriedout by methods of information extraction, recognition of named entities and topicextraction, using rule-based techniques with the help of additional resources suchas electronic dictionaries, thesauri and vocabularies from a specific domain.To classify textual protocols in relation to the topic, research was conducted onmethods that can be used to solve the problem of classifying texts in the Serbianlanguage, and a method was offered that is adapted to the specific domain beingprocessed (intangible cultural heritage), to the specific problems being solved (clas-sification of protocols in relation to the topic) and to the Serbian language, as one of the morphologically rich languages.To work with spatial data, a spatial model has been developed that is suitable for displaying results on a map, as well as for creating spatial queries through an interactive graphical display of a map of locations.The results of experiments conducted on the developed methods show that the use of a rule-based approach in combination with additional language resources an dwith putting in a reasonable amount of effort gives very good results for the task of information extraction. An F measure of 0.87 was reached for the extraction of named entities, while an F measure of 0.90 was reached for the extraction of topics,which is in the range of measures from published research from similar problem sand domains.The results of the text classification indicate that the selected statistical methods of machine learning in their basic form when applied to the protocols, although generally successful, give a bad F measure, 0.44, while significant improvement is achieved with the use of semantic techniques, in which case an F measure of 0.88 is reached.Some of the results presented in this dissertation are contained in the papers[266], [265], [94], [264], [267], which have been published or accepted for publication.The conclusion drawn from the research is that to solve the given problem it is necessary to engage experts from several fields, that the needs of different groups of users are complex, which complicates the task of organizing and managing them ultimedia collection, that the domain of cultural heritage is very rich in semantics,that context plays a major role in the tasks of information extraction and text classification, and finally that for these tasks the developed rule-based methods of natural language processing as well as statistical techniques of machine learning prove to be successful.
URI:	http://hdl.handle.net/123456789/5093
Date:	2020