Browsing Computer Science by Title
Now showing items 1-20 of 40
Next Page-
Jelović, Ana (Beograd , 2022)[more][less]
Abstract: n the first part of this dissertation different repeat types are defined as well as repeats that satisfy motif masks. A method for precise repeat finding in input sequences of arbitrary length has been described. As the input sequences can be very long, the number of found repeats can also be large. For that reason it is important that the method also includes filtering found repeats based on the expected number of their occurrences. The method was first applied to protein sequences in which experimentally confirmed T-cell epitopes from the IEDB database were registered. Association rules were applied to the found repeats in order to construct a model that would enable the prediction of the positions of T-cell epitopes in protein sequences. In this way, it would indicate to researchers a region in the protein sequence where an epitope can be expected with high confidence. In the case of T-cell epitopes, a large number of rules with high confidence was found. These rules can be considered as reliable predictors of the position of T-cell epitopes within the protein sequences. Based on the results found, association rules were formed that characterize the epitopes and the repeats associated with them in more detail. As a large number of results were found, only their part is presented in this disser- tation. On the basis of the strings that determine the repeat, a motif mask that the repeat needs to satisfy was searched for. The entire procedure was applied to both direct non-complementary repeats and indirect non-complementary repeats. With similar results, the entire procedure was applied to B-cell epitopes on data from the IEDB database. Data on experimentally confirmed short linear motifs were taken from the ELM database. In protein sequences where short linear motifs were registered, repeats were searched for and association rules were applied to them. The rules with high confidence have been singled out in particular. On the basis of the results found, motif masks that repeats with high confidence would satisfy were searched for. URI: http://hdl.handle.net/123456789/5442 Files in this item: 1
Ana_Jelovic_tekst_doktorata.pdf ( 6.127Mb ) -
Vujošević Janičić, Milena (Beograd , 2013)[more][less]
Abstract: LAV is a system for statically verifying program assertions and locating bugs such as buffer overflows, pointer errors and division by zero. LAV is primarily aimed at analyzing programs written in the programming language C. Since LAV uses the popular LLVM intermediate code representation, it can also analyze programs written in other procedural languages. Also, the proposed approach can be used with any other similar intermediate low level code representation. System combines symbolic execution, SAT encoding of program’s control-flow, and elements of bounded model checking. LAV represents the program meaning using first-order logic (FOL) formulas and generates final verification conditions as FOL formulas. Each block of the code (blocks have no internal branchings and no loops) is represented by a FOL formula obtained through symbolic execution. Symbolic execution, however, is not performed between different blocks. Instead, relationships between blocks are modeled by propositional variables encoding transitions between blocks. LAV constructs formulas that encode block semantics once for each block. Then, it combines these formulas with propositional formulas encoding the transitions between the blocks. The resulting compound FOL formulas describe correctness and incorrectness conditions of individual instructions. These formulas are checked by an SMT solver which covers suitable combination of theories. Theories that can be used for modeling correctness conditions are: theory of linear arithmetic, theory of bit-vectors, theory of uninterpreted functions, and theory of arrays. Based on the results obtained from the solver, the analyzed command may be given the status safe (the command does not lead to an error), flawed (the command always leads to an error), unsafe (the command may lead to an error) or unreachable (the command will never be executed). If a command cannot be proved to be safe, LAV translates a potential counterexample from the solver into a program trace that exhibits this error. It also extracts the values of relevant program variables along this trace. The proposed system is implemented in the programming language Ñ++, as a publicly available and open source tool named LAV. LAV has support for several SMT solvers (Boolector, MathSAT, Yices, and Z3). Experimental evaluation on a corpus of C programs, which are designed to demonstrate areas of strengths and weaknesses of different verification techniques, suggests that LAV is competitive with related tools. Also, experimental results show a big advantage of the proposed system compared to symbolic execution applied to programs containing a big number of possible execution paths. The proposed approach allows determining status of commands in programs which are beyond the scope of analysis that can be done by symbolic execution tools. LAV is successfully applied in educational context where it was used for finding bugs in programs written by students at introductory programming course. This application showed that in these programs there is a large number of bugs that a verification tool can efficiently find. Experimental results on a corpus of students’ programs showed that LAV can find bugs that cannot be found by commonly used automated testing techniques. Also, it is shown that LAV can improve evaluation of students’s assignments: (i) by providing useful and helpful feedback to students, which is important in the learning process, and (ii) by improving automated grading process, which is especially important to teachers. URI: http://hdl.handle.net/123456789/4231 Files in this item: 1
phdVujosevicJanicicMilena.pdf ( 1.748Mb ) -
Marinković, Vesna (Beograd , 2015)[more][less]
Abstract: The problems of geometry constructions using ruler and compass are one of the oldest and most challenging problems in elementary mathematics. A solution of construction problem is not an illustration, but a procedure that, using given construction primitives, gives a “recipe” how to construct the object sought. The main problem in solving construction problems, both for a human and a computer, is a combinatorial explosion that occurs along the solving process, as well as a procedure of proving solutions correct. In this dissertation a method for automated solving of one class of construction problems, so-called location problems, is proposed. These are the problems in which the task is to construct a triangle if locations of three characteristic points are given. This method successfully proved most of the solvable problems from Wernick’s and Connelly’s list. For each of the problems it is checked if it is symmetric to some of the already solved problems, and then its status is determined: the problem can be found redundant, locus dependent, solvable, or not solvable using existing knowledge. Also, a description of the construction in natural-language form and in language GCLC is automatically generated, accompanied by a corresponding illustration, and correctness proof of the generated construction, followed by the list of conditions when the construction is correct. The proposed method is implemented within the tool ArgoTriCS. For proving generated constructions correct, the system ArgoTriCS uses a newly developed prover ArgoCLP, the automated theorem provers integrated within tools GCLC and OpenGeoProved, as well as the interactive theorem prover Isabelle. It is demonstrated that the proofs obtained can be machine verifiable. Within this dissertation, the system ArgoCLP (developed in collaboration with Sana Stojanovi´c) which is capable of automatically proving theorems in coherent logic is described. This prover is successfully applied to different axiomatic systems. It automatically generates proofs in natural-language form, as well as machineverifiable proofs, whose correctness can be checked using interactive theorem prover Isabelle. The important part of this system is a module for simplification of generated proofs whereby shorter and readable proofs are obtained. URI: http://hdl.handle.net/123456789/4406 Files in this item: 1
tezaVesnaMarinkovic.pdf ( 2.233Mb ) -
Alshafah, Samira (Beograd , 2018)[more][less]
Abstract: Proteins with intrinsically disordered regions are involved in large number of key cell processes including signaling, transcription, and chromatin remodeling functions . On the other side, such proteins have been observed in people suffering from neurological and cardiovascular diseases, as well as various malignancies. Process of experimentally determining disordered regions in proteins is a very expensive and long - term process. As a consequence, a various computer programs for predicting position of disordered regions in proteins have been developed and constantly improved. In this thesis a new method for determining Amino acid sequences that characterize ordered/disordered regions is presented. Material used in research includes 4076 viruses wit h more than 190000 proteins. Proposed method is based on defining correspondence between n -grams (including both repeats and palindromic sequence s) characteristics and their belonging to ordered/disordered protein regions. Positions of ordered/disordered regions are predicted using three different predictors. The features of the repetitive strings used in the research include mol e fractions, fract ional differences, and z -values. Also, data mining techniques association rules and classification were applied on both repeats and palindromes. The results obtained by all techniques show a high level of agreement for a short length of less than 6, while the level of agreement grows up to the maximum with increasing the length of the sequences. The high reliability of the results obtained by the data mining techniques shows that there are n -grams, both repeating sequences and palindromes, which uniquely ch aracterize the disordered/ ordered regions of the proteins . The obtained results were verified by comparing with the results based on n- grams from the DisProt database which contain s the positions of experimentally verified disordered regions of the protein. Results can be used both for the fast localization of disordered/ordered regions in proteins as well as for further improving existing programs for their prediction. URI: http://hdl.handle.net/123456789/4746 Files in this item: 1
ThesisSamira_Alshafah.pdf ( 3.106Mb ) -
Elfaghine, Halima (Beograd , 2016)[more][less]
Abstract: The subject of this thesis belongs the area of quality control, which represents the practical usage of statistics in following and improving the production process. In 1930 Walter Shewhart started studying quality control, based on control charts, and using statistical principles. Later on, after World War II, Edward Deming took this discipline to Japan, where it ourished. The design and the performance of control charts are the most important problems in this area. The purpose of studying the characteristics of control charts in this thesis is to compare the existing and the suggested control charts. The thesis is divided into four chapters. The rst chapter is introductory and contains motivation and basic de nitions related to this subject. In this study it is always assumed that the data are normally distributed, and that the in-control process data are stationary and uncorrelated. Shewhart control charts and the corresponding limits are constructed in order to meet the given speci cations for the quality characteristic that we investigate. Quality control that is applied to a production process always has costs related to the control. The important parameters connected to the cost of quality control are: width of control limits k, the sample size n and the interval between the samples h. In Chapter 2 a new loss function is given, which is connected to the production process and to X−bar quality control chart. Using Matlab program for optimization, values of ^k; ^n and ^h are found, which minimize the loss function for given costs. For given values of cost, a non-linear regression model is built using a package Sigma plot and the obtained values are compared to those obtained by numerical optimization. In Chapter 3, the time series model Yi = Xi + (1 − )Yi−1 is investigated, where 0 < B 1 is a constant, Xi are N( ; 2) distributed. Exponentially Weighted Moving Average (EWMA) control charts for this model are presented, and type I and type II errors are calculated in the case when i is large. For di erent sample sizes, the new comparison between the optimal design of the X-bar and EWMA control charts for Normally distributed quality characteristic is given, comparing the corresponding cost-loss functions, power functions and average run lengths. i The process of calibration is one of the methods in statistical process control, introduced for improving the quality of the products and for reducing the production costs. In Chapter 4, two new models of non-symmetrical loss function are introduced. Here, the loss function is connected to one product under control (not to the whole sample). Using our program, written in statistical software R, the value which minimizes the expected loss for Shewhart X control chart is found. This value is used as the new central target value of the quality characteristic, that is, the production process is calibrated with this new value. The thesis ends with Conclusions, where the results of the thesis are summarized, and with some open problems to be investigated. URI: http://hdl.handle.net/123456789/4340 Files in this item: 1
Disertacija_Halima_Elfaghihe.pdf ( 1.104Mb ) -
Korać, Vanja (Beograd , 2014)[more][less]
Abstract: Digitalna forenzika je multidisciplinarna nauka koja podrazumeva spoj razlicitih nauĉnih disciplina (raĉunarske nauke, pravo, kriminologija) sa brojnim izazovima u uslovima masovnog generisanja digitalnih podataka (Big Data), virtuelizacije klijentske i serverske strane (Cloud Computng), neusaglašenosti standardizacionih tela i opšteg nedostatka brojnih standarda i eksperata u svim disciplinama. Kako se digitalna forenzika odnosi na sve digitalne uraĊaje, uţa nauĉna oblast uklјuĉuje brojne aplikacije digitalne forenzike, kao što su raĉunarska forenzika, forenzika mobilnih ureĊaja, forenzika na sistemima savremenih automobila, senzorskih mreţa itd. U ovom radu je analizirana i primenjena uţa nauĉna oblast raĉunarske forenzike. Opisana je digitalna forenzika raĉunarskih sistema baziranih na Windows i Linux platformi, sa fokusom na odreĊena mesta u implementiranom sistemu proaktivne digitalne forenzike koja mogu ukazati na forenziĉki relevantne dogaĊaje kritiĉne za bezbednost sistema. Opisane su brojne metodologije, tehnologije i tehnike istrage visokotehnološkog kriminala. Proces prikuplјanja podataka i digitalne forenziĉke analize „uţivo―, detalјno je razmatran. Izvršena je kratka revizija karakteristika i tipiĉno zahtevanih funkcionalnosti softverskih forenziĉkih alata, za inicijalni odgovor i oporavak podataka i particija magnetnih diskova. Opisani su i najvaţniji digitalni forenziĉki kompleti alata i njihove osnovne funkcionalnosti. U radu se istiĉu i najznaĉajniji elementi kojima treba posvetiti posebnu paţnju prilikom digitalne forenziĉke analize u virtuelnom okruţenju. TakoĊe su objašnjeni i najvaţniji segmenti samog virtuelnog okruţenja i naĉin na koji oni mogu biti znaĉajni alati, za postupak digitalne forenziĉke analize. U poslednjem delu ovog rada, fokus je usmeren na ranjivosti Windows i Linux platformi sa prikazanim naĉinima zlonamernog proboja sistema. Opisane su opšte ranjivosti i specifiĉne ranjivosti koje se odnose samo na Windows, odnosno samo na Linux platforme. TakoĊe, navedeni su i najĉešći naĉini zlonamernog iskorišćavanja sistema. Ranjivosti raĉunarskih sistema i mreţa mogu se odnositi na programe, hardver, konfiguraciju i lјude. Isklјuĉujući lјude kao najznaĉajniji i istovremeno najkritiĉniji faktor u zaštiti informacija, programske ranjivosti se tipiĉno koriste za online direktne napade, ili napade malicioznim programima. Otkrivanje i otklanjanje ranjivosti sistemskih programa je jedan od glavnih cilјeva digitalne forenzike. Pored skuplјanja forenziĉki relevantnih digitalnih podataka i izgradnje ĉvrstih digitalnih dokaza o kompjuterskom incidentu ili kriminalu za potrebe pravosudnog sistema, cilј digitalne forenziĉke analize je da se iskorišćene ranjivosti trajno otklone i da se incident/protivpravna aktivnost takve vrste više nikada ne ponovi. U tom smislu je doprinos ovog rada veoma znaĉajan. Praktiĉan primer ispitivanja ranjivosti servisa na Windows i Linux platformama obuhvatio je 80 operativnih sistema. Od tog broja, 51 se odnosi na Windows operativne sisteme, a 29 na Linux operativne sisteme. Dobijeni rezultati su rezultat dvogodišnjeg istraţivanja, jer je ispitivanje sistema vršeno u 2011. i 2013. godini. Kroz skeniranje i prikaz ranjivosti difoltno instaliranih Windows i Linux sistema preventivno se otkrivaju ranjivosti koje potencijalno mogu biti iskorišćene od strane bezbednosnih pretnji (maliciozni programi ili zlonamerni napadaĉi) i time ugroziti raĉunarske sisteme i informacije. Proaktivnim otklanjanjem ovih ranjivosti realizuje se preventivna zaštita. Uspostavlјanjem sistema proaktivne forenzike, obezbeĊuje se logovanje forenziĉki relevantnih dogaĊaja, tj. tragova pokušaja napada u realnom vremenu, ĉime se bitno olakšava forenziĉka istraga u sluĉaju incidenta ili protivpravne aktivnosti. URI: http://hdl.handle.net/123456789/3869 Files in this item: 1
doktorat_Vanja_Korac.pdf ( 9.093Mb ) -
Vujičić Stanković, Staša (Beograd , 2016)[more][less]
Abstract: The basic goal of this doctoral thesis is a research into different techniques and models which are applied in information extraction, and providing an informatic support in processing of natural language texts from culinary and gastronomy domain. Information extraction is a subfield of computational linguistics which includes techniques for natural languages processing, in order to find relevant information, define their meaning and establish relations between them. A very special attention is given to ontology based information extraction. It consists of the following: recognition of instances of ontology concepts in non‐structured or semistructured texts written in natural language, reasoning over the identified instances based on the rules defined in the ontology, as well as recognition of instances and their use for instantiating the proper ontology concepts. The main result of thesis reflects in the presentation of a new model for ontology based information extraction. Besides solving tasks of information extraction, the new model includes not only upgrade of existing lexical resources and ontologies, but also creation of the new ones. Its application resulted in development of a system for extraction of information related to the culinary domain, but this new model can be used in other fields as well. Beside this, the food ontology has been developed, Serbian WordNet is extended for another 1.404 synsets from the culinary domain, while electronic dictionary of Serbian is enlarged with 1.248 entries. The significance of the model application comes from the fact that the new and enriched linguistic resources can be used in other systems for natural language processing. The opening chapter of the thesis elaborates the need of providing an informatic model for processing a huge linguistic corpus related to culinary and gastronomy domain, through methodologically precise and solid approach integrating pieces of information on the domain. Also, the formalization of the basic research subject, text in electronic form, has been presented. Further on, the chapter contains a description of the natural languages approximations introduced in order to enable modern information technologies to process texts written in natural languages, and it emphasizes the need to make the characterisation of the text language with corresponding corpus and sublanguage. Further on in the first chapter, the task of information extraction, and the models for informatic processing of non‐structured or semi‐structured texts, used by the computer to interpret the meaning that the author (not necessarily a human) has intended to give while writing the text, are defined. Additionally, this chapter contains the description of the methods used in information extraction field – methods based on rules and methods based on machine learning. Their advantages and shortcomings are listed, so as the reasons why in this thesis are used techniques based on linguistic knowledge. As a conclusion to the introduction chapter, a special attention is given to ontologies, WordNet, and the significance of its usage as ontology. The second chapter contains the presentation of the linguistic resources and tools exploited in this thesis. It describes morphological dictionaries and local grammars used for solving the problem of information extraction from texts written in Serbian. A review of information extraction systems is given subsequently. At the end of the second chapter, the stages in processing of Serbian written texts during the information extraction in the software systems Unitex and GATE are described. The main result of the thesis is presented in the third chapter. It is the model for solving the problem of information extraction by integrating linguistic resources and tools, which includes creation of a text corpus, definition of tasks for information extraction, establishment of finite state models for information extraction, and their application accordingly, iterative enlarging of electronic morphological dictionaries, enrichment and enhancement of WordNet, and creation of new ontologies. Each of these steps is described thoroughly. Even though the model was at first considered as a solution for problems in processing Serbian, it can be equally applied for processing texts written in other languages, with the development of suitable language resources accordingly. The implementation of the above explained steps is described in the fourth chapter, through a system for information extraction from the culinary texts written in Serbian. Then follows the description of a bond in the development and mutual complement of lexical resources through steps in creating domain corpus, identifying culinary lexica, expanding and upgrading of WordNet and electronic morphological dictionaries, and developing of domain ontologies – the food ontology, the approximate measure ontology, and the ontology of ingredients that can be used as mutual replacements in the culinary domain. This system, developed for information extraction, has served for creating an advanced search system which, based on a corpus of culinary texts, generates all possible answers to inquiries made by users. In the frame of this system is implemented a specific method which serves for creation of links between different recipes. This is used in case when the user reviews a text of a recipe and notices that in preparing description features some part which already had appeared in other recipe, but with additional or different explanation. Another contribution of this thesis is application of developed ontologies in tasks that convert approximate measures into standard measures, and establishment of similarities among the recipes. The similarity of the recipes is defined as similarity of texts which describe process of course preparation in accordance with a specific recipe. The last chapter contains final conclusions and directions for future research. URI: http://hdl.handle.net/123456789/4410 Files in this item: 1
teza_Stasa.pdf ( 10.38Mb ) -
Stojanović, Sana (Beograd , 2016)[more][less]
Abstract: The advance of geometry over the centuries can be observed through the development of di erent axiomatic systems that describe it. The use of axiomatic systems begins with Euclid, continues with Hilbert and Tarski, but it doesn't end there. Even today, new axiomatic systems for Euclidean geometry are developed. Avigad's axiomatic system goes back to the beginnings and precisely describes basic derivations presented in Euclid's ½Elements . Writing an axiomatic system in a format suitable for computer theorem proving is a challenge in itself. Imprecise formulations of axioms which appear in books get exposed only when they need to be written in a format suitable for computers. The formalization of di erent axiomatic systems and computer-assisted proofs within theories described by them is the main motif of this thesis. The programs for theorem proving have existed since the eighties and today they present a collection of very powerful tools. This thesis presents a system for automated and formal theorem proving which uses the power of resolution theorem provers, a coherent prover, as well as interactive theorem provers for verifying the generated proofs. Coherent prover ArgoCLP is one of the contributions of the thesis. Additionally, the thesis develops a dialect of coherent logic based on natural deduction which enables simple transformation of generated proofs into proofs written in languages of interactive provers Isabelle and Coq as well as in natural languages, English and Serbian. The system for theorem proving is applied to three axiomatic systems of Euclidean geometry, thus illustrating its applicability to both proving the large mathematical theories and veri cation of informal proofs from mathematical textbooks. URI: http://hdl.handle.net/123456789/4416 Files in this item: 1
SanaStojanovic.pdf ( 1.885Mb ) -
Simić, Danijela (Beograd , 2017)[more][less]
Abstract: In this thesis is presented interactive formalization of various models of geometry and algebraic methods for automated proving geometry theorems. We present our current work on formalizing analytic (Cartesian) plane geometries within the proof assistant Isabelle/HOL. We give several equivalent definitions of the Cartesian plane and show that it models synthetic plane geometries (using both Tarski’s and Hilbert’s axiom systems). We also discuss several techniques used to simplify and automate the proofs. As one of our aims is to advocate the use of proof assistants in mathematical education, our exposure tries to remain simple and close to standard textbook definitions, but completely formal and mechanically verifiable. This formalization presents the develop of the necessary infrastructure for implementing decision procedures based on analytic geometry within proof assistants. Furthermore, we investigate complex numbers. Deep connections between complex numbers and geometry had been well known and carefully studied centuries ago. Fundamental objects that are investigated are the complex plane (usually extended by a single infinite point), its objects (points, lines and circles), and groups of transformations that act on them (e.g., inversions and Möbius transformations). In this thesis we treat the geometry of complex numbers formally and present a fully mechanically verified development within the theorem prover Isabelle/HOL. We discuss different approaches to formalization and discuss major advantages of the more algebraically oriented approach. Apart from applications in formalizing mathematics and in education, this work serves as a ground for formally investigating various non-Euclidean geometries and their intimate connections. We also present a formalization of part of Tarski axiom system withing Poincare disk model in Isabelle/HOL. Further on, we analyze connections between geometry and polynomials and the use of these connections. In Euclidean geometry, objects and relations between them can be expressed as polynomials. Further, any geometry construction can be expressed by set of polynomials and geometry statements can be proved by using algebraic methods (e.g. the Gröbner bases method or Wu’s method) over that set of polynomials. We describe an implementation of an algorithm in Isabelle/HOL that accepts a term representation of a geometry construction and returns its corresponding set of polynomials. Our further work will be to use the method of Gröbner bases within the Isabelle system on the generated polynomials, in order to prove correctness of the given construction. Furthermore, we investigate how spatial geometry constructions can be presented using polynomials. We investigate two different approaches in deriving those polynomials and then compare efficiency of algebraic provers depending on the approach used. We present a fully automated system for transforming geometry constructions into set of polynomials. Our further work would be to relate these geometry provers with dynamic geometry software and thus make easier for students to use it. URI: http://hdl.handle.net/123456789/4499 Files in this item: 1
06062017danijela_doktorat.pdf ( 1.158Mb ) -
Čukić, Ivan (Beograd , 2018)[more][less]
Abstract: There is a big class of problems that require software systems with asynchronously executed components. For example, distributed computations have the distributed nodes that process the data asynchronously to one anot- her, service-oriented architectures need to process separate requests asynchrono- usly, and multi-core and heterogeneous systems need to have multiple separa- te tasks running concurrently to best utilize the hardware. Even ordinary GUI applications need asynchronous components – the user interface needs to be re- sponsive at all times which means that no matter in what state the program is in, it needs to process and react to the input events coming from the user. The necessity of concurrency and asynchronous execution brings in the added com- plexity of the Inversion of Control (IoC) into the system, either through mes- sage passing or through event processing. IoC makes code difficult to develop and reason about, it increases component coupling and inhibits clean functional or object-oriented software design. In this dissertation, a method for solving the problems that IoC introduces is presented. It presents a way to model both synchronous and different types of asynchronous tasks with the continuation monad. The continuation monad serves as a primitive to build more complex control flow structures that mimic the control flow structures of the host programming language. It also allows for building more complex control structures specialized for parallelism, transactional execution, and for simulating functional programming idioms with asynchronous tasks through a generalization of the continuation monad that allows the asynchronous tasks to generate results one at a time. This allows for writing programming systems with asynchronously executed components by writing seemingly synchronous imperati- ve or functional code while leaving it up to the compiler to do all the heavy lifting and convert the written code to asynchronously executed set of tasks. Another benefit of the presented method is that it allows for easier automatic handling of the data lifetime without the need for garbage collection. This method has been successfully applied and tested in several Free/Libre Open Source Software and proprietary real-world software projects used by hun- dreds of millions of people around the world. In this dissertation, an example of a secure project management system is described which is based on a similar system implemented as a part of the KDE Plasma project. This dissertation also contains the important parts of the implementation of the AsynQt library which extends the Qt library, and its concurrency primitive – QFuture class – with functional reactive programming patterns based on the method proposed in this dissertation. URI: http://hdl.handle.net/123456789/4738 Files in this item: 1
ivan_cukic_phd.pdf ( 1.328Mb ) -
Stojanović, Boban (Kragujevac, Serbia , 2007)[more][less]
Abstract: Muscles are organs whose primary function is to produce force and motion. There are three types of muscles: skeletal (striated), smooth and hart muscles. Skeletal muscles are attached to bones and can move them voluntarily. There are many daily activities which occur over an extended period of time and during which performances of muscles can be reduced (reduction of maximal force, contraction speed, movement control, etc). Although numerous mathematical models of muscles have been developed, there are only few models which take into account muscle fatigue. Most of the existing muscle fatigue models consider muscle fatigue under specific conditions only. Motivated by the fact that the existing muscle fatigue models are very limited under arbitrary conditions of activation and loading, we here present a new model including muscle fatigue. The proposed model is based on Hill’s phenomenological model consisting of contractile, serial and parallel elastic elements, but now using a fatigue curve under maximal activation and recovery curve as input parameters, in order to predict muscle response under arbitrary loading conditions. Furthermore, an extension of Hill’s model is introduced, in order to take into account different fiber types. Various types of muscle fibers can have very different physiological and mechanical properties, significantly affecting their resistance to fatigue. The developed models are incorporated into the finite element software PAK. The proposed models are verified by comparing the calculated results with experimental measurements and data from literature. By computer modeling of human biceps and triceps muscles, as well as the frog gastrocnemius muscle, it is shown that the models can predict behavior of real muscles with satisfactory precision. Besides application to single muscles, the proposed models can be used for computer simulations of complex musculoskeletal systems. In order to provide efficient modeling of muscles and musculoskeletal systems, a software for automatic muscle generation using medical images has been developed, as well as a module for result post-processing by employing various types of graphs. The proposed models and the developed software can be used as a very powerful tool in designing medical and sport equipment, planning trainings and analyzing exercises. Computer simulations based on the muscle mechanical models can prevent work injuries and significantly reduce costs for individuals and society. URI: http://hdl.handle.net/123456789/1843 Files in this item: 1
Boban Stojanovic - Doktorska disertacija.pdf ( 12.75Mb ) -
Ivanović, Miloš (Kragujevac, Srbija , 2010)[more][less]
URI: http://hdl.handle.net/123456789/1838 Files in this item: 1
mivanovic-disertacija-pun-tekst.pdf ( 11.57Mb ) -
Mladenović, Miljana (Beograd , 2016)[more][less]
Abstract: The beginning of the new millennium was marked by huge development of social networks, internet technologies in the cloud and applications of artificial intelligence tools on the web. Extremely rapid growth in the number of articles on the Internet (blogs, e-commerce websites, forums, discussion groups, and systems for transmission of short messages, social networks and portals for publishing news) has increased the need for developing methods of rapid, comprehensive and accurate analysis of the text. Therefore, remarkable development of language technologies has enabled their applying in processes of document classification, document clustering, information retrieval, word sense disambiguation, text extraction, machine translation, computer speech recognition, natural language generation, sentiment analysis, etc. In computational linguistics, several different names for the area concerning processing of emotions in text are in use: sentiment classification, opinion mining, sentiment analysis, sentiment extraction. According to the nature and the methods used, sentiment analysis in text belongs to the field of computational linguistics that deals with the classification of text. In the process of analysing of emotions we generally speak of three kinds of text classification: • identification of subjectivity (opinion classification or subjectivity identification) used to divide texts into those that carry emotional content and those that only have factual content • sentiment classification (polarity identification) of texts that carry emotional content into those with positive and those with negative emotional content • determining the strength or intensity of emotional polarity (strength of orientation). In terms of the level at which the analysis of feelings is carried out, there are three methodologies: an analysis at the document level, at the sentence level and at the level of attributes. Standardized methods of text classification usually use machine learning methods or rule-based techniques. Sentiment analysis, as a specific type of classification of documents, also uses these methods. This doctoral thesis, whose main task is the analysis of emotions in text, presents research related to the sentiment classification of texts in Serbian language, using a probabilistic method of machine learning of multinomial logistic regression i.e. maximum entropy method. The aim of this research is to create the first comprehensive, flexible, modular system for sentiment analysis of Serbian language texts, with the help of digital resources such as: semantic networks, specialized lexicons and domain ontologies. This research is divided into two phases. The first phase is related to the development of methods and tools for detecting sentiment polarity of literal meaning of the text. In this part of the work, a new method of reducing the feature vector space for sentiment classification is proposed, implemented and evaluated. The proposed method for reduction is applied in the classification model of maximum entropy, and relies on the use of lexical-semantic network WordNet and a specialized sentiment lexicon. The proposed method consists of two successive processes. The first process is related to the expansion of feature vector space by the inflectional forms of features. The study has shown that usage of stemming in sentiment analysis as a standard method of reducing feature vector space in text classification, can lead to incomplete or incorrect sentiment-polarity feature labelling, and with the introduction of inflectional feature forms, this problem can be avoided. The paper shows that a feature vector space, increased due to the introduction of inflectional forms, can be successfully reduced using the other proposed procedure – semantic mapping of all predictors with the same sentiment-polarity into a small number of semantic classes. In this way, the feature vector space is reduced compared to the initial one, and it also retains the semantic precision. The second phase of the dissertation describes the design and implementation of formal ontologies of Serbian language rhetorical figures – the domain ontology and the task ontology. Usage of the task ontology in generating features representing figurative speech is presented. The research aim of the second phase is to recognize figurative speech to be used in improving of the existing set of predictors generated in the first phase of the research. The research results in this phase show that some classes of figures of speech can be recognized automatically. In the course of working on this dissertation, a software tool SAFOS (Sentiment Analysis Framework for Serbian), as an integrated system for sentiment classification of text in Serbian language, has been developed, implemented and statistically evaluated. Results of the research within the scope of this thesis are shown in papers (Mladenović & Mitrović, 2013; Mladenović & Mitrović, 2014; Mladenović, Mitrović & Krstev, 2014; Mladenović, Mitrović, Krstev & Vitas, 2015; Mladenović, Mitrović & Krstev, 2016). The dissertation consists of seven chapters with the following structure. Chapter 1 introduces and defines methods, resources and concepts used in the first phase of research: text classification, sentiment classification, machine learning, supervised machine learning, probabilistic supervised machine learning, and language models. At the end of the introductory section, the tasks and objectives of the research have been defined. Chapter 2 presents a mathematical model of text classification methods and classification of sentiment methods. A mathematical model of a probabilistic classification and an application of the probabilistic classification in regression models are presented. At the end of the chapter it is shown that the method using the mathematical model of maximum entropy, as one of the regression models, has been successfully applied to natural language processing tasks. Chapter 3 presents the lexical resources of the Serbian language and the methods and tools of their processing. Chapter 4 deals with the comprehensive research on the currently available types and methods of sentiment classification. It shows the current work and research in sentiment classification of texts. It also presents a comparative overview of research in sentiment classification of texts using the method of maximum entropy. Chapter 5 discusses the contribution of this thesis to methods of feature space reduction for maximum entropy classification. First, a feature space reduction method is analysed. A new feature space reduction method which improves sentiment classification is proposed. А mathematical model containing proposed method is defined. Learning and testing sets and lexical-semantic resources that are used in the proposed method are introduced. Chapter 5 also describes building and evaluation of a system for sentiment classification – SAFOS, which applies and evaluates the proposed method of a feature vector space reduction. The parameters and the functions of SAFOS are defined. Also, measures for evaluation of the system were discussed – precision, recall, F1-measure and accuracy. A description of the method for assessing the statistical significance of a system is given. Also, implementation of the statistical test in the system SAFOS is discussed. The chapter provides an overview of the presented experiments, results and evaluation of the system. Chapter 6 deals with methods of recognizing figurative speech which can improve sentiment classification. The notion of domain ontology is introduced, the role of rhetorical figures and domain ontology of rhetorical figures. The importance of figurative speech in the sentiment classification has been explored. The description of the construction and structure of the first domain ontology of rhetorical figures in Serbian language, RetFig.owl, is given. Also, the description of the construction and structure of the corresponding task ontology that contains rules for identification of some classes of rhetorical figures is given. At the end of this chapter, an overview of the performed experiments, results and evaluation of the SAFOS system plugin that improved the recognition of figurative speech is given. The final chapter of this study deals with the achievemnts, problems and disadvantages of the SAFOS system. The conclusion of this thesis points to the great technological, social, educational and scientific importance of the sentiment analysis and recognition of the figurative speech and gives some routes in further development of the SAFOS system. URI: http://hdl.handle.net/123456789/4422 Files in this item: 1
Mladenovic_Miljana.pdf ( 13.60Mb ) -
Marovac, Ulfeta (Beograd , 2015)[more][less]
Abstract: Proteins are signi cant biological macromolecules of polymeric nature (polypeptides), which contain amino acids and are basic structural units of each cell. Their contents include 20+3 amino acids and, as a consequence, they are presented in biological databases as sequences formed from 23 di erent characters. Proteins can be classi ed based on their primary structure, secondary structure, function etc. One of possible classi cations of proteins by their function is related to their contents in a certain cluster of ortholologous groups (COGs). This classi cation is based on the previous comparison of proteins by their similarities in their primary structures, which is most often a result of homology, i.e. their mutual (evolutionary) origin. COG database is obtained by comparison of the known and predicted proteins encoded in the completely sequenced prokaryotic (archaea and bacteria) genomes and their classi cation by orthology. The proteins are classi ed in 25 categories which can be ordered in three basic functional groups (the proteins responsible for: (1) information storage and processing; (2) cellular processes and signaling; and (3) metabolism), or in a group of poorly characterized proteins. Classi cation of proteins by their contents in certain COG category (euKaryote Orthologous Groups- KOG for eukaryotic organisms) is signi cant for better understanding of biological processes and various pathological conditions in people and other organisms. The dissertation proposed the model for classi cation of proteins in COG categories based on amino acid n-grams (sequences of n- length). The set of data contains protein sequences of genomes from 8 di erent taxonomic classes [TKL97] of bacteria (Aqui cales, Bacteroidia, Chlamydiales, Chlorobia, Chloro exia, Cytophagia, Deinococci, Prochlorales), which are known to have been classi ed by COG categories. The new method is presented, based on the generalized systems of Boolean equations, used for separation of n-grams characteristic for proteins of corresponding COG categories. The presented method signi cantly reduces the number of processed n-grams in comparison to previously used methods of n-gram analysis, iv thus more memory space is provided and less time for protein procession is necessary. The previously known methods for classi cation of proteins by functional categories compared each new protein (whose function had to be determined) to the set of all proteins which had already been classi ed by functions in order to determine the group which contained most similar proteins to the one which was to be classi ed. In relation to the previous, the advantage of the new method is in its avoidance of sequence-sequence comparison and in search for those patterns (n-grams, up to 10 long) in a protein which are characteristic of the corresponding COG category. The selected patterns are added to a corresponding COG category and describe sequences of certain length, which have previously appeared in that COG category only, not in the proteins of other COG categories. On the basis of the proposed method, the predictor for determination of the corresponding COG category for a new protein is implemented. Minimal precision of the prediction is one of the predictors arguments. During the test phase the constructed predictor shown excellent results, with maximal precision of 99% reached for some proteins. According to its properties and relatively simple construction, the proposed method can be applied in similar domains where the solution of problem is based on n-gram sequence analysis. URI: http://hdl.handle.net/123456789/4308 Files in this item: 1
phdUlfetaMarovac.pdf ( 7.954Mb ) -
Šandrih, Branislava (Beograd , 2020)[more][less]
Abstract: The main goal of this dissertation is to put different text classification tasks inthe same frame, by mapping the input data into the common vector space of linguisticattributes. Subsequently, several classification problems of great importance for naturallanguage processing are solved by applying the appropriate classification algorithms.The dissertation deals with the problem of validation of bilingual translation pairs, sothat the final goal is to construct a classifier which provides a substitute for human evalu-ation and which decides whether the pair is a proper translation between the appropriatelanguages by means of applying a variety of linguistic information and methods.In dictionaries it is useful to have a sentence that demonstrates use for a particular dictio-nary entry. This task is called the classification of good dictionary examples. In this thesis,a method is developed which automatically estimates whether an example is good or badfor a specific dictionary entry.Two cases of short message classification are also discussed in this dissertation. In thefirst case, classes are the authors of the messages, and the task is to assign each messageto its author from that fixed set. This task is called authorship identification. The otherobserved classification of short messages is called opinion mining, or sentiment analysis.Starting from the assumption that a short message carries a positive or negative attitudeabout a thing, or is purely informative, classes can be: positive, negative and neutral.These tasks are of great importance in the field of natural language processing and theproposed solutions are language-independent, based on machine learning methods: sup-port vector machines, decision trees and gradient boosting. For all of these tasks, ademonstration of the effectiveness of the proposed methods is shown on for the Serbianlanguage. URI: http://hdl.handle.net/123456789/5090 Files in this item: 1
BranislavaSandrihPhd_konacno.pdf ( 9.053Mb ) -
Zeković, Ana (Beograd , 2015)[more][less]
Abstract: A main focus of the paper is construction of new methods for defining diverse knot distance types - the distance of knots made by crossing changes (Gordian distance) and the distance among knots made by crossing smoothing (smoothing distance). Different ways of knots presentation are introduced, with objective to a mirror curve model. It is presented a purpose of the model, coding of knots, by using the model preferences, as well as introduction of a method to determinate a knots presented by the model and derived all the knots that could be placed to a nets dimensions p×q (p ≤ 4, q ≤ 4). Diverse knot notations are described into details, with a focus to Conway’s notation and its topological characteristics. As it is known, a present algorithms are based on an algebra of chain fractions, that are in close relation with a presentation of rational knots, which results in an absence of a huge number of non-rational knots, in an existing Gordian’s distance tables. The subject of the paper is an implementation of methods with bases on determination of new distances equal 1. The methods are based on a non-minimal presentation of rational and non-rational knots, generation of algorithms established on geometrical characteristics of Conway’s notation and a weighted graph search. The results are organized into Gordian’s distance knots tables up to 9 crossings, and have been enclosed with the paper. In order to append the table with knots having a bigger number of crossings, it has been suggested a method for extension of results for knot families. Using facts of relation among Gordian’s numbers and smoothing numbers, a new method for smoothing number determination is presented, and results in a form of lists for knots not having more then 11 crossings. In conjunction with Conway’s notation concept and the method, algorithms for a smoothing distance are generated. New results are organized in knot tables, up to 9 crossings, combined with previous results, and enclosed with the paper. A changes and smoothing to a knot crossing could be applied for modeling topoisomerase and recombinase actions of DNA chains. It is presented the method for studying changes introduced by the enzymes. A main contribution to the paper is the concept of Conways notation, used for all relevant results and methods, which led to introduction of a method for derivation a new knots in Conways notation by extending C-links. In a lack of an adequat pattern for an existing knot tables in DT-notation, there is usage of a structure based on topological knot concepts. It is proposed a method for knot classification based on Conways notation, tables of all knots with 13 crossings and alternated knots with 14 crossings has been generated and enclosed. The subject of the paper takes into consideration Bernhard-Jablan’s hypothesis for a determination of unknotting number using minimal knot diagrams. The determination is crucial in computation of diverse knot distances. The paper covers one of main problems in knot theory and contains a new method of knot minimization. The method is based on relevance of local and global minimization. 5 There are defined new terms such as a maximum and a mixed unknotting number. The knots that do not change a minimum crossing number, after only one crossing change are taken into consideration for the analyzes. Three classes of the knots are recognized, and called by authors . Kauffman’s knots, Zekovic knots and Taniyama’s knots. The most interesting conclusion correlated with Zekovic knots is that all derived Perko’s knots (for n ≤ 13 crossings) are actually Zekovic knots. Defining this class of knots provides opportunity to emphasize new definitions of specifis featured for well-known Perko’s knots. URI: http://hdl.handle.net/123456789/4255 Files in this item: 1
phdZekovicAna.pdf ( 5.246Mb ) -
Radosavljević, Jovan (Beograd , 2023)[more][less]
Abstract: Graph G = (V,E) is an ordered pair of set of nodes V and branches E. Order graph G is the number of nodes |V |, and its size is the number of branches |E|. Knots u, v ∈ V are adjacent if there is a branch uv ∈ E between them. Distance dist(u, v) nodes u and v G is the length of the shortest path from u to v. The diameter of the graph G is the largest distance dist(u, v) let two nodes in, v. They are discussed in the dissertation graphs of diameter 2. Intuitively, the notion that graphs are dia- meters 2 simple structures; however, they are known to be asymptotically close all graphs of diameter 2. That is why a narrower class is interesting — class D2C of critical graphs of diameter 2, i.e. graphs where the removal of any branches leads to an increase in diameter. In addition, a narrower class of pri- mitive D2C (PD2C) graphs, i.e. D2C graphs that do not have two nodes with the same set of neighbors. In the introductory chapter 2, the basic concepts, algorithms and dings used in the dissertation. They are presented in the following chapters original results regarding diameter graphs 2. Chapter 3 describes the procedure for obtaining a list of D2C graphs of order up to 13. With built-in parallelization, the creation of a list of D2C graphs of order up to 13 it lasted a month. This was a step forward, because previously there was a spi- around all graphs of diameter 2 lines up to 10. The obtained results were used for testing several known hypotheses about graphs of diameter 2. In chapter 4 it is shown that for every m ⩾ 3 a D2C graph containing cli- a ku of size m must have at least 2m nodes. At the same time, with accuracy up to isomorphism, there is exactly one graph of size 2m that contains a clique of characters m. Chapter 5 discusses PD2C graphs with the smallest number of branches. From list of all PD2C graphs of order n ⩽ 13 are selected PD2C graphs of size at most 2n − 4. Only three of the isolated graphs are of size 2n − 5, which is in accordance with the statement of the Erdes-Renji theorem about the lower bound for the size graphs of diameter 2 that do not contain a node adjacent to all other nodes (that limit is 2n − 5). PD2C graphs of size 2n − 4 rows up to 13 sorted are in three groups: • The first group belongs to the Z family, defined in the dissertation, which for each n ⩾ 6 contains exactly one PD2C graph of order n of size 2n − 4. • The second group consists of seven Hamiltonian PD2C graphs of order at most 9 of size 2n−4. In the dissertation it was proved that there is no such Hamil- tone graph of order greater than 11, i.e. that the seven graphs found are the only ones Hamiltonian PD2C graphs of size 2n − 4. • The third group consists of a unique graph that does not belong to any of the first two groups. Based on these results, the hypothesis was formulated that all PD2C graphs re- that n ⩾ 10 and sizes 2n − 4 belong to the family Z. Keywords: graphs, critical graphs of diameter 2, primitive graph- You Scientific field: Computing and informatics Narrower scientific field: Graph theory UDC number: 004.415.5(519.1 URI: http://hdl.handle.net/123456789/5594 Files in this item: 1
disertacijaJovanRadosavljevic.pdf ( 746.0Kb ) -
Spasić, Mirko (Beograd , 2020)[more][less]
Abstract: The query containment problem is a very important computer science problem,originally defined for relational queries. With the growing popularity of theSPARQLquerylanguage, it became relevant and important in this new context, too. This thesis introducesa new approach for solving this problem, based on a reduction to satisfiability in first orderlogic. The approach covers containment underRDF SCHEMAentailment regime, and it candeal with the subsumption relation, as a weaker form of containment. The thesis provessoundness and completeness of the approach for a wide range of language constructs. It alsodescribes an implementation of the approach as an open source solverSPECS. The experi-mental evaluation on relevant benchmarks shows thatSPECSis efficient and comparing tostate-of-the-art solvers, it gives more precise results in a shorter amount of time, while suppor-ting a larger fragment ofSPARQLconstructs. An application of query language modeling canbe useful also along refactoring of database driven applications, where simultaneous changesthat include both a query and a host language code are very common. These changes canpreserve the overall equivalence, without preserving equivalence of these two parts consideredseparately. Because of the ability to guarantee the absence of differences in behavior betweentwo versions of the code, tools that automatically verify code equivalence have great benefitsfor reliable software development. With this motivation, a custom first-order logic modelingof SQL queries is developed and described in the thesis. It enables an automated approachfor reasoning about equivalence ofC/C++programs with embedded SQL. The approach is implemented within a publicly available and open source framework SQLAV. URI: http://hdl.handle.net/123456789/5095 Files in this item: 1
doktoratMirkoSpasic.pdf ( 1.258Mb ) -
Dražić, Zorica (Beograd , 2014)[more][less]
Abstract: The Variable neighborhood search method proved to be very successful for solving discrete and continuous optimization problems. The basic idea is a systematic change of neighborhood structures in search for the better solution. For optimiza- tion of multiple variable functions, methods for obtaining the local minimum starting from certain initial point are used. In case when the continuous function has many local minima, nding the global minimum is usually not an easy task since the obta- ined local minima in most cases are not optimal. In typical implementations with bounded neighborhoods of various diameters it is not possible, from arbitrary point, to reach all points in solution space. Consequently, the strategy of using the nite number of neighborhoods is suitable for problems with solutions belonging to some known bounded subset of IRn. In order to overcome the previously mentioned limitation the new variant of the method is proposed, Gaussian Variable neighborhood search method. Instead of de ning the sequence of di erent neighborhoods from which the random point will be chosen, all neighborhoods coincide with the whole solution space, but with di e- rent probability distributions of Gaussian type. With this approach, from arbitrary point another more distant point is theoretically reachable, although with smaller probability. In basic version of Variable neighborhood search method one must de ne in advance the neighborhood structure system, their number and size, as well as the type of random distribution to be used for obtaining the random point from it. Gaussian Variable neighborhood search method has less parameters since all the neighborhoods are theoretically the same (equal to the solution space), and uses only one distribution family - Gaussian multivariate distribution with variable dispersion. File transfer scheduling problem (FTSP) is an optimization problem widely appli- cable to many areas such as Wide Area computer Networks (WAN), Local Area Ne- v tworks (LAN), telecommunications, multiprocessor scheduling in a MIMD machines, task assignments in companies, etc. As it belongs to the NP-hard class of problems, heuristic methods are usually used for solving this kind of problems. The problem is to minimize the overall time needed to transfer all les to their destinations for a given collection of various sized les in a computer network, i.e. to nd the le transfer schedule with minimal length. In order to obtain the exact solution of the FTS problem, integer linear pro- gramming formulations are proposed and their correctness is proved. In this way optimal solutions can be found for small and medium size test instances. For large test instances, the Variable neighborhood search method is proposed using the "permutation" representation and typical neighborhood structures. More- over, the same method is used for obtaining the upper bounds of the solutions which are used in proposed integer linear programming formulations. For obtaining be- tter solutions in the small neighborhood of the current solution, three di erent local search procedures are implemented: 2-swap, 2-swap adjacent and variable neighbo- rhood descent. In order to apply the continuous optimization methods for solving FTSP, the weighted solution representation is developed. Such representation enables the co- ntinuous optimization methods to be used, which do not require the di erentiability of objective function. Since Gauss Variable neighborhood search method proved to be successful in continuous optimization problems, it was applied to FTSP. Pre- viously described local search procedures can also be used with weighted solution representation. Using the proposed methods optimal solutions for all small and medium size test instances are found. For large size instances, which are beyond the reach of exact methods, metaheuristic methods obtained good solutions in reasonable time. URI: http://hdl.handle.net/123456789/4246 Files in this item: 1
phdDrazic_Zorica.pdf ( 4.739Mb ) -
Tanasijević, Ivana (Beograd , 2020)[more][less]
Abstract: The motivation for writing this doctoral dissertation is a multimedia col-lection that is the result of many years of field research conducted by researchers from the Institute for Balkan studies of the Serbian Academy of Sciences and Arts. The collection consists of materials in the form of recorded interviews, various recorded customs, associated textual descriptions (protocols) and numerous other documents.The subject of research of this dissertation is the study of possibilities and the development of new methods that could be used as a starting point in solving the problem of managing the intangible cultural heritage of the Balkans. The subtasksthat emerge in this endeavor are the development of adequate design and implemen-tation of a multimedia database of intangible cultural heritage that would meet the needs of different types of users, automatic semantic annotation of protocols using natural language processing methods, as a basis for semi-automatic annotation of the multimedia collection, and successful search by metadata which comply with the CIDOC CRM standard, study of additional search possibilities of this collection in order to gain new knowledge, as well as development of selected methods.The main problem with the available methods is that there is still not enough developed infrastructure in the context of natural language processing, organization and management in the field of cultural heritage in the Balkans and especially for the Serbian language, which could be effectively used to solve the proposed problem.There is thus a strong need to develop methods to reach an appropriate solution.For the semi-automatic annotation of multimedia materials, automatic semantic annotation of the protocols associated with the materials was used. It was carriedout by methods of information extraction, recognition of named entities and topicextraction, using rule-based techniques with the help of additional resources suchas electronic dictionaries, thesauri and vocabularies from a specific domain.To classify textual protocols in relation to the topic, research was conducted onmethods that can be used to solve the problem of classifying texts in the Serbianlanguage, and a method was offered that is adapted to the specific domain beingprocessed (intangible cultural heritage), to the specific problems being solved (clas-sification of protocols in relation to the topic) and to the Serbian language, as one of the morphologically rich languages.To work with spatial data, a spatial model has been developed that is suitable for displaying results on a map, as well as for creating spatial queries through an interactive graphical display of a map of locations.The results of experiments conducted on the developed methods show that the use of a rule-based approach in combination with additional language resources an dwith putting in a reasonable amount of effort gives very good results for the task of information extraction. An F measure of 0.87 was reached for the extraction of named entities, while an F measure of 0.90 was reached for the extraction of topics,which is in the range of measures from published research from similar problem sand domains.The results of the text classification indicate that the selected statistical methods of machine learning in their basic form when applied to the protocols, although generally successful, give a bad F measure, 0.44, while significant improvement is achieved with the use of semantic techniques, in which case an F measure of 0.88 is reached.Some of the results presented in this dissertation are contained in the papers[266], [265], [94], [264], [267], which have been published or accepted for publication.The conclusion drawn from the research is that to solve the given problem it is necessary to engage experts from several fields, that the needs of different groups of users are complex, which complicates the task of organizing and managing them ultimedia collection, that the domain of cultural heritage is very rich in semantics,that context plays a major role in the tasks of information extraction and text classification, and finally that for these tasks the developed rule-based methods of natural language processing as well as statistical techniques of machine learning prove to be successful. URI: http://hdl.handle.net/123456789/5093 Files in this item: 1
IvanaTanasijevicdr.pdf ( 4.674Mb )
Now showing items 1-20 of 40
Next Page