Primena pravila pridruživanja i metoda podržavajućih vektora za predviđanje T - ćelijskih epitopa

Zur Langanzeige

Titel:	Primena pravila pridruživanja i metoda podržavajućih vektora za predviđanje T - ćelijskih epitopa
Autor:	Jandrlić, Davorka
Zusammenfassung:	Application of association rule and support vector machine technique for T cell epitope prediction Abstract: Data mining is an interdisciplinary sub eld of computer science, including various scienti c disciplines such as: database systems, statistics, machine learning, arti cial intelligence and the others. The main task of data mining is automatic and semi-automatic analysis of large quantities of data to extract previously unknown, nontrivial and interesting patterns. Rapid development in the elds of immunology, genomics, proteomics, molecular biology and other related areas has caused a large increase in biological data. Drawing conclusions from these data requires sophisticated computational analyses. Without automatic methods to extract data it is almost impossible to investigate and analyze this data. Currently, one of the most active problems in immunoinformatics is T cell epitope identi cation. Identi cation of T - cell epitopes, especially dominant T - cell epitopes widely represented in population, is of the immense relevance in vaccine development and detecting immunological patterns characteristic for autoimmune diseases. Epitope-based vaccines are of great importance in combating infectious and chronic diseases and various types of cancer. Experimental methods for identi cation of T - cell epitopes are expensive, time consuming, and are not applicable for large scale research (especially not for the choice of the optimal group of epitopes for vaccine development which will cover the whole population or personalized vaccines). Computational and mathematical models for T - cell epitope prediction, based on MHC-peptide binding, are crucial to enable the systematic investigation and identi cation of T - cell epitopes on a large dataset and to complement expensive and time consuming experimentation [16]. T - cells (T - lymphocytes) recognize protein antigen(s) only when degradated to peptide fragments and complexed with Major Histocompatibility Complex (MHC) molecules on the surface of antigen-presenting cells [1]. The binding of these peptides (potential epitopes) to MHC molecules and presentation to T - cells is a crucial (and the most selective) step in both cellular and humoral adoptive immunity. Currently exist numerous of methodologies that provide identi cation of these epitopes. In this PhD thesis, discussed methods are exclusively based on peptide sequence binding to MHC molecules. It describes existing methodologies for T - cell epitope prediction, the shortcomings of existing methods and some of the available databases of experimentally determined linear T - cell epitopes. The new models for T - cell epitope prediction using data mining techniques are developed and extensive analyses concerning to whether disorder and hydropathy prediction methods could help understanding epitope processing and presentation is done. Accurate computational prediction of T cell epitope, which is the aim of this thesis, can greatly expedite epitope screening by reducing costs and experimental e ort. These theses deals with predictive data mining tasks: classi cation and regression, and descriptive data mining tasks: clustering, association rules and sequence analysis. The new-developed models, which are main contribution of the dissertation are comparable in performance with the best currently existing methods, and even better in some cases. Developed models are based on the support vector machine technique for classi cation and regression problems. À new approach of extracting the most important physicochemical properties that in uence the classi cation of MHC-binding ligands is also presented. For that purpose are developed new clustering-based classi cation models. The models are based on k-means clustering technique. The second part of the thesis concerns the establishment of rules and associations of T - cell epitopes that belong to di erent protein structures. The task of this part of research was to nd out whether disorder and hydropathy prediction methods could help in understanding epitope processing and presentation. The results of the application of an association rule technique and thorough analysis over large protein dataset where T cell epitopes, protein structure and hydropathy has been determined computationally, using publicly available tools, are presented. During the research on this theses new extendable open source software system that support bioinformatic research and have wide applications in prediction of various proteins characteristics is developed. A part of this thesis is described in the works [71][82][45][42][43][44][72][73] that are published or submitted for publications in several journals. The dissertation is organized as follows: In section1 is illustrated introduction to the problem of identifying T - cell epitopes, the importance of mathematical and computational methods in this area, vii as well as the importance of T - cell epitopes to the immune system and basis for functioning of the immune system. In section 2 are described in details data mining techniques that are used in the thesis for development of new models. Section 3 provides an overview of existing methods for predicting the T - cell epitopes and explains the work methodologies of existing models and methods. It pointed out the shortcomings of existing methods which have been the motivation for the development of new models for the T - cell epitope prediction. Some of the publicly available databases with the experimentally determined MHC binding peptides and T - cell epitope are described. In section 4 are presented new developed models for epitopes prediction. The developed models include three new encoding schemes for peptide sequences representation in the form of a vector which is more suitable as input to models based on the data mining techniques. Section 5 reports results of presented new classi cation and regression models. The new models are compared with each other as well as with currently existing methods for T cell epitope prediction. Section 6 presents the research results of the T - cell epitopes relationship with ordered and disordered regions in proteins. In the context of this chapter summary results are presented which are shown in more detail in the published works [71][82][45][44]. Section 7 concludes the dissertation with some discussion of the potential signi cance of obtained results and some directions for future work.
URI:	http://hdl.handle.net/123456789/4457
Datum:	2016