Data mining on protein sequences: n-gram analysis of ordered and disordered protein regions

Show simple item record

dc.contributor.advisor	Mitić, Nenad
dc.contributor.author	Alshafah, Samira
dc.date.accessioned	2018-12-13T16:41:54Z
dc.date.available	2018-12-13T16:41:54Z
dc.date.issued	2018
dc.identifier.uri	http://hdl.handle.net/123456789/4746
dc.description.abstract	Proteins with intrinsically disordered regions are involved in large number of key cell processes including signaling, transcription, and chromatin remodeling functions . On the other side, such proteins have been observed in people suffering from neurological and cardiovascular diseases, as well as various malignancies. Process of experimentally determining disordered regions in proteins is a very expensive and long - term process. As a consequence, a various computer programs for predicting position of disordered regions in proteins have been developed and constantly improved. In this thesis a new method for determining Amino acid sequences that characterize ordered/disordered regions is presented. Material used in research includes 4076 viruses wit h more than 190000 proteins. Proposed method is based on defining correspondence between n -grams (including both repeats and palindromic sequence s) characteristics and their belonging to ordered/disordered protein regions. Positions of ordered/disordered regions are predicted using three different predictors. The features of the repetitive strings used in the research include mol e fractions, fract ional differences, and z -values. Also, data mining techniques association rules and classification were applied on both repeats and palindromes. The results obtained by all techniques show a high level of agreement for a short length of less than 6, while the level of agreement grows up to the maximum with increasing the length of the sequences. The high reliability of the results obtained by the data mining techniques shows that there are n -grams, both repeating sequences and palindromes, which uniquely ch aracterize the disordered/ ordered regions of the proteins . The obtained results were verified by comparing with the results based on n- grams from the DisProt database which contain s the positions of experimentally verified disordered regions of the protein. Results can be used both for the fast localization of disordered/ordered regions in proteins as well as for further improving existing programs for their prediction.	en_US
dc.description.provenance	Submitted by Slavisha Milisavljevic (slavisha) on 2018-12-13T16:41:54Z No. of bitstreams: 1 ThesisSamira_Alshafah.pdf: 3106746 bytes, checksum: 1b8ab175aa8f27e8329b10d92a26ee16 (MD5)	en
dc.description.provenance	Made available in DSpace on 2018-12-13T16:41:54Z (GMT). No. of bitstreams: 1 ThesisSamira_Alshafah.pdf: 3106746 bytes, checksum: 1b8ab175aa8f27e8329b10d92a26ee16 (MD5) Previous issue date: 2018	en
dc.language.iso	en	en_US
dc.publisher	Beograd	en_US
dc.title	Data mining on protein sequences: n-gram analysis of ordered and disordered protein regions	en_US
mf.author.birth-date	1978-12-29
mf.author.birth-place	Zawia	en_US
mf.author.birth-country	Libya	en_US
mf.author.residence-state	Libya	en_US
mf.author.citizenship	Libya	en_US
mf.author.nationality	Libya	en_US
mf.subject.area	Computer Science	en_US
mf.subject.keywords	n- gram, data mining, ordered/disordered regions, association rules, proteins	en_US
mf.subject.subarea	Data Mining	en_US
mf.contributor.committee	Malkov, Saša
mf.contributor.committee	Beljanski, Miloš
mf.university.faculty	Mathematics faculty	en_US
mf.document.pages	111	en_US
mf.document.location	Beograd	en_US
mf.document.genealogy-project	No	en_US
mf.university	Belgrade	en_US

Files in this item

Files	Size	Format	View
ThesisSamira_Alshafah.pdf	3.106Mb	PDF	View/Open

This item appears in the following Collection(s)

Computer Science

Show simple item record

Data mining on protein sequences: n-gram analysis of ordered and disordered protein regions

eLibrary

Data mining on protein sequences: n-gram analysis of ordered and disordered protein regions

Files in this item

This item appears in the following Collection(s)

Search eLibrary

Browse

All of eLibrary

This Collection

My Account

Relited sites

COPYRIGHT STATEMENT