Abstract:
|
Proteins are linear biological polymers composed of amino acids
whose structure and function are determined by the number and order of amino
acids. The structure of the protein has three levels: primary, secondary and ter-
tiary (three-dimensional, 3D) structure. Since the experimental determination of
protein 3D structure is expensive and time-consuming, it is important to develop
predictors of protein 3D structure properties from the amino acid sequence (pri-
mary structure), such as 3D structure of the protein backbone. The 3D structure
of the backbone can be described using prototypes of local protein structure, i.e.
prototypes of protein fragments with a length of few amino acids. A set of local
structure prototypes determines the library of local protein structures, also called
the structural alphabet. A structural alphabet is defined as a set of N proto-
types of L amino acid length. The subject of this dissertation is the development
of models for the prediction of structural alphabet prototypes for a given amino
acid sequence using different data mining approaches. As one of the most known,
structural alphabet Protein Blocks (PBs) was used in one part of the doctorial re-
search. Structural alphabet PBs consists of 16 prototypes that are defined using
fragments of 5 consecutive amino acids. The amino acid sequence is combined
with the structural properties of a protein that can be determined based on amino
acid sequence (occurrence of repeats in the amino acid sequence) and results of
predictors of protein structural properties (backbone angles, secondary structures,
occurrence of disordered regions, accessible surface area of amino acids) as an
input to the prediction model of structural alphabet prototypes. Besides the de-
velopment of models for prediction of prototypes of existing structural alphabet,
the analysis of the capability of developing new structural alphabets is researched
by applying the TwoStep clustering algorithm and construction of models for the
prediction of prototypes of new structural alphabets. Several structural alpha-
bets, which differ in the length of prototypes and the number of prototypes, have
been constructed and analyzed. Fragments of the large number of proteins, whose
structure is experimentally determined, were used to construct the new structural
alphabets. |