Computer Science
Recent Submissions
-
Protić, Danijela (Beograd , 2023)[more][less]
Abstract: Anomaly detection is the recognition of suspicious computer network behavior by comparing unknown network traffic to a statistical model of normal network behavior. Binary classifiers based on supervised machine learning are good candidates for normality detection. This thesis presents five standard binary classifiers: the k-nearest neighbors, weighted k-nearest neighbors, decision trees, support vector machines and feedforward neural network. The main problem with supervised learning is that it takes a lot of data to train high-precision classifiers. To reduce the training time with minimal degradation of the accuracy of the models, a two-phase pre-processing step is performed. In the first phase, numeric attributes are selected to reduce the dataset. The second phase is a novel normalization method based on hyperbolic the tangent function and the damping strategy of the Levenberg-Marquardt algorithm. The Kyoto 2006+ dataset, the only publicly available data set of real-world network traffic intended solely for anomaly detection research in computer networks, was used to demonstrate the positive impact of such pre-processing on classifier training time and accuracy. Of all the selected classifiers, the feedforward neural network has the highest processing speed, while the weighted k-nearest neighbor model proved to be the most accurate. The assumption is that when the classifiers work concurrently, they should detect either an anomaly or normal network traffic, which occasionally is not the case, resulting in different decision about the anomaly, i.e. a conflict arises. The conflicting decision detector performs a logical exclusive OR (XOR) operation on the outputs of the classifiers. If both classifiers simultaneously detected an anomaly or recognized traffic as normal, their decision was no conflict had occurred. Otherwise a conflict is detected. The number of conflicts detected provides an opportunity for additional detection of changes in computer network behavior. URI: http://hdl.handle.net/123456789/5599 Files in this item: 1
Danijela Protic - Doktorska Disertacija.pdf ( 3.143Mb ) -
Radosavljević, Jovan (Beograd , 2023)[more][less]
Abstract: Graph G = (V,E) is an ordered pair of set of nodes V and branches E. Order graph G is the number of nodes |V |, and its size is the number of branches |E|. Knots u, v ∈ V are adjacent if there is a branch uv ∈ E between them. Distance dist(u, v) nodes u and v G is the length of the shortest path from u to v. The diameter of the graph G is the largest distance dist(u, v) let two nodes in, v. They are discussed in the dissertation graphs of diameter 2. Intuitively, the notion that graphs are dia- meters 2 simple structures; however, they are known to be asymptotically close all graphs of diameter 2. That is why a narrower class is interesting — class D2C of critical graphs of diameter 2, i.e. graphs where the removal of any branches leads to an increase in diameter. In addition, a narrower class of pri- mitive D2C (PD2C) graphs, i.e. D2C graphs that do not have two nodes with the same set of neighbors. In the introductory chapter 2, the basic concepts, algorithms and dings used in the dissertation. They are presented in the following chapters original results regarding diameter graphs 2. Chapter 3 describes the procedure for obtaining a list of D2C graphs of order up to 13. With built-in parallelization, the creation of a list of D2C graphs of order up to 13 it lasted a month. This was a step forward, because previously there was a spi- around all graphs of diameter 2 lines up to 10. The obtained results were used for testing several known hypotheses about graphs of diameter 2. In chapter 4 it is shown that for every m ⩾ 3 a D2C graph containing cli- a ku of size m must have at least 2m nodes. At the same time, with accuracy up to isomorphism, there is exactly one graph of size 2m that contains a clique of characters m. Chapter 5 discusses PD2C graphs with the smallest number of branches. From list of all PD2C graphs of order n ⩽ 13 are selected PD2C graphs of size at most 2n − 4. Only three of the isolated graphs are of size 2n − 5, which is in accordance with the statement of the Erdes-Renji theorem about the lower bound for the size graphs of diameter 2 that do not contain a node adjacent to all other nodes (that limit is 2n − 5). PD2C graphs of size 2n − 4 rows up to 13 sorted are in three groups: • The first group belongs to the Z family, defined in the dissertation, which for each n ⩾ 6 contains exactly one PD2C graph of order n of size 2n − 4. • The second group consists of seven Hamiltonian PD2C graphs of order at most 9 of size 2n−4. In the dissertation it was proved that there is no such Hamil- tone graph of order greater than 11, i.e. that the seven graphs found are the only ones Hamiltonian PD2C graphs of size 2n − 4. • The third group consists of a unique graph that does not belong to any of the first two groups. Based on these results, the hypothesis was formulated that all PD2C graphs re- that n ⩾ 10 and sizes 2n − 4 belong to the family Z. Keywords: graphs, critical graphs of diameter 2, primitive graph- You Scientific field: Computing and informatics Narrower scientific field: Graph theory UDC number: 004.415.5(519.1 URI: http://hdl.handle.net/123456789/5594 Files in this item: 1
disertacijaJovanRadosavljevic.pdf ( 746.0Kb ) -
Carić, Marko (Beograd , 2023)[more][less]
Abstract: This dissertation discusses the problem of calculating the number of equivalent classes of a Boolean function. The difficulty of determining the number of equivalence classes increases sharply with the number of variables sn. The motivation for choosing this topic lies in the fact that concrete numbers have been known so far only for relatively small values of n, although the problem itself has long been theoretically solved. Let G be the permutation group of the set Bn={0,1}n. Effect of Gonscalar group, Bn→B1, that is, vector invertible Boolean functions, Bn→Bn. Two scalar Boolean functions f(k)ig(k), defined by Bn, are considered equivalent. ∈G forever ∈Bnf(k )=g(σ(k)) holds. Two vector invertible Boolean functions f(k) and g(k) are considered equivalent in relation to the group G, i.e. ))).The equivalence relation∼decomposes the sets of all Boolean functions into equivalence classes.The equivalence of Boolean functions confers significant applications in the logical synthesis of combinatorial circuits and cryptography, especially in connection with the design of S-boxes. Let Un(G) and Vn(G) denote the number of scalar equivalence classes, ie. vector invertible Boolean functions of n variables with respect to the group G. The numbers Un(G) and Vn(G) can be calculated relatively simply if the cycle index from the group G to the G group is known. Induced by the group Snpermutations of coordinate elements step = (k1,k2,...,kn)∈Bn, •group Gn, induced by permutations and additions of coordinates, •group GLnlinear invertible transformations of elements of vector space Bn, and •group AGLof invertible transformations of elements Bn. If the permutation σ∈Ghasikcycles of lengthk1, its cycle structure is(σ)=(i1,i2,...).The cyclic index of the groupGisgeneratrix ZG(f1,f2,...)=1 |G|σ∈Gk1 fik k structure cycles σene structures of cycles σfraconstructionsGsides ∗pressuresGpermutations of the red group are known, but the cycle indexes itself, i.e. numbers Un(G) and Vn(G), are practically calculated only for relatively small values, for example n10. The dissertation presents original results in the field of enumeration of equivalence classes of Boolean functions in relation to these four groups of soft transformations. A similar expression is derived for all four groups of soft transformations for the cycle index in the form of the sumover partition softhennumbern. Based on the precyclical index much more. An overview of known results for relatively small and new results in the thesis for larger ones are shown in the following table: Number\GSnGnGLnAGLn Un(G)11→3310→328→3110→31 Vn(G)6→307→276→266, participates in direct effect, by special effect 26 latingthenumberofequivalenceclassesthatdoesnotuseacycleindex is shown and described in the third paper from the introductory chapter. These second parts of the dissertation refer to monotone Boolean functions — scalar Boolean functions that satisfy the condition of monotonicity (from which follows f(k)f(i)). Letrn, i.e. monotoneBooleanfunctionsofnvariables. The difficulty of calculating the number rn increases rapidly with n, so that until recently the last term of the sequence to be calculated was r7. The procedure described in the dissertation is based on Frobenius' theorem, on the basis of which the number r8 was determined. The dissertation consists of the first introductory chapter and the following three chapters. In the second chapter, theoretical concepts related to the material from chapters 3 and 4 are introduced, and they relate to discrete mathematics, combinatorics and the index of the cycle considered by the soft cycle of the 3rd form for describing the 3rd cycle. indices for the four considered groups of permutations, as well as the numbers Un(G) and Vn(G) of the equivalent class of the Boolean function with respect to these groups. First, the common improvements for all four groups are discussed, and then the specific accelerations related to individual groups are published by group. ter. In Chapter 4, the problem of finding the number of equivalence classes of a monotone Boolean function is solved. First, a general expression for calculating a number is given based on the Frobenius theorem of the form of the sum (by the soft number partition of the number) of the number from the fixed part of the graph corresponding to the point corresponding to the point. In different partitions, different ways of calculating the number of non-fixed points for n8 are shown. The procedure based on which the number r8 was calculated, which also represents the original contribution of this dissertation, is shown in the first paper from the list from pp. 1 to 3. practically at the same time the obtained result described in the dissertation. URI: http://hdl.handle.net/123456789/5571 Files in this item: 1
MarkoCaricDisertacija.pdf ( 1.467Mb ) -
Jelović, Ana (Beograd , 2022)[more][less]
Abstract: n the first part of this dissertation different repeat types are defined as well as repeats that satisfy motif masks. A method for precise repeat finding in input sequences of arbitrary length has been described. As the input sequences can be very long, the number of found repeats can also be large. For that reason it is important that the method also includes filtering found repeats based on the expected number of their occurrences. The method was first applied to protein sequences in which experimentally confirmed T-cell epitopes from the IEDB database were registered. Association rules were applied to the found repeats in order to construct a model that would enable the prediction of the positions of T-cell epitopes in protein sequences. In this way, it would indicate to researchers a region in the protein sequence where an epitope can be expected with high confidence. In the case of T-cell epitopes, a large number of rules with high confidence was found. These rules can be considered as reliable predictors of the position of T-cell epitopes within the protein sequences. Based on the results found, association rules were formed that characterize the epitopes and the repeats associated with them in more detail. As a large number of results were found, only their part is presented in this disser- tation. On the basis of the strings that determine the repeat, a motif mask that the repeat needs to satisfy was searched for. The entire procedure was applied to both direct non-complementary repeats and indirect non-complementary repeats. With similar results, the entire procedure was applied to B-cell epitopes on data from the IEDB database. Data on experimentally confirmed short linear motifs were taken from the ELM database. In protein sequences where short linear motifs were registered, repeats were searched for and association rules were applied to them. The rules with high confidence have been singled out in particular. On the basis of the results found, motif masks that repeats with high confidence would satisfy were searched for. URI: http://hdl.handle.net/123456789/5442 Files in this item: 1
Ana_Jelovic_tekst_doktorata.pdf ( 6.127Mb ) -
Jovanović, Jasmina (Beograd , 2022)[more][less]
Abstract: The analysis of biological sequence similarity between different species is significant in identifying functional, structural or evolutionary relationships among the species. Biological sequence similarity and analysis of newly discovered nucleotide and amino acid sequences are demanding tasks in bioinformatics. As biological data is growing exponentially, new and innovative algorithms are needed to be constantly developed to get faster and more effective data processing. The challenge in sequence similarity analysis algorithms is that sequence does not always have obvious features and the dimension of sequence features may be very high for applying regular feature selection methods on sequences. It is important to have a simple and effective algorithm for determining biological sequence relationships. This thesis proposes two new methods for sequence transformation in feature vectors that takes into consideration statistically significant repetitive parts of analyzed sequences, as well as includes different approaches for determination of nucleotide sequence similarity and sequence classification for predicting taxonomy groups of biological sequence data. The first method is based on information theory and fact that both position and frequency of repeated sequences are not expected to occur with the identical presence in a random sequence of the same length. The second method includes building signatures of biological sequences and profiles of taxonomic classes based on repetitive parts of sequences and distances between these repeats. Proposed methods have been validated on multiple data sets and compared with results obtained using different well known and accepted methods in this field like BLAST, Clustal Omega and methods based on k-mers. Resulted precision for proposed methods is close to values provided for existing methods for the majority of tested data-sets, and time performance depends strictly to used infrastructure and sequence type. Methods provide results that are comparable with other commonly used methods focused on resolving the same problem, taking into consideration statistically significant repetitive parts of sequences with different characteristics. URI: http://hdl.handle.net/123456789/5440 Files in this item: 1
JasminaJovanovic.pdf ( 3.984Mb )