Abstract:
|
This work investigates the potential of improving the classi cation process
through solving three classi cation-related problems: feature selection, feature
weighting and parameter selection. All three problems are challenging and currently
in the focus of scienti c researches in the eld of machine learning. Each problem
is solved by using population-based metaheruistic method called electromagnetismlike
method. This method is used for combinatorial and global optimization. It is
inspired by laws of attraction and repulsion among charged particles. Each particle
is represented by a vector of real values. The solution of the problem of interest is
then obtained by mapping these real-valued vectors to the feasible solution domain.
Particles representing better solutions achieve higher level of charge, which consequently
produces greater impact on other particles. The search process is performed
by iterating the particle movement, induced by charges. Through implementing the
methods, two key aspects are managed: 1) the classi cation quality obtained after
applying the optimization method and 2) the e ciency of the proposed methods
from the perspective of time and space resources. All methods are equiped with
problem-speci c local search procedures which tend to increase the solution quality.
The bene t of applying feature selection for the classi cation process is twofold.
Firstly, the elimination of unnecessary features decreases the data set noise, which
degrades the quality of the classi cation model. Secondly, the problem dimension
is decreased, thus the e ciency is increased. Feature selection problem is very e -
ciently solved by the proposed method. The classi cation quality is in the majority
of cases (instances) improved relative to the methods from literature. For some of
the instances, computational times are up to several hundred times smaller than
those of the competing methods.
Feature weighting and parameter selection problem share similar underlying solution
representation, based on the vectors of real values. Since the representation
of charged particles is based on the same underlying domain, the transition from
the particle to the solution domain behaves smoothly. The quality of the method for
iv
feature weighting is demonstrated through nearest neighbors classi cation model.
The testing of the method is conducted on di erent collection of instances, and after
that, the comparison to several methods from literature is made. In the majority
of cases, the proposed method outperformed the comparison methods.
The parameter selection, in classi cation, has a great impact on the classi cation
quality. The proposed method for parameter selection is applied on the support
vector machihe, which has a complex parametric structure when the number of parameters
and the size of their domains is in question. By using heuristic initialization
procedure, the detection of high quality regions for parameter combinations is accelerated.
Exhaustive tests are performed on various instances in terms of their
dimension and feature structure: homogenous and heterogeneous. Single kernel
learning is adopted for homogenous, and multiple kernel learning for heterogeneous
instances. The comparison with methods from literature showed superiority of the
proposed method when single and multiple kernel learning based on radial basis
function is considered. The method shows to be competitive in other cases.
All proposed methods improved the classi cation quality. Because of the way,
the problem is being solved, all three methods can be generalized and applied to a
wide class of classi cation models and/or classi cation problem. |