Multi-SOM: an Algorithm for High-Dimensional,
Small Size Datasets
Shen Lu, Richard S. Segall
Since it takes time to do experiments in bioinformatics,
biological datasets are sometimes small but with high
dimensionality. From probability theory, in order to
discover knowledge from a set of data, we have to have a
sufficient number of samples. Otherwise, the error bounds
can become too large to be useful. For the SOM (Self-
Organizing Map) algorithm, the initial map is based on the
training data. In order to avoid the bias caused by the
insufficient training data, in this paper we present an
algorithm, called Multi-SOM. Multi-SOM builds a number
of small self-organizing maps, instead of just one big map.
Bayesian decision theory is used to make the final decision
among similar neurons on different maps. In this way, we
can better ensure that we can get a real random initial
weight vector set, the map size is less of consideration and
errors tend to average out. In our experiments as applied to
microarray datasets which are highly intense data
composed of genetic related information, the precision of
Multi-SOMs is 10.58% greater than SOMs, and its recall is
11.07% greater than SOMs. Thus, the Multi-SOMs
algorithm is practical. Full Text
|