Peer Reviewed Journal via three different mandatory reviewing processes, since 2006, and, from September 2020, a fourth mandatory peer-editing has been added.
This paper aims at improving the accuracy of the non-
negative matrix factorization approach to word learn-
ing and recognition of spoken utterances. We pro-
pose and compare three coding methods to alleviate
quantization errors involved in the vector quantization
(VQ) of speech spectra: multi-codebooks, soft VQ and
adaptive VQ. We evaluate on the task of spotting a
vocabulary of 50 keywords in continuous speech. The
error rates of multi-codebooks decreased with increas-
ing number of codebooks, but the accuracy leveled off
around 5 to 10 codebooks. Soft VQ and adaptive VQ
made a better trade-off between the required memory
and the accuracy. The best of the proposed methods
reduce the error rate to 1.2% from the 1.9% obtained
with a single codebook. The coding methods and the
model framework may also prove useful for applica-
tions such as topic discovery/detection and mining of
sequential patterns.