Peer Reviewed Journal via three different mandatory reviewing processes, since 2006, and, from September 2020, a fourth mandatory peer-editing has been added.
Prosody plays an important role in speech communication
between humans. Although several computer-assisted
language learning (CALL) systems with utterance evaluation
function have been developed, the accuracy of their
prosody evaluation is still poor.
In the present paper, we develop new methods by which
to evaluate the rhythm and intonation of English sentences
uttered by Japanese learners. The novel features of our
study are as follows: (1) new prosodic features are added
to traditional features, and (2) word importance factors
are introduced in the calculation of intonation score. The
word importance factor is automatically estimated using
the ordinary least squares method and is optimized based
on word clusters generated by a decision tree.
Experiments conducted herein reveal the correlation coefficient
(±1.0 denotes the best correlation) between the
rhythm score given by native speakers and the system was
-0.55. In contrast, a conventional feature (pause insertion
error rate) gave a correlation coefficient of only -0.11. The
correlation coefficient between the intonation scores given
by native speakers and the system was only -0.29. However,
the word importance factor with decision tree clustering
improved the correlation coefficient to 0.45.
In addition, we propose a method of integrating the
rhythm score with the intonation score, which improved
the correlation coefficient from 0.45 to 0.48 for evaluating
intonation.