Sunday, 7 January 2018

Article "On the impact of children's emotional speech on acoustic and language models" (4th)

Clarisa Livia
16611022


This results show difficulty in automatically recognizing children's utterance, especially in cases of spontaneous and affective speech. Evaluations show clearly affecting the recognition of children's utterances. Thus, E mphatic and A ngry speech are recognized best-even better than Nutral speech, although basic ASR systems are only trained in Nutral speech only. The idea of anger is clearly articulated and its acoustic awareness is very similar to neutral speech. This does not apply to other speeches that result in a high error rate. ASR performance can be improved by adaptation of acoustic and linguistic models. For speeches produced in other languages, the acoustic model adaptation is the variability and dominance of a single speaker. However, the results can be improved by adaptation of the linguistic model.

While ASR performance in country-specific emotional speech can be enhanced by the adaptation of this particular country, the performance of speech produced in other countries declines in general. Therefore, the emotional classification module can be used to dynamically select the identifier of words that depend on the emotions maintained. ASR performance is influenced by many factors. However, four different subsets of different sets of tests relate to actual vocabulary sizes used in different emotional states. Since this is a spontaneous speech, this factor can not be controlled. It remains unclear how ASR's performance is affected by this different vocabulary. Perhaps similar vocabulary acoustic words can be observed in a Nutral state rather than in the other two states E mphatic and A nger. Furthermore, the acoustic realization of other M's during the training seems to be too different from the referenced test so the acoustic model can not be adapted successfully.



No comments:

Post a Comment