|
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
| Volume 177 - Issue 16 |
| Published: Nov 2019 |
| Authors: Francisco Carlos M. Souza, Alinne C. Correa Souza, Carolina Y. V. Watanabe, Patricia Pupin Mandrá, Alessandra Alaniz Macedo |
10.5120/ijca2019919393
|
Francisco Carlos M. Souza, Alinne C. Correa Souza, Carolina Y. V. Watanabe, Patricia Pupin Mandrá, Alessandra Alaniz Macedo . An Analysis of Visual Speech Features for Recognition of Non-articulatory Sounds using Machine Learning. International Journal of Computer Applications. 177, 16 (Nov 2019), 1-9. DOI=10.5120/ijca2019919393
@article{ 10.5120/ijca2019919393,
author = { Francisco Carlos M. Souza,Alinne C. Correa Souza,Carolina Y. V. Watanabe,Patricia Pupin Mandrá,Alessandra Alaniz Macedo },
title = { An Analysis of Visual Speech Features for Recognition of Non-articulatory Sounds using Machine Learning },
journal = { International Journal of Computer Applications },
year = { 2019 },
volume = { 177 },
number = { 16 },
pages = { 1-9 },
doi = { 10.5120/ijca2019919393 },
publisher = { Foundation of Computer Science (FCS), NY, USA }
}
%0 Journal Article
%D 2019
%A Francisco Carlos M. Souza
%A Alinne C. Correa Souza
%A Carolina Y. V. Watanabe
%A Patricia Pupin Mandrá
%A Alessandra Alaniz Macedo
%T An Analysis of Visual Speech Features for Recognition of Non-articulatory Sounds using Machine Learning%T
%J International Journal of Computer Applications
%V 177
%N 16
%P 1-9
%R 10.5120/ijca2019919393
%I Foundation of Computer Science (FCS), NY, USA
People with articulation and phonological disorders need exercise to execute sounds of speech. Essentially, exercise starts with production of non-articulatory sounds in clinics or homes where a huge variety of the environment sounds exist; i.e., in noisy locations. Speech recognition systems considers environment sounds as background noises, which can lead to unsatisfactory speech recognition. This study aims to assess a system that supports aggregation of visual features to audio features during recognition of non-articulatory sounds in noisy environments. Thehe methods Mel-Frequency Cepstrum Coefficients and Laplace transform were used to extract audio features, Convolutional Neural Network to extract video features, and Support Vector Machine to recognize audio and Long Short-Term Memory networks for video recognition. Report experimental results regarding the accuracy, recall and precision of the system on a set of 585 sounds was achieved. Overall, the results indicate that video information can complement audio recognition and assist non-articulatory sound recognition.