International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
Volume 177 - Issue 16 |
Published: Nov 2019 |
Authors: Francisco Carlos M. Souza, Alinne C. Correa Souza, Carolina Y. V. Watanabe, Patricia Pupin Mandrá, Alessandra Alaniz Macedo |
![]() |
Francisco Carlos M. Souza, Alinne C. Correa Souza, Carolina Y. V. Watanabe, Patricia Pupin Mandrá, Alessandra Alaniz Macedo . An Analysis of Visual Speech Features for Recognition of Non-articulatory Sounds using Machine Learning. International Journal of Computer Applications. 177, 16 (Nov 2019), 1-9. DOI=10.5120/ijca2019919393
@article{ 10.5120/ijca2019919393, author = { Francisco Carlos M. Souza,Alinne C. Correa Souza,Carolina Y. V. Watanabe,Patricia Pupin Mandrá,Alessandra Alaniz Macedo }, title = { An Analysis of Visual Speech Features for Recognition of Non-articulatory Sounds using Machine Learning }, journal = { International Journal of Computer Applications }, year = { 2019 }, volume = { 177 }, number = { 16 }, pages = { 1-9 }, doi = { 10.5120/ijca2019919393 }, publisher = { Foundation of Computer Science (FCS), NY, USA } }
%0 Journal Article %D 2019 %A Francisco Carlos M. Souza %A Alinne C. Correa Souza %A Carolina Y. V. Watanabe %A Patricia Pupin Mandrá %A Alessandra Alaniz Macedo %T An Analysis of Visual Speech Features for Recognition of Non-articulatory Sounds using Machine Learning%T %J International Journal of Computer Applications %V 177 %N 16 %P 1-9 %R 10.5120/ijca2019919393 %I Foundation of Computer Science (FCS), NY, USA
People with articulation and phonological disorders need exercise to execute sounds of speech. Essentially, exercise starts with production of non-articulatory sounds in clinics or homes where a huge variety of the environment sounds exist; i.e., in noisy locations. Speech recognition systems considers environment sounds as background noises, which can lead to unsatisfactory speech recognition. This study aims to assess a system that supports aggregation of visual features to audio features during recognition of non-articulatory sounds in noisy environments. Thehe methods Mel-Frequency Cepstrum Coefficients and Laplace transform were used to extract audio features, Convolutional Neural Network to extract video features, and Support Vector Machine to recognize audio and Long Short-Term Memory networks for video recognition. Report experimental results regarding the accuracy, recall and precision of the system on a set of 585 sounds was achieved. Overall, the results indicate that video information can complement audio recognition and assist non-articulatory sound recognition.