KEYWORDS: Speech recognition, Signal to noise ratio, Linear filtering, Electronic filtering, Gaussian filters, Interference (communication), Probability theory, Neural networks
Despite dramatic recent advances in speech recognition technology, speech recognition still perform much worse than humans. The difference in performance between humans and machines is most dramatic when variable amounts and types of filtering and noise are present during testing. For example, humans readily understand speech that is low-pass filtered below 3 kHz or high-pass filtered above 1 kHz. Machines trained with wide-band speech, however, degrade dramatically under these conditions. An approach to compensate for variable unknown sharp filtering and noise is presented which uses mel-filter-bank magnitudes as input features, estimates the signal-to-noise ratio (SNR) for each filter, and uses missing feature theory to dynamically modify the probability computations performed using Gaussian Mixture or Radial Basis Function neural network classifiers embedded within Hidden Markov Model recognizers. The approach was successfully demonstrated using a talker-independent digit recognition task. It was found that recognition accuracy across many conditions rises from below 50% to above 95% with this approach. These promising results suggest future work to dynamically estimate SNR's and to explore the dynamics of human adaptation to channel and noise variability.
Experiments demonstrate that sigmoid multilayer perceptron (MLP) networks provide slightly better risk prediction than conventional logistic regression and Bayesian models when used to predict the risk of death using a data base of 41,385 patients who underwent coronary artery bypass operations in 1993. MLP networks with no hidden layers (single-layer MLPs), networks with one hidden layer (two-layer MLPs), and networks with two hidden layers (three-layer MLPs) were trained using stochastic gradient descent with early stopping. All prediction techniques used the same input features and were evaluated by training on 20,698 patients and testing on a separate 20,687 patients. Receiver operating characteristic (ROC) curve areas for predicting mortality were roughly 75% for all classifiers. Risk stratification or accuracy of posterior probability prediction was slightly better with three-layer MLP networks which did not inflate risk for high-risk patients. Simple approaches were developed to calculate effective odds ratios for MLP networks and to generate confidence intervals for MLP risk predictions using an auxiliary `confidence MLP.' The confidence MLP is trained to reproduce confidence intervals that were generated during training using the outputs of 50 MLP networks trained with different bootstrap samples.
Conference Committee Involvement (3)
Applications of Neural Networks and Machine Learning in Image Processing X
18 January 2006 | San Jose, California, United States
Applications of Neural Networks and Machine Learning in Image Processing IX
19 January 2005 | San Jose, California, United States
Applications of Artificial Neural Networks in Image Processing VIII
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.