Dynamic Automatic Noisy Speech Recognition System (DANSR)

  • In this thesis we studied and investigated a very common but a long existing noise problem and we provided a solution to this problem. The task is to deal with different types of noise that occur simultaneously and which we call hybrid. Although there are individual solutions for specific types one cannot simply combine them because each solution affects the whole speech. We developed an automatic speech recognition system DANSR ( Dynamic Automatic Noisy Speech Recognition System) for hybrid noisy environmental noise. For this we had to study all of speech starting from the production of sounds until their recognition. Central elements are the feature vectors on which pay much attention. As an additional effect we worked on the production of quantities for psychoacoustic speech elements. The thesis has four parts: 1) The first part we give an introduction. The chapter 2 and 3 give an overview over speech generation and recognition when machines are used. Also noise is considered. 2) In the second part we describe our general system for speech recognition in a noisy environment. This is contained in the chapters 4-10. In chapter 4 we deal with data preparation. Chapter 5 is concerned with very strong noise and its modeling using Poisson distribution. In the chapters 5-8 we deal with parameter based modeling. Chapter 7 is concerned with autoregressive methods in relation to the vocal tract. In the chapters 8 and 9 we discuss linear prediction and its parameters. Chapter 9 is also concerned with quadratic errors, the decomposition into sub-bands and the use of Kalman filters for non-stationary colored noise in chapter 10. There one finds classical approaches as long we have used and modified them. This includes covariance mehods, the method of Burg and others. 3) The third part deals firstly with psychoacoustic questions. We look at quantitative magnitudes that describe them. This has serious consequences for the perception models. For hearing we use different scales and filters. In the center of the chapters 12 and 13 one finds the features and their extraction. The fearures are the only elements that contain information for further use. We consider here Cepstrum features and Mel frequency cepstral coefficients(MFCC), shift invariant local trigonometric transformed (SILTT), linear predictive coefficients (LPC), linear predictive cepstral coefficients (LPCC), perceptual linear predictive (PLP) cepstral coefficients. In chapter 13 we present our extraction methods in DANSR and how they use window techniques And discrete cosine transform (DCT-IV) as well as their inverses. 4) The fourth part considers classification and the ultimate speech recognition. Here we use the hidden Markov model (HMM) for describing the speech process and the Gaussian mixture model (GMM) for the acoustic modelling. For the recognition we use forward algorithm, the Viterbi search and the Baum-Welch algorithm. We also draw the connection to dynamic time warping (DTW). In the rest we show experimental results and conclusions.

Download full text files

Export metadata

Metadaten
Author:Sheuli Paul
URN:urn:nbn:de:hbz:386-kluedo-37877
Advisor:Michael Richter, Steven Liu
Document Type:Doctoral Thesis
Language of publication:English
Date of Publication (online):2014/05/05
Year of first Publication:2014
Publishing Institution:Technische Universität Kaiserslautern
Granting Institution:Technische Universität Kaiserslautern
Acceptance Date of the Thesis:2014/02/26
Date of the Publication (Server):2014/05/07
Tag:Feature extraction; Noise control; Speech recognition
Page Number:XVII, 268
Faculties / Organisational entities:Kaiserslautern - Fachbereich Elektrotechnik und Informationstechnik
DDC-Cassification:6 Technik, Medizin, angewandte Wissenschaften / 600 Technik
Licence (German):Standard gemäß KLUEDO-Leitlinien vom 10.09.2012