Continuous Speech Recognition

1879 Words8 Pages

Speech is a primary mode of communication between human being and is also the most natural and efficient form of exchanging information among human beings. Speech Recognition is a conversion of an acoustic waveform to text. Speech can be isolated, connected and continuous type. The goal of this work is to recognize a Continuous Speech using Mel Frequency Cepstrum Coefficients (MFCC) to extract the features of Speech signal, Hidden Markov Models (HMM) for pattern recognition and Viterbi Decoder for decoding of speech signal. Continuous Speech files of the TIMIT standard database are used for the work. The recognition success rate is calculated for the entire database, separate Training and Testing files are found in the database and we also …show more content…

It comes so naturally to us that we don’t realize how complex a phenomenon speech is. When humans speak, air passes from the lungs through the mouth and nasal cavity, and this air stream is restricted and changed depending on the position of tongue, teeth and lips. This produces contractions and expansions of the air, an acoustic wave, a sound. The sounds so forms are usually called phonemes. The phonemes are combined together to form words [1]. The speech recognition means transforming human speech to a text or to an order to the computer. The development of Continuous speech recognizers allows users to speak almost naturally, while the computer determines the content. It includes a great deal of "Co articulation", where adjacent words run together without pauses or any other apparent division between words. Continuous speech recognition work is difficult because they must utilize special methods to determine utterance boundaries. As vocabulary grows larger, confusability between different word sequences grows …show more content…

Proposed System Block Diagram The first stage of any recognize development work is data preparation. MFCC Features are extracted from the training and testing speech files; HMM models are developed only for training files for each phoneme using MFCC features and the transcription (text information about content in speech file) data called word modelling. Each HMM model is represented by 3 to 5 states were in each state is represented by 8 Gaussian Mixture Model (GMM) mixtures for more accuracy they are trained n times. During the testing stage, the Viterbi search algorithm is used for the best state sequence to match the given observation sequence of the test data and represents the text of a speech file on the command prompt. The overall recognition performance is calculated based on word substitution, deletion and insertion errors found during recognition. Number of error counts will be displayed upon recognition [3&4]. Below Sections describes the detailed methodology of a work includes, Feature extraction technique, i.e. MFCC, Pattern Recognition Technique i.e. Building Hidden Markov Models, Decoding method using Viterbi decoder, complete HTK Process, obtained results from the work, conclusion and references used for the