Why Is Speech Recognition Difficult 8.1 Noise A speech is uttered in an environment full of sound, people talking, a speaker in the background, the tick tocks of the clock, the noise of the A.C’s compressor etc. these all are referred as Noise i.e. the unwanted voice signals in the background. In speech Recognition we have to filter out and cancel the effect of these noises from the main speech signal so that the result should be completely derived from the correct information. 8.2 Poor Recognition The Speech Recognition software may not recognize some certain proper nouns such as brand names, surnames until they are added properly to the words grammar library. Some people who have unusual accents may not be able to make use of Voice …show more content…
By the development of this project we in no way suggest that therapy clinics are harmful or not necessary for people with speech difficulties. Rather the project encourages the concept of speech therapy from which many people even in the twenty first century shy away from. The system in its current state requires that the patient seek the help of an actual therapist for an initial evaluation pin pointing the vulnerable areas and the problems that need to be addressed. IUT does not suggest that the initial evaluation can be done by the system. It only provides a solution to the everyday hassle of having to attend speech therapy sessions. These training sessions can be minimized after the patient has grasped the basic concepts and word pronunciations and the remaining practice can be done using this software. For parents whose kids are speech impaired as well as mentally challenged, this system provides an excellent opportunity for the child to learn since such children are not comfortable training in the presence of others without their parents being by their side. Kids such as these become extremely agitated and misbehave hence the entire concept of progressing in such training sessions are …show more content…
Regarding future enhancements, the accuracy of the system can be increased by training it on a larger dictionary of words. Furthermore, the monophone level can be enhanced to the triphone level for recognition that is more accurate and takes a larger number of parameters into consideration during the comparison of the users and the pre-recorded samples on which the system has been