AI-Powered Sign Language Translation for Enhanced Communication
School
Asia Pacific University of Technology and Innovation**We aren't endorsed by this school
Course
RMCT CT098-3-2
Subject
Computer Science
Date
Dec 12, 2024
Pages
12
Uploaded by AdmiralDuckMaster1207
INDIVIDUAL ASSIGNMENT RESEARCH METHODS FOR COMPUTING AND TECHNOLOGY APU2F2305SE HAND OUT DATE : 9 November 2023 HAND IN DATE : 8 January 2024 INTAKE CODE : APU2F2305SE MODULE CODE : CT098-3-2-RMCT LECTURE : HEMALATHA A/P RAMALINGAM STUDENT : WONG JOE KIT TP NUMBER : TP068538
Title: AI Sign Language Translate Application overcome PWD communication issues.Abstract: Integration of artificial intelligence (AI) into education has become prominent. In the context of education, AI systems will performance tasks that given without least errors through the techniques of deep learning and machine learning for continuous self-improvement. Meanwhile, AI also present an opportunities in revolutionize communication, its plays a pivotal role in supporting PWD in their communication and facilitating seamless interaction through sign language such as American Sign Language (ASL) and Malaysia Sign Language (MSL). In 2023 there were over 40,000 hearing-impaired Malaysian registered in Malaysia. The actual amount of hearing-impaired people should be much larger as there were more Malaysian that did not register yet. In the future amount of PWD might increase hence the important of translation application have also increase. Due these reasons, a translate application is important as it able to overcome and increase communication experience. Key Terms: Malaysia Sign Language (MSL), Artificial Intelligence Technology, Deep Learning Techniques, Machine Learning Techniques, Sign Language Translate System. Introduction: In this 21stcentury, technology, techniques, and network have improved day by day. In Malaysia country social, 8 out of 10 people have their own handphone device neither is Android or IOS. Due to this fact most of the applications is supported on Android and IOS devices compared to the past. Meanwhile, artificial intelligence (AI) technology has grown rapidly and have become a future trends, lot of famous applications and website such as ChatGPT have appeared. AI technology have been used in different categories such as AI-drawing, AI-video creation, AI-music and etc. Although AI technology have brought a lot of benefits to people in different categories and it also have huge potential in the future, but due to the technology was still not mature there were several challenges have not solved yet. Challenges such as output of AI technology did not reach users expectation, the logic of the output have issues, and the quality of the output was not stable. The number of Person With Disable (PWD) have increase rapidly in Malaysia. Person With Disable is defined as a person with a long-term impairment (Muhammad Nasrul, Umijah, Aiman Marzuqi, 2022). Some of the Person With Disable unable to
CT098-3-2-RMCT INDIVIDUAL ASSIGNMENT 3 | P a g e talk or hear when they have born while some of them is due to acquired factors which might be family, work, and accident. Because of their body impairment, they need to learn using sign language to communicate with others as sign languages reduce the communication gap between deaf and regular people, facilitating normal contact (Gerges H.Samaan et al., 2022).Meanwhile, not every people will know how to use sign language to communicate with PWD because sign language is complicated as it is a completely language and not everyone have time to study another language. Furthermore, Malaysia Sign Language due to Malaysia have several states issues and every state have their own dialect on sign language. This is also one of the reasons that why it is so hard for a normal person to learn and skilled use sign language during communication. To overcome this problem, translate applications have appear to translate text to sign language. The translate application system that mention in this research paper is to solves communication issues and improve current communication environment. The translate application HandTalker. This application has use MediaPipe technology and use Long short-term Memory (LSTM) to provide sign language translation. MediaPipe technology is an open-source framework, and it is free for everyone to use. LSTM is a type of recurrent neural network (RNN) that designed to solve the limitations of traditional RNN. It uses camera vision-based sensory to capture users hand motion and provide a 3D model that have hand key point on it. Then the application will use LSTM to store their hand motion images for future using. All of the 3D models will be stored inside LSTM cells and LSTM cells will controls all the flow of translating sign language. In the HandTalker application, users and developers will need to cooperate together and improve the current communication environment between PWD and people. Problem Statement: Due to the improvement of artificial intelligence (AI) technology, its integration into realm of education has increasing pronounced. AI systems can be characterized as rational agents that autonomously respond to inputs and performance tasks guided by their underlying models and functions (Jonny Holmstrom, 2022). Through utilization of deep learning techniques and machine learning techniques, AI exhibits capacity of continuous self-improvement and enable it to always remain current and relevant. Moreover, AI play as a pivotal role in improving education level and its server as a core for helping Persons With Disabilities (PWD) in the realm of communication.
CT098-3-2-RMCT INDIVIDUAL ASSIGNMENT 4 | P a g e Other than that, through the improvement of camera based-vision sensors techniques and convenienceof devices, hand motion and images able to be capture easily. With the help of AI, they can communicate among other with their own sign language such as American Sign Language (ASL), Malaysia Sign Language (MSL) without any communication issues. Despite the advantages of using AI is huge, but challenges persist which include the limitations of it. Malaysia Sign Language have several difference between every state even all of them is created based on American Sign Language. This difference and dialect of every MSL user will increase the difficult of translating language.Challenges include similar gestures for certain alphabets, addressed through dataset expansion and gesture alterations. (Sundar B, Bagyammal T, 2022). To address this challenges, establish an application that have an AI which able to understand and translate Sign Language by capturing gesture recognition and translate gestures into natural language text for communication (Sundar B, Bagyammal T, 2022). This technological solution enables people that lack of proficiency in sign language able to easily communicate with PWD and avoid misunderstanding issues. Research Aim: This research proposal aims is to use the AI-driven technology in translating Malaysia Sign Language (MSL) to natural language text to provide a smooth communication between People With Disabilities (PWD) and others. Meanwhile, it will enhance and improve communication experience and the environment of current PWD communication. Furthermore, AI-driven technology will cooperate with deep learning techniques and machine learning techniques to further enhance user’s experience in the translating process and improve the translate system to be more complete. Research Question: 1.How can AI-driven technology help on translation works? 2.Why deep learning and machine learning techniques suitable for AI-driven technologies in translation works? 3.How the AI translate work able to help PWD during communicating with people that lack of sign language knowledge? Research Objectives: 1.To evaluate the efficacy of AI-Driven Technology in translation works.
CT098-3-2-RMCT INDIVIDUAL ASSIGNMENT 5 | P a g e 2.To investigate the role of Deep Learning and Machine Learning techniques in translation works. 3.To examine the impact of AI Translational to PWD on communication environment. Research Significance: Purpose and significance of developing this HandTalker artificial intelligence (AI) technology-based translate applications is to : Improve communication environment: Language translates applications able to helps human to solve communicate problems such as communication between two different languages, sign language and traditional language. Meanwhile, due to languages are improving daily and the amount of language are increasing rapidly, the importance of having a translate application become higher. A great language translate application able to solve communication issues and improve the environment in directly. Then it is same for Person With Disables (PWD), most of the people does not know how to use sign language proficient in communication which is increase the difficulty for PWD to communicate with others. Furthermore, even people that are skilled in using sign language such as PWD families, friends the time cost for them to practice sign language is huge. Due to these reasons, a sign language translate applications are important as it able to direct lets a person that does not know sign language communicate with a PWD fluently. Increase job opportunity: Employment rate of PWD is very low as seldom have company will hire a employees that unable to talk or listen this is because that it will increase costs for the company but bring less benefits. Meanwhile, most of jobs in the current social is not suitable for PWD as they hard to communicate with others unless the jobs is able to complete under less communicate or individual works such as grab driver. Due to these reasons, PWD is hard to find a jobs that they interested or jobs that they really wanted. Other than that, most of PWD are talented but most of them did not get a property education when they are teenager or children. With the help of language translate applications, PWD is easily to find a jobs as they were able to communicate with others and they able to have a property education. Reduce the risk of accident happen: When PWD faced any accident, they unable to communicate with rescue team as not every rescuer knows sign language. Other than that, misunderstanding is one of the reasons that causes accident happen. In
CT098-3-2-RMCT INDIVIDUAL ASSIGNMENT 6 | P a g e another words, if able to avoid misunderstanding then might be able to avoid accident happening. With the help of sign language translate app, the rate of happening misunderstanding able to reduce thus the risk of accident happen will also be reduce too. Accident is unable to hundred present completely avoided, the only things that able to do is always prepare to face accident. Proposed System Overview: The MSL translate system that suggested in this research study have selected camera vision-based sensors as it was much cheaper compared to other sensors. The techniques that selected to detect and identify the hand motion that capture through device camera is MediaPipe. MediaPipe is a framework that used for processing real-time series data such as video, audio, or motion. Meanwhile, MediaPipe is supported in different devices such as desktop, Android, and IOS and it also able to functions in different programming coding language such as python, C++, and java script. Due to this, MediaPipe is more convenient compared to other techniques such as OpenCV. Furthermore, MediaPipe is an open source and it is free for everyone to use. The main function of MediaPipe that selected is MediaPipe hand. Meanwhile, MediaPipe hand work and cooperate with machine learning pipeline to capture hand motion. Machine learning pipeline capture the hand motion on the entire images through the palm detection model which is one of the models that provided by MediaPipe. Palm detection model detect process is when it receives an entire image and it will output a hand-bounding box that is oriented and have highly accurate 3D hand key points (Sundar B, Bagyammal T, 2022). Figure 1 have shown that all the hand key point from the model that can be detect and tracked by the MediaPipe hand detector. In the translate system, AI able to provide translate functions by identifying users hand motion through using MediaPipe to capture the 3D key point that provided from palm detection model. Figure 1 Hand Landmarks representation used in MediaPipe. (Sundar B, Bagyammal T, 2022)Another techniques that selected is Long short-term Memory (LSTM). The reason to select this technique is to let the system to recognize sign language. LSTM is a type of recurrent neural network (RNN) architecture designed to overcome
CT098-3-2-RMCT INDIVIDUAL ASSIGNMENT 7 | P a g e limitations of traditional RNN when capturing and learning long-term dependencies data. LSTM idea is introduced by Sepp Hochreiter and Jurgen Schmidhuberin 1997. Another reasons to select this technique is because basic sign language alphabet has several hand motion instead of a single hand motion, so the RNN need to handle more data while it unable. LSTM differs from its predecessors by including a memory cell for efficient information storage (Andre Jallen S Ong et al., 2022). Then, LSTM cells have three state that was named as forget gate, input gate, and output gate. Forget gate is the gate that decided the information inside cell state should be discarded or forgotten. Input gate is gate that determines the amount of new information should be stored inside the cell state. Output gate is the gate that decides what output to produce based on the current cell state. Meanwhile, these gates allow LSTM to select to retain or discard long sequences information and make them to be more effectively.Before starting the translate process, all the 26 alphabet gestures will be collected one after another and they are pre-processed into arrays and stored for future use. Next, the 26 alphabet that collected will pass onto the LSTM model. Then the LSTM model will have total 26 alphabet classes for all the 26 alphabets. The algorithm of LSTM will be designed to own accuracy close to 1 and the loss nearly close to 0. The model that developed when the algorithm runs will be stored in the database for future using. After that, the model is able to be used for tracking the live hand motion. With the LSTM technique, AI able to identify and compare the hand motion that users input with the model that saved inside the database. Other than that, every alphabet or text have to record and stored in several versions such as male, female, and children hand. When all the data have recorded and stored safely then the rate of encounter issues will decrease. When users input new hand motion images through the “Dictionary” function, the system will use MediaPipe to provide a hand model that have 3D hand key. Then it will be saved inside LSTM and have its own classes and stored well in the database for future used. Algorithm of LSTM will design to let the accuracy reach the standard. If users input the same text or same hand motion, it able to directly take out from the system database and display to users. This process will repeat until users finish the translation process. Next, the system will continue improving through the feedback given by users and the information that provided by users when they using the dictionary function.
CT098-3-2-RMCT INDIVIDUAL ASSIGNMENT 8 | P a g e Flow Chart 1 the translate system As shown in Figure 2, the flowchart of the translate system is represented a normal process of user using the system. The user type has respectively as users and guests. The difference of these two-user type is users able to add word and meaning to the dictionary while guests unable. Other than that, the other functions such as translate is the same. The reason that the functions of “Dictionary” is only open to users due to easier to identify the accurate and manage the new added dictionary. During translation process, users can select the language that wanted to translate, and they were also allowed to use images to represent hand motion that capture through camera vision-based sensors. Flow Chart 2 User functions
CT098-3-2-RMCT INDIVIDUAL ASSIGNMENT 9 | P a g e First of all, the system will detect the username and password that user have enter. If the username and password that user enter is valid, then the users able to login to the system. Meanwhile, if the username and password was not valid then users have two choice which is login in as guests or re-enter username and password. After successfully login to the system, a main page will display to users and users able to select the functions “Translate” and “Dictionary”.If users select “Dictionary”, then they were allowed to input extra text meaning or specific sign language. Other than that, when users select “Translate” function they were allowed to select a language then start the translation process. Flow Chart 3 Translate function As shown in the flow chart 3, a translate window will be displayed to users when they select “Translate” function. Meanwhile, there were two functions allow users to select in the translate window. They were “Sign Language to Text” and “Text to Sign Language”, these functions are to lets users to select the language that
CT098-3-2-RMCT INDIVIDUAL ASSIGNMENT 10 | P a g e wanted to translate. If users select “Sign Language to Text”, then users need to input the Sign Language through their device camera. The system will detect the Sign Language that display by users and translate it into text. Moreover, if users select “Text to Sign Language”, users will need to input the text. Then, the system will identify and compared the text that users input with the text that saved inside the database. Next, the system will use a visual hand to display the sign language. Meanwhile, if the text or sign language that user’s inputis invalid in the system database, the system will display an error message and allow users to provide a meaning of it. When users input the meaning, users will end the translation process and return back to the translate window. Flow Chart 4 Guests Login As shown in Flow Chart 4, this is the process of users when they selected to login in as a guests. Meanwhile, most of the
CT098-3-2-RMCT INDIVIDUAL ASSIGNMENT 11 | P a g e functions of the translate system is the same while guests are not able to use the “Dictionary” function. Furthermore, during the translation process if users enter an invalid text or sign language, the system will display an error message, end the translation process, and return back to translate window. The system will display a message to suggest guest to register as an users to have a complete function provided. Flow Chart 5 End of the Translate SystemAfter completed all the process above, users able to select continue translate or log out the system. If users select to continue, then users will return back to the translate window to continue their translation process. Meanwhile, if user select to log out, the current and pass translation process will be kept as a history in the system for further using. Conclusion In a nutshell, this study highlighted the how a sign language translate application bring benefits for PWD. It able to enhance users’ communication experience and improve communication environment between PWD and people. Then, translate applications is just a way to help PWD communicate with others. In another word, it is just a tools to let person hear the thing that PWD wanted to say. In order to have a fully completed communicate experience with PWD, having a skilled sign language is a must. (3014 words) References Sundar B, & Bagyammal T. (2022). American Sign Language Recoginition for Alphabets Using MediaPipe and LSTM. Procedia Computer Science , 215, 642–651.
CT098-3-2-RMCT INDIVIDUAL ASSIGNMENT 12 | P a g e https://doi.org/10.1016/j.procs.2022.12.066Muhammad Nasril bin Mohamed Noor, Umijah binti Madzen, & Aiman Marzuqi bin Mohd Azzahari. (2022). A STUDY ON IMPACTS OF UNEMPLOYMENT AMONG PERSON WITH DISABILITIES DURING COVID-19 PANDEMIC. International Journal for Studies on Children, Women, Elderly and Disabled, Vol. 17 (ISSN 0128-309X). CodeBlue. (2023, October 19). Why Hasn’t The Government Fulfilled Promise Of 1% Civil Service Jobs For Disabled Persons After 35 Years —Disability Advocates. CodeBlue. https://codeblue.galencentre.org/2023/10/19/why-hasnt-the-government-fulfilled-promise-of-1-civil-service-jobs-for-disabled-persons-after-35-years-disability-advocates/ Gerges H.Samaan, Abanoub R.Wadie, Abanoub K.Attia, Abanoub M.Assad, Andrew E.Kamel, Salwa O.Slim, Mohamed S.Abdallah, & Young-Im Cho. (2022). MediaPipe’s Landmarks with RNN for Dynamic Sign Language Recognition. Electronics, Vol. 11(19), p. 3228.https://doi.org/10.3390/electronics11193228 Jonny Holmstrom. (2022). From AI to digital transformation: The AI readiness framework. Business Horizons, Vol. 65(3), p. 329-339. https://doi.org/10.1016/j.bushor.2021.03.006 Myron Darrel Layug Montefalcon, Jay Rhald Padilla, & Ramon Rodriguez. (2023). Filipino Sign Language Recognition Using Long Short-Term Memory and Residual Network Architecture. Proceedings of Seventh International Congress on Information and Communication Technology, p 489-497. https://doi.org/10.1007/978-981-19-2397-5_45 Andre Jallen S Ong, Melvin Cabatuan, Janos Lance L Tiberio, & John Anthony Jose. (2022). Andre Jallen S OngMelvin Cabatuan. TENCON 2022 - 2022 IEEE Region 10 Conference . https://doi.org/10.1109/TENCON55691.2022.9977857