Image and Video for Hearing Impaired People
EURASIP Journal on Image and Video Processing volume 2007, Article number: 045641 (2008)
We present a global overview of image- and video-processing-based methods to help the communication of hearing impaired people. Two directions of communication have to be considered: from a hearing person to a hearing impaired person and vice versa. In this paper, firstly, we describe sign language (SL) and the cued speech (CS) language which are two different languages used by the deaf community. Secondly, we present existing tools which employ SL and CS video processing and recognition for the automatic communication between deaf people and hearing people. Thirdly, we present the existing tools for reverse communication, from hearing people to deaf people that involve SL and CS video synthesis.
Liddell SK: Grammar, Gesture, and Meaning in American Sign Language. Cambridge University Press, Cambridge, UK; 2003.
Stokoe WC Jr.: Sign language structure: an outline of the visual communication systems of the american deaf. 1960.
Cornett RO: Cued speech. American Annals of the Deaf 1967, 112: 3-13.
Foulds RA: Biomechanical and perceptual constraints on the bandwidth requirements of sign language. IEEE Transactions on Neural Systems and Rehabilitation Engineering 2004,12(1):65-72. 10.1109/TNSRE.2003.821371
Manoranjan MD, Robinson JA: Practical low-cost visual communication using binary images for deaf sign language. IEEE Transactions on Rehabilitation Engineering 2000,8(1):81-88. 10.1109/86.830952
Sperling G: Video transmission of american sign language and finger spelling: present and projected bandwidth requirements. IEEE Transactions on Communications 1981,29(12):1993-2002. 10.1109/TCOM.1981.1094953
Chiu Y-H, Wu C-H, Su H-Y, Cheng C-J: Joint optimization of word alignment and epenthesis generation for Chinese to Taiwanese sign synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence 2007,29(1):28-39.
Karpouzis K, Caridakis G, Fotinea S-E, Efthimiou E: Educational resources and implementation of a Greek sign language synthesis architecture. Computers and Education 2007,49(1):54-74. 10.1016/j.compedu.2005.06.004
Aran O, Ari I, Benoit A, et al.: SignTutor: an interactive sign language tutoring tool. Proceedings of the SIMILAR NoE Summer Workshop on Multimodal Interfaces (eNTERFACE '06), July-August 2006, Dubrovnik, Croatia
Ohene-Djan J, Naqvi S: An adaptive WWW-based system to teach British sign language. Proceedings of the 5th IEEE International Conference on Advanced Learning Technologies (ICALT '05), July 2005, Kaohsiung, Taiwan 127-129.
Wu C-H, Chiu Y-H, Cheng K-W: Error-tolerant sign retrieval using visual features and maximum a posteriori estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 2004,26(4):495-508. 10.1109/TPAMI.2004.1265864
Quek F: Toward a vision-based hand gesture interface. In Proceedings of the conference on Virtual reality software and technology (VRST '94), August 1994, Singapore. Edited by: Singh G, Feiner SK, Thalmann D. World Scientific; 17-31.
Tyrone M: Overview of capture techniques for studying sign language phonetics. Proceedings of the International Gesture Workshop on Gesture and Sign Languages in Human-Computer Interaction (GW '01), April 2001, London, UK 101-104.
Awad G, Han J, Sutherland A: A unified system for segmentation and tracking of face and hands in sign language recognition. Proceedings of the 18th International Conference on Pattern Recognition (ICPR '06), August 2006, Hong Kong 1: 239-242.
Holden E-J, Lee G, Owens R: Australian sign language recognition. Machine Vision and Applications 2005,16(5):312-320. 10.1007/s00138-005-0003-1
Habili N, Lim CC, Moini A: Segmentation of the face and hands in sign language video sequences using color and motion cues. IEEE Transactions on Circuits and Systems for Video Technology 2004,14(8):1086-1097. 10.1109/TCSVT.2004.831970
Imagawa I, Matsuo H, Taniguchi R-I, Arita D, Lu S, Igi S: Recognition of local features for camera-based sign language recognition system. Proceedings of the 15th International Conference on Pattern Recognition (ICPR '00), September 2000, Barcelona, Spain 4: 849-853.
Cui Y, Weng J: A learning-based prediction-and-verification segmentation scheme for hand sign image sequence. IEEE Transactions on Pattern Analysis and Machine Intelligence 1999,21(8):798-804. 10.1109/34.784311
Ong E-J, Bowden R: A boosted classifier tree for hand shape detection. Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition (FGR '04), May 2004, Seoul, Korea 889-894.
Dreuw P, Deselaers T, Rybach D, Keysers D, Ney H: Tracking using dynamic programming for appearance-based sign language recognition. Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (AFGR '06), April 2006, Southampton, UK 293-298.
Hienz H, Grobel K: Automatic estimation of body regions from video images. Proceedings of the International Gesture Workshop on Gesture and Sign Language in Human-Computer Interaction (GW '97), September 1997, Bielefeld, Germany 135-145.
Wu J, Gao W: The recognition of finger-spelling for Chinese sign language. Proceedings of the International Gesture Workshop on Gesture and Sign Languages in Human-Computer Interaction (GW '01), April 2001, London, UK 96-100.
Pavlovic VI, Sharma R, Huang TS: Visual interpretation of hand gestures for human-computer interaction: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence 1997,19(7):677-695. 10.1109/34.598226
Yang X, Jiang F, Liu H, Yao H, Gao W, Wang C: Visual sign language recognition based on HMMs and auto-regressive HMMs. Proceedings of the 6th International Gesture Workshop on Gesture and Sign Languages in Human-Computer Interaction (GW '05), May 2005, Berder Island, France 80-83.
Bowden R, Windridge D, Kadir T, Zisserman A, Brady M: A linguistic feature vector for the visual interpretation of sign language. Proceedings of the 8th European Conference on Computer Vision (ECCV '04), May 2004, Prague, Czech Republic 390-401.
Chang C-C, Pengwu C-M: Gesture recognition approach for sign language using curvature scale space and hidden Markov model. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '04), June 2004, Taipei, Taiwan 2: 1187-1190.
Yang M-H, Ahuja N, Tabb M: Extraction of 2D motion trajectories and its application to hand gesture recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 2002,24(8):1061-1074. 10.1109/TPAMI.2002.1023803
Kadir T, Bowden R, Ong E, Zisserman A: Minimal training, large lexicon, unconstrained sign language recognition. Proceedings of the 15th British Machine Vision Conference (BMVC '04), September 2004, Kingston, UK
Munib Q, Habeeba M, Takruria B, Al-Malik HA: American sign language (ASL) recognition based on Hough transform and neural networks. Expert Systems with Applications 2007,32(1):24-37. 10.1016/j.eswa.2005.11.018
Al-Jarrah O, Halawani A: Recognition of gestures in Arabic sign language using neuro-fuzzy systems. Artificial Intelligence 2001,133(1-2):117-138. 10.1016/S0004-3702(01)00141-2
Rabiner LR: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 1989,77(2):257-286. 10.1109/5.18626
Pavlovic VI: Dynamic Bayesian networks for information fusion with applications to human-computer interfaces. Ph.D. thesis, University of Illinois at Urbana-Champaign, Champaign, III, USA; 1999.
Bobick A, Davis J: Real-time recognition of activity using temporal templates. Proceedings of the 3rd Workshop on Applications of Computer Vision (WACV '96), December 1996, Sarasota, Fla, USA 39-42.
Ong SCW, Ranganath S, Venkatesh YV: Understanding gestures with systematic variations in movement dynamics. Pattern Recognition 2006,39(9):1633-1648. 10.1016/j.patcog.2006.02.010
Sagawa H, Takeuchi M, Ohki M: Methods to describe and recognize sign language based on gesture components represented by symbols and numerical values. Knowledge-Based Systems 1998,10(5):287-294. 10.1016/S0950-7051(97)00039-7
Ong SCW, Ranganath S: Automatic sign language analysis: a survey and the future beyond lexical meaning. IEEE Transactions on Pattern Analysis and Machine Intelligence 2005,27(6):873-891. 10.1109/TPAMI.2005.112
Zeshan U: Aspects of Türk Isaret Dili. Sign Language & Linguistics 2003,6(1):43-75. (Turkish) 10.1075/sll.6.1.04zes
Ma J, Gao W, Wang R: A parallel multistream model for integration of sign language recognition and lip motion. Proceedings of the 3rd International Conference on Advances in Multimodal Interfaces (ICMI '00), October 2000, Beijing, China 582-589.
Erdem UM, Sclaroff S: Automatic detection of relevant head gestures in American sign language communication. Proceedings of the 16th International Conference on Pattern Recognition (ICPR '02), August 2002, Quebec, Canada 1: 460-463.
Xu M, Raytchev B, Sakaue K, et al.: A vision-based method for recognizing non-manual information in Japanese sign language. Proceedings of the 3rd International Conference on Advances in Multimodal Interfaces (ICMI '00), October 2000, Beijing, China 572-581.
Ming KW, Ranganath S: Representations for facial expressions. Proceedings of the 7th International Conference on Control, Automation, Robotics and Vision (ICARCV '02), December 2002, Singapore 2: 716-721.
Fang G, Gao W, Zhao D: Large-vocabulary continuous sign language recognition based on transition-movement models. IEEE Transactions on Systems, Man, and Cybernetics A 2007,37(1):1-9.
Zhang LG, Chen X, Wang C, Chen Y, Gao W: Recognition of sign language subwords based on boosted hidden Markov models. Proceedings of the 7th International Conference on Multimodal Interfaces (ICMI '05), October 2005, Torento, Italy 282-287.
Fang G, Gao W, Zhao D: Large vocabulary sign language recognition based on fuzzy decision trees. IEEE Transactions on Systems, Man, and Cybernetics A 2004,34(3):305-314. 10.1109/TSMCA.2004.824852
Zahedi M, Keysers D, Ney H: Pronunciation clustering and modeling of variability for appearance-based sign language recognition. Proceedings of the 6th International Gesture Workshop on Gesture and Sign Languages in Human-Computer Interaction (GW '05), May 2005, Berder Island, France 68-79.
Sarfraz M, Syed YA, Zeeshan M: A system for sign language recognition using fuzzy object similarity tracking. Proceedings of the 9th International Conference on Information Visualisation (IV '05), July 2005, London, UK 233-238.
Wang C, Chen X, Gao W: A comparison between etymon- and word-based Chinese sign language recognition systems. Proceedings of the 6th International Gesture Workshop on Gesture and Sign Languages in Human-Computer Interaction (GW '05), May 2005, Berder Island, France 84-87.
Vogler C, Metaxas D: Handshapes and movements: multiple-channel American sign language recognition. Proceedings of the 5th International Gesture Workshop on Gesture and Sign Languages in Human-Computer Interaction (GW '04), April 2004, Genova, Italy 247-258.
Fang G, Gao X, Gao W, Chen Y: A novel approach to automatically extracting basic units from Chinese sign language. Proceedings of the 17th International Conference on Pattern Recognition (ICPR '04), August 2004, Cambridge, UK 4: 454-457.
Liddell SK, Johnson RE: American sign language: the phonological base. Sign Language Studies 1989, 64: 195-278.
Wilbur R, Kak A: Purdue RVL-SLLL American sign language database. School of Electrical and Computer Engineering, Purdue University, West Lafayette, Ind, USA; 2006.
Kadous M: Temporal classification: extending the classification paradigm to multivariate time series. Ph.D. thesis, School of Computer Science and Engineering, University of New South Wales, Sydney, Australia; 2002.
Edwards A: Progress in sign languages recognition. Proceedings of the International Gesture Workshop on Gesture and Sign Language in Human-Computer Interaction (GW '97), September 1997, Bielefeld, Germany 13-21.
Viola P, Jones MJ: Robust real-time face detection. International Journal of Computer Vision 2004,57(2):137-154.
Hjelmås E, Low BK: Face detection: a survey. Computer Vision and Image Understanding 2001,83(3):236-274. 10.1006/cviu.2001.0921
Yang M-H, Kriegman DJ, Ahuja N: Detecting faces in images: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 2002,24(1):34-58. 10.1109/34.982883
Tian Y, Kanade T, Cohn J: Facial expression analysis. In Handbook of Face Recognition. Edited by: Li SZ, Jain AK. Springer, New York, NY, USA; 2005.
Fröba B, Ernst A: Face detection with the modified census transform. Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition (FGR '04), May 2004, Seoul, Korea 91-96.
Garcia C, Delakis M: Convolutional face finder: a neural architecture for fast and robust face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 2004,26(11):1408-1423. 10.1109/TPAMI.2004.97
Ruiz-del-Solar J, Verschae R, Vallejos P, Correa M: Face analysis for human computer interaction applications. Proceedings of the 2nd International Conference on Computer Vision Theory and Applications (VISAPP '07), March 2007, Barcelona, Spain
Waring C, Liu X: Rotation invariant face detection using spectral histograms and support vector machines. Proceedings of the IEEE International Conference on Image Processing (ICIP '06), October 2006, Atlanta, Ga, USA 677-680.
Wu B, Ai H, Huang C, Lao S: Fast rotation invariant multi-view face detection based on real adaboost. Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition (FGR '04), May 2004, Seoul, Korea 79-84.
Potamianos G, Neti C, Luettin J, Matthews I: Audio-visual automatic speech recognition: an overview. In Issues in Visual and Audio-Visual Speech Processing. Edited by: Bailly G, Vatikiotis-Bateson E, Perrier P. MIT Press, Cambridge, Mass, USA; 2004.
Attina V, Beautemps D, Cathiard M-A, Odisio M: A pilot study of temporal organization in Cued Speech production of French syllables: rules for a Cued Speech synthesizer. Speech Communication 2004,44(1–4):197-214.
Aboutabit N, Beautemps D, Besacier L: Automatic identification of vowels in the Cued Speech context. Proceedings of the International Conference on Auditory-Visual Speech Processing (AVSP '07), August-September 2007, Hilvarenbeek, The Netherlands
Kass M, Witkin A, Terzopoulos D: Snakes: active contour models. International Journal of Computer Vision 1988,1(4):321-331. 10.1007/BF00133570
Terzopoulos D, Waters K: Analysis and synthesis of facial image sequences using physical and anatomical models. IEEE Transactions on Pattern Analysis and Machine Intelligence 1993,15(6):569-579. 10.1109/34.216726
Aleksic PS, Williams JJ, Wu Z, Katsaggelos AK: Audio-visual speech recognition using MPEG-4 compliant visual features. EURASIP Journal on Applied Signal Processing 2002,2002(11):1213-1227. 10.1155/S1110865702206162
Cootes TF, Hill A, Taylor CJ, Haslam J: Use of active shape models for locating structures in medical images. Image and Vision Computing 1994,12(6):355-365. 10.1016/0262-8856(94)90060-4
Eveno N, Caplier A, Coulon P-Y: Automatic and accurate lip tracking. IEEE Transactions on Circuits and Systems for Video Technology 2004,14(5):706-715. 10.1109/TCSVT.2004.826754
Zhang L: Estimation of the mouth features using deformable templates. Proceedings of the IEEE International Conference on Image Processing (ICIP '97), October 1997, Santa Barbara, Calif, USA 3: 328-331.
Beaumesnil B, Luthon F, Chaumont M: Liptracking and MPEG4 animation with feedback control. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '06), May 2006, Toulouse, France 2: 677-680.
Luettin J, Thacker NA, Beet SW: Statistical lip modeling for visual speech recognition. Proceedings of the 8th European Signal Processing Conference (EUSIPCO '96), September 1996, Trieste, Italy
Gacon P, Coulon P-Y, Bailly G: Nonlinear active model for mouth inner and outer contours detection. Proceedings of the 13th European Signal Processing Conference (EUSIPCO '05), September 2005, Antalya, Turkey
Stillittano S, Caplier A: Inner lip segmentation by combining active contours and parametric models. Proceedings of the 3rd International Conference on Computer Vision Theory and Applications (VISAPP '08), January 2008, Madeira, Portugal
Schwartz JL, Robert-Ribes J, Escudier P: Ten years after summerfield: a taxonomy of models for audio-visual fusion in speech perception. In Hearing by Eye II: Advances in the Psychology of Speechreading and Auditory-Visual Speech. Psychology Press, Hove, UK; 1998:85-108.
Aboutabit N, Beautemps D, Besacier L: Lips and hand modelling for recognition of the Cued Speech gestures: the French Vowel Case. to appear in Speech Communication
Aboutabit N, Beautemps D, Besacier L: Hand and lips desynchronization analysis in French Cued Speech: automatic temporal segmentation of visual hand flow. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '06), May 2006, Toulouse, France
Taylor P, Black A: Speech synthesis by phonological structure matching. Proceedings of the 6th European Conference on Speech Communication and Technology (EUROSPEECH '99), September 1999, Budapest, Hungary 623-626.
Okadome T, Kaburagi T, Honda M: Articulatory movement formation by kinematic triphone model. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC '99), October 1999, Tokyo, Japan 2: 469-474.
Duchnowski P, Lum DS, Krause JC, Sexton MG, Bratakos MS, Braida LD: Development of speechreading supplements based on automatic speech recognition. IEEE Transactions on Biomedical Engineering 2000,47(4):487-496. 10.1109/10.828148
Minnis S, Breen A: Modeling visual coarticulation in synthetic talking heads using a lip motion unit inventory with concatenative synthesis. Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP '00), October 2000, Beijing, China 759-762.
Gibert G, Bailly G, Beautemps D, Elisei F, Brun R: Analysis and synthesis of the 3D movements of the head, face and hand of a speaker using cued speech. Journal of Acoustical Society of America 2005,118(2):1144-1153. 10.1121/1.1944587
Tamura M, Kondo S, Masuko T, Kobayashi T: Text-to-audio-visual speech synthesis based on parameter generation from HMM. Proceedings of the 6th European Conference on Speech Communication and Technology (EUROSPEECH '99), September 1999, Budapest, Hungary 959-962.
Govokhina O, Bailly G, Breton G: Learning optimal audiovisual phasing for a HMM-based control model for facial animation. Proceedings of the 6th ISCA Workshop on Speech Synthesis (SSW '07), August 2007, Bonn, Germany
Bregler C, Covell M, Slaney M: Video rewrite: driving visual speech with audio. Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '97), August 1997, Los Angeles, Calif, USA 353-360.
Cosatto E, Graf HP: Sample-based of photo-realistic talking heads. Proceedings of the Computer Animation Conference (CA '98), June 1998, Philadelphia, Pa, USA 103-110.
Ezzat T, Poggio T: MikeTalk: a talking facial display based on morphing visemes. Proceedings of the Computer Animation Conference (CA '98), June 1998, Philadelphia, Pa, USA 96-102.
Ezzat T, Geiger G, Poggio T: Trainable videorealistic speech animation. ACM Transactions on Graphics 2002,21(3):388-398.
Theobald B, Bangham J, Matthews I, Cawley G: Visual speech synthesis using statistical models of shape and appearance. Proceedings of the Auditory-Visual Speech Processing Workshop (AVSP '01), September 2001, Aalborg, Denmark 78-83.
Massaro D, Cohen M, Beskow J: Developing and evaluating conversational agents. In Embodied Conversational Agents. Edited by: Cassell J, Sullivan J, Prevost S, Churchill E. MIT Press, Cambridge, Mass, USA; 2000:287-318.
Beskow J, Nordenberg M: Data-driven synthesis of expressive visual speech using an MPEG-4 talking head. Proceedings of the 9th European Conference on Speech Communication and Technology, September 2005, Lisbon, Portugal 793-796.
Vatikiotis-Bateson E, Kuratate T, Kamachi M, Yehia H: Facial deformation parameters for audiovisual synthesis. Proceedings of the Auditory-Visual Speech Processing Conference (AVSP '99), August 1999, Santa Cruz, Calif, USA 118-122.
Bailly G, Bérar M, Elisei F, Odisio M: Audiovisual speech synthesis. International Journal of Speech Technology 2003,6(4):331-346. 10.1023/A:1025700715107
Geiger G, Ezzat T, Poggio T: Perceptual evaluation of video-realistic speech. Massachusetts Institute of Technology, Cambridge, Mass, USA; 2003.
Bowden R: Learning non-linear models of shape and motion. Ph.D. thesis, Department of Systems Engineering, Brunel University, London, UK; 1999.
Uchanski RM, Delhorne LA, Dix AK, Braida LD, Reed CM, Durlach NI: Automatic speech recognition to aid the hearing impaired: prospects for the automatic generation of cued speech. Journal of Rehabilitation Research and Development 1994,31(1):20-41.
Haskell B, Swain C: Segmentation and sign language synthesis. 1998.
Yehia H, Kuratate T, Vatikiotis-Bateson E: Facial animation and head motion driven by speech acoustics. Proceedings of the 5th Seminar on Speech Production: Models and Data & CREST Workshop on Models of Speech Production: Motor Planning and Articulatory Modelling, May 2000, Kloster Seeon, Germany 265-268.
Toda T, Black A, Tokuda K: Mapping from articulatory movements to vocal tract spectrum with Gaussian mixture model for articulatory speech synthesis. Proceedings of the 5th International Speech Synthesis Workshop (ISCA '04), June 2004, Pittsburgh, Pa, USA 26-31.
Hiroya S, Mochida T: Multi-speaker articulatory trajectory formation based on speaker-independent articulatory HMMs. Speech Communication 2006,48(12):1677-1690. 10.1016/j.specom.2006.08.002
van Santen JPH, Pols LCW, Abe M, Kahn D, Keller E, Vonwiller J: Report on the third ESCA TTS workshop evaluation procedure. Proceedings of the 3rd ESCA/COCOSDA Workshop on Speech Synthesis, November 1998, Jenolan Caves, Australia 329-332.
Boula de Mareüil P, d'Alessandro C, Raake A, Bailly G, Garcia M-N, Morel M: A joint intelligibility evaluation of French text-to-speech systems: the EvaSy SUS/ACR campaign. Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC '06), May 2006, Genoa, Italy 2034-2037.
Pandzic IS, Ostermann J, Millen D: User evaluation: synthetic talking faces for interactive services. The Visual Computer 1999,15(7-8):330-340. 10.1007/s003710050182
Gibert G, Bailly G, Elisei F: Evaluating a virtual speech cuer. Proceedings of the 9th International Conference on Spoken Language Processing (ICSLP '06), September 2006, Pittsburgh, Pa, USA 2430-2433.
Voiers WD: Evaluating processed speech using the diagnostic rhyme test. Speech Technology 1983,1(4):30-39.
Boyes Braem P: Rhythmic temporal patterns in the signing of early and late learners of German Swiss Sign Language. Language and Speech 1999,42(2-3):177-208. 10.1177/00238309990420020301
About this article
Cite this article
Caplier, A., Stillittano, S., Aran, O. et al. Image and Video for Hearing Impaired People. J Image Video Proc 2007, 045641 (2008). https://doi.org/10.1155/2007/45641