Transforming 3D Coloured Pixels into Musical Instrument Notes for Vision Substitution Applications

Bologna, Guido; Deville, Benoît; Pun, Thierry; Vinckenbosch, Michel

doi:10.1155/2007/76204

Research Article
Open access
Published: 22 August 2007

Transforming 3D Coloured Pixels into Musical Instrument Notes for Vision Substitution Applications

Guido Bologna¹,
Benoît Deville²,
Thierry Pun² &
…
Michel Vinckenbosch¹

EURASIP Journal on Image and Video Processing volume 2007, Article number: 076204 (2007) Cite this article

1612 Accesses
18 Citations
Metrics details

Abstract

The goal of the See ColOr project is to achieve a noninvasive mobility aid for blind users that will use the auditory pathway to represent in real-time frontal image scenes. We present and discuss here two image processing methods that were experimented in this work: image simplification by means of segmentation, and guiding the focus of attention through the computation of visual saliency. A mean shift segmentation technique gave the best results, but for real-time constraints we simply implemented an image quantification method based on the HSL colour system. More particularly, we have developed two prototypes which transform HSL coloured pixels into spatialised classical instrument sounds lasting for 300 ms. Hue is sonified by the timbre of a musical instrument, saturation is one of four possible notes, and luminosity is represented by bass when luminosity is rather dark and singing voice when it is relatively bright. The first prototype is devoted to static images on the computer screen, while the second has been built up on a stereoscopic camera which estimates depth by triangulation. In the audio encoding, distance to objects was quantified into four duration levels. Six participants with their eyes covered by a dark tissue were trained to associate colours with musical instruments and then asked to determine on several pictures, objects with specific shapes and colours. In order to simplify the protocol of experiments, we used a tactile tablet, which took the place of the camera. Overall, colour was helpful for the interpretation of image scenes. Moreover, preliminary results with the second prototype consisting in the recognition of coloured balloons were very encouraging. Image processing techniques such as saliency could accelerate in the future the interpretation of sonified image scenes.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28]

References

Ruff RM, Perret E: Auditory spatial pattern perception aided by visual choices. Psychological Research 1976,38(4):369-377. 10.1007/BF00309042
Article Google Scholar
Lakatos S: Recognition of complex auditory-spatial patterns. Perception 1993,22(3):363-374. 10.1068/p220363
Article Google Scholar
Hollander A: An exploration of virtual auditory shape perception, M.S. thesis. University of Washington, Seattle, Wash, USA; 1994.
Google Scholar
Kay L: A sonar aid to enhance spatial perception of the blind: engineering design and evaluation. The Radio and Electronic Engineer 1974,44(11):605-627. 10.1049/ree.1974.0148
Article Google Scholar
Meijer PBL: An experimental system for auditory image representations. IEEE Transactions on Biomedical Engineering 1992,39(2):112-121. 10.1109/10.121642
Article Google Scholar
Capelle C, Trullemans C, Arno P, Veraart C: A real-time experimental prototype for enhancement of vision rehabilitation using auditory substitution. IEEE Transactions on Biomedical Engineering 1998,45(10):1279-1293. 10.1109/10.720206
Article Google Scholar
Cronly-Dillon J, Persaud K, Gregory RPF: The perception of visual images encoded in musical form: a study in cross-modality information transfer. Proceedings of the Royal Society B 1999,266(1436):2427-2433. 10.1098/rspb.1999.0942
Article Google Scholar
Gonzalez-Mora JL, Rodriguez-Hernandez A, Rodriguez-Ramos LF, Dfaz-Saco L, Sosa N: Development of a new space perception system for blind people, based on the creation of a virtual acoustic space. Proceedings of International Work-Conference on Artificial and Natural Neural Networks (IWANN '99), June 1999, Alicante, Spain 2: 321-330.
Google Scholar
Roth P: Représentation multimodale d'images digitales dans des systèmes informatiques multimédias pour utilisateurs non-voyants, Ph.D. thesis. Computer Vision and Multimedia Laboratory, University of Geneva, Geneva, Switzerland; 2002.
Google Scholar
Horowitz SL, Pavlidis T: Picture segmentation by a directed split and merge procedure. In Computer Methods in Image Analysis. IEEE Press, New York, NY, USA; 1977:101-111.
Google Scholar
Forgy E: Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 1965,21(3):768-769.
Google Scholar
McQueen J: Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, 1967, Berkeley, Calif, USA 1: 281-297.
Fukunaga K: Introduction to Statistical Pattern Recognition. 2nd edition. Academic Press Professional, San Diego, Calif, USA; 1990.
MATH Google Scholar
Cheng Y: Mean shift, mode seeking, and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 1995,17(8):790-799. 10.1109/34.400568
Article Google Scholar
DeCarlo D, Santella A: Stylization and abstraction of photographs. Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '02), July 2002, San Antonio, Tex, USA 769-776.
Chapter Google Scholar
Landragin F: Saillance physique et saillance cognitive. Corela 2004.,2(2):
Hoffman DD, Singh M: Salience of visual parts. Cognition 1997,63(1):29-78. 10.1016/S0010-0277(96)00791-3
Article Google Scholar
Milanese R: Detecting salient regions in an image: from biological evidence to computer implementations, Ph.D. thesis. University of Geneva, Geneva, Switzerland; 1993.
Google Scholar
Itti L, Koch C, Niebur E: A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 1998,20(11):1254-1259. 10.1109/34.730558
Article Google Scholar
Kadir T, Brady M: Scale, saliency and image description. International Journal of Computer Vision 2001,45(2):83-105. 10.1023/A:1012460413855
Article MATH Google Scholar
Lowe DG: Object recognition from local scale-invariant features. Proceedings of the 7th IEEE International Conference on Computer Vision (ICCV '99), September 1999, Kerkyra, Greece 2: 1150-1157.
Article Google Scholar
Bay H, Tuytelaars T, van Gool L: SURF: speeded up robust features. Proceedings of the 9th European Conference on Computer Vision (ECCV '06), May 2006, Graz, Austria 404-417.
Google Scholar
Gerzon MA: Design of ambisonic decoders for multispeaker surround sound. Journal of the Audio Engineering Society 1977, 25: 1064.
Google Scholar
Bamford JS: An analysis of ambisonic sound systems of first and second order, M.S. thesis. University of Waterloo, Waterloo, Ontario, Canada; 1995.
Google Scholar
Malham DG, Myatt A: 3-D sound spatialization using ambisonic techniques. Computer Music Journal 1995,19(4):58-70. 10.2307/3680991
Article Google Scholar
Daniel J: Acoustic field representation, application to the transmission and the reproduction of complex sound environments in a multimedia context, Ph.D. thesis. University of Paris 6, Paris, France; 2000.
Google Scholar
Bologna G, Vinckenbosch M: Eye tracking in coloured image scenes represented by ambisonic fields of musical instrument sounds. Proceedings of the 1st International Work-Conference on the Interplay Between Natural and Artificial Computation (IWINAC '05), June 2005, Las Palmas, Spain 327-337.
Google Scholar
Algazi VR, Duda RO, Thompson DP, Avendano C: The CIPIC HRTF database. Proceedings of IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (WASPAA '01), October 2001, New Platz, NY, USA 99-102.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Applied Science, Rue de la prairie 4, Geneva, 1202, Switzerland
Guido Bologna & Michel Vinckenbosch
Computer Science Center, University of Geneva, Rue Général Dufour 24, Geneva, 1211, Switzerland
Benoît Deville & Thierry Pun

Authors

Guido Bologna
View author publications
You can also search for this author in PubMed Google Scholar
Benoît Deville
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Pun
View author publications
You can also search for this author in PubMed Google Scholar
Michel Vinckenbosch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guido Bologna.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Bologna, G., Deville, B., Pun, T. et al. Transforming 3D Coloured Pixels into Musical Instrument Notes for Vision Substitution Applications. J Image Video Proc 2007, 076204 (2007). https://doi.org/10.1155/2007/76204

Download citation

Received: 15 January 2007
Accepted: 23 May 2007
Published: 22 August 2007
DOI: https://doi.org/10.1155/2007/76204

Transforming 3D Coloured Pixels into Musical Instrument Notes for Vision Substitution Applications

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords