Image and Video Indexing Using Networks of Operators

Ayache, Stéphane; Quénot, Georges; Gensel, Jérôme

doi:10.1155/2007/56928

Research Article
Open access
Published: 21 November 2007

Image and Video Indexing Using Networks of Operators

Stéphane Ayache¹,
Georges Quénot¹ &
Jérôme Gensel²

EURASIP Journal on Image and Video Processing volume 2007, Article number: 056928 (2007) Cite this article

1532 Accesses
9 Citations
Metrics details

Abstract

This article presents a framework for the design of concept detection systems for image and video indexing. This framework integrates in a homogeneous way all the data and processing types. The semantic gap is crossed in a number of steps, each producing a small increase in the abstraction level of the handled data. All the data inside the semantic gap and on both sides included are seen as a homogeneous type called numcept and all the processing modules between the various numcepts are seen as a homogeneous type called operator. Concepts are extracted from the raw signal using networks of operators operating on numcepts. These networks can be represented as data-flow graphs and the introduced homogenizations allow fusing elements regardless of their nature. Low-level descriptors can be fused with intermediate of final concepts. This framework has been used to build a variety of indexing networks for images and videos and to evaluate many aspects of them. Using annotated corpora and protocols of the 2003 to 2006 TRECVID evaluation campaigns, the benefit brought by the use of individual features, the use of several modalities, the use of various fusion strategies, and the use of topologic and conceptual contexts was measured. The framework proved its efficiency for the design and evaluation of a series of network architectures while factorizing the training effort for common sub-networks.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28]

References

Iyengar G, Nock HJ, Neti C, Franz M: Semantic indexing of multimediq using audio, text and visual cues. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '02), August 2002, Lausanne, Switzerland
Google Scholar
Iyengar G, Nock HJ: Discriminative model fusion for semantic concept detection and annotation in video. Proceedings of the 11th ACM International Conference on Multimedia (MULTIMEDIA '03), November 2003, Berkeley, Calif, USA 255-258.
Google Scholar
Hauptman A, Baron RV, Chen M-Y, et al.: Informedia at TRECVID 2003 : analyzing and searching broadcast news video. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '03), November 2003, Gaithersburg, Md, USA 15.
Google Scholar
Naphade MR, Smith JR: On the detection of semantic concepts at TRECVID. Proceedings of the 12th ACM International Conference on Multimedia (MULTIMEDIA '04), 2004, New York, NY, USA 660-667.
Naphade MR: On supervision and statistical learning for semantic multimedia analysis. Journal of Visual Communication and Image Representation 2004,15(3):348-369. 10.1016/j.jvcir.2004.04.010
Article Google Scholar
Chua T-S, Neo S-Y, Zheng Y, et al.: TRECVID 2006 by NUS-I2R. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '06), November 2006, Gaithersburg, Md, USA
Google Scholar
Ayache S, Quénot G, Satoh S: Context-based conceptual image indexing. Processing of the IEEE International Conference on Acoustics, Speech and Signal Proceedings (ICASSP '06), May 2006, Toulouse, France 2: 421-424.
Google Scholar
Snoek CGM, Worring M, Hauptmann AG: Learning rich semantics from news video archives by style analysis. ACM Transactions on Multimedia Computing, Communications and Applications 2006,2(2):91-108. 10.1145/1142020.1142021
Article Google Scholar
Snoek CGM, Worring M, Geusebroek J-M, Koelma DC, Seinstra FJ, Smeulders AWM: The semantic pathfinder: using an authoring metaphor for generic multimedia indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence 2006,28(10):1678-1689. 10.1109/TPAMI.2006.212
Article Google Scholar
Wolpert DH: Stacked generalization. Neural Networks 1992,5(2):241-259. 10.1016/S0893-6080(05)80023-1
Article MathSciNet Google Scholar
Backus J: Can programming be liberated from the von Neumann style? A functional style and its algebra of programs. Communications of the ACM 1978,21(8):613-641. 10.1145/359576.359579
Article MathSciNet MATH Google Scholar
Zavidovique B, Sérot J, Quénot GM: Massively parallel dataflow computer dedicated to real time image processing. Integrated Computer-Aided Engineering 1997,4(1):9-29.
Google Scholar
Kumar S, Hebert M: Discriminative random fields: a discriminative framework for contextual interaction in classification. Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV '03), October 2003, Nice, France 2: 1150-1157.
Article Google Scholar
Naphade MR, Kristjansson T, Frey B, Huang TS: Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems. Proceedings of International Conference on Image Processing (ICIP '98), October 1998, Chicago, Ill, USA 3: 536-540.
Article Google Scholar
Ayache S, Quénot G, Gensel J, Satoh S: Using topic concepts for semantic video shots classification. Proceedings of 5th International Conference on Image and Video Retrieval (CIVR '06), July 2006, Tempe, Ariz, USA, Lecture Notes in Computer Science 4071: 300-309.
Article Google Scholar
Snoek CGM, Worring M, Smeulders AWM: Early versus late fusion in semantic video analysis. Proceedings of the 13th Annual ACM International Conference on Multimedia (MULTIMEDIA '05 ), November 2005, Singapore 399-402.
Chapter Google Scholar
Ayache S, Quénot G, Gensel J: CLIPS-LSR experiments at TRECVID 2006. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '06), November 2006, Gaithersburg, Md, USA
Google Scholar
Cortes C, Vapnik V: Support-vector networks. Machine Learning 1995,20(3):273-297.
MATH Google Scholar
Over P, Ianeva T, Kraaij W, Smeaton AF: TRECVID 2005—an overview. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '05), November 2005, Gaithersburg, Md, USA
Google Scholar
Naphade M, Smith JR, Tesic J, et al.: Large-scale concept ontology for multimedia. IEEE Multimedia 2006,13(3):86-91. 10.1109/MMUL.2006.63
Article Google Scholar
Ayache S, Quénot G, Gensel J: Classifier fusion for SVM-based multimedia semantic indexing. Proceedings of 29th European Conference on Information Retrieval Research (ECIR '07), April 2007, Rome, Italy, Lecture Notes in Computer Science 4425:
Google Scholar
Snoek CGM, Worring M, Geusebroek J-M, Koelma DC, Seinstra FJ: The mediamill TRECVID 2004 semantic video search engine. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '04), November 2004, Gaithersburg, Md, USA
Google Scholar
Chang CC, Lin CJ: LIBSVM: a library for support vector machines. 2001, http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Google Scholar
Quénot GM: Computation of optical flow using dynamic programming. IAPR Workshop on Machine Vision Applications, November 1996, Tokyo, Japan 249-252.
Google Scholar
Lin C-Y, Tseng BL, Smith JR: Video collaborative annotation forum: establishing groundtruth labels on large multimedia datasets. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '03), November 2003, Gaithersburg, Md, USA
Google Scholar
Lewis DD, Yang Y, Rose TG, Li F: RCV1: a new benchmark collection for text categorization research. The Journal of Machine Learning Research 2004, 5: 361-397.
Google Scholar
Lanckriet GRG, Deng M, Cristianini N, Jordan MI, Noble WS: Kernel-based data fusion and its application to protein function prediction in yeast. Proceedings of the Pacific Symposium on Biocomputing (PSB '04), January 2004, Big Island of Hawaii, Hawaii, USA 300-311.
Google Scholar
Gosselin PH, Cord M: A comparison of active classification methods for content-based image retrieval. Proceedings of the 1st International Workshop on Computer Vision Meets Databases (CVDB '04), June 2004, Paris, France 51-58.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Multimedia Information Retrieval (MRIM) Group of LIG, Laboratoire d'Informatique de Grenoble, 385 rue de la Bibliothèque, Grenoble, Cedex 9, 38041, France
Stéphane Ayache & Georges Quénot
Spatio-Temporal Information, Adaptability, Multimédia and Knowledge Représentation (STEAMER) Group of LIG, Laboratoire d'Informatique de Grenoble, 385 rue de la Bibliothèque, Grenoble, Cedex 9, 38041, France
Jérôme Gensel

Authors

Stéphane Ayache
View author publications
You can also search for this author in PubMed Google Scholar
Georges Quénot
View author publications
You can also search for this author in PubMed Google Scholar
Jérôme Gensel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stéphane Ayache.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Ayache, S., Quénot, G. & Gensel, J. Image and Video Indexing Using Networks of Operators. J Image Video Proc 2007, 056928 (2007). https://doi.org/10.1155/2007/56928

Download citation

Received: 28 November 2006
Revised: 09 July 2007
Accepted: 16 September 2007
Published: 21 November 2007
DOI: https://doi.org/10.1155/2007/56928

Image and Video Indexing Using Networks of Operators

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords