Skip to content


  • Research Article
  • Open Access

Image and Video Indexing Using Networks of Operators

EURASIP Journal on Image and Video Processing20072007:056928

  • Received: 28 November 2006
  • Accepted: 16 September 2007
  • Published:


This article presents a framework for the design of concept detection systems for image and video indexing. This framework integrates in a homogeneous way all the data and processing types. The semantic gap is crossed in a number of steps, each producing a small increase in the abstraction level of the handled data. All the data inside the semantic gap and on both sides included are seen as a homogeneous type called numcept and all the processing modules between the various numcepts are seen as a homogeneous type called operator. Concepts are extracted from the raw signal using networks of operators operating on numcepts. These networks can be represented as data-flow graphs and the introduced homogenizations allow fusing elements regardless of their nature. Low-level descriptors can be fused with intermediate of final concepts. This framework has been used to build a variety of indexing networks for images and videos and to evaluate many aspects of them. Using annotated corpora and protocols of the 2003 to 2006 TRECVID evaluation campaigns, the benefit brought by the use of individual features, the use of several modalities, the use of various fusion strategies, and the use of topologic and conceptual contexts was measured. The framework proved its efficiency for the design and evaluation of a series of network architectures while factorizing the training effort for common sub-networks.


  • Processing Type
  • Abstraction Level
  • Fusion Strategy
  • Homogeneous Type
  • Training Effort


Authors’ Affiliations

Multimedia Information Retrieval (MRIM) Group of LIG, Laboratoire d'Informatique de Grenoble, 385 rue de la Bibliothèque, Grenoble, Cedex 9, 38041, France
Spatio-Temporal Information, Adaptability, Multimédia and Knowledge Représentation (STEAMER) Group of LIG, Laboratoire d'Informatique de Grenoble, 385 rue de la Bibliothèque, Grenoble, Cedex 9, 38041, France


  1. Iyengar G, Nock HJ, Neti C, Franz M: Semantic indexing of multimediq using audio, text and visual cues. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '02), August 2002, Lausanne, SwitzerlandGoogle Scholar
  2. Iyengar G, Nock HJ: Discriminative model fusion for semantic concept detection and annotation in video. Proceedings of the 11th ACM International Conference on Multimedia (MULTIMEDIA '03), November 2003, Berkeley, Calif, USA 255-258.Google Scholar
  3. Hauptman A, Baron RV, Chen M-Y, et al.: Informedia at TRECVID 2003 : analyzing and searching broadcast news video. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '03), November 2003, Gaithersburg, Md, USA 15.Google Scholar
  4. Naphade MR, Smith JR: On the detection of semantic concepts at TRECVID. Proceedings of the 12th ACM International Conference on Multimedia (MULTIMEDIA '04), 2004, New York, NY, USA 660-667.Google Scholar
  5. Naphade MR: On supervision and statistical learning for semantic multimedia analysis. Journal of Visual Communication and Image Representation 2004,15(3):348-369. 10.1016/j.jvcir.2004.04.010View ArticleGoogle Scholar
  6. Chua T-S, Neo S-Y, Zheng Y, et al.: TRECVID 2006 by NUS-I2R. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '06), November 2006, Gaithersburg, Md, USAGoogle Scholar
  7. Ayache S, Quénot G, Satoh S: Context-based conceptual image indexing. Processing of the IEEE International Conference on Acoustics, Speech and Signal Proceedings (ICASSP '06), May 2006, Toulouse, France 2: 421-424.Google Scholar
  8. Snoek CGM, Worring M, Hauptmann AG: Learning rich semantics from news video archives by style analysis. ACM Transactions on Multimedia Computing, Communications and Applications 2006,2(2):91-108. 10.1145/1142020.1142021View ArticleGoogle Scholar
  9. Snoek CGM, Worring M, Geusebroek J-M, Koelma DC, Seinstra FJ, Smeulders AWM: The semantic pathfinder: using an authoring metaphor for generic multimedia indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence 2006,28(10):1678-1689. 10.1109/TPAMI.2006.212View ArticleGoogle Scholar
  10. Wolpert DH: Stacked generalization. Neural Networks 1992,5(2):241-259. 10.1016/S0893-6080(05)80023-1MathSciNetView ArticleGoogle Scholar
  11. Backus J: Can programming be liberated from the von Neumann style? A functional style and its algebra of programs. Communications of the ACM 1978,21(8):613-641. 10.1145/359576.359579MathSciNetView ArticleMATHGoogle Scholar
  12. Zavidovique B, Sérot J, Quénot GM: Massively parallel dataflow computer dedicated to real time image processing. Integrated Computer-Aided Engineering 1997,4(1):9-29.Google Scholar
  13. Kumar S, Hebert M: Discriminative random fields: a discriminative framework for contextual interaction in classification. Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV '03), October 2003, Nice, France 2: 1150-1157.View ArticleGoogle Scholar
  14. Naphade MR, Kristjansson T, Frey B, Huang TS: Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems. Proceedings of International Conference on Image Processing (ICIP '98), October 1998, Chicago, Ill, USA 3: 536-540.View ArticleGoogle Scholar
  15. Ayache S, Quénot G, Gensel J, Satoh S: Using topic concepts for semantic video shots classification. Proceedings of 5th International Conference on Image and Video Retrieval (CIVR '06), July 2006, Tempe, Ariz, USA, Lecture Notes in Computer Science 4071: 300-309.View ArticleGoogle Scholar
  16. Snoek CGM, Worring M, Smeulders AWM: Early versus late fusion in semantic video analysis. Proceedings of the 13th Annual ACM International Conference on Multimedia (MULTIMEDIA '05 ), November 2005, Singapore 399-402.View ArticleGoogle Scholar
  17. Ayache S, Quénot G, Gensel J: CLIPS-LSR experiments at TRECVID 2006. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '06), November 2006, Gaithersburg, Md, USAGoogle Scholar
  18. Cortes C, Vapnik V: Support-vector networks. Machine Learning 1995,20(3):273-297.MATHGoogle Scholar
  19. Over P, Ianeva T, Kraaij W, Smeaton AF: TRECVID 2005—an overview. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '05), November 2005, Gaithersburg, Md, USAGoogle Scholar
  20. Naphade M, Smith JR, Tesic J, et al.: Large-scale concept ontology for multimedia. IEEE Multimedia 2006,13(3):86-91. 10.1109/MMUL.2006.63View ArticleGoogle Scholar
  21. Ayache S, Quénot G, Gensel J: Classifier fusion for SVM-based multimedia semantic indexing. Proceedings of 29th European Conference on Information Retrieval Research (ECIR '07), April 2007, Rome, Italy, Lecture Notes in Computer Science 4425:Google Scholar
  22. Snoek CGM, Worring M, Geusebroek J-M, Koelma DC, Seinstra FJ: The mediamill TRECVID 2004 semantic video search engine. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '04), November 2004, Gaithersburg, Md, USAGoogle Scholar
  23. Chang CC, Lin CJ: LIBSVM: a library for support vector machines. 2001, Scholar
  24. Quénot GM: Computation of optical flow using dynamic programming. IAPR Workshop on Machine Vision Applications, November 1996, Tokyo, Japan 249-252.Google Scholar
  25. Lin C-Y, Tseng BL, Smith JR: Video collaborative annotation forum: establishing groundtruth labels on large multimedia datasets. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '03), November 2003, Gaithersburg, Md, USAGoogle Scholar
  26. Lewis DD, Yang Y, Rose TG, Li F: RCV1: a new benchmark collection for text categorization research. The Journal of Machine Learning Research 2004, 5: 361-397.Google Scholar
  27. Lanckriet GRG, Deng M, Cristianini N, Jordan MI, Noble WS: Kernel-based data fusion and its application to protein function prediction in yeast. Proceedings of the Pacific Symposium on Biocomputing (PSB '04), January 2004, Big Island of Hawaii, Hawaii, USA 300-311.Google Scholar
  28. Gosselin PH, Cord M: A comparison of active classification methods for content-based image retrieval. Proceedings of the 1st International Workshop on Computer Vision Meets Databases (CVDB '04), June 2004, Paris, France 51-58.View ArticleGoogle Scholar


© Stéphane Ayache et al. 2007

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.