Skip to main content

Image and Video Indexing Using Networks of Operators


This article presents a framework for the design of concept detection systems for image and video indexing. This framework integrates in a homogeneous way all the data and processing types. The semantic gap is crossed in a number of steps, each producing a small increase in the abstraction level of the handled data. All the data inside the semantic gap and on both sides included are seen as a homogeneous type called numcept and all the processing modules between the various numcepts are seen as a homogeneous type called operator. Concepts are extracted from the raw signal using networks of operators operating on numcepts. These networks can be represented as data-flow graphs and the introduced homogenizations allow fusing elements regardless of their nature. Low-level descriptors can be fused with intermediate of final concepts. This framework has been used to build a variety of indexing networks for images and videos and to evaluate many aspects of them. Using annotated corpora and protocols of the 2003 to 2006 TRECVID evaluation campaigns, the benefit brought by the use of individual features, the use of several modalities, the use of various fusion strategies, and the use of topologic and conceptual contexts was measured. The framework proved its efficiency for the design and evaluation of a series of network architectures while factorizing the training effort for common sub-networks.



  1. Iyengar G, Nock HJ, Neti C, Franz M: Semantic indexing of multimediq using audio, text and visual cues. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '02), August 2002, Lausanne, Switzerland

    Google Scholar 

  2. Iyengar G, Nock HJ: Discriminative model fusion for semantic concept detection and annotation in video. Proceedings of the 11th ACM International Conference on Multimedia (MULTIMEDIA '03), November 2003, Berkeley, Calif, USA 255-258.

    Google Scholar 

  3. Hauptman A, Baron RV, Chen M-Y, et al.: Informedia at TRECVID 2003 : analyzing and searching broadcast news video. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '03), November 2003, Gaithersburg, Md, USA 15.

    Google Scholar 

  4. Naphade MR, Smith JR: On the detection of semantic concepts at TRECVID. Proceedings of the 12th ACM International Conference on Multimedia (MULTIMEDIA '04), 2004, New York, NY, USA 660-667.

  5. Naphade MR: On supervision and statistical learning for semantic multimedia analysis. Journal of Visual Communication and Image Representation 2004,15(3):348-369. 10.1016/j.jvcir.2004.04.010

    Article  Google Scholar 

  6. Chua T-S, Neo S-Y, Zheng Y, et al.: TRECVID 2006 by NUS-I2R. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '06), November 2006, Gaithersburg, Md, USA

    Google Scholar 

  7. Ayache S, Quénot G, Satoh S: Context-based conceptual image indexing. Processing of the IEEE International Conference on Acoustics, Speech and Signal Proceedings (ICASSP '06), May 2006, Toulouse, France 2: 421-424.

    Google Scholar 

  8. Snoek CGM, Worring M, Hauptmann AG: Learning rich semantics from news video archives by style analysis. ACM Transactions on Multimedia Computing, Communications and Applications 2006,2(2):91-108. 10.1145/1142020.1142021

    Article  Google Scholar 

  9. Snoek CGM, Worring M, Geusebroek J-M, Koelma DC, Seinstra FJ, Smeulders AWM: The semantic pathfinder: using an authoring metaphor for generic multimedia indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence 2006,28(10):1678-1689. 10.1109/TPAMI.2006.212

    Article  Google Scholar 

  10. Wolpert DH: Stacked generalization. Neural Networks 1992,5(2):241-259. 10.1016/S0893-6080(05)80023-1

    Article  MathSciNet  Google Scholar 

  11. Backus J: Can programming be liberated from the von Neumann style? A functional style and its algebra of programs. Communications of the ACM 1978,21(8):613-641. 10.1145/359576.359579

    Article  MathSciNet  MATH  Google Scholar 

  12. Zavidovique B, Sérot J, Quénot GM: Massively parallel dataflow computer dedicated to real time image processing. Integrated Computer-Aided Engineering 1997,4(1):9-29.

    Google Scholar 

  13. Kumar S, Hebert M: Discriminative random fields: a discriminative framework for contextual interaction in classification. Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV '03), October 2003, Nice, France 2: 1150-1157.

    Article  Google Scholar 

  14. Naphade MR, Kristjansson T, Frey B, Huang TS: Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems. Proceedings of International Conference on Image Processing (ICIP '98), October 1998, Chicago, Ill, USA 3: 536-540.

    Article  Google Scholar 

  15. Ayache S, Quénot G, Gensel J, Satoh S: Using topic concepts for semantic video shots classification. Proceedings of 5th International Conference on Image and Video Retrieval (CIVR '06), July 2006, Tempe, Ariz, USA, Lecture Notes in Computer Science 4071: 300-309.

    Article  Google Scholar 

  16. Snoek CGM, Worring M, Smeulders AWM: Early versus late fusion in semantic video analysis. Proceedings of the 13th Annual ACM International Conference on Multimedia (MULTIMEDIA '05 ), November 2005, Singapore 399-402.

    Chapter  Google Scholar 

  17. Ayache S, Quénot G, Gensel J: CLIPS-LSR experiments at TRECVID 2006. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '06), November 2006, Gaithersburg, Md, USA

    Google Scholar 

  18. Cortes C, Vapnik V: Support-vector networks. Machine Learning 1995,20(3):273-297.

    MATH  Google Scholar 

  19. Over P, Ianeva T, Kraaij W, Smeaton AF: TRECVID 2005—an overview. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '05), November 2005, Gaithersburg, Md, USA

    Google Scholar 

  20. Naphade M, Smith JR, Tesic J, et al.: Large-scale concept ontology for multimedia. IEEE Multimedia 2006,13(3):86-91. 10.1109/MMUL.2006.63

    Article  Google Scholar 

  21. Ayache S, Quénot G, Gensel J: Classifier fusion for SVM-based multimedia semantic indexing. Proceedings of 29th European Conference on Information Retrieval Research (ECIR '07), April 2007, Rome, Italy, Lecture Notes in Computer Science 4425:

    Google Scholar 

  22. Snoek CGM, Worring M, Geusebroek J-M, Koelma DC, Seinstra FJ: The mediamill TRECVID 2004 semantic video search engine. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '04), November 2004, Gaithersburg, Md, USA

    Google Scholar 

  23. Chang CC, Lin CJ: LIBSVM: a library for support vector machines. 2001,

    Google Scholar 

  24. Quénot GM: Computation of optical flow using dynamic programming. IAPR Workshop on Machine Vision Applications, November 1996, Tokyo, Japan 249-252.

    Google Scholar 

  25. Lin C-Y, Tseng BL, Smith JR: Video collaborative annotation forum: establishing groundtruth labels on large multimedia datasets. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '03), November 2003, Gaithersburg, Md, USA

    Google Scholar 

  26. Lewis DD, Yang Y, Rose TG, Li F: RCV1: a new benchmark collection for text categorization research. The Journal of Machine Learning Research 2004, 5: 361-397.

    Google Scholar 

  27. Lanckriet GRG, Deng M, Cristianini N, Jordan MI, Noble WS: Kernel-based data fusion and its application to protein function prediction in yeast. Proceedings of the Pacific Symposium on Biocomputing (PSB '04), January 2004, Big Island of Hawaii, Hawaii, USA 300-311.

    Google Scholar 

  28. Gosselin PH, Cord M: A comparison of active classification methods for content-based image retrieval. Proceedings of the 1st International Workshop on Computer Vision Meets Databases (CVDB '04), June 2004, Paris, France 51-58.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Stéphane Ayache.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Ayache, S., Quénot, G. & Gensel, J. Image and Video Indexing Using Networks of Operators. J Image Video Proc 2007, 056928 (2007).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI:


  • Processing Type
  • Abstraction Level
  • Fusion Strategy
  • Homogeneous Type
  • Training Effort