- Research Article
Image and Video Indexing Using Networks of Operators
EURASIP Journal on Image and Video Processingvolume 2007, Article number: 056928 (2007)
This article presents a framework for the design of concept detection systems for image and video indexing. This framework integrates in a homogeneous way all the data and processing types. The semantic gap is crossed in a number of steps, each producing a small increase in the abstraction level of the handled data. All the data inside the semantic gap and on both sides included are seen as a homogeneous type called numcept and all the processing modules between the various numcepts are seen as a homogeneous type called operator. Concepts are extracted from the raw signal using networks of operators operating on numcepts. These networks can be represented as data-flow graphs and the introduced homogenizations allow fusing elements regardless of their nature. Low-level descriptors can be fused with intermediate of final concepts. This framework has been used to build a variety of indexing networks for images and videos and to evaluate many aspects of them. Using annotated corpora and protocols of the 2003 to 2006 TRECVID evaluation campaigns, the benefit brought by the use of individual features, the use of several modalities, the use of various fusion strategies, and the use of topologic and conceptual contexts was measured. The framework proved its efficiency for the design and evaluation of a series of network architectures while factorizing the training effort for common sub-networks.
Iyengar G, Nock HJ, Neti C, Franz M: Semantic indexing of multimediq using audio, text and visual cues. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '02), August 2002, Lausanne, Switzerland
Iyengar G, Nock HJ: Discriminative model fusion for semantic concept detection and annotation in video. Proceedings of the 11th ACM International Conference on Multimedia (MULTIMEDIA '03), November 2003, Berkeley, Calif, USA 255-258.
Hauptman A, Baron RV, Chen M-Y, et al.: Informedia at TRECVID 2003 : analyzing and searching broadcast news video. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '03), November 2003, Gaithersburg, Md, USA 15.
Naphade MR, Smith JR: On the detection of semantic concepts at TRECVID. Proceedings of the 12th ACM International Conference on Multimedia (MULTIMEDIA '04), 2004, New York, NY, USA 660-667.
Naphade MR: On supervision and statistical learning for semantic multimedia analysis. Journal of Visual Communication and Image Representation 2004,15(3):348-369. 10.1016/j.jvcir.2004.04.010
Chua T-S, Neo S-Y, Zheng Y, et al.: TRECVID 2006 by NUS-I2R. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '06), November 2006, Gaithersburg, Md, USA
Ayache S, Quénot G, Satoh S: Context-based conceptual image indexing. Processing of the IEEE International Conference on Acoustics, Speech and Signal Proceedings (ICASSP '06), May 2006, Toulouse, France 2: 421-424.
Snoek CGM, Worring M, Hauptmann AG: Learning rich semantics from news video archives by style analysis. ACM Transactions on Multimedia Computing, Communications and Applications 2006,2(2):91-108. 10.1145/1142020.1142021
Snoek CGM, Worring M, Geusebroek J-M, Koelma DC, Seinstra FJ, Smeulders AWM: The semantic pathfinder: using an authoring metaphor for generic multimedia indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence 2006,28(10):1678-1689. 10.1109/TPAMI.2006.212
Wolpert DH: Stacked generalization. Neural Networks 1992,5(2):241-259. 10.1016/S0893-6080(05)80023-1
Backus J: Can programming be liberated from the von Neumann style? A functional style and its algebra of programs. Communications of the ACM 1978,21(8):613-641. 10.1145/359576.359579
Zavidovique B, Sérot J, Quénot GM: Massively parallel dataflow computer dedicated to real time image processing. Integrated Computer-Aided Engineering 1997,4(1):9-29.
Kumar S, Hebert M: Discriminative random fields: a discriminative framework for contextual interaction in classification. Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV '03), October 2003, Nice, France 2: 1150-1157.
Naphade MR, Kristjansson T, Frey B, Huang TS: Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems. Proceedings of International Conference on Image Processing (ICIP '98), October 1998, Chicago, Ill, USA 3: 536-540.
Ayache S, Quénot G, Gensel J, Satoh S: Using topic concepts for semantic video shots classification. Proceedings of 5th International Conference on Image and Video Retrieval (CIVR '06), July 2006, Tempe, Ariz, USA, Lecture Notes in Computer Science 4071: 300-309.
Snoek CGM, Worring M, Smeulders AWM: Early versus late fusion in semantic video analysis. Proceedings of the 13th Annual ACM International Conference on Multimedia (MULTIMEDIA '05 ), November 2005, Singapore 399-402.
Ayache S, Quénot G, Gensel J: CLIPS-LSR experiments at TRECVID 2006. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '06), November 2006, Gaithersburg, Md, USA
Cortes C, Vapnik V: Support-vector networks. Machine Learning 1995,20(3):273-297.
Over P, Ianeva T, Kraaij W, Smeaton AF: TRECVID 2005—an overview. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '05), November 2005, Gaithersburg, Md, USA
Naphade M, Smith JR, Tesic J, et al.: Large-scale concept ontology for multimedia. IEEE Multimedia 2006,13(3):86-91. 10.1109/MMUL.2006.63
Ayache S, Quénot G, Gensel J: Classifier fusion for SVM-based multimedia semantic indexing. Proceedings of 29th European Conference on Information Retrieval Research (ECIR '07), April 2007, Rome, Italy, Lecture Notes in Computer Science 4425:
Snoek CGM, Worring M, Geusebroek J-M, Koelma DC, Seinstra FJ: The mediamill TRECVID 2004 semantic video search engine. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '04), November 2004, Gaithersburg, Md, USA
Chang CC, Lin CJ: LIBSVM: a library for support vector machines. 2001, http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Quénot GM: Computation of optical flow using dynamic programming. IAPR Workshop on Machine Vision Applications, November 1996, Tokyo, Japan 249-252.
Lin C-Y, Tseng BL, Smith JR: Video collaborative annotation forum: establishing groundtruth labels on large multimedia datasets. Proceedings of the TREC Video Retrieval Evaluation (TRECVID '03), November 2003, Gaithersburg, Md, USA
Lewis DD, Yang Y, Rose TG, Li F: RCV1: a new benchmark collection for text categorization research. The Journal of Machine Learning Research 2004, 5: 361-397.
Lanckriet GRG, Deng M, Cristianini N, Jordan MI, Noble WS: Kernel-based data fusion and its application to protein function prediction in yeast. Proceedings of the Pacific Symposium on Biocomputing (PSB '04), January 2004, Big Island of Hawaii, Hawaii, USA 300-311.
Gosselin PH, Cord M: A comparison of active classification methods for content-based image retrieval. Proceedings of the 1st International Workshop on Computer Vision Meets Databases (CVDB '04), June 2004, Paris, France 51-58.