Li, Ying, 1972 September 18-

Video content analysis using multimodal information : for movie content extraction, indexing, and representation / Ying Li, C.-C. Jay Kuo. - xxi, 194 pages : illustrations ; 25 cm

Includes bibliographical references (pages 179-192) and index.

Introduction -- Audiovisual Content Analysis -- Video Indexing, Browsing and Abstraction -- MPEG-7 Standard -- Roadmap of The Book -- Background and Previous Work -- Visual Content Analysis -- Audio Content Analysis -- Speaker Identification -- Video Abstraction -- Video Indexing and Retrieval -- Video Content Pre-Processing -- Shot Detection in Raw Data Domain -- Shot Detection in Compressed Domain -- Audio Feature Analysis -- Commercial Break Detection -- Experimental Results -- Content-Based Movie Scene and Event Extraction -- Movie Scene Extraction -- Movie Event Extraction -- Experimental Results -- Speaker Identification for Movies -- Supervised Speaker Identification for Movie Dialogs -- Adaptive Speaker Identification -- Experimental Results -- Scene-Based Movie Summarization -- An Overview of the Proposed System -- Hierarchical Keyframe Extraction -- Scalable Movie Summarization and Navigation -- Experimental Results -- Event-Based Movie Skimming -- Introduction -- An Overview of the Proposed System -- Extended Event Set Construction -- Extended Event Feature Extraction -- Video Skim Generation -- More Thoughts on the Video Skim -- Experimental Results -- Conclusion and Future Work -- Conclusion -- Future Work. 1. 1. 2. 3. 4. 2. 1. 2. 3. 4. 5. 3. 1. 2. 3. 4. 5. 4. 1. 2. 3. 5. 1. 2. 3. 6. 1. 2. 3. 4. 7. 1. 2. 3. 4. 5. 6. 7. 8. 1. 2.

"Video Content Analysis Using Multimodal Information For Movie Content Extraction, Indexing and Representation is on content-based multimedia analysis, indexing, representation and applications with a focus on feature films. Presented are the state-of-art techniques in video content analysis domain, as well as many novel ideas and algorithms for movie content analysis based on the use of multimodal information." "The authors employ multiple media cues such as audio, visual and face information to bridge the gap between low-level audiovisual features and high-level video semantics. Based on sophisticated audio and visual content processing such as video segmentation and audio classification, the original video is re-represented in the form of a set of semantic video scenes or events, where an event is further classified as a 2-speaker dialog, a multiple-speaker dialog, or a hybrid event. Moreover, desired speakers are simultaneously identified from the video stream based on either a supervised or an adaptive speaker identification scheme. All this information is then integrated together to build the video's TOC (table of content) as well as the index table. Finally, a video abstraction system, which can generate either a scene-based summary or an event-based skim, is presented by exploiting the knowledge of both video semantics and video production rules." "This monograph will be of great interest to research scientists and graduate level students working in the area of content-based multimedia analysis, indexing, representation and applications as well as its related fields."--BOOK JACKET.

1402074905 9781402074905

2003052695


Information storage and retrieval systems
Optical storage devices.
Multimedia systems

TA1635 / .L53 2003

006.7