Wednesday, January 11, 2006

Automatic Video Annotation

Right now QT/search is working only with text , but it is worth exploring whether it can work also with multimedia. In order to learn how to use QT/search for searching video I collected the following information: (This is a sequence article to "automatic image annotation")

http://news.com.com/Yahoo+tests+video+search+engine/2100-1025_3-5492779.html
Yahoo tests video search engine | CNET News.com
o Yahoo has unveiled a video search engine to the serve the growing appetite for multimedia entertainment online.
o The proposed system builds on a standard for syndicating content to other Web sites by allowing publishers to add text, or metatags, to their media files. That way, the RSS feeds can be sent to Yahoo for indexing in the search engine. Eventually, Yahoo said, the system could be used to let people aggregate video feeds on a personalized Web page, for example.

http://www.ysearchblog.com/archives/000060.html
Yahoo! Search Blog: Yahoo! Video Search Beta
Nice, but how come an "early beta" of Yahoo Video produces exactly the same results as the established Altavista Video Search? Apart from the obvious Yahoo-Overture-Altavista ownership chain, I mean. Did you merge the Altavista engine with Yahoo's or vice versa?

html">http://www.w3.org/People/howcome/p/telektronikk-4-93/Davis_M.html
Media streams: an iconic visual language for video annotation
The central problem in the creation of robust and extensible systems for manipulating video information lies in representing and visualizing video content. Currently, content providers possess large archives of film and video for which they lack sufficient tools for search and retrieval. For the types of applications that will be developed in the near future (interactive television, personalized news, video on demand, etc. ) these archives will remain a largely untapped resource, unless we are able to access their contents. Without a way of accessing video information in terms of its content, a thousand hours of video is less useful than one. With one hour of video, its content can be stored in human memory, but as we move up in orders of magnitude, we need to find ways of creating machine-readable and human-usable representations of video content. It is not simply a matter of cataloguing reels or tapes, but of representing and manipulating the content of video at multiple levels of granularity and with greater descriptive richness. This paper attempts to address that challenge.

http://portal.acm.org/citation.cfm?id=1101826.1101844
Semi-automatic video annotation based on active learning with . . .
Semi-automatic video annotation based on active learning with multiple complementary predictors
In this paper, we will propose a novel semi-automatic annotation scheme for video semantic classification. It is well known that the large gap between high-level semantics and low-level features is difficult to be bridged by full-automatic content analysis mechanisms. To narrow down this gap, relevance feedback has been introduced in a number of literatures, especially in those works addressing the problem of image retrieval. And at the same time, active learning is also suggested to accelerate the converging speed of the learning process by labeling the most informative samples.

http://portal.acm.org/citation.cfm?id=1101149.1101235
Automatic video annotation using ontologies extended with visual . . .
Automatic video annotation using Ontologies extended with visual information
Classifying video elements according to some pre-defined ontology of the video content domain is a typical way to perform video annotation. Ontologies are defined by establishing relationships between linguistic terms that specify domain concepts at different abstraction levels. However, although linguistic terms are appropriate to distinguish event and object categories, they are inadequate when they must describe specific patterns of events or video entities. Instead, in these cases, pattern specifications can be better expressed through visual prototypes that capture the essence of the event or entity. Therefore pictorially enriched ontologies, that include both visual and linguistic concepts, can be useful to support video annotation up to the level of detail of pattern specification.
Annotation is performed associating occurrences of events, or entities, to higher level concepts by checking their proximity to visual concepts that are hierarchically linked to higher level semantics.

http://vismod.media.mit.edu/vismod/demos/football/annotation.htm
Computers Watching Football - Video Annotation
The NFL has recently converted to using all digital media so that the video can be accessed and viewed directly from a computer.
Video annotation is the task of generating such descriptions. It is different than conventional computer vision image understanding in that one is primarily interested in what is happening in a scene, as opposed to what is in the scene. The goal is to describe the behavior or action that takes place in a manner relevant to the domain.
In the "football domain," we would like to build a computer system that will automatically annotate video automatically or provide a semi-automatic process for the VAC.
Video annotation is a problem that will become much more important in the next few years as video databases begin to grow and methods must be developed for automatic database summary, analysis, and retrieval. Other annotation problems being studied in the Vision and Modeling Group of the MIT Media Lab include dance steps and human gesture .
We have chosen to study the automatic annotation of football plays for four reasons: (1) football has a known descriptive language, (2), football has a rich set of domain rules and domain expectations, (3), football annotation is a real-world problem, and (4) it's fun.
An automatic football annotation system must have some input data upon which to make a preliminary play hypothesis. In the football annotation problem, we are using player trajectories. In the first stage of our annotation project, we have implemented a computer vision football-player tracker that uses contextual knowledge to track football players as they move around a field.

http://domino.watson.ibm.com/library/CyberDig.nsf/a3807c5b4823c53f85256561006324be/b4d1f3b9a6c594ea852565b6006bf7b7?
OpenDocument IBM Research | Technical Paper Search | Automatic Text Extraction From Video For Content-Based Annotation and ...
Efficient content-based retrieval of image and video databases is an important emerging application due to rapid proliferation of image and digital video data on the Internet and corporate intranets and exponential growth of video content in general. Text either embedded or superimposed within video frames is very useful for describing the semantic content of the frames, as it enables both keyword and free-text based search, automatic video logging, and video cataloging. Extracting text directly from video data becomes especially important when closed captioning or speech recognition is not available to generate textual transcripts of audio or when video footage that completely lacks audio needs to be automatically annotated and searched based on frame content. Towards building a video query system, we have developed a scheme for automatically extracting text from digital images and videos for content annotation and retrieval.

No comments: