A Framework for Video Interaction with Web Browsersby Pablo Cesar, Dick Bulterman and Jack Jansen In order to make multimedia a first-class citizen on the Web, there is a need for major efforts across the community. European projects such as Passepartout (ITEA) and SPICE (IST IP) show that there is a need for a standardized mechanism to provide rich interaction for continuous media content. CWI is helping to build a framework that adds a temporal dimension to existing a-temporal Web browsers. Web 2.0 is not so much a technological revolution as an evolution in the attitude of end-users towards the Web. What started as a global library is becoming a social meeting place in which users can share views and content. Where the initial focus was on a repository of static documents, the future focus will be on the provision of dynamic document services, such as the one shown in Figure 1. ![]() Figure 1: Screenshot of the e-tourism scenario. Some of the future scenarios that motivate our work are:
Persistent segmentation: for example, by allowing the user to explicitly pause a presentation and then restart it at some later point – possibly days or weeks later. In order to realize the services-oriented vision with video, an interaction model needs to be defined that transcends the traditional control set of start, stop and pause. The content within the video element will need to be triggered from external, peer-level content, as in the e-commercial scenario. That content in turn needs to trigger related content within the context of a higher-level embedding, as in the e-tourism scenario. The scenarios show that there is a clear need for richer temporal semantics when integrating a conventional (X)HTML browser interface with multimedia documents. To this end, we wrap videos with an external data model, to extend content-related (not content-based) interaction. The data model – rather than the video encoding - is the focal point for sharing, mashing and reusing individual objects. Following the lead of XForms, our data model is defined as a small XML document. This data model is language-independent and can be shared between different XML-based documents such as (X)HTML, SMIL, or SVG. In addition, the framework provides support for defining and manipulating the value of variables in the data model. Moreover, the framework provides the mechanism by which variables can be evaluated at runtime and the state variable values saved for the next time the media document is played. By exporting the data model to the outside world, it becomes possible for the media document to affect other contexts, eg the (X)HTML presentation. At the same time, external engines can affect the media presentation. So, unlike embedded video players, in our scenarios the video plays an active role in the Web page. At the moment, the framework is implemented in the Ambulant open-source SMIL player. The work sketched in this article has been submitted to the W3C's SYMM working group under the name of smilState. It is expected to be integrated in the SMIL 3.0 release in early 2008. We are also actively participating in the W3C Backplane work to use the results from this and other Web groups to integrate a broadly consistent framework for sharing the data model across XML-based languages. This work has been funded by the Dutch Bsik BRICKS project, the ITEA Project Passepartout and the FP6 IST project SPICE. Development of the open-source Ambulant Player and CWI's participation in the SMIL standardization effort have been funded by the NLnet foundation. Links: Please contact: |










