Having identified the biggest obstacles in creating captioned media on campus, a pilot project was proposed that saught to remedy or lessen the impact of these impediments. These issues were determined to be:
Solution: An appliance that performs media conversion "automagically". Programs exist today that perform this conversion, and are employed at websites such as YouTube and YahooVideo routinely. The Stanford Captioning program currently uses FFMPEG to provide this codec conversion.
Solution: An automated, managed solution that uses multiple transcription providers. With delivery times linked to cost of delivery, the price of transcribing a one-hour lecture can be as low as $75.00 -- well within reach of even limited budgets. Stanford Captioning's system allows for multiple service providers, who have provided contract pricing for these services. The system is scalable in that it can accomodate multiple service providers, allowing for both scalability as well as ensuring competitive pricing via market forces.
Solution: An appliance that processes text files and does an heuristic analysis of the audio stream, dynamically inserting the caption points into the transcript, and further exporting the modified file in the appropriate file format. Working with DocSoft and their AV Appliance we have a system that performs this function.
Working with the DocSoft AV appliance (a software and hardware solution designed to audio mine the spoken content in digital audio and video files) we engaged the manufacturer to customize the existing work flow of the appliance to add the ability to out-source and manage human transcription of media assets, ensuring accurate and reliable transcripts at an affordable price, as well as provide a programmatic means to convert media from one file format to another. A custom user/administrator interface will also be created to allow for web-based management of the service.
With this appliance in place, content creators will be able to log into the system and upload their media asset in most formats currently being produced. The appliance will convert this media (codec conversion) into either the .flv or H.264 (.mov or .m4v) file format, suitable for use in the JW FLV media player (a shockwave player currently enjoying wide-spread use across campus).
These converted files will be returned to the content owner in a timely fashion, currently envisioned as same day, allowing the content owner to post their media on the web in a quick delivery cycle. These initial web media files however will not be captioned, but the media player will be ready to accept caption files upon completion in a seamless integration process.
At the same time that the video is being converted to a web delivery format, the media will also be automatically converted to the MP3 audio format and forwarded to a pre-selected transcription company based upon a selected cost/delivery time chosen by the content owner. Once transcribed, these text files are reloaded into the appliance, where they are automatically converted to an XML based time-stamped file, which can be subsequently outputted in a number of formats, including the .srt file format used by the JW FLV Player. Once the .srt file is produced (an almost automatic process) the time-stamped transcript can be integrated into the website alongside the video, resulting in a fully closed-caption solution. As well, because the time-stamped file is initially created as an XML document, re-purposing the text for web delivery (XML to XHTML) is a relatively trivial undertaking, but a powerful means of ensuring that the video text is index-able and searchable.
Automatic codec conversion, a standardized cross-platform media player based upon Flash, and searchable text transcripts that allow for full indexing of the media asset, as well as precision search within the media itself, with a flexible, affordable transcription solution that ensures accurate and dependable results with consistency.
The project was initiated during the Fall quarter of 2008, with initial fuctional specifications determined with DocSoft. Between January 2009 and July 2009, multiple iterations of the system were deployed and tested, culminating in the launching of the service at a Tech Briefing on July 31, 2009. As part of the test phase, a number of legacy videos that experience high volume traffic at Stanford's YouTube channel were captioned and re-released. Plans to continue retrofitting older non-captioned videos are currently in discussion
The next phase of the project will seee us working with the Accessible Education [OAE]. We plan to seek out two related courses currently using web videos as part or all of the course curriculum, and a trial evaluation will be constructed where one of the courses will provide captioned videos to all students, whilst the other class will not provide captioning. At the end of the quarter, analysis of comprehension and retention, along with other user feedback will be collected and analyzed. The current hypothesis is that due to the increased modality of information transfer, that the captioned videos will deliver enhanced value and pedagogical value to all students -- yet one more reasons to encourage captioned media at Stanford.