TubeCaption.com

Entries categorized as ‘Development’

Smarter Workspace-Text Import

July 5, 2008 · 3 Comments

After implementing the “Import Text” from Workspace feature, I was trying to eat my own dog-food, captioning lots of songs using the tool.  It turns out that while the Import Text function is extremely useful (it cuts down the typing significantly!) , it can be a lot better.  The idea of the “Import Text” from Workspace featuer is that basically for every line that is selected in the text box, a new segment is created and dump into the timeline track, saving me the typing.  However, the duration of the segment was a fixed constant, 1 second.  I still had to adjust the start time and the duration of every captions in another play-through to make sure the captions were synchronized correctly.

To improve the feature, I played with a few ideas of how the text-importer could estimate the duration of the segment and used value instead of the default 1 second.  I thought of phonetic analysis, got a few PDF-papers from Google, skimmed through them.  Right from the start,I knew that I woudln’t need very high accuracy for the estimation.  More importantly, I had very limited budget of time (an hour or two) to implement the feature, digging through complex phonetic algorithms will take a few days easily, and maybe a bit overkill for Captionizer.

I thought some more but not too hard because I had been working on so many captions for the past few days that the music kept on playing in my head.  Then suddenly, I realized that I could just count the vowels in a word and from there, I can estimate how long a word is spoken.  Even though I’m not a linguistic expert, the algorithm can still come up with a reasonable guestimation value for how long a sentence would be spoken!

Determining how long a sentence would be spoken

Let’s take an example.   We have this sentence (read it yourself)

Read This Sentence Aloud

Counting the vowel groups (2 vowels next to each other can be grouped together as one phonetic sound), we will have

[ ea, i, e, e, e, a, ou ]

7 vowel groups.  With normal phonetic stress and regular gaps between the words when spoken, I assume that it would take an average person 0.3 second to pronounce a vowel (or vowel group).  So the above sentence would take 7 x 0.3 = 2.1 seconds at average speed..

There are of course exceptions, such as words with multiple vowels but still get pronounced as one sound, such as “there”, “where”, “these”.  However, the overal spoken time of a sentence will be averaged out as we have other words as well.

The implementation is a mere 3 lines in JavaScript:

var wordSpokenLength = 0.3; // seconds
var vowels = text.strip().split( /[AEIOU]{1,}/i );
var totalDuration = vowels.length * wordSpokenLength;

(totalDuration is the suggested value for how long the segment should be)

The crux of the algorithm is the splitting of the whole sentence into groups.  The estimated duration can be calculated by multiplying the number of groups with the average length of spoken word.  Instead of using a regular string delimiter for the splitting, I used small regex of a group of standard latin vowels, A, E, I, O, U, with possible repetitions.  For example, aloud should be spliited into 2 pronounceable groups:  “a”, and “ou”.  If you execute the split in the javascript console, the javascript split will return 3 groups ["", l, "d" ] instead of 2, but it’s okay because of the estimation.

This method worked surprisingly well when I tested with a few songs.  I choosed Rihanna’s new song, Take a Bow, because it is a slower pace song.  For most part, the suggested duration was in good agreement with the singing, even though Rihinna stretched her voice longer at the end of a line more, thus,I would still need to stretch out some imported caption segments.

I tested again with Boys Like Girls – Thunder, a more popular rock,and Kardinal Offishal/Akon – Dangerous, a hiphop/Rap song.  0.3 second for average spoken word seems to works well in all of those cases.  However, with “Dangerous”, the rapping part is a bit faster than the suggested values, but this is an exception because those rappers they go for speed!  I will probably need to lower the average value to 0.1 second to get a better estimation.

Since the vowels being used are more specific to English, the algorithm won’t work as well with other languages with different or more vowels, such as my beloved Vietnamese.  In Vietnamese, we have vowels such as â, ă, ê, ơ, ô, ư with different accents such as ầ, ả, à, á, ê, ề, ế, ơ, ờ, ư, etc.  Thus the current pattern would fail to split the sentence and the estimated duration will be shorter than what it should be.  To overcome this, the splitter pattern needs to get updated to include all those different vowels so that we can get a better split results.  And for non-latin languages such as Chinese, Korean, Japanese, Arabic, Mayan :) etc., my method also fails miserably.

Another idea I’m thinking is that the importer can learn more about the pace of the current captions by keeping some kind of statistic of the average spoken word length.  “Dangerous” have a higher word density per captions, thus the average should be 0.2, instead of 0.3, while Rihnna have a less “densed” words-per-caption, and a longer duration, then the values should be 0.35 instead of 0.3.  However, this would be far more complicated implementation.  And right now, from my hands-on experiements, 0.3 seconds seems to do the job.

I am quite please with this tweak and how it would help make the captioning process easier and faster.  As we develop TubeCaption further, we have collected more feedbacks from friends and family, and we also learned more about the process.  It has been a long development road since the first prototype of Captionizer.  We added more and more features, tools, shortcuts, etc., with one single goal:  to make captioning less tedious and more like fun.  We hope you enjoy it as well!

Categories: Development
Tagged: , , ,

New Feature: Import Workspace Text

July 4, 2008 · 2 Comments

New Feature For Captionizer

After observing how friends and family used Captionizer, our awesome caption editor, I added a new feature which will further streamline a lot of the typing, which is the most tedious and error-prone step.

I made a quick screenshot guide for this new feature in 3 easy steps:

Once you have the transcript or the raw captions text, it is a matter of import from the workspace into the timeline.  Each line in the workspace will be imported as a caption segment in the timeline, thus saving you the typing.

New Video Request Page

Javier has also been working on the Video Request page.  It is still a bit rough with the design and the flow, but it is a start for us to collect requests so that TubeCaption users can work on them.

Enjoy!

Categories: Announcements · Development
Tagged: , , ,

Tiny is the new Big

June 6, 2008 · Leave a Comment

Maybe for something, being bigger means better.  But not for TubeCaption.  I spent the entire day today working on the different JavaScript minifiers, obsfucators, and asset managements to futher trim down the size of the application.    It’s been an extremely productive day here for both Javier and I.

To start with the optimizing process, I gave Jsmin a shot.  It gives satisfactory results in terms of minification, but the packed files were not compressed and obfuscated as I wanted.  Since I wanted to reduce the file size more and also to protect the application from external manipulations, obfuscation is a critical requirement.

I then found out about PackR, a Rails plugin implementing the Dean Edward’s Packer script.  Dean’s original implementation in JavaScript of the packer and obfuscator is wickedly cool, but it will probably take me a full day to read thru his code and see what he’s doing — it is THAT convoluted.  I was very impressed with the results from the Ruby siblings.  PackR spitted out the smallest files ever, in term of size, when run with “base64″ and “shrink variables” both checked.  The files were also almost unreadable due to the minification and obfuscation.  However, sadly my code stopped working completely altogether.  There’s no way I can read a file contains one single line of 8000000000 characters with convoluted JS calls and able to figure out what went wrong.  So I thought that I would go bankcrupt with my original plan of optimization the scripts…

I then remembered about YUICompressor and decided to give it a shot.   YUICompressor is a Java app, compiled as a JAR file.  Java? NOOOOOOOO.  Anyhow, I gave in.  Installing Java on the server was simple enough through yum.  I already have Java installed locally so I just fired up the JAR file (I’m using yuicompressor-2.3.4.jar) to see how it rolls.  YUICompressor works decently with reasonable minifications.  The obfuscation is weak since it only does function parameters replacement (unlike Dean’s Packer, which goes deep into the code and performs crazy regex-kungfu to obfuscate).  Nonetheless, my code did work, after numerous failed attempts to pack the correct files in the correct orders.  I’m really happy of how things did turn out.

The results?

Prototype (1.6.0.1) + Scriptaculous (1.8.1) total of 169KB after combined and packed.  After the packed file is gzipped, it’s only 47KB!

The editor’s code, spreading out around 10 different files with a total size > 60KB is now reduced to ONE single minified file, weighs 13KB after getting gzipped.

Miscellaneous JavaScript for the Watch Video Page, packed down to 15KB, gzipped down to …… 3KB!

What does this mean to TubeCaption users?  The page loading time will improve and the editor will load much faster than before.

The thing I love the most about this is that the entire process of merging all the JS files, obfuscating them, and produce the end results is totally automated.  In development mode, everything is nicely separated.  I still work on the different source files in my project.  Once I push the deploy button, a Rails plug-in, which I customized to work with the YUI Compressor, takes care of entire packaging process.  In the view, the javascript helper will automatically know to pick up the correct packed JavaScript files if the current environment is production, or spits out the links to the individual source files if in development mode!

So why don’t you fire up the editor, start captioning and make some money today?

Categories: Announcements · Development
Tagged: , , , ,