Back
The Media Source Extension <> JavaScript API: The foundation of streaming on the web
How did innovation and standardization make streaming video content into what it is today? Read about it in this combination concept and tutorial article.
The streaming universe is booming.
Since its inception way back in 2013, influential companies like Netflix, Amazon and Apple have poured millions of dollars into creating new TV shows, movies and other video content. Over the past decade, many business areas have been leveraged by this unique ecosystem, including tech. Because distributing video is challenging and costly, the tech industry has been tasked with making it faster, cheaper and easier to stream content. As a result, the way we stream video on the web is vastly different than it was a decade ago.
Back in the day, we relied on external plugins in browsers to play videos, like Microsoft Silverlight (2007) and Adobe Flash Player (1996).
They were the first solutions for playing videos on the web, but they quickly ran into trouble because of many security issues in their source code that attackers leveraged, among other technical limitations shown by the growing demand for video content.
In January 2013, the World Wide Web Consortium (W3C) wrote a new standard to address these challenges: Media Source Extensions (MSE).
MSE aimed to be incorporated into the HTML5 standard. It set specifications for the byte streams and video/audio codecs supported on web browsers through video and audio HTML5 tags.
In September 2013, Youtube was one of the first video pioneers to use MSE.
The MSE advantage
Handling video data on the web is challenging.
Have you ever tried to share a big video file over the internet? Not many free and reliable solutions exist, as storing big data is generally very costly and very inconvenient to download.
As you know, requesting a whole movie bigger than 1.5GB on a web browser is not exactly efficient.
That's when byte streams begin to take shape. MSE and HTTP form a great team when it comes to downloading part of a file. Indeed, a few streaming standards have emerged— like Microsoft Smooth Streaming, DASH and HLS— to propose a solution that would leverage these two APIs to make transporting video content on the web much more efficient than it used to be. These standards permit requesting a small chunk of video data, usually 2 seconds at a time.
This technique is very efficient because you don't have to download the entire video, but only what you really consume; there’s no point in downloading the complete video data when you can only watch one part of it at a time.
Here is a small schema explaining the process that most streaming companies use to transport video data to your TV, phone or computer.
How it works
Based on the schema above, we can see that the server stores the various pieces that the client needs to play a video.
On one hand, we have the segments, which are small chunks of a video (usually two seconds long). If we merge those two-second chunks, we have the whole video.
On the other hand, we have the manifest. This is our guide: it tells us where to find the video data on the server and what data is available. This could be in which quality the video is available or what audio and subtitle languages we can choose.
The manifest’s file follows different possible standards, the most common of which are the DASH and HLS standards.
You can see here what form the manifest could take depending on the standard it adopts.
It's then the role of the application to decide what to download, depending on the user's choice and internet conditions. For example, downloading 4K video segments may not be the best choice if the user's internet access is limited, as 4K video segments are the heaviest.
The video player layer is responsible for requesting video segments from the server under the form of ArrayBuffer, a JavaScript type used to represent raw binary data.
Nowadays, manipulating binary data on the front end is not very common, but this is where MSE comes into play.
The MSE API
The element
The magic happens in the HTML5 element. Many developers know that we have to specify a media element to the src
attribute like so:
But a few know that we can pass a URL that is directly linked to an object that lives in memory.
We achieve that thanks to the API:
The URL will look like what we have below:
Click to save this command to Pieces
Media Source
Now that we know that we can create a URL that is directly linked to an object, we will leverage the MediaSource API to create a MediaSource
object connected to the element.
Click to save this command to Pieces
Until here, everything works fine, but nothing will happen as the MediaSource
object isn’t filled with anything. It’s empty. We need to add video data.
This is when the SourceBuffer
object is used.
SourceBuffer
The SourceBuffer
object acts like an actual buffer. It's a small object used to store data and could be represented like so:
We will store the two-second video segments we discussed earlier in this buffer.
A few methods live in the MediaSource
instance. One of them is addSourceBuffer
. This is the method that will create and add a SourceBuffer
object to the MediaSource
.
Click to save this command to Pieces
The addSourceBuffer
methods take a single parameter that needs to be specified to tell the SourceBuffer
the kind of data we will insert in the buffer.
In the video world, we compress and decompress video to transport it through the internet to gain in size, as we would do with a big file that we’ve zipped to make it easier to transport.
Multiple types of compression algorithms exist out there, but the most known in the industry are H264 (Advanced Video encoding) and its evolution, H265 (High-Efficiency Video Coding). H265 compresses twice as well for the same quality as H264. The decoding usually happens on the hardware side of the client you use to play the video (PC, Smartphone, Console…). As H265 is relatively new to the market, many hardware materials do not support this codec.
The MSE API provides a way to know if the codec you want to use is available, given your current hardware configuration.
Click to save this command to Pieces
Here, mimeCodec
is essential, because most of the time, you can’t push a different codec type on the same SourceBuffer
. Before adding new data from a different codec, you must first call the changeType
method on the SourceBuffer
.
The codec is the program we name that encodes and decodes a data stream.
Finally, once we have our SourceBuffer
instance, we can start adding actual video segments to the buffer we just created, thanks to the appendBuffer(buf)
method on the SourceBuffer
instance.
Click to save this command to Pieces
Once the videoElement
has enough data in the buffer to start playing the video, its property readyState
will be superior or equal to 3. The videoElement
can then trigger the play
method.
Events
The process of setting up these APIs is eventful. This means that most APIs should be called when an event has been sent to tell them that the API is ready to proceed further.
For instance, when we create a MediaSource
instance, we can’t instantaneously use it, as it will be in the closed
state:
Click to save this command to Pieces
We need to wait for the MediaSource
to be in the open state.
Click to save this command to Pieces
The SourceBuffer
object also produces many events.
Appending real data
We now know how to append data, but how can we retrieve the data that we will send to the SourceBuffer
we just created?
We will use the responseType
property from the XMLHttpRequest
API that tells the browser what kind of data we want back.
Click to save this command to Pieces
We set the responseType
to arraybuffer
, which acts like a binary JavaScript representation that contains the two-second video segment.
Once we request that data from the server, we can insert it inside the SourceBuffer
object we created.
Conclusion
Since the first live-streamed concert in 1993, video has been a part of the internet. Since then, video has gained more and more traction, reaching incredible milestones and allowing internet companies to create entire businesses around video. For instance, in 2018, according to research from bandwidth management company Sandvine, Netflix occupied 15% of worldwide internet traffic. There’s also the mastodon of video on the internet, YouTube, which accrues more than 1 billion hours of watched content worldwide every day.
From these success stories, it’s clear that the technical handling of video on the internet has required innovation. So many of these actors gathered to create new technologies of distribution to make it faster and less frustrating for users to consume video on the web.
Hence, MSE has evolved to be even more supported across different browsers and platforms. This article acts as an introduction to how prominent actors handle video on the web. However, video is more complex than we think, and many other factors should be considered in further articles.