Back

Sep 6, 2022

Sep 6, 2022

The Media Source Extension <> JavaScript API: The foundation of streaming on the web

How did innovation and standardization make streaming video content into what it is today? Read about it in this combination concept and tutorial article.

Diagram of how the media source extension works.
Diagram of how the media source extension works.
Diagram of how the media source extension works.

The streaming universe is booming.

Since its inception way back in 2013, influential companies like Netflix, Amazon and Apple have poured millions of dollars into creating new TV shows, movies and other video content. Over the past decade, many business areas have been leveraged by this unique ecosystem, including tech. Because distributing video is challenging and costly, the tech industry has been tasked with making it faster, cheaper and easier to stream content. As a result, the way we stream video on the web is vastly different than it was a decade ago.

Back in the day, we relied on external plugins in browsers to play videos, like Microsoft Silverlight (2007) and Adobe Flash Player (1996).

They were the first solutions for playing videos on the web, but they quickly ran into trouble because of many security issues in their source code that attackers leveraged, among other technical limitations shown by the growing demand for video content.

In January 2013, the World Wide Web Consortium (W3C) wrote a new standard to address these challenges: Media Source Extensions (MSE).

MSE aimed to be incorporated into the HTML5 standard. It set specifications for the byte streams and video/audio codecs supported on web browsers through video and audio HTML5 tags.

In September 2013, Youtube was one of the first video pioneers to use MSE.

The MSE advantage

Handling video data on the web is challenging.

Have you ever tried to share a big video file over the internet? Not many free and reliable solutions exist, as storing big data is generally very costly and very inconvenient to download.

As you know, requesting a whole movie bigger than 1.5GB on a web browser is not exactly efficient.

That's when byte streams begin to take shape. MSE and HTTP form a great team when it comes to downloading part of a file. Indeed, a few streaming standards have emerged— like Microsoft Smooth Streaming, DASH and HLS— to propose a solution that would leverage these two APIs to make transporting video content on the web much more efficient than it used to be. These standards permit requesting a small chunk of video data, usually 2 seconds at a time.

This technique is very efficient because you don't have to download the entire video, but only what you really consume; there’s no point in downloading the complete video data when you can only watch one part of it at a time.

Here is a small schema explaining the process that most streaming companies use to transport video data to your TV, phone or computer.

A diagram of how video players interact with servers.

How it works

Based on the schema above, we can see that the server stores the various pieces that the client needs to play a video.

On one hand, we have the segments, which are small chunks of a video (usually two seconds long). If we merge those two-second chunks, we have the whole video.

On the other hand, we have the manifest. This is our guide: it tells us where to find the video data on the server and what data is available. This could be in which quality the video is available or what audio and subtitle languages we can choose.

The manifest’s file follows different possible standards, the most common of which are the DASH and HLS standards.

You can see here what form the manifest could take depending on the standard it adopts.

It's then the role of the application to decide what to download, depending on the user's choice and internet conditions. For example, downloading 4K video segments may not be the best choice if the user's internet access is limited, as 4K video segments are the heaviest.

The video player layer is responsible for requesting video segments from the server under the form of ArrayBuffer, a JavaScript type used to represent raw binary data.

Nowadays, manipulating binary data on the front end is not very common, but this is where MSE comes into play.

The MSE API

The element

A flowchart of downloading a video element.

The magic happens in the HTML5 element. Many developers know that we have to specify a media element to the src attribute like so:

But a few know that we can pass a URL that is directly linked to an object that lives in memory.

We achieve that thanks to the API:

URL.createObjectURL(...)

The URL will look like what we have below:

const video = document.querySelector('video');
video.src = URL.createObjectURL(...);
//src="blob:"

Click to save this command to Pieces

Media Source

Now that we know that we can create a URL that is directly linked to an object, we will leverage the MediaSource API to create a MediaSource object connected to the element.

const mediaSource = new MediaSource();video.src = URL.createObjectURL(mediaSource);

Click to save this command to Pieces

Until here, everything works fine, but nothing will happen as the MediaSource object isn’t filled with anything. It’s empty. We need to add video data.

This is when the SourceBuffer object is used.

SourceBuffer

The SourceBuffer object acts like an actual buffer. It's a small object used to store data and could be represented like so:

A long rectangle subdivided into smaller rectangles.

We will store the two-second video segments we discussed earlier in this buffer.

A few methods live in the MediaSource instance. One of them is addSourceBuffer. This is the method that will create and add a SourceBuffer object to the MediaSource.

const mimeCodec = 'video/mp4; codecs="avc1.42E01E, mp4a.40.2"';const bufferVideo = mediaSource.addSourceBuffer(mimeCodec);

Click to save this command to Pieces

The addSourceBuffer methods take a single parameter that needs to be specified to tell the SourceBuffer the kind of data we will insert in the buffer.

In the video world, we compress and decompress video to transport it through the internet to gain in size, as we would do with a big file that we’ve zipped to make it easier to transport.

Multiple types of compression algorithms exist out there, but the most known in the industry are H264 (Advanced Video encoding) and its evolution, H265 (High-Efficiency Video Coding). H265 compresses twice as well for the same quality as H264. The decoding usually happens on the hardware side of the client you use to play the video (PC, Smartphone, Console…). As H265 is relatively new to the market, many hardware materials do not support this codec.

The MSE API provides a way to know if the codec you want to use is available, given your current hardware configuration.

MediaSource.isTypeSupported(mimeCodec) // true or false

Click to save this command to Pieces

Here, mimeCodec is essential, because most of the time, you can’t push a different codec type on the same SourceBuffer. Before adding new data from a different codec, you must first call the changeType method on the SourceBuffer.

The codec is the program we name that encodes and decodes a data stream.

Finally, once we have our SourceBuffer instance, we can start adding actual video segments to the buffer we just created, thanks to the appendBuffer(buf) method on the SourceBuffer instance.

const mimeCodec = 'video/mp4; codecs="avc1.42E01E, mp4a.40.2"';
const bufferVideo = mediaSource.addSourceBuffer(mimeCodec);
bufferVideo.appendBuffer(buf)

Click to save this command to Pieces

Once the videoElement has enough data in the buffer to start playing the video, its property readyState will be superior or equal to 3. The videoElement can then trigger the play method.

Events

The process of setting up these APIs is eventful. This means that most APIs should be called when an event has been sent to tell them that the API is ready to proceed further.

For instance, when we create a MediaSource instance, we can’t instantaneously use it, as it will be in the closed state:

const mediaSource = new MediaSource()  console.log(mediaSource.readyState); // closed

Click to save this command to Pieces

We need to wait for the MediaSource to be in the open state.

mediaSource.addEventListener('sourceopen', () => {
  // The mediaSource is open and ready to receive a sourceBuffer 
});

Click to save this command to Pieces

The SourceBuffer object also produces many events.

Appending real data

We now know how to append data, but how can we retrieve the data that we will send to the SourceBuffer we just created?

We will use the responseType property from the XMLHttpRequest API that tells the browser what kind of data we want back.

const xhr = new XMLHttpRequest;
xhr.open('get', url);
xhr.responseType = 'arraybuffer';
xhr.onload = () => {
  // append the video segment to the buffer
  sourceBuffer.appendBuffer(xhr.response);
};
xhr.send();

Click to save this command to Pieces

We set the responseType to arraybuffer, which acts like a binary JavaScript representation that contains the two-second video segment.

Once we request that data from the server, we can insert it inside the SourceBuffer object we created.

Conclusion

Since the first live-streamed concert in 1993, video has been a part of the internet. Since then, video has gained more and more traction, reaching incredible milestones and allowing internet companies to create entire businesses around video. For instance, in 2018, according to research from bandwidth management company Sandvine, Netflix occupied 15% of worldwide internet traffic. There’s also the mastodon of video on the internet, YouTube, which accrues more than 1 billion hours of watched content worldwide every day.

From these success stories, it’s clear that the technical handling of video on the internet has required innovation. So many of these actors gathered to create new technologies of distribution to make it faster and less frustrating for users to consume video on the web.

Hence, MSE has evolved to be even more supported across different browsers and platforms. This article acts as an introduction to how prominent actors handle video on the web. However, video is more complex than we think, and many other factors should be considered in further articles.

Paul Rosset.
Paul Rosset.

Written by

Written by

Paul Rosset

Paul Rosset

SHARE

SHARE

The Media Source Extension <> JavaScript API: The foundation of streaming on the web

Title

Title

our newsletter

Sign up for The Pieces Post

Check out our monthly newsletter for curated tips & tricks, product updates, industry insights and more.

our newsletter

Sign up for The Pieces Post

Check out our monthly newsletter for curated tips & tricks, product updates, industry insights and more.

our newsletter

Sign up for The Pieces Post

Check out our monthly newsletter for curated tips & tricks, product updates, industry insights and more.