Technology Uncategorized

ouTube gets real-time video segmentation: Here’s how this technology works

The new segmentation technology will allow creators to replace and modify the background, increasing videos’ production value without specialised equipment.

Google has introduced real-time, on-device mobile video segmentation to the YouTube app, by integrating this technology into the latter’s stories feature, a new lightweight video format, designed specifically for YouTube creators on its beta version.

The new segmentation technology will allow creators to replace and modify the background, effortlessly increasing videos’ production value without specialised equipment, Google’s research blog noted.

“Video segmentation is a widely used technique that enables movie directors and video content creators to separate the foreground of a scene from the background and treat them as two different visual layers. By modifying or replacing the background, creators can convey a particular mood, transport themselves to a fun location or enhance the impact of the message. However, this operation has traditionally been performed as a time-consuming manual process or requires a studio environment with a green screen for real-time background removal. In order to enable users to create this effect live in the viewfinder, we designed a new technique that is suitable for mobile phones,” the blog read.

The new technology has been developed using machine learning to solve a semantic segmentation task using convolution neural networks. To provide high-quality data for the machine learning pipeline, the developers annotated thousands of images that captured a wide spectrum of foreground poses and background settings. Annotations consisted of pixel-accurate locations of foreground elements such as hair, glasses, neck, skin, and lips, and a general background label achieving a cross-validation result of 98 percent Intersection-Over-Union (IOU) of human annotator quality.

Furthermore, the specific segmentation task to compute a binary mask separating foreground from the background for every input frame (three channels, RGB) of the video was created. After this, the computed mask was passed from the previous frame as a prior by concatenating it as a fourth channel to the current RGB input frame to achieve temporal consistency, the developers said in the blog.

The original frame (left) is converted into three colour channels and concatenated with the previous mask (middle). Google uses this input to its neural network to predict the mask for the real-time frame (right). (Google)

Google noted that a limited rollout of YouTube stories will be facilitated to test the technology on this first set of effects, and will be rolled out across all versions in the near future.

Read more
  • Behind the glitz: ‘Success’ on YouTube still means a life of poverty for most
  • YouTube’s new moderators mistakenly pull right-wing channels

“As we improve and expand our segmentation technology to more labels, we plan to integrate it into Google’s broader Augmented Reality services,” the blog noted.

Leave a Reply

Your email address will not be published. Required fields are marked *