OMDIA_logo-wht
|
Sponsored By:
comcast-technology-solutions-vrt-white-with-transparent-background-png-white
Global Report

How AI is transforming
TV and video

Author:

Eden Zoller, Chief Analyst

The far-reaching role of AI

Artificial Intelligence (AI) describes advanced technologies and algorithmic modelling that can enable a growing range of high value applications and use cases. Most AI systems are based on machine learning (ML) and deep learning (DL) models, with the latter being a more sophisticated approach based on neural networks. ML and DL models “learn” from the data they process, in a supervised or unsupervised manner, and become more efficient with greater exposure to data. ML and DL apply these learnings to make predictions, classify data, recognize objects or images, and comprehend speech or text. AI encompasses a range of technologies with important areas including visual AI (e.g. objecting, image and facial recognition) and voice AI (e.g. voice recognition, natural language processing and understanding). But those just scratch the surface of the utility that intelligent, learning systems can deliver back to media and advertising businesses.

“AI [is enhancing] how consumers engage with TV and video by making recommendations more contextually relevant . . . “

AI is having a profound impact on TV and video, touching every domain (as shown in Figure 1) and bringing multiple benefits. For example, in the data domain, AI is changing the stakes in analytics with ML models that can interrogate data in real time to reveal more granular, actionable insights. AI is enabling the the automation of media assets and production workflows, while having a transformative effect on video content metadata tagging that enables multiple use cases. AI is supporting content compliance, for example, with ML algorithms can be trained to identify and tag content that is subject to restrictions. At the same time, AI is helping to improve TV and video production, for example advanced dubbing techniques and automated clip generation. When it comes to delivery, AI-powered network resource planning, orchestration and fault detection can improve network efficiency and optimize performance. AI is also enhancing how consumers engage with TV and video, by making recommendations more contextually relevant through providing new types of user interfaces, (e.g. voice and gesture control) and with Augmented Reality (AR) applications that can make content more interactive and immersive. Alongside all this, AI is helping to improve monetization opportunities, for example by enabling more relevant, contextual ad placements and by driving advances in shoppable TV.

Source: Omdia
Data management and analytics

AI-powered data transformation

The use of analytics in TV and video is of course not new, but what is changing the stakes is data management and analytics solutions augmented by AI, as outlined below.

Taking the pain out of data management

TV and video data is high-volume, fragmented, and of variable quality, all of which make data processing difficult and time consuming. AI automation tools are highly efficient at managing large data sets at scale, at speed, and in real time – critical in the case of live broadcast.

Improved data quality control

ML algorithms can be trained to maintain the quality and integrity of both incoming and outgoing data sets. For example, ML algorithms can be trained to detect anomalies and noise in data sets; to identify and repair missing data.

Augmented analytics works smarter and faster

ML algorithms are dramatically improving what analytics can do, and as a consequence also improving the insights and outcomes that can be achieved – insights which can be used to support multiple use cases and improve business decisions. For example:

Smart metadata tagging is at the heart of everything
Content metadata tagging involves analyzing video content in order to recognize, classify, and label elements of that content. This process creates metadata information that makes content data searchable and actionable. Videos contain massive amounts of content data, as highlighted in Table 1, which is by no means a finite list.

Source: Omdia
AI-powered content data tagging solutions are transformative
The volume, variety and granularity of video content makes manual metadata tagging slow, inaccurate and/or only high level. In contrast, AI automated content tagging solutions can operate at speed, scale, and produce very granular results. ML algorithms can be trained to identify and automatically tag different elements of video content, using a range of AI technology including facial recognition, audio/speech recognition and sentiment/emotion recognition. Video content metadata tagging can enable and enhance multiple use cases, of which key examples are summarized in Table 2.
Source: Omdia
Production
Automated subtitles & dubbing

Powerful services increasing accessibility and monetization
Subtitles and dubbing also have a strong monetization aspect by making local content available to international markets. A viewer might not understand the original language, or have hearing difficulties (subtitles) or visual challenges (dubbing). It is not surprising then that these services are invaluable to consumers, as shown by Omdia’s Devices, Media and Usage survey results seen in Figure 2. Subtitles and dubbing also have a strong monetization aspect, makes local content available to international markets.

Subtitles/captioning

In the case of subtitles, AI-automated captioning leverages automatic speech recognition (ASR). ML algorithms are trained on a range of speech and other sound inputs (e.g. laughter, breaking glass, an explosion) to recognize patterns and meaning, and to automatically produce text outputs (transcripts, the captions/subtitles). In the case of TV, film and live broadcast, ASR models must be trained to recognize millions of words in many different languages in multiple contexts.

Language dubbing

Dubbing describes the process where original language dialogue is translated into a new language with synchronized lip/mouth movements. Traditional approaches to dubbing (hiring local actors to learn the role and voice the translated script) is an expensive, lengthy process that typically produces imperfect results and at worst, can spoil the viewing experience. AI solutions are driving significant improvements, with cutting edge dubbing solutions drawing on deep generative modeling techniques. This can involve using an actor’s original voice as an input into a generative model that creates a synthetic voice that can form words in a new language but based on the same authentic pitch and timbre as the original actor’s voice. An alternative approach is based on visual rather than verbal dubbing, where deep generative modelling creates a synthetic version of the actor’s face speaking the target language, with lip and facial movements that correspond with the exact formations/expressions used by a native speaker.

Automated Clips and Trailers

Fine-tuning the process for faster results
Clip highlights are a staple of the viewing experience. AI can be used to automatically generate clips from a wide variety of content sources: sporting events, a rolling news story, reality TV, etc. ML models essentially act like a production editor, and must be trained to recognize, classify and compile clips based on defined criteria and elements in a specific context. The various factors that can potentially comprise a desired clip can be scored and ranked, which means the automated clips should contain optimum elements. Hundreds of clips can be generated in this way and the results be presented in a dashboard to a human editor that has the final say.

Well-executed trailers are a powerful promotional tool, giving consumers a first taste of what is available and acting as a hook to pull them into the full cut of the content. Creating trailers using traditional production methods involves human editors having to scrutinize hours of film footage and manually selecting potential sequences for a trailer. This is where intelligent, learning workflows can help, via ML models that can be trained to identify the most appealing, winning elements for a trailer. Algorithms are trained on trailers for similar TV programs or films, along with associated data inputs such as user feedback and reviews. Algorithms must learn to identify characters, objects, scenery and music in the film; to recognize when sequences are designed to be exciting, frightening, romantic, funny and so on.

Augmented Reality to Enhance On-Set Storytelling

Augmented Reality (AR) is the layering of 2D or 3D graphics onto real world environments and objects, augmenting the latter with contextual information and entertainment. AR is already used in broadcast TV, with AR graphics placed as overlays on a real or virtual sets alongside human presenters/actors that interact with the graphics. For example, the graphic overlays used by a news anchor to relay complex data in easy to a digest format (e.g. charts/graphs summarizing economic data).

The AR graphics software/engine must be able to recognise and understand the real world so that the virtual content and physical environment can react with each other in exactly the right way in real time (e.g. correct scale, pose, location awareness). This is where computer vision and ML play a central role. AR software draws on ML algorithms that have been trained to recognize objects in the world around the user from the content of the camera feed. The recognized object triggers the AR software to render the relevant graphics overlay, and it must do so with precision and in real time.

Delivery
AI Network Optimization to Enhance the Quality of Experience

Consumers want high quality video services and will switch if they don’t get it. In this context it is not surprising that when it comes to the application of AI in TV and video, consumers place a high priority on AI that can improve viewing quality, such as bandwidth optimization and image upscaling, as shown in Figure 3.

AI analytics and automation can help core networks to be more agile, responsive, and performance-optimized. Two important ways AI does this is by:

Smarter Multi-CDN architecture
AI can also boost the performance of Content Delivery Networks (CDNs) by intelligently automating content routing in real time to the optimal edge server, which is more efficient than rules-based delivery and so further minimizes latency for data transfer. The same approach can be used for multi-CDNs – i.e. algorithms that determine the best CDN to use. And just as with core networks, ML models can be trained to analyse traffic flows and capacity, identify anomalies and take appropriate action.

User Engagement
Content discovery

Surfacing the right video content is harder, and more important than ever
Consumers need a shorter, straighter path to the content that resonates with them. Video libraries are expanding at a breakneck pace, magnified by the rise of super aggregation models from the likes of Amazon, Sky (Sky Q) and Roku. Unified content discovery is a real challenge in the context of super aggregation, as diverse content assets from different providers typically means disparate metadata tagging systems that need to be normalized.

Improving the understanding of search intent

We have already looked at how AI is transforming video content metadata tagging and noted that this can benefit search by making results more accurate. Another important way that AI is enhancing search is through advances in Natural Language Processing (NLP) techniques that better understand the intent behind a search query, and by improving the accuracy of complex queries.

Voice search has potential but is complex

Asking for information, recommendations and posing queries with spoken words can be easier and faster than navigating through an electronic program guide or menu system – particularly for people with visual challenges. Voice search could also be used to find specific sections of a video/TV program based on a spoken description. Taken together, this puts the voice interface in a good position to act as a unified tool for content discovery.

The central role of recommendations

Large content libraries are positive in that consumers have more choice, but can be negative if the amount of content overwhelms people to the point of option “paralysis”, where people cannot decide what to watch, get frustrated and end up watching nothing. In this context, the role of contextually relevant, personalized recommendations is more important than ever before. However, effective recommendations of this kind are hard to execute well, evidenced by the fact that recommendations often miss the mark with consumers. The level of personalization provided by recommendation engines depends on the breadth and quality of the available of data. Amazon, for example, is extremely data rich, not only because of the size of Amazon Video/Prime Video in terms of users and content, but because of the additional data insights it can bring into the mix from users interacting with other Amazon assets (Amazon marketplace, Alexa voice assistant, Amazon’s audio books, music, games etc.). The data available must also be accurately and consistently labelled, which is where video content metadata tagging solutions play a critical role. Moreover, the ML models that underpin recommendation engines need to be increasingly sophisticated, able to scale and of course, well trained. Looking at Amazon again by way of example, it uses cutting edge approaches such as graph neural networks (GNNs) that allow it to connect information about its content and customers from a variety of sources and to process data sets at scale.

Personalized Content

AI to drive segmented trailers
Standard movie trailers are typically based on a couple of compilations that highlight different aspects of the movie, for example a trailer that homes in on the lead actor or plays up a film’s high-octane credentials. These compilations are high-level and not created to align with, and appeal to different audience segments. We have already seen how AI can be used to automate the generation of clips and trailers, but AI can take this a step further by enabling the production of personalized short form content assets. The basic idea is to use AI for advanced customer segmentation modelling, with automated labelling of the segment attributes. In parallel with this, the original video content assets are automatically tagged, with algorithms trained to identify and extract content assets that best match the preferences and needs of a defined user segment(s).

Towards user-driven adaptive video content
A powerful form of personalization are formats that allow viewers to tailor the plot narrative and ending to their own tastes. In December 2018 Netflix took this approach with Black Mirror: Bandersnatch (Bandersnatch). The dystopian program allowed viewers to select between different actions for the protagonist at different points in the story, with their choice sending the narrative along different branching paths. Formats like Bandersnatch are a novel, interactive experience and are designed to encourage repeat viewing as fans explore the alternative storylines.

AI as an aid to adaptive program design and implementation

The narrative alternatives in Bandersnatch were not auto-generated by AI, but rather teams of experienced scriptwriters and producers. But where AI can help with these types of complex, non-linear, interactive formats is by helping to manage content assets and automate workflows. For example, through metadata tagging that is used to locate quickly and precisely where to trigger a decision tree function in the story, and to automatically select and load the corresponding video content. Algorithms can be trained to track and “remember” the viewers choices and the personalized story lines, which in turn provides a rich seam of data for future recommendations.

Immersion

AR TV companion applications
We have looked at how augmented reality (AR) overlays can be used to enhance virtual sets and broadcast programming. Alongside these, AR can be used to enhance viewer interactivity with the means of doing so controlled by consumers via AR companion applications, typically on smartphones. AR applications are becoming popular with consumers and the volumes of AR app downloads is steadily growing, as shown in Figure 4.

Non-synchronized AR TV applications

Applications of this kind are usually positioned as promotional tools for a film or TV program where the app is not synched with the show as viewers play it on the main screen. This is by far the most common approach and is also used for other use cases besides TV, including retail.

Synchronized AR TV applications

Interactive AR graphics can be layered over the TV screen or displayed alongside it, with the AR content synched with the corresponding video/TV content as it is played. AR TV applications of this kind must be able to recognize the TV screen within the camera feed, recognise content and/or time stamps in the video/TV program being played, and to synchronise AR overlays with specified moments in the content. This all relies on computer vision algorithms that have been trained to detect video frames and objects within frames, which in turn relies on robust video content metadata tagging. There is a long list of potential use cases for synchronized AR TV applications that the most prevalent being to provide additional information relating to a program (e.g. actors in a film, footballers in a match, reality TV stars). But it can also include synchronized entertainment and fun features – animations, music, AR stickers, mini-quizzes, polls.

Monetization
AI helps sharpen business decisions

AI cannot conjure the perfect business model, but advanced analytics can help by providing insights that can make business decisions better informed. For example, ML algorithms can be used to fine-tune existing segmentation models or surface attributes that reveal completely new clusters, with models dynamically updated based on changing market trends and customer behavior. This enables multiple business scenarios, including segmented, targeted video bundles in terms of the content, streaming quality (SD, HD, 4K), additional services (e.g. music, games) and pricing (e.g. ad supported, subscriptions, tiered pricing).

Propensity modeling is improved thanks to ML algorithms that can more accurately predict the probability of customer intent and work to a more granular level. The ability to understand customer intent confers huge competitive advantages as it can provide insights into the type of content and features that will appeal most to customers and also indicate their propensity to churn. Accurate insights into churn trajectories are particularly useful for streaming video platforms, since churn rates are high and unpredictable.

Informing content investment
A recent development is the use of AI for film and TV investment decisions, but in our view solutions of this kind should be viewed as assistive tools only. ML predictive modelling can be applied to data related to a proposed film or TV program to surface insights that indicate how successful it might be, and therefore if it will provide a decent return on content investments. Relevant data points include script analysis, popularity of the main actors, audience viewing stats, the success of similar films or TV programs. AI solutions of this kind can interrogate and model multiple data inputs at speeds that human teams cannot match. This could be useful if executives need to make investment decisions in a hurry, for example at film festivals. It would also allow executives to change different parameters of a film/TV program to see how this affects performance.

However, the ability of AI solutions to accurately predict a film or TV program’s level of success has limitations. Models rely on historical data, which means predictions are based on what has pleased audiences in the past rather than what could ignite them going forward. Solutions can also struggle to capture culture differences and can predict outcomes that are obvious, for example that having a hot A list actor in a lead role will boost engagement.

Advertising

AI is enhancing advertising on multiple fronts
Linear TV is an established and trusted advertising channel, while online video platforms are seeing rapid growth in advertising revenues, as shown in Figure 5. Ad-funded streaming services are going to be increasingly important as video streaming matures and content owners look to expand viewership outside of subscription models. But TV and video advertising is becoming more complex and challenging – a situation that will only magnify going forward. Issues include data deprecation as well as audience and device fragmentation, all of which make attribution and measurement harder. The good news is that AI can help, by making TV advertising more effective and engaging, with key means including immersion, advanced analytics, automation, and metadata tagging.

AI supports interactive advertising experiences

Millennials and younger generations in particular are drawn to interactive, immersive digital content, and will gravitate to digital advertising that has similar qualities. TV advertising may not be renowned for interactivity, but this can be helped by AR. AR graphics can be used to provide users with additional information about products featured in a video in ways that are highly interactive, be it in the form of a companion AR TV app or an online AR shopping experience associated with a program. Bravo TV, owned by NBCUniversal, did exactly this with its Virtual Bravo Bazaar AR shopping mall (online and mobile app) that gave fans an immersive way to virtually explore and buy products featured in popular Bravo programs like Below Deck, Southern Charm and Family Karma (among others).

Creative support

AI can assist advertising creative outputs in a similar way that it can be used to support the production of TV and film trailers. Algorithms can be trained on existing advertising campaigns, copy, and other inputs in order to identify winning combinations and recommend iterations that can support a new ad campaign. AI can also help make advertising more accessible with auto subtitling and advanced dubbing, just as it can for film and TV content formats.

Advanced analytics
Automation
Metadata tagging
e-commerce

AI is shaping next-generation shoppable TV
Shoppable TV/video describes experiences that enable viewers to purchase products featured in TV programs and videos. The basic premise is not new, as shopping channels like QVC have been around for years, while certain broadcasters like Comcast-owned NBCUniversal and Sky are using QR codes to drive shoppable advertising. But the nature of TV shopping is changing, with deeper TV e-commerce integrations and a more seamless experience driven by a convergence of forces including increased uptake smart TVs (see Figure 6) and AI solutions. An important development is the use of automated metadata tagging to identify and label products that appear in a video. Viewers can be notified when a featured product is available for purchase, enabling them to find out more about the product and, if they like it, purchase the item. This is the approach behind a shoppable TV experience UK broadcaster ITV introduced in 2021 for its popular reality TV show Love Island.

AI is Improving the Efficiency of Payments – and the User Experience
The ultimate goal of a shopping journey is one that ends with a purchase that leads to more purchases. This takes a delightful end-to-end experience, and a check-out characterized by payments that are fast, seamless and secure. AI can enhance these attributes and by doing so enable a better payments experience, with the main impacts via automation and enhanced security. For example:
The role of voice shopping: still a way to go

A voice interface can be used to make payments directly or via an AI voice assistant controlled by the user. In Omdia’s 2021 Consumer AI survey the majority of respondents used their AI assistants for shopping activity to some degree, as shown in Figure 7. However, there are hurdles to clear before voice shopping takes becomes mainstream and a basic issue is that many people are not comfortable using their voice to make payments, as cited by 48% of respondents in the survey. Lack of trust is another deterrent: 32% of respondents didn’t trust the security payments made via a voice assistant, 29% worry that the assistant will not understand them and buy the wrong things.

Recommendations
Content providers
Video and TV service providers
Advertisers