Friday, September 6, 2024
YOU ARE AT:UCHow Generative AI on the Edge Transforms the Videoconferencing Experience

How Generative AI on the Edge Transforms the Videoconferencing Experience

Until recently, meetings and conferences primarily involved people meeting in person. We’ve seen a shift, however, over the past few years, where virtual meetings have become increasingly popular and are at least as common as face-to-face interactions, if not more so. Although virtual meetings tend to be more practical, they still can’t completely mimic the social aspect of personal interactions and technology is trying to bridge that gap.

At the same time, the rapid growth of generative artificial intelligence (GenAI) based use cases hasn’t skipped the domain of virtual meetings. New GenAI powered features are becoming more common with the power to make virtual meetings more engaging and productive, offering close to real-life experiences.

But for these advances to make an impact at scale, these features need to become available in real-time with minimum latency, and at an affordable cost. This means that, at a minimum, some of the new functionality must be available at the connected endpoints. Some solution providers are already integrating AI into video conferencing platforms and personal computers to address issues like virtual enhancements, real-time optimization, and automated meeting management.

The Impact of Generative AI on Video Conferencing

GenAI has the power to transform the video, audio, and text experience of a virtual meeting. Imagine a hybrid meeting with both boardroom and remote participants. Instead of sending a static wide shot of the boardroom participants to the remote team, intelligent video processing can dynamically zoom in on speakers, mimicking the nuanced experience of in-person interactions. With the power of neural radiance field (NeRF) or similar technologies, an engaging view of the remote participant’s side can be generated, giving an immersive experience, and dynamically changing the angle of view at each endpoint. AI can do wonders to make a harmonic and consistent gallery view, displaying all participants in a uniform size, posture, and style. If there’s a whiteboard in the boardroom, it can be auto detected by AI, and written notes can be recognized and converted into an editable format. Then, a personal version can be created for note taking and on-the-fly comments.

On the audio and text front, GenAI can be considered as a personal assistant that each of the participants can employ to maximize their productivity. This assistant can be used to convert audio to text to create a summary of the meeting, take actions as they are pointed to respective owners and even suggest relevant responses on the fly. For multilingual teams, language barriers can be mitigated with the help of such an assistant that can deliver instantaneous audio translation.

However, with all the potential it holds, GenAI as it exists today is limited by the technology that enables it. For AI-based videoconferencing to become useful and effective in applications like the ones described above, using existing cloud-based services is not enough for it to become available by default.

The Power and Potential of GenAI at the Edge

To enable the applications outlined above, video conferencing systems should be able to perform GenAI processing at the endpoints themselves — either on the personal computer or the conferencing gateway device – without needing to reach back to the cloud for processing.

One of the key elements of conferencing systems is their ability to scale. When it comes to scalability, it is critical to identify the cases in which centralized processing is relevant and the ones that require edge processing.

There are 3 main cases in which processing at a central point is advantageous:

  1. Information sharing – when the same information needs to be shared by all participants. For example, a shared whiteboard with no personal comments per participant.

2. Resource sharing – when the function has an inherent processing that is common to all endpoints, such as searching on a shared database. In such cases, the shared processing can be applied once and it is reusable for many or all endpoints.

Unlock all the Content
Takes only seconds to register
Invalid email address

3. Time sharing – when functionality requires light processing that can be handled easily by a central machine at a fraction of its capacity, like an alert when a participant enters the room or unmutes their microphone, the central machine can serve all endpoints, each at a different time slot without noticeable impact.

 

Most of the capabilities formerly described do not meet these three cases. Therefore, to build scalable video conferencing systems that can make these functions available for all participants, distribution of the AI capabilities downstream is required, equipping the different nodes with proper AI compute capacity.

This will result in several benefits, including:

  1. Cost – The expense of monthly subscriptions to cloud-based generative AI tools can be overwhelming. With multiple tools catering to various user needs like search engine, chat, and image/video creation, costs can quickly pile up to hundreds of dollars per user per month, straining budgets further. By migrating generative AI to the personal computer of the users or to the conferencing device, users become owners of the tools without the need for monthly subscriptions or long-term commitments, presenting a more financially viable solution.
  2. Connectivity – Virtual conferences are often impacted by a shortage of bandwidth, especially when participants have limited internet connectivity during travel or in remote locations. Edge-based generative AI can locally crop out irrelevant information, guaranteeing that only relevant and important data is transmitted and enabling uninterrupted and productive meetings.
  3. Latency – In virtual conferences, instant results are central to smooth interactions, whether it’s real-time translation, video adjustment, or content creation. Leveraging generative AI on edge devices reduces latency, ensuring a fluent discussion and seamless user experience without delays.
  4. Sustainability – The environmental impact of cloud-based AI processing cannot be underestimated, with significant  pollution and energy consumption generated in the process. Researchers at Carnegie Mellon University and Hugging Face measured the carbon footprint of different machine learning tasks, and their research shows that AI tasks that involve the generation of new content, such as text generation, image captioning, summarization, and image generation, stand out as the most energy intensive. The findings also show that the most energy-intensive AI models such as Stability AI’s Stable Diffusion XL, produce nearly 1,600 grams of CO2 per session, which has a similar environmental impact as driving four miles in a gas-powered car. Edge devices offer a more sustainable option for generative AI, consuming less power, minimizing cooling requirements, and reducing carbon footprint, therefore, presenting a more eco-friendly approach to AI conferencing.

Incorporating AI Processing Capabilities into Devices

Creating a videoconferencing system that processes AI directly on edge devices requires closed-loop systems capable of managing tasks typically handled in the cloud. By processing AI on endpoint devices, such as laptops, conference room devices and cameras, meetings can run smoothly and cost-effectively, while also ensuring the security of  AI-generated content like auto-summaries or dynamic presentations.

Hailo provides purpose-designed  AI processors that handle AI models like  those described above, creating energy-efficient and price-suitable solutions for a variety of edge devices. The company is currently working with conferencing manufacturers to integrate AI processors into their hardware.

In the near future, AV integrators and designers will have access to videoconferencing systems that are ready for the GenAI era, offering the advantages of GenAI in tandem with the performance, security, and reliability advantages of edge processing. This design promises to elevate collaboration to new heights, delivering the optimal combination of capabilities for enhanced teamwork.

Avi Baum is Chief Technology Officer and Co-Founder of Hailo, an AI-focused, Israel-based chipmaker that has developed a specialized AI processor for enabling data center-class performance on edge devices. Baum has over 17 years of experience in system engineering, signal processing, algorithms, and telecommunications and has focused on wireless communication technologies for the past 10 years.

Avi Baum

Avi Baum is Chief Technology Officer and Co-Founder of Hailo, an AI-focused, Israel-based chipmaker that has developed a specialized AI processor for enabling data center-class performance on edge devices. Baum has over 17 years of experience in system engineering, signal processing, algorithms, and telecommunications and has focused on wireless communication technologies for the past 10 years.

AVNATION IS SUPPORTED BY

- Advertisement -

POPULAR

AVNATION IS ALSO SUPPORTED BY

- Advertisement -

More Articles Like This