🎉 Big news: LightlyTrain now supports DINOv3. Learn more here

Navigating the Future of Edge AI: Observability and Intelligent Data Selection

Ideal For:

Reading time:

Category:

Share blog post

Below is a short summary on the challenges in building Edge AI applications.

TL;DR

Why Edge AI is generating so much data

‍Edge AI's unique data challenge stems from the immense volume of data generated by devices like security cameras and IoT sensors, making traditional cloud-based processing impractical for computer vision applications. Companies are increasingly moving towards edge solutions, despite the inherent complexities.

The practical challenges of Edge AI

‍Building and managing AI on the edge presents three main difficulties: managing the vast and varied data to ensure models reflect real-world conditions; handling complex deployments and updates on diverse hardware across large fleets of devices while maintaining data privacy; and continuously adapting models to new environments and data changes to combat data drift.

How data curation helps

‍Data curation offers a solution by optimizing datasets through techniques like active learning and intelligent sampling. This allows for real-time monitoring of live data streams on the edge, enabling the identification and selection of valuable data.

The outcome of edge observability and data selection

‍Implementing edge observability and data selection leads to several benefits, including reduced compute, labeling, transfer, and storage costs, improved model performance through higher quality training data, and the early detection and correction of data and model drift.

Lightly's solution

Lightly provides a data curation solution powered by active learning, offering both observability and intelligent data selection capabilities directly on edge devices to enhance AI.

The Lightly team recently had the pleasure of attending the 2023 AI Hardware & Edge AI Summit, immersing ourselves in groundbreaking discussions and sharing insights with pioneers in the field of Edge AI. Through all the innovation and challenges, Pushpak Pujari’s statement from Verkada cut through the noise:

“AI Observability on the edge is still unsolved.”

Back to the Cloud? Not for Everyone

While Large Language Models (LLMs) are witnessing a significant shift back to the cloud, computer vision, a predominant application for running AI on the edge, treads a different path. The sheer volume of data involved is too extensive, rendering cloud-based inference impractical. Consequently, a growing number of companies are advancing towards edge-based solutions, addressing the inherent challenges in building Edge AI applications.

The Practical Challenges

Building Edge AI is hard. It’s not just about deploying models; it’s about managing them in the wild. Pushpak laid out the three big headaches during his presentation:

Data
Deployment
Drift

‍Data: It's About Reality

For models to work on the edge, they need to be trained with datasets that mirror the real world to avoid unexpected breakdowns in production.

‍The data should represent various conditions and scenarios, and bias should be minimized. It’s imperative that datasets contain diverse sites, angles, objects, scenes, lighting, and weather conditions, minimizing the risk of failure due to unanticipated variables. Data bias goes beyond mere representation and encompasses data redundancy that can hurt model performance. Organizations must either construct internal data curation tools or leverage external ones to mitigate these issues.

Deployment: It's a Balancing Act

Deployment is critical and tricky. It needs to work on different hardware and requires regular updates. Managing a large number of devices is already challenging, and AI adds another layer of complexity to it.

Ideally, updates can be done over-the-air (OTA). Managing extensive fleets of devices is already a complex task, fortunately, mitigated by tools like Mender or Balena. However, introducing AI further complicates the situation, especially regarding MLOps and data pipeline management. Data must flow bidirectionally while maintaining stringent data privacy standards.

Data Drift: Keep Up or Fall Behind

Data drift is a constant challenge for companies deploying on the edge. Deployed models need to adapt to new environments and situations, requiring the integration of new data into the training sets to maintain their efficiency and accuracy.

Regular deployments in diverse locations necessitate the incorporation of environmental data into training sets, ensuring model resilience and generalization. This requirement introduces challenges related to data transfer volumes, labeling costs, and increased compute costs due to expanding training datasets. However, these tasks are crucial to maintain optimal model performance and prevent data or model drift in production environments.

How Can Data Curation Help?

Data curation using techniques like over- or undersampling and active learning can optimize datasets, improve model performance, and reduce training set sizes. A data curation tool on the edge can monitor live data streams and select valuable data in real-time.

Data curation plays a significant role in mitigating all the above-mentioned challenges. It enhances dataset quality and enables the identification and selection of valuable data, optimizing model performance in terms of generalization and robustness. Implementing a data curation solution on the edge serves as a powerful tool to monitor live data streams continuously for anomalies or out-of-distribution data.

The Outcome?

Implementing data selection and observability on the edge can lead to:

Reduced compute costs due to optimized training set sizes
Lower labeling costs due to efficient data selection
Minimized transfer & storage costs due to selective data on the edge
Better model performance due to high-quality training data
Early detection and rectification of data drift and model drift through edge observability and selection

Lightly’s Solution

Lightly offers a data curation solution powered by active learning, providing observability and data selection on the edge to enhance AI. If you want to optimize your AI, connect with us.

‍
Matthias Heller

Co-founder Lightly.ai

‍

Thanks Laura Schweiger, Malte Ebner, and Igor Susmelj for reviewing drafts of this blog.

See Lightly in Action

Curate data, train foundation models, deploy on edge today.

Book a Demo

Get Started with Lightly

Talk to Lightly’s computer vision team about your use case.

Book a Demo

Stay ahead in computer vision

Get exclusive insights, tips, and updates from the Lightly.ai team.

Navigating the Future of Edge AI: Observability and Intelligent Data Selection

Table of contents

Share blog post

Back to the Cloud? Not for Everyone

The Practical Challenges

‍Data: It's About Reality

Deployment: It's a Balancing Act

Data Drift: Keep Up or Fall Behind

How Can Data Curation Help?

The Outcome?

Lightly’s Solution

See Lightly in Action

Get Started with Lightly

Stay ahead in computer vision

Related Articles

DINOv3 Explained: Technical Deep Dive

How We Reproduced DINOv2 (So You Don’t Have To): Technical Guide

A Comprehensive Guide to Mean Average Precision

Navigating the Future of Edge AI: Observability and Intelligent Data Selection

Table of contents

Share blog post

Share blog post

Back to the Cloud? Not for Everyone

The Practical Challenges

‍Data: It's About Reality

Deployment: It's a Balancing Act

Data Drift: Keep Up or Fall Behind

How Can Data Curation Help?

The Outcome?

Lightly’s Solution

See Lightly in Action

Get Started with Lightly

Stay ahead in computer vision

Related Articles

DINOv3 Explained: Technical Deep Dive

How We Reproduced DINOv2 (So You Don’t Have To): Technical Guide

A Comprehensive Guide to Mean Average Precision

Get Beyond ImageNet: Vision Model Pretraining for Real-World Tasks.