🎉 Big news: LightlyTrain now supports DINOv2. Read our announcement.

A Guide for Active Learning in Computer Vision

Learn how active learning can be used to build a data flywheel where only data is getting labeled and used for training that actually matters.

Ideal For:

Reading time:

Category:

Share blog post

TL;DR

Learn how active learning can be used to build a data flywheel where only data is getting labeled and used for training that actually matters.

Before jumping right into the steps to select data using active learning, we will have a look at what active learning actually is.

What is Active Learning?

Active learning is a research field in machine learning (ML) that aims to reduce costs and time to build new machine learning solutions by querying the next data for your pipeline in an intelligent manner. When developing new AI solutions and working with unstructured data such as images, audio or text, we often require the data to be annotated by humans before we can use them for training our models. This data annotation process is very time-consuming and expensive. It’s typically one of the biggest bottlenecks in modern ML teams.

‍

With active learning you can create a feedback loop where you iterate between annotation, training and selection. Using good selection algorithms you can reduce the amount of data that is required to train a model to reach a desired accuracy.

‍

In our journey at Lightly, we talked to over 200 ML teams in the computer vision field. Most don’t use sophisticated active learning strategies yet and rely on random selection. Selecting data randomly has the advantage that it does not change the distribution of the data. However, this is also under the premise that the input data matches the distribution you actually care about.

‍

Different Active Learning Approaches

When doing active learning, we typically use the predictions of models. Whenever your model makes a prediction, you also get the associated probability of the prediction. Since models are inherently bad in knowing their own limit, we try to make use of other tricks in research to overcome these limitations. We could for example not only consider a single model but a group of models (ensemble). This gives us more information about the actual model uncertainty. If the group of models all agree on the predictions, the uncertainty is low. If they all disagree, the uncertainty is high. But having multiple models is very expensive. Papers like “Training Data Subset Search with Ensemble Active Learning, 2020”, use between 4 and 8 different models for ensemble methods.

‍

Plot from the “Training Data Subset Search with Ensemble Active Learning, 2020” paper showing how their different methods compare against the random baseline on ImageNet.

‍

We can be more efficient by using Monte Carlo dropout, where we add dropout between the last layers of our model. This allows us to use a single model to create multiple predictions (using Dropout) similar to using model ensembles. However, this has the downside that we need to change the model architecture and add dropout layers.

‍

Using Embeddings in Active Learning

Illustration from “Scalable Active Learning for Object Detection, 2020”

‍

Recently, papers also started using embeddings. With embeddings, we can get a feeling for how similar the different samples are. In computer vision, we could for example use embeddings to check for similar images or even similar objects. We can then use a distance metric such as Euclidean distance or cosine similarity in the embedding space and combine this with the uncertainty of the prediction.

Using embeddings and predictions from the very same model however has the drawback that both rely on the same features learned by the model. Typically, the embeddings are the output of the model one or a few layers before the predictions. To overcome this limitation, we started using embeddings from other models than what we call the “task” model. The task model is there the actual model you would like to improve using active learning.

Our own benchmarks and experience working with dozens of companies across autonomous driving, satellite imaginary, robotics and video analytics suggest that using models trained using self supervised learning have the most robust embeddings. Recent models such as CLIP or SEER are both using self supervised learning. We already summarized in another blog post that these self supervised learning models are more robust and fair.

‍

What can I expect when using Active Learning?

First, be aware that active learning is a tool and as most other tools you use, you will have to fine-tune some parameters to get maximum value out of it. After extensive research and trying to replicate many papers from recent active learning research, we observed that these basic rules seem to hold for what we consider to be “good” training data:

Choose diverse data — having diverse data (diverse images, diverse objects) is the single most important factor
Balance your dataset — Make sure the data is balanced across your modalities (weather, gender, time of the day)
Don’t worry too much about model architecture — Based on our own experiments, it looks like good data for a large ViT model is also helpful for a small ResNet

The first two points suggest that we should aim at getting diverse data from all modalities and in equal amounts. The third point is nice to know. It means, that we can select training data with a model today, and we can still reuse the same data in a year when we train a completely new model. Please note, that these are just observations. If implemented correctly, active learning can improve model accuracy significantly.

We evaluated the performance of combining AL, diversity and balanced selection on the task of detecting problems in salmon filets. The goal was to improve model accuracy for “Hematoma” of Salmon, as this is the most crucial class.

‍

Example image of a Salmon filet with object detections for the different classes. Read our full case study here.

‍

The company started with 20'000 images and had budget to select 1'000 new images, once using their existing method “random sampling” and once using a more sophisticated approach. Using a combination of diversity, prediction uncertainty and class balancing as part of their active learning strategy, the company was able to almost improve the F1 score for that crucial class by 100% compared to random selection. Overall F1 score (“General”) increased by 10% compared to randomly selecting images.

‍

Active Learning case study for object detection Lythium (Lightly 2022)

‍

Use Active Learning in your next Computer Vision Project

You have two options here. Either you start implementing your own framework based on Papers and GitHub repositories, or you use existing active learning solutions like LightlyOne.

‍

Implement Active Learning Algorithms from Scratch

Let’s start with implementing active learning yourself. In its simplest form, we can just focus on the predictions of our model. We could just create predictions for a new unlabeled dataset, compute a score and sort all samples based on the score.
The advantage of this approach is that it requires only little work. We can use a simple entropy scorer (we look at the prediction entropy). To compute the entropy of a discrete random variable, we can use the following formula:

‍

Entropy calculation formula. (source: Wikipedia)

‍

In Python code, this would translate to the following snippet:

‍

Example code to compute the entropy of a prediction using Numpy.

‍

Now we have a single score. How about doing the same for object detection? We can compute the entropy for each prediction and then aggregate the metrics per image, since we are interested in ranking the images for labeling.

We can then also create a scorer that uses embeddings to consider image diversity. And another one to consider our desired distribution of the metadata. As we go further, we discover a few things:

we have to write many scorers for every new input or task we want to solve
we need a way to easily switch between scorers to evaluate and keep track of which ones work best
we need a scalable solution as we might have much more unlabeled data at hand
we need strategies in order to combine different scorers

‍

Use Active Learning Solutions

Doing active learning yourself from scratch becomes its own engineering project. We also want to make sure that the algorithms work and if new papers with even more promising methods appear we want to include and benchmark them.

Finally, we also want metrics in order to evaluate the selected data before spending a ton of money on the data labeling process.

Instead of building your own active learning solution, you could use a platform like LightlyOne. The platform can help you process large amounts of unlabeled data with sophisticated data selection algorithms and without sharing your data. LightlyOne is used by leading machine learning teams that want to build an automated data flywheel that allows them to scale operations without having to build their own tools or grow their operations.

And here’s the best thing. You can even try it out for free!

Igor Susmelj,
Co-Founder Lightly

‍

See Lightly in Action

Curate data, train foundation models, deploy on edge today.

Book a demo

Get Started with Lightly

Talk to Lightly’s computer vision team about your use case.

Book a Demo

Stay ahead in computer vision

Get exclusive insights, tips, and updates from the Lightly.ai team.