LightlyTrain x DINOv2: Smarter Self-Supervised Pretraining, Faster

We’ve integrated DINOv2 into LightlyTrain — you can now pretrain ViT models directly on your own data using DINOv2, and with our new Distillationv2 your models train faster and achieve higher accuracy.

Get Started with Lightly

Talk to Lightly’s computer vision team about your use case.
Book a Demo

Table of contents

Product
LightlyTrain
Category:
Update
Use Case
5 min

It's been just over a month since we launched LightlyTrain, and the response has been very encouraging.We've seen over 10,000 downloads in the first four weeks and received interest from various companies looking to train their own foundation models using our self-supervised learning framework. LightlyTrainis designed for pretraining models on your own data, without the need for annotations.Based on initial feedback and our development roadmap, we’ve been making steady improvements.And if you haven’t yet explored our framework, make sure to check out LightlyTrain documentation and LightlyTrain demo, or book a meeting with our team to learn more.

Pro tip

For more information please check our Lightly Documentation!

Distillationv2: Faster Convergence and Improved Performance

We've developed Distillationv2, an updated version of our distillation module that utilizes DINOv2 for distillation. 

Key improvements include:

  • Adjusted Loss: The loss has been simplified to directly enforce similarity between teacher and student representations, removing the need for a pseudo-classification task.
  • Supervisory Signal Granularity: By adjusting the training loss, supervision is applied at the feature level rather than the image level, without incurring additional computational cost.
  • Faster Convergence: The finer-grained supervisory signal leads to significantly faster training. On RT-DETR, DistillationV2 matches the performance of the first distillation method in just one-third of the training time.
  • Performance Gains: DistillationV2 achieves up to +2 mAP improvement over DistillationV1 when trained for the same duration. Final results at convergence are also better.
  • Reduced number of hyperparameters: With the newly introduced loss term the number of hyperparameters inherent to the method is zero.

Robustness: The new method performs reliably across a wide range of batch sizes

Figure 1: Object detection performance of RT-DETR models pretrained with LightlyTrain using distillation v1 and v2. Models trained with distillation v2 converge much faster and achieve higher accuracy than v1, (128–2048), making it accessible even to customers with limited compute resources.
Figure 1: Object detection performance of RT-DETR models pretrained with LightlyTrain using distillation v1 and v2. Models trained with distillation v2 converge much faster and achieve higher accuracy than v1, (128–2048), making it accessible even to customers with limited compute resources.

Figure 2: Object detection performance (mAP50-95%) of YOLO models pretrained with LightlyTrain using distillation v1 and v2. Models trained with distillation v2 achieve higher accuracy for all architectures larger than the small model.
Figure 2: Object detection performance (mAP50-95%) of YOLO models pretrained with LightlyTrain using distillation v1 and v2. Models trained with distillation v2 achieve higher accuracy for all architectures larger than the small model.

DINOv2: Pretrain Your Own Foundation Models

Support for DINOv2 was a frequently requested feature, and we're pleased to announce its integration.

Highlights:

  • Native DINOv2 Pretraining: You can use LightlyTrain to pretrain Vision Transformer (ViT) models with DINOv2 directly on your private datasets.
  • For Large-Scale Projects: This is ideal for companies with access to large datasets and compute resources who wish to develop their own foundation models internally.

This integration aims to make it easier for teams to train custom DINOv2 models tailored to their specific data.

See Lightly in Action

Curate data, train foundation models, deploy on edge today.

Book a Demo

What’s Next: Upcoming Improvements and Community

The initial traction with over 10,000 downloads and the interest from companies aiming to build foundation models is a strong motivator for us. Our immediate next steps include:

  • Improve performance for images containing small objects
  • Further improve DINOv2 performance
  • Continuing to improve the overall usability and performance of the framework based on user feedback.

LightlyTrain is an open-source framework, designed to be simple to use for self-supervised learning on real-world datasets, so you might want to give it a go, no strings attached.

You can check out the project on GitHub. We welcome stars on the repo and any feedback you might have. If you're looking to integrate LightlyTrain into your training stack, please feel free to reach out.

Get Started with Lightly

Talk to Lightly’s computer vision team about your use case.
Book a Demo

Explore Lightly Products

Lightly One

Data Selection & Data Viewer

Get data insights and find the perfect selection strategy

Learn More

Lightly Train

Self-Supervised Pretraining

Leverage self-supervised learning to pretrain models

Learn More

Lightly Edge

Smart Data Capturing on Device

Find only the most valuable data directly on device

Learn More