LightlyTrain x DINOv2: Smarter Self-Supervised Pretraining, Faster
It's been just over a month since we launched LightlyTrain, and the response has been very encouraging.
We've seen over 10,000 downloads in the first four weeks and received interest from various companies looking to train their own foundation models using our self-supervised learning framework. LightlyTrain is designed for pretraining models on your own data, without the need for annotations.
Based on initial feedback and our development roadmap, we’ve been making steady improvements.
LightlyTrain x DINOv2: Smarter Self-Supervised Pretraining, Faster
Table of contents
Share blog post
LightlyTrain has passed 10,000 downloads and now includes DINOv2 support for pretraining ViTs, a new Distillationv2 module with faster convergence, and strong benchmark results. Updates make it easier for teams to train foundation models on custom data without annotations.
Ideal For:
Computer Vision Engineers
Reading time:
3 mins
Category:
Tools
Share blog post
LightlyTrain has been live for just over a month, and we’ve already shipped major updates to help teams pretrain foundation models more efficiently:
TL;DR
DINOv2 Support: You can now pretrain Vision Transformers (ViTs) on your own data using DINOv2 directly within LightlyTrain.
Distillationv2 Module: Faster convergence, fewer hyperparameters, and up to +2 mAP improvement over the previous method.
Performance Benchmarks: New distillation shows significantly faster training with strong results on RT-DETR and YOLO models.
10,000+ downloads and growing interest from teams building internal foundation models.
Next up: Better small object handling, DINOv2 refinements, and general performance improvements.
Check out the docs or GitHub to get started.
It's been just over a month since we launched LightlyTrain, and the response has been very encouraging.
We've seen over 10,000 downloads in the first four weeks and received interest from various companies looking to train their own foundation models using our self-supervised learning framework. LightlyTrain is designed for pretraining models on your own data, without the need for annotations.
Based on initial feedback and our development roadmap, we’ve been making steady improvements.
Distillationv2: Faster Convergence and Improved Performance
We've developed Distillationv2, an updated version of our distillation module that utilizes DINOv2 for distillation.
Key improvements include:
Adjusted Loss: The loss has been simplified to directly enforce similarity between teacher and student representations, removing the need for a pseudo-classification task.
Supervisory Signal Granularity: By adjusting the training loss, supervision is applied at the feature level rather than the image level, without incurring additional computational cost.
Faster Convergence: The finer-grained supervisory signal leads to significantly faster training. On RT-DETR, DistillationV2 matches the performance of the first distillation method in just one-third of the training time.
Performance Gains: DistillationV2 achieves up to +2 mAP improvement over DistillationV1 when trained for the same duration. Final results at convergence are also better.
Reduced number of hyperparameters: With the newly introduced loss term the number of hyperparameters inherent to the method is zero.
Robustness: The new method performs reliably across a wide range of batch sizes
Figure 1:Object detection performance of RT-DETR models pretrained with LightlyTrain using distillation v1 and v2. Models trained with distillation v2 converge much faster and achieve higher accuracy than v1, (128–2048), making it accessible even to customers with limited compute resources.
Figure 2: Object detection performance (mAP50-95%) of YOLO models pretrained with LightlyTrain using distillation v1 and v2. Models trained with distillation v2 achieve higher accuracy for all architectures larger than the small model.
DINOv2: Pretrain Your Own Foundation Models
Support for DINOv2 was a frequently requested feature, and we're pleased to announce its integration.
Highlights:
Native DINOv2 Pretraining: You can use LightlyTrain to pretrain Vision Transformer (ViT) models with DINOv2 directly on your private datasets.
For Large-Scale Projects: This is ideal for companies with access to large datasets and compute resources who wish to develop their own foundation models internally.
This integration aims to make it easier for teams to train custom DINOv2 models tailored to their specific data.
What’s Next: Upcoming Improvements and Community
The initial traction with over 10,000 downloads and the interest from companies aiming to build foundation models is a strong motivator for us. Our immediate next steps include:
Improve performance for images containing small objects
Further improve DINOv2 performance
Continuing to improve the overall usability and performance of the framework based on user feedback.
LightlyTrain is an open-source framework, designed to be simple to use for self-supervised learning on real-world datasets, so you might want to give it a go, no strings attached.
You can check out the project on GitHub. We welcome stars on the repo and any feedback you might have. If you're looking to integrate LightlyTrain into your training stack, please feel free to reach out.
Get Started with Lightly
Talk to Lightly’s computer vision team about your use case.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.