📣 Big news: LightlyStudio is now live! Try it for free.

Customer Success Stories

From Millions of Road Images to High-Value Training Data: How Greenwood Engineering Uses LightlyTrain

Lightly helped Greenwood Engineering curate millions of road-surface images by training a custom DINOv2 model with LightlyTrain, enabling a more effective selection of high-value samples for labeling and model training.

Vijay Gill Hansted

Machine Learning Engineer

Overview

Industry

Manufacturing

Location

Brøndby, Denmark

Employee

<50

Get Started with Lightly

Talk to Lightly’s computer vision team about your use case.

Book a Demo

Products

LightlyTrain

Results

2M+

Curated Images

Use Case

Data curation

About

Greenwood Engineering, based in Denmark, develops advanced measurement systems used across road, rail, and airport infrastructure. Their equipment continuously records high-resolution road-surface images, generating extremely large datasets that capture the texture, condition, and marking patterns of road networks at scale.

As the volume of captured data grew into the millions, Greenwood began exploring machine learning approaches to classify surface types, detect patterns, and support automated quality assessment.

Problem

While collecting data wasn’t an issue, making it useful was the real challenge.

Greenwood Engineering trains models for detecting and measuring road surface defects, such as cracks and potholes, as well as measuring lane marking quality. But labeling millions of images was infeasible, and manually selecting which samples to label was slow, repetitive, and prone to redundancy:

Many images captured nearly identical road segments.
Important edge cases were buried in the dataset.
Labeling at scale was costly, and manual filtering didn’t scale.

Labeling this entire corpus was not feasible, and manual sampling lacked consistency.

Testimonials

We collect millions of road surface images, but since surface imagery is highly spatially correlated, labelling every sample is redundant, and finding sets of diverse data was a challenge.

Vijay Gill Hansted

Machine Learning Engineer

Solution

To efficiently curate their dataset, Greenwood used LightlyTrain to train their own DINOv2 model on unlabeled road surface images. The resulting model captures different road surface conditions much better than an off-the-shelf model. This made data curation the most effective lever for improving model performance, and LightlyTrain enabled that shift.

Using LightlyTrain and the custom DINOv2 model, the team generated embeddings for their entire dataset. These embeddings gave them a scalable way to explore the data, run similarity search, remove redundancy across millions of road-surface images, and extract valuable samples for labeling.

Why Curation Was Essential

With the current dataset, the team quickly reached the point where finding relevant samples for labelling required manually inspecting hundreds or thousands of samples.

What they needed instead was a better understanding of which images were actually informative. LightlyTrain helped provide that structure:

Training improvements from additional labeled data were modest
Adding labels without addressing redundancy led to diminishing returns
The dataset needed to be organized before annotation could have impact

Results

LightlyTrain helped Greenwood organize their large road-surface dataset into a meaningful embedding space. This gave the team a much clearer understanding of where redundancy existed, which textures and markings were visually similar, and which samples were diverse enough to prioritize for labeling.

With this visibility, annotation could focus on examples that were most likely to improve downstream models.

Identify clusters of similar road textures and markings
Run k-nearest-neighbor (kNN) queries to find visually related samples
Detect redundant images across long road stretches
Build a more balanced and representative labeled subset

Get Started with Lightly

Talk to Lightly’s computer vision team about your use case.

Book a Demo

Testimonials

What engineers say after adopting Lightly

No fluff—just results from teams using Lightly to move faster with better data and models.

"We had millions of images but no clear way to prioritize. Manual selection was slow and full of guesswork. With Lightly, we just feed in the data and get back what’s actually worth labeling."

Carlos Alvarez

Machine Learning Engineer

“The pretrained models were low in performance. The color scheme is probably the reason, they just don’t transfer well to ash-RGB. This is why we decided to give LightlyTrain distillation a try.”

Ana-Maria Pelin

ML Trainee

"Through this collaboration, SDSC and Lightly have combined their expertise to revolutionize the process of frame selection in surgical videos, making it more efficient and accurate than ever before to find the best subset of frames for labeling and model training."

Margaux Masson-Forsythe

Director of Machine Learning

“Lightly enabled us to improve our ML data pipeline in all regards: Selection, Efficiency, and Functionality. This allowed us to cut customer onboarding time by 50% while achieving better model performance.”

Harishma Dayanidhi

Co-Founder/ VP of Engineering

"It took far less work than expected to plug DINO into our SSL system - the LightlySSL code was clean and easy to adapt"

Suraj Pai

Research Associate

“By integrating Lightly into our existing workflow, we achieved a 90% reduction in dataset size and doubled the efficiency of our deployment process. The tool’s seamless implementation significantly enhanced our data pipeline.”

Usman Khan

Sr. Data Scientist