Data Curation for

Improve your machine learning models by using the right data

Audi LogoLogo CurbflowLogo ARMLogo FrontifyLogo SBBNvidia Logo

How it works

Lightly tells companies which subset of their data to label to have the biggest impact on model accuracy.

Find data redundancy and bias

Find and remove redundancy and bias introduced by the data collection process to reduce overfitting and improve ML model generalization.

10x more efficient

Save money on your data related costs by removing redundancies

Increased accuracy

Reduce overfitting and improve generalization by diversifying your dataset

Manage everything in one place

Understand your data within minutes after collection and before any data labeling.
We use self-supervised learning combined with active-learning to accelerate your data preparation pipeline.

Data Selection

Most companies only use between 0.1% and 10% of their data for machine learning. Use our state-of-the-art methods to select the most relevant samples. Let Lightly handle the selection of the data for you while you focus on the training process.

Smart Data Pool

Keep track of the data your team is working on. Our algorithms help you only adding relevant data to the existing pool. We only store non-sensitive meta-information on our servers so you don't have to worry about transfer costs or privacy issues.

Data Analytics

Use our deep data analytics framework to analyze your raw datasets. Get insights about the distribution, diversity, and other key metrics. Find dataset bias before training and evaluating your model.

What Customers say

Foto Alejandro Gracia from AI Retailer Systems

"I was amazed once we received the results of Lightly. The results showed us how we can work more efficiently by selecting the right data"

Alejandro Garcia, CEO AI Retailer Systems

"After training a model on the filtered data suggested by Lightly, I saw a dramatic increase in performance on our key metrics (...)"

Angelo Stekardis, Computer Vision Lead

Picture Nasib Adriano Naimi from DroGone

"Lightly helped us understand more about our own data gathering process. We saw, that a lot of data being collected was not meaningful"

Nasib Naimi, Co-Founder DroGone

Where to use it

  • Python PIP Package (CLI)
    • < 100'000 samples
    • Train models using self-supervised learning
    • Option to only upload non-sensitive metadata
  • Webapp
    • < 100'000 samples
    • Interactive Analytics
    • 2048-bit SSL encryption
  • On-Premise (Docker)
    • Used by Fortune 500 companies to process millions of samples
    • Neither your raw data nor metadata leave your server
    • Analytics reports

Speeding-up AI Across Industries

Autonomous
Vehicles

Make your vehicle autonomous for the street, sea, or air.

Industries:

Shipping, Logistics, Airline, Defense & Military

Autonomous
Vehicles

Visual Inspection

Detect defects in infrastructure, manufactured products, or find infected plants.

Industries:

Railways & Roads, Infrastructure, Manufacturing, Agriculture, Surveillance & Security

Visual
Inspection

Medical Imaging

Find abnormalities in medical images such as X-rays, MRIs, microscope & medical scans.

Industries:

Health/Life Science, Biotechnology, and Digital Diagnostics/Pathology

Medical
Imaging

Space Data

Improve space products and achieve better results

Industries:

Sattelite Imaging, Visual Inspection for Space Components, Autonomous Systems

Space
Data

As seen on

Logo Startupticker
Logo Swisspreneur
Improve your data
Today is the day to get the most out of your data. Share our mission with the world — unleash your data's true potential.
Contact us