Introducing LightlyStudio: The Unified Data Platform for Multimodal ML

LightlyStudio unifies curation, labeling, and embeddings in one fast, developer-friendly platform built for modern computer vision and multimodal ML. It’s the next evolution of LightlyOne, rebuilt from the ground up.

Get Started with Lightly

Talk to Lightly’s computer vision team about your use case.
Book a Demo

Table of contents

Product
LightlyStudio
Category:
Update
Reading time
4 mins

From Data Curation to a Unified Data Platform: LightlyOne becomes LightlyStudio

LightlyOne started with a simple mission: make it easy for teams to curate better datasets for computer vision.

It became the trusted tool for automating data selection, exploring millions of images, and building smarter training sets.

Figure 1: LightlyOne

Over time, our customers began to ask for more. They wanted to label directly inside the same tool, to review and improve annotations, and to work with more than just images. Machine learning projects now span images, text, and 3D data. The way engineers build models has changed, and we needed to evolve with it. That is why we rebuilt our platform from the ground up. 

LightlyOne has grown into LightlyStudio, a unified data platform that brings together curation, labeling, and embeddings in one fast, unified product.

LightlyStudio builds on everything our users loved about LightlyOne, but removes the boundaries between steps.

Before: LightlyOne After: LightlyStudio
Curate large image datasets using Python Curation and labeling in one place
Visualize embeddings and metadata Native embedding support allowing for easy search of data
Active Learning and data selection algorithms Instant quality review with Label QA tools
One dataset can contain data from only one bucket or local folder Datasets can consist of a wild mix of different cloud storage providers and local data
Uses Docker for processing and Python SDK for scheduling Everything in a single Python SDK
No option for custom extensions Custom extensions possible through our plug-in system, planned to launch later this year
Pro tip

LightlyStudio is open-source. We also make it easy to migrate your data from Encord, Voxel51, Ultralytics, V7Labs, Roboflow, or other ML tools. Contact us to learn more.

What's New in LightlyStudio: An Overview

LightlyStudio is more than just an update; it's a complete rethinking of the data workflow.

1. Automate Data Curation and Management

Automatically surface the most valuable samples for training and fine-tuning. Use advanced filters, embeddings, and metadata to create meaningful subsets in seconds.

Figure 2: Data curation in LightlyStudio.

2. Label and QA

Label and QA your data using built-in annotation tools for images and videos. Manage tasks, review annotations, and ensure data quality without leaving the platform.

Figure: Labeling and QA process in LightlyStudio.
Figure 3: Labeling and QA process in LightlyStudio.

3. Native Multimodality and Embeddings

Embeddings are now built-in. Index, search, and filter across images, text, and point clouds in one unified workspace. Vector search is part of the core, not an add-on.

Figure 4: Embeddings in LightlyStudio.

4. Performance by Design

Experience near-instant interactions even on massive datasets. We achieved this with a new lightweight backend powered by DuckDB, performance-critical components rewritten in Rust, and a highly optimized frontend using Svelte.

5. Superior Developer Experience 

We put developers first. A clean, Python-first SDK with fully typed schemas (Pydantic) makes programmatic data work a breeze. It's pip-installable and designed for easy integration into your existing ML pipelines.

6. Open and Extensible 

LightlyStudio is open-source (Apache-2). You can use it locally, extend it with plugins, or integrate it deeply into your infrastructure. A hosted, collaborative cloud version will be available soon.

‍
Get Started with LightlyStudio in 60 Seconds

Getting started with LightlyStudio is simple. It runs on Python 3.8+ and on Windows, MacOS or Linux. 

You don’t even need a GPU to use it.

  1. Installation:
pip install lightly-studio
  1. Connect and Curate: Use our powerful and intuitive SDK to connect to your data and start building better datasets.
import lightly_studio as ls
from lightly_studio.core.dataset_query import AND, OR, NOT, OrderByField, SampleField‍

# Index your dataset from a local folder
dataset = ls.Dataset.create()
dataset.add_samples_from_yolo(    
	data_yaml="path/to/your/dataset/data.yaml",)‍
    
# Programmatically find interesting samples.
# Example: Find small images (< 500px) that have not been reviewed.
query = dataset.query().match(
	AND(
    	SampleField.width < 500,
        NOT(SampleField.tags.contains("reviewed"))    
        )
)‍

# Tag this subset for easy filtering in the UI.
query.add_tag("needs-review")‍

# Launch the local web UI on http://localhost:8001
ls.start_gui()‍


See it Live at ICCV 2025 in Hawaii!

We’ll be launching LightlyStudio at ICCV 2025.

Visit our booth for a hands-on demo, deep-dive walkthroughs, and quick migration clinics. Bring a dataset, and we'll show you how LightlyStudio can transform your workflow in minutes.

A Note for LightlyOne Users

No one gets left behind. LightlyOne will remain available for at least the next 12 months while we support customers who need more time to migrate. LightlyStudio is a breaking change in architecture, but for most teams, the migration is straightforward. 

Our team is ready to assist with migration plans and tooling to make the transition seamless.

Try LightlyStudio Now!

We are incredibly proud to share this milestone with you. LightlyStudio is the product of deep technical work and invaluable customer feedback, and it’s just the beginning. Here are links for further information:

Support the Project on GitHub

LightlyStudio is open source and built together with our community. If you believe in open, developer-first tools for better machine learning data, show your support by leaving us a star on GitHub 🌟.

Your stars help us grow awareness, attract contributors, and keep investing in open innovation for ML data workflows.

Join us on this journey as we build the next generation of data tooling for ML.

See Lightly in Action

Curate data, train foundation models, deploy on edge today.

Book a Demo

Get Started with Lightly

Talk to Lightly’s computer vision team about your use case.
Book a Demo

Explore Lightly Products

LightlyStudio

Data Curation & Labeling

Curate, label and manage your data
in one place

Learn More

LightlyTrain

Self-Supervised Pretraining

Leverage self-supervised learning to pretrain models

Learn More

LightlyEdge

Smart Data Capturing on Device

Find only the most valuable data directly on device

Learn More