Introducing LightlyStudio: The Unified Data Platform for Multimodal ML
LightlyStudio unifies curation, labeling, and embeddings in one fast, developer-friendly platform built for modern computer vision and multimodal ML. It’s the next evolution of LightlyOne, rebuilt from the ground up.
Get Started with Lightly
Talk to Lightly’s computer vision team about your use case.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
From Data Curation to a Unified Data Platform: LightlyOne becomes LightlyStudio
LightlyOne started with a simple mission: make it easy for teams to curate better datasets for computer vision.
It became the trusted tool for automating data selection, exploring millions of images, and building smarter training sets.
Figure 1: LightlyOne
Over time, our customers began to ask for more. They wanted to label directly inside the same tool, to review and improve annotations, and to work with more than just images. Machine learning projects now span images, text, and 3D data. The way engineers build models has changed, and we needed to evolve with it. That is why we rebuilt our platform from the ground up.Â
LightlyOne has grown into LightlyStudio, a unified data platform that brings together curation, labeling, and embeddings in one fast, unified product.
LightlyStudio builds on everything our users loved about LightlyOne, but removes the boundaries between steps.
Before: LightlyOne
After: LightlyStudio
Curate large image datasets using Python
Curation and labeling in one place
Visualize embeddings and metadata
Native embedding support allowing for easy search of data
Active Learning and data selection algorithms
Instant quality review with Label QA tools
One dataset can contain data from only one bucket or local folder
Datasets can consist of a wild mix of different cloud storage providers and local data
Uses Docker for processing and Python SDK for scheduling
Everything in a single Python SDK
No option for custom extensions
Custom extensions possible through our plug-in system, planned to launch later this year
Pro tip
LightlyStudio is open-source. We also make it easy to migrate your data from Encord, Voxel51, Ultralytics, V7Labs, Roboflow, or other ML tools. Contact us to learn more.
What's New in LightlyStudio:Â An Overview
LightlyStudio is more than just an update; it's a complete rethinking of the data workflow.
1. Automate Data Curation and Management
Automatically surface the most valuable samples for training and fine-tuning. Use advanced filters, embeddings, and metadata to create meaningful subsets in seconds.
Figure 2: Data curation in LightlyStudio.
2. Label and QA
Label and QA your data using built-in annotation tools for images and videos. Manage tasks, review annotations, and ensure data quality without leaving the platform.
Figure 3: Labeling and QA process in LightlyStudio.
3. Native Multimodality and Embeddings
Embeddings are now built-in. Index, search, and filter across images, text, and point clouds in one unified workspace. Vector search is part of the core, not an add-on.
Figure 4: Embeddings in LightlyStudio.
4. Performance by Design
Experience near-instant interactions even on massive datasets. We achieved this with a new lightweight backend powered by DuckDB, performance-critical components rewritten in Rust, and a highly optimized frontend using Svelte.
5. Superior Developer ExperienceÂ
We put developers first. A clean, Python-first SDK with fully typed schemas (Pydantic) makes programmatic data work a breeze. It's pip-installable and designed for easy integration into your existing ML pipelines.
6. Open and ExtensibleÂ
LightlyStudio is open-source (Apache-2). You can use it locally, extend it with plugins, or integrate it deeply into your infrastructure. A hosted, collaborative cloud version will be available soon.
‍ Get Started with LightlyStudio in 60 Seconds
Getting started with LightlyStudio is simple. It runs on Python 3.8+ and on Windows, MacOS or Linux.Â
You don’t even need a GPU to use it.
Installation:
pip install lightly-studio
Connect and Curate: Use our powerful and intuitive SDK to connect to your data and start building better datasets.
import lightly_studio as ls
from lightly_studio.core.dataset_query import AND, OR, NOT, OrderByField, SampleField‍
# Index your dataset from a local folder
dataset = ls.Dataset.create()
dataset.add_samples_from_yolo(Â Â Â Â
data_yaml="path/to/your/dataset/data.yaml",)‍
# Programmatically find interesting samples.
# Example: Find small images (< 500px) that have not been reviewed.
query = dataset.query().match(
AND(
SampleField.width < 500,
NOT(SampleField.tags.contains("reviewed"))Â Â Â Â
)
)‍
# Tag this subset for easy filtering in the UI.
query.add_tag("needs-review")‍
# Launch the local web UI on http://localhost:8001ls.start_gui()‍
Visit our booth for a hands-on demo, deep-dive walkthroughs, and quick migration clinics. Bring a dataset, and we'll show you how LightlyStudio can transform your workflow in minutes.
A Note for LightlyOne Users
No one gets left behind. LightlyOne will remain available for at least the next 12 months while we support customers who need more time to migrate. LightlyStudio is a breaking change in architecture, but for most teams, the migration is straightforward.Â
Our team is ready to assist with migration plans and tooling to make the transition seamless.
Try LightlyStudio Now!
We are incredibly proud to share this milestone with you. LightlyStudio is the product of deep technical work and invaluable customer feedback, and it’s just the beginning. Here are links for further information:
LightlyStudio is open source and built together with our community. If you believe in open, developer-first tools for better machine learning data, show your support by leaving us a star on GitHub 🌟.
Your stars help us grow awareness, attract contributors, and keep investing in open innovation for ML data workflows.
Join us on this journey as we build the next generation of data tooling for ML.
See Lightly in Action
Curate data, train foundation models, deploy on edge today.