Build data flywheels and active learning loops

Powerful Components for Seamless Integration

Lightly Worker

A docker container running on your GPU that does all the processing.

Python SDK

Integrate with other frameworks and create selection jobs using scripts.

Lightly Platform

Powerful API and UI that gives instant access to the selected data.

Example of a Python script to do diversity sampling

from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType
from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Create the Lightly client to connect to the Lightly Platform.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset(
   dataset_name="dataset-name",
   dataset_type=DatasetType.IMAGES,
)

# Connect the dataset to your S3 bucket.
client.set_s3_config(
   resource_path="s3://bucket/input/",
   region="eu-central-1",
   access_key="S3-ACCESS-KEY",
   secret_access_key="S3-SECRET-ACCESS-KEY",
   purpose=DatasourcePurpose.INPUT,
)
client.set_s3_config(
   resource_path="s3://bucket/lightly/",
   region="eu-central-1",
   access_key="S3-ACCESS-KEY",
   secret_access_key="S3-SECRET-ACCESS-KEY",
   purpose=DatasourcePurpose.LIGHTLY,
)

# Schedule a Lightly Worker run.
client.schedule_compute_worker_run(
   worker_config={
       "enable_training": True,
   },    selection_config={
       "n_samples": 50,
       "strategies": [
           {
               "input": {
                   "type": "EMBEDDINGS"
               },
               "strategy": {
                   "type": "DIVERSITY"
               }
           }
       ]
   }
)

Automatically select data that matters

Embeddings

Selection based on similar / diverse images

Metadata

Selection based on metadata such as location, weather and more

Predictions

Selection based on predictions and probabilities

Why should I use Lightly?

Save costs on building and scaling in-house solutions. Leverage a suite of powerful and evaluated selection algorithms. Reduce deployment cycles and data labeling costs by finding the most relevant training data.

Feedback cycle

Create a feedback cycle from your production data to improve your training data. Use Lightly to select data to cover new use cases and to prevent data drift for existing ones.

Easy to use

All you need to use Lightly is your data in cloud storage and a machine with a GPU. Our Python SDK allows for easy integration into your existing ML stack within a few hours

Automate your data pipeline

Lightly automates your data pipeline, processing tens of millions of samples daily without manual intervention. Simplify your data curation and selection process while ensuring high-quality training data for your models.

Explore Our

Unmatched Features

Discover the unique features that will lift your machine learning pipeline to the next level.

Supported Data types

Images

Sequences

Videos

Selection (Active Learning)

Embeddings

Metadata

Predictions

Features

Integrations

Dashboard

Automation

and more ...

Data Privacy

Data such as images are streamed directly from your connected cloud storage to the Lightly Worker or the Lightly UI. Since both components are run on the client side, your data never leaves your environment.

Additional Security

We provide additional SSO/2FA and SLAs for our enterprise customers. Contact us to learn more about how Lightly complies with SOC2, HIPAA, and GDPR.

Improve your data

Today is the day to get the most out of your data. Share our mission with the world — unleash your data's true potential.