Product Overview

Easy data preparation based on your needs

We have the right solution for every amount of data. Use our on-premise
or  webapp  solution together with our
 PIP package  to analyze and filter your first dataset within minutes.

You can try out our limited free version with no payment required!

Lightly also offers an easy-to-use interface. The following lines show how the package can be used to train a model with self-supervision and create embeddings with only three lines of code

from lightly import train_embedding_model, embed_images

# first the model is trained for 10 epochs
checkpoint = train_embedding_model(input_dir=
           trainer={'max_epochs': 10})

# embeding 'cats' using the trained model
embeddings, labels, filenames = embed_images(input_dir='./my

# Inspecting the shape of the embeddings

The Lightly framework provides a command-line interface (CLI) to train self-supervised models and create embeddings without having to write a single line of code

# upload only the dataset
lightly-upload input_dir=cat token=your_token

# the dataset can be uploaded together with the embedding
lightly-upload input_dir=cat embedding=your_embedding.csv
                 token=your_token dataset_id=your_dataset_id

# download the dataset
# download a list of files

lightly-download tag_name=my_tag_name
                 dataset_id=your_dataset_id token=your_token

# copy files in a tag to a new folder
lightly-download tag_name=my_tag_name
                 dataset_id=your_dataset_id token=your_token
                 input_dir=cat output_dir=cat_curate

Example for using the docker container to analyze and filter the famous ImageNet dataset. The sample report can be replicated using the following command.

docker run --gpus all --rm -it \
                  -v /datasets/imagenet/train/:/home/
input_dir:ro \
                  -v /datasets/docker_imagenet_500k:
/home/output_dir \
                  --ipc="host" \
                  lightly/sampling:latest \
                  token=MYAWESOMETOKEN \
                  lightly.collate.input_size=64 \
                  lightly.loader.batch_size=256 \
                  lightly.loader.num_workers=8 \                   
                  lightly.trainer.max_epochs=0 \
                  stopping_condition.n_samples=500000 \
                  remove_exact_duplicates=True \

Our Technology

We're passionate engineers who want to make deep learning more efficient. We're making use of representation learning using self-supervised methods to understand raw data. Our solution can therefore be used before any data annotation step. The learned representations can be used to analyze and visualize your datasets as well as for selecting a core set of samples that can be used for further steps in the data preparation pipeline. Our active-learning library is powered by the same algorithms to help you with iterative active-learning loops.

Our Platform

Preparing and organizing data for machine learning has never been easier.  With our platform anyone can become a data preparation engineer. Visual feedback helps you understand which samples are within your datasets and which have been removed. Keep track of different datasets versions using tags. Collaborate with your team in data cleaning and share the final datasets with your ML engineer training and evaluating models.

Read our whitepaper to learn more about our solution
Improve your data
Today is the day to get the most out of your data. Share our mission with the world — unleash your data's true potential.
Contact us