Note: We only run the training data through our data selection solution. The test set stays the same. We do not recommend to do this in practice since train / test should have a similar distribution to properly evaluate a ML model. Additionally, these datasets went through a manual cleaning procedure to balance the dataset. We see on customer data much stronger impacts. Typically, we see the same test accuracy with 50% of the training data selected by Lightly as when using the full training dataset.


Kitti is a well-known dataset for autonomous driving for object detection.


CamVid is one of the first image segmentation datasets from 2007 and with little over 700 images for autonomous driving


CIFAR-10 is a well-known image classification dataset consisting of 10 classes.


Cityscapes has been released in 2016 and is commonly used for benchmarking segmentation models in autonomous driving. It consists of 5'000 images.
