Learn more about how the Lightly subsampling method compares against random subsampling on well-known acadmic datasets.
Note: We only run the training data through our data selection solution. The test set stays the same. We do not recommend to do this in practice since train / test should have a similar distribution to properly evaluate a ML model. Additionally, these datasets went through a manual cleaning procedure to balance the dataset. We see on customer data much stronger impacts. Typically, we see the same test accuracy with 50% of the training data selected by Lightly as when using the full training dataset.