Customer Success Stories

How Harvard Medical School Researchers Use Lightly to Train a 3D CT Foundation Model

Lightly helped Harvard Medical School advance their 3D CT segmentation research by delivering a clean, extensible SSL workflow that improved representation quality and unified experiment setups across the lab.

Suraj Pai
Research Associate
Overview

Lightly helped Harvard Medical School advance their 3D CT segmentation research by delivering a clean, extensible SSL workflow that improved representation quality and unified experiment setups across the lab.

Industry
Healthcare
Location
Boston, USA
Employee
<30

Get Started with Lightly

Talk to Lightly’s computer vision team about your use case.
Book a Demo
Products
LightlyTrain
Results
3D
SSL Training Pipeline with DINOv2
Use Case
SSL for 3D medical imaging

About

The AI in Medicine Program at Mass General Brigham works on self-supervised and foundation models for medical imaging, with a focus on radiotherapy planning and 3D segmentation tasks.

Their work spans MRI, CT, and other volumetric modalities - domains where labeled data is scarce, 3D pipelines are complex, and robust pretrained models are still underdeveloped. 

Problem

Medical imaging pipelines at the lab are highly heterogeneous: models must support full 3D volumes, non-standard intensity distributions, variable voxel spacing, and task-specific augmentation strategies. 

Unlike 2D vision, there is no widely adopted pretrained backbone for 3D CT segmentation that works consistently across datasets.

This creates several technical constraints:

  • Existing 3D pretrained models tend to be dataset-specific rather than broadly generalizable.
  • Most architectures require full model fine-tuning, which is slow and expensive.
  • Public SSL repositories weren’t designed for 3D or medical pipelines, making adaptation difficult.
  • Augmentations from natural images (e.g., color jitter) are not meaningful for CT.

The team wanted to move beyond task-specific tuning and instead train a DINO-based CT foundation model that could serve multiple oncology use cases, ideally requiring only light downstream adaptation.

With five researchers leading SSL efforts inside a 25-person lab, they needed an implementation that was clean, modular, and easy for several PhD students to use consistently.

To keep experimentation consistent, they standardized on: MONAI for medical-imaging data handling, PyTorch Lightning for workflow orchestration, Lightly SSL for the DINOv2 implementation internal config system (“sparkwheel” for experiment management).

Testimonials

"It took far less work than expected to plug DINO into our SSL system - the LightlySSL code was clean and easy to adapt"

Suraj Pai

Research Associate

Scalable and Efficient Data Curation using Lightly

The team first experimented with Meta’s DINO repositories, but the code complexity and frequent upstream changes made collaboration difficult. LightlySSL became the natural choice, especially since the group had already used Lightly in earlier self-supervised projects for about four years, which meant it fit naturally into their workflow and was easy for the team to adopt.

Choosing LightlySSL meant a clear and well-structured DINOv2 workflow that worked seamlessly with MONAI and PyTorch Lightning. The implementation was straightforward to extend to 3D and provided a reproducible setup that fit cleanly into their existing pipeline. It also reduced coordination overhead across PhD students, who could now work from a shared, consistent configuration.

Adapting DINOv2 for CT

To make DINO suitable for volumetric CT, the team introduced CT-appropriate augmentations and extended the pipeline to support 3D inputs. Using Lightly, they were able to:

  • Replace color jitter with histogram-shift augmentations appropriate for CT
  • Add 3D affine transformations to capture anatomical invariance
  • Integrate SimCLR-style medical augmentations validated in earlier projects
  • Support volumetric patch extraction aligned with CT geometry
  • Incorporate the workflow into their internal experiment-management system

These modifications enabled the model to capture finer radiological structure than their previous masked autoencoder (MAE) based setup.

Results

With LightlySSL, the team built a DINOv2-based CT foundation model that demonstrated promising early results for segmentation and served as a stable baseline across several ongoing studies.

The new workflow enabled them to:

  • integrate DINOv2  into a 3D SSL pipeline previously limited to MAE
  • improve feature-level representations through volumetric augmentations,
  • establish a reproducible training setup shared across PhD researchers,
  • produce DINOv2 baselines requested by peer reviewers with minimal overhead,
  • support downstream projects such as vision-language models and SAM-style segmentation approaches.

The foundation model now underpins multiple research directions in the lab and accelerates internal experimentation by reducing engineering complexity. Check out the project on Github.

Get Started with Lightly

Talk to Lightly’s computer vision team about your use case.
Book a Demo
Testimonials

What engineers say after adopting Lightly

No fluff—just results from teams using Lightly to move faster with better data and models.

"We had millions of images but no clear way to prioritize. Manual selection was slow and full of guesswork. With Lightly, we just feed in the data and get back what’s actually worth labeling."

Carlos Alvarez
Machine Learning Engineer

"Through this collaboration, SDSC and Lightly have combined their expertise to revolutionize the process of frame selection in surgical videos, making it more efficient and accurate than ever before to find the best subset of frames for labeling and model training."

Margaux Masson-Forsythe
Director of Machine Learning

“Lightly enabled us to improve our ML data pipeline in all regards: Selection, Efficiency, and Functionality. This allowed us to cut customer onboarding time by 50% while achieving better model performance.”

Harishma Dayanidhi
Co-Founder/ VP of Engineering

“By integrating Lightly into our existing workflow, we achieved a 90% reduction in dataset size and doubled the efficiency of our deployment process. The tool’s seamless implementation significantly enhanced our data pipeline.”

Usman Khan
Sr. Data Scientist

“Lightly gave us transparency to a part of the ML development that is a black box, data. Furthermore, Lightly enabled us to do Active Learning at scale and helped us improve recall and F1-score of our object detector by 32% and 10% compared to our previous data selection method. We finally saw the light in our data using Lightly.”

Gonzalo Urquieta
Project Leader

"Lightly is hyper-focused on finding thousands of relevant images from millions of video frames to improve deep learning models. The Lightly platform enabled us to build models and deploy features more than 2x faster and unlock completely new development workflows."

Isura Ranatunga
Co-Founder and CTO

Explore Lightly Products

LightlyStudio

Data Curation & Labeling

Curate, label and manage your data
in one place

Learn More

LightlyTrain

Self-Supervised Pretraining

Leverage self-supervised learning to pretrain models

Learn More

LightlyEdge

Smart Data Capturing on Device

Find only the most valuable data directly on device

Learn More

Ready to Get Started?

Experience the power of automated data curation with Lightly

Book a Demo

Get Beyond ImageNet: Vision Model Pretraining for Real-World Tasks.

See benchmarks comparing real-world pretraining strategies inside. No fluff.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.