Medical imaging pipelines at the lab are highly heterogeneous: models must support full 3D volumes, non-standard intensity distributions, variable voxel spacing, and task-specific augmentation strategies.
Unlike 2D vision, there is no widely adopted pretrained backbone for 3D CT segmentation that works consistently across datasets.
This creates several technical constraints:
- Existing 3D pretrained models tend to be dataset-specific rather than broadly generalizable.
- Most architectures require full model fine-tuning, which is slow and expensive.
- Public SSL repositories weren’t designed for 3D or medical pipelines, making adaptation difficult.
- Augmentations from natural images (e.g., color jitter) are not meaningful for CT.
The team wanted to move beyond task-specific tuning and instead train a DINO-based CT foundation model that could serve multiple oncology use cases, ideally requiring only light downstream adaptation.
With five researchers leading SSL efforts inside a 25-person lab, they needed an implementation that was clean, modular, and easy for several PhD students to use consistently.
To keep experimentation consistent, they standardized on: MONAI for medical-imaging data handling, PyTorch Lightning for workflow orchestration, Lightly SSL for the DINOv2 implementation internal config system (“sparkwheel” for experiment management).