Domain adaptation helps models trained on one dataset perform well on another with a different distribution. By learning domain-invariant features through reweighting, feature alignment, or fine-tuning, models generalize better across domains.
Here is what you need to know about domain adaptation.
What is domain adaptation?
Domain adaptation is a technique (a specific form of transfer learning) that lets a model trained on a source domain perform well on a target domain with a different data distribution.Â
The goal is to learn domain-invariant features, allowing inputs from both domains to be handled similarly. This enables accurate predictions even when the target domain has little or no labeled
How is domain adaptation related to transfer learning?Â
Domain adaptation is a subset of transfer learning. Transfer learning spans moving knowledge across tasks or domains. In contrast, domain adaptation focuses on the same task with different data distributions, usually with limited or no target labels.Â
All domain adaptation is a form of transfer learning. However, not all transfer learning is domain adaptation.
‍
What techniques are used for domain adaptation?Â
There are several domain adaptation techniques used to learn domain-invariant features and reduce the divergence between source and target distributions.Â
Major approaches include:
Generative approaches: Translate or augment data across domains (e.g., GAN-based style transfer) to reduce the gap.
‍
We face challenges in machine learning (ML) when data changes between training and real-world use. Models expect the data they see in the real world to be similar to the data they were trained on.Â
When there's a mismatch, a problem known as domain shift occurs, and the model's performance degrades.Â
Domain adaptation provides a solution. It involves adjusting a model in one setting so it can adapt and perform well in another.
In this guide, we will walk you through:
Domain adaptation demands high-quality data. Lightly provides tools to help teams bridge domain gaps and improve model performance:
Domain adaptation is a machine learning technique to improve a model’s performance on a target domain by using knowledge from a related source domain.
The source domain is the dataset on which the model is originally trained with labeled data. The target domain is the new dataset where we want to apply the model.
Domain adaptation assumes that the tasks or categories are the same, such as both domains involving the classification of the same object categories. However, the data distributions differ, such as in pixels or features.Â
For example, a source domain of well-lit outdoor photos and a target domain of dim indoor photos share the same labels like “car” or “person” but have different lighting distributions.
Let's define core concepts in more detail for a better understanding of the domain adaptation:
Modern deep learning models excel when training and test data come from the same distribution, but in real-world deployment, they often do not. A model trained on one domain can degrade badly if the deployment domain’s data shifts even slightly.Â
For instance, a photo classifier trained on bright daylight images may mislabel the same scenes taken at dusk or under different cameras.Â
This domain gap means that without adaptation, each new domain would force us to collect and annotate new data or retrain the model, which is costly.
Domain adaptation is necessary to address the practical challenges posed by this gap. Key reasons include:Â Â Â
Pro tip: Looking for high-quality labeled data or domain-specific curation? Check out Lightly’s Training Data Services or our list of 5 Best Data Annotation Companies in 2025 [Services & Pricing Comparison].
Domain adaptation is closely related to transfer learning and domain generalization, but each has a distinct focus in machine learning contexts.Â
Transfer learning reuses knowledge from one domain or task to improve performance in another. This can involve various tasks, such as using ImageNet-trained weights for medical image classification.Â
Domain adaptation, on the other hand, assumes the task and label space are the same and focuses on handling differences in data distribution.Â
Domain generalization trains a model that performs well on unknown, unseen domains without requiring any further adaptation. That typically involves training on multiple source domains so that the model learns invariant features that generalize across domains.Â
In contrast, domain adaptation assumes you have access to (possibly unlabeled) data from the target domain during training, and that you adapt the model to that domain.
The table below summarizes the key distinctions between these three fields.
‍
Domain adaptation problems vary depending on the available data and the differences between the source and target domains. Correctly identifying the type of problem you are facing is the first step in choosing the right solution.
Domain adaptation problems are often classified based on the availability of labeled data in the target domain.
Supervised adaptation is ideal if the target domain has fully labeled data. But it is rarely seen in practice due to the high cost of labeling.Â
If plenty of target labels are available, standard supervised learning techniques can often solve the problem. These include approaches like fine-tuning a pre-trained model from the source domain using the target data.
Supervised domain adaptation can be achieved through various methods. One such method is detailed in a paper that introduces a classification and contrastive semantic alignment (CCSA) loss.Â
In this method, a deep learning model maps input images into a shared embedding space and then performs classification. Within this embedding space, the CCSA loss is used to:
This joint optimization helps align domain features while preserving clear class separation.
SSDA is a go-to option for problems where the target domain has only a small amount of labeled data but a larger pool of unlabeled data.
It combines supervised and unsupervised learning. This allows you to train a model directly on the few labeled target samples. At the same time, it uses a large set of unlabeled target data alongside the labeled source data.Â
Practically, the model aligns feature distributions between the source and target domains by minimizing unsupervised alignment losses. These may include adversarial losses, entropy minimization, or discrepancy-based objectives.
These losses encourage the model to produce similar feature representations for source and target images, even when the target samples lack labels.
Here, the feature extractor (G) learns shared representations from labeled source data, a few labeled target examples, and a large pool of unlabeled target samples. It projects all inputs into a common feature space, where the classifier (C) predicts class probabilities.
Training alternates between supervised and unsupervised steps, using classification loss on labeled examples and alignment or entropy-based loss on unlabeled ones.Â
This process ensures that the feature extractor produces domain-invariant embeddings and allows the model to perform reliably in the target domain even with minimal labeled data.
In WSDA, the target domain contains labels, but they are weak or imprecise. For example, in a segmentation task, you might have image-level tags such as "this image contains a car", instead of precise pixel-level masks.Â
Weak labels are easier to collect than full annotations and can still provide valuable supervisory signals or guidance to constrain the adaptation process.
Pro Tip: Check our list of 12 Best Data Annotation Tools for Computer Vision (Free & Paid).
Consider the task of 3D hand pose estimation under Hand-Object Interaction (HOI) settings. In such cases, obtaining full 3D annotations is difficult due to occlusions and the high cost of labeling.
This paper introduces a novel Weakly Supervised Domain Adaptation method to adapt a model trained on hand-only data (with 3D labels) to the HOI domain (without 3D labels).Â
The approach uses 2D object segmentation masks from the target domain as weak supervision. These masks guide the model in distinguishing between hands and objects during the adaptation process.
The framework combines a GAN-based image-to-image translation model with a 3D mesh reconstruction model to transform HOI images into a “hand-only” appearance. The adaptation aligns the domains by teaching the network to predict accurate 3D hand poses even in occluded, object-interaction scenarios.
Although the HOI data lacks full 3D labels, the weak 2D masks provide sufficient structural cues to effectively bridge the gap between the hand-only and HOI domains.
Unsupervised domain adaptation is another good option for problems where the target domain has no labels.Â
UDA methods rely entirely on aligning the source and target distributions using only unlabeled target samples. Techniques include aligning features, adversarial training, and reconstruction, all without target labels.
The source and target domains have the same feature space and input structure. For example, adapting between two RGB image datasets, or between two text corpora of the same language.
In this case, we can directly compare features (pixels, vectors) between domains. Although the distributions of these features differ, the features themselves are directly comparable.
Heterogeneous domain adaptation deals with cases where the source and target domains have different feature spaces or even varying modalities.
For example, the source could be text data and the target images (or vice versa), or one domain could be RGB images and the other multispectral images.Â
Since inputs differ, adaptation requires learning a common latent space where data from both domains can be meaningfully compared and aligned.
Here, we have two distinct domains, the source domain (French sentences) and the target domain (Spanish sentences). Each domain has its own features and distributions. Directly aligning them isn’t possible because they exist in different feature spaces.
To overcome this, the model projects both domains into a common representation space using a latent embedding or neural mapping to capture shared semantics.Â
Through domain adaptation, the system minimizes the gap between the representations. This makes the features from both domains more homogeneous and comparable.
The adapted representations from both domains (shown as triangles in the bottom section of the image) align closely in the same feature space by the end of the process.Â
This alignment enables the model to transfer knowledge from labeled data in the source domain to improve predictions in the target domain, even though their original data types differ.
The relationship between the sets of classes in the source and target domains defines another critical set of categories. Applying a method that assumes the wrong label space relationship can result in significant performance drops.
The source and target domains have identical label sets (they cover the same classes). The goal is to adapt a classifier to shared categories when only the domain (data distribution) changes, with all classes present in both source and target.
In partial domain adaptation, the target label space is a subset of the source label space. For example, a source dataset contains 10 classes, but the target dataset includes only 5 of them.Â
When adaptation is performed using this method, source-only classes are detected and disregarded. It prevents irrelevant source categories from confusing the model during adaptation to a smaller target label set.
Open-set domain adaptation (OSDA) is used when the target domain contains some classes that were not present in the source domain.Â
In simpler terms, the model encounters new, unknown categories during testing that it never saw during training.
The model must correctly classify the known classes shared with the source while also identifying and rejecting instances of the new, unknown classes.Â
It is crucial for building safe and reliable systems that can recognize when they encounter something they were not trained on.
In the diagram, on the left, you can see the labeled source domain, where all training samples belong to known classes. The unlabeled target domain, however, includes both known and unknown classes (highlighted in red).
A feature extractor, a pretrained CNN, transforms both source and target images into numerical feature representations. The system then passes these features through a decoder and a softmax classifier to predict class probabilities.
A special gradient reversal mechanism helps the model separate known and unknown features. It encourages the model to minimize the classification loss for known classes while maximizing uncertainty for unknown ones.
As a result, the model learns to classify familiar samples correctly and flag unfamiliar samples as belonging to an additional “unknown” class.
Universal domain adaptation is a general case that covers both partial and open-set situations. Neither the source nor target label sets are assumed to be subsets of the other.
Both the source and target domains can have their own private classes, as well as some common classes.Â
Universal domain adaptation methods aim to detect the classes shared by the source and target domains and adapt to them. They also reject samples from each domain’s private classes as “unknown.”
The number of source or target domains involved can also define adaptation problems.
The classic setup involves a single source and a single target domain. However, in many cases, we have multiple labeled source domains available for training.
A multi-source domain adaptation algorithm can use all diverse source data jointly to build a more powerful model for the target domain.Â
Mixing all source data can be counterproductive if the sources differ significantly. However, specialized techniques are necessary to effectively weigh and combine knowledge from each source.
Multi-target domain adaptation (MTDA) allows a single model to function across multiple, distinct target domains simultaneously. It differs from single-target adaptation, which only aligns a source domain with a single target.Â
MTDA learned a generalized representation that bridges multiple domain shifts by disentangling domain-specific features or dynamically adjusting model parameters. This paper introduces a framework, Collaborative Consistency Learning (CCL), to address this challenge.Â
The main idea behind CCL is collaboration among multiple expert models. Each expert model is trained on one specific target domain, allowing it to specialize in that domain’s unique features and patterns.Â
However, instead of treating these experts independently, CCL encourages them to collaborate by sharing information and aligning their predictions through consistency constraints.
During training:
The magnitude of the domain gap can influence the adaptation strategy.
Sometimes, a large domain gap can be bridged by introducing intermediate domains.Â
The model is adapted progressively along a path of domains. It moves from the source to an intermediate domain, and finally to the target domain. At each step, it solves a series of easier adaptation problems with smaller domain gaps.
For example, to adapt a model from highly stylized art images to medical scans, one might first go through an intermediate domain in two steps instead of jumping directly.Â
Common Domain Adaptation Techniques and Approaches
Many techniques have been developed to address the domain shift problem.Â
These methods typically seek features that are consistent across domains, focusing on task-relevant information while ignoring domain differences.
Below are some of the most common and effective approaches.
These methods adjust or reweight source samples to make their distribution more closely match the target distribution.Â
One common strategy is to assign importance weights to each source instance. This is often done by comparing the feature distributions of the source and target datasets. Once the weights are calculated, the model is trained on the reweighted source data.Â
The goal is to give more weight to source examples that resemble the target and reduce the influence of those that don't.
Instance reweighting is simple and interpretable, but it requires some overlap between source and target data distributions to work well. If the overlap is small or the feature mapping is very complex, reweighting alone might fail.
A large class of domain adaptation techniques explicitly aligns the statistical properties of the source and target data distributions within a shared feature space.Â
The model is encouraged to produce similar feature representations for inputs from both domains by minimizing a chosen distance metric between the distributions.Â
Common choices for the statistical metric include:
Inspired by Generative Adversarial Networks (GANs), adversarial DA methods insert a domain classifier into the network and train it adversarially with the feature extractor.Â
The feature extractor is trained to perform well on the main task, like classification using source labels, but also to fool the domain classifier.Â
It's forced to learn representations that are common to both domains by trying to generate features that the discriminator cannot reliably classify by domain.Â
While highly effective, this adversarial training process can be unstable and requires careful balancing of the learning rates and loss weights to ensure stability.Â
Another idea to align domains is to use an autoencoder or reconstruction objective as an auxiliary task on one or both domains.Â
The model's feature extractor acts as the encoder, and a separate decoder network is added. The model is then trained to perform two tasks simultaneously:
This reconstruction loss encourages the feature extractor to preserve low-level information from the input. It helps create transferable features that are less prone to overfitting on source-specific artifacts.  Â
Instead of aligning features, one can adapt at the pixel level by transforming source images to look like target images. This approach uses image-to-image translation generative adversarial networks, such as CycleGAN and StarGAN.Â
For instance, CycleGAN-based Domain Adaptation (also known as CyCADA) establishes two distinct mappings. One from the source to the target style and another from the target back to the source style.Â
This process incorporates a cycle-consistency constraint to preserve the original content.Â
After this training, every source image is “translated” into the target domain style. Next, the original source labels are used to train the task model on these adapted images.Â
This approach directly addresses low-level appearance gaps (lighting, color, texture) by altering the input images. It can help computer vision tasks where the styles differ, like synthetic-to-real driving scenes.
However, GANs are known to be difficult to train. The translation process can distort object shapes and adds the additional complexity of generating images as part of the pipeline.
Ensemble methods use multiple models or ensembles to improve adaptation. If you have multiple source domains, train a separate model for each source domain, then average their predictions on the target domain. This can sometimes capture complementary knowledge and reduce variance.Â
Other approaches use self-ensembling or mean teacher models, where one model’s weights are an exponential moving average of another’s. Consistency losses encourage the teacher and student to agree on target samples.
Ensemble methods can improve robustness, but at the cost of more computation and more complex inference. Combining outputs from different models can also be a non-trivial task.
When a few labeled examples are available in the target domain, semi-supervised domain adaptation methods can incorporate those labels directly.Â
They combine the standard supervised loss on these few labeled target samples with an unsupervised domain-adaptation loss (e.g., MMD or an adversarial loss) on the remaining unlabeled target data.Â
It enables the model to use the valuable, albeit limited, information from the target labels to guide the adaptation process more effectively.
Pro tip: Make sure to also check out our Engineer's Guide to Self-Supervised Learning.Â
Self-training and pseudo-labeling are iterative learning strategies used in unsupervised domain adaptation to improve model performance on unlabeled target data.
The idea is to let the model teach itself by gradually generating and refining its own labels for new data.
The process works as follows:
The primary risk of this approach is that if the initial model is inaccurate, it can generate incorrect pseudo-labels, thereby reinforcing errors. Â
In the SFDA case, you do not have access to the source data at the time of adaptation. Instead, you only have a pretrained source model and unlabeled target data, and SFDA methods attempt to adapt the model without any source samples.Â
For example, one approach involves keeping the source classifier fixed while updating batch normalization statistics on the target data or employing self-supervised losses on the target.
This is challenging since the model must adapt blindly to where the source data lies.Â
However, it is increasingly important in practice when sharing raw source data is not allowed. Such methods trade off some performance for practicality.
The table below summarizes these approaches with their key ideas, whether they require labeled target data, and their pros and cons:
‍
Domain adaptation is essential for deploying machine learning models across various real-world applications, where data variability is common. Â
Here are some notable applications:
In vision tasks, it is common to have easily generated synthetic data, such as simulation- or graphics-generated data, with perfect labels. However, the model must ultimately work on real images.
For instance, autonomous-driving datasets from simulators can be abundant and well-annotated. However, real-world driving images differ in texture and lighting.
Domain adaptation, especially pixel-level GAN methods, is often used to make synthetic images look real. It also aligns features so that models trained on synthetic data perform well on real camera images. This synthetic-to-real adaptation saves immense annotation effort.
Object detectors trained on one camera setup or environment may perform poorly when deployed in another (e.g., different angle, resolution, lighting). In video surveillance, cameras vary in quality and viewpoint.Â
Domain adaptation can address this by aligning feature representations or translating images to match the new conditions.
For example, if a car detector is trained on clear sunny data, adaptation techniques (like feature alignment or image translation) can improve performance on foggy night footage. Similarly, surveillance systems often adapt across different locations (indoor vs outdoor or different cities).
Pro tip: Working with YOLO models? Read YOLO Object Detection Explained: Models, Tools, Use Cases.
Medical image analysis is well-known for domain shifts. MRI or CT scans from different hospitals, scanners, or patient populations can have varying characteristics (contrast, noise, resolution).Â
A segmentation model trained on data from Hospital A might fail on Hospital B’s scans. Domain adaptation allows the use of a model across sites.
For example, it's possible to adapt a tumor-detection model trained on MRIs from one device to another without needing to collect a completely new labeled dataset.Â
This is extremely valuable because expert labeling in medicine is costly. Domain adaptation can reduce the need for new annotations by transferring knowledge across different medical imaging domains.
In robotics, it is often much safer and cheaper to train policies in simulation before deploying on a real robot. Yet, the real environment differs from the simulator.Â
Domain adaptation helps transfer learned policies from simulation to the real world. This includes adapting vision-based manipulation policies to work on a physical robot’s camera feed.
Techniques include domain randomization during training and feature layer alignment to make the learned policy robust to the sim-to-real gap. This sim-to-real adaptation is crucial for using reinforcement learning in physical systems.
Outside of vision, domain adaptation is used in natural language processing.Â
A typical example is adapting a sentiment classifier trained on movie reviews (long, formal text) to work on social media posts like tweets (short, slangy text). The language distribution shifts considerably, but the task (sentiment polarity) is similar.Â
Domain adaptation techniques can help a sentiment model generalize from reviews to tweets. These include aligning feature representations of words or using adversarial language models. This avoids the need to label thousands of tweets when a labeled review dataset already exists.
The core challenge of domain adaptation is a data problem. A domain gap arises from differences in data distributions, and the most effective solutions are often data-centric.Â
Overcoming the domain shift problem requires a deep understanding of both your source and target datasets. This is where a robust data curation platform like Lightly AI becomes essential.
LightlyTrain enables self-supervised pretraining on your own unlabeled images. You can learn rich domain-specific feature representations before any adaptation by pretraining a model on target-domain images or mixed source and target.
After pretraining, you fine-tune the model on whatever labeled target data you have. The result is a model that already understands your domain’s style and thus requires less adaptation.Â
This can dramatically reduce the amount of labeled data needed, since the feature extractor has already been tuned on the target domain distribution.
LightlyOne helps you select the most valuable data points to label from your pool, using active learning strategies. It can identify which target-domain samples would most improve model performance if labeled.Â
For example, after an initial adaptation pass, there may be regions where the model is uncertain about the target data. LightlyOne can highlight similar unlabeled target samples for annotation.Â
It ensures that your labeled set covers the target distribution efficiently by focusing labeling effort on the most impactful target examples.Â
Domain adaptation makes machine learning practical and effective in real-world applications. It addresses the unavoidable domain shift problem, where a model trained in one context fails in another.Â
Engineers can bridge the gap between source and target domains by understanding the different types of adaptation problems and the array of available techniques.
The most successful strategies are data-centric, focusing on managing data distributions at the core of the challenge.
Get exclusive insights, tips, and updates from the Lightly.ai team.
See benchmarks comparing real-world pretraining strategies inside. No fluff.