A Divide-and-Conquer Strategy for Cross-Domain Few-Shot Learning

Wang, Bingxin; Yu, Dehong

doi:10.3390/electronics14030418

Open AccessArticle

A Divide-and-Conquer Strategy for Cross-Domain Few-Shot Learning

by

Bingxin Wang

^*

and

Dehong Yu

School of Mechanical Engineering, Xi’an Jiaotong University, No. 28 Xianning West Road, Xi’an 710049, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(3), 418; https://doi.org/10.3390/electronics14030418

Submission received: 23 December 2024 / Revised: 16 January 2025 / Accepted: 20 January 2025 / Published: 21 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

Cross-Domain Few-Shot Learning (CD-FSL) aims to empower machines with the capability to rapidly acquire new concepts across domains using an extremely limited number of training samples from the target domain. This ability hinges on the model’s capacity to extract and transfer generalizable knowledge from a source training set. Studies have indicated that the similarity between source and target-data distributions, as well as the difficulty of target tasks, determine the classification performance of the model. However, the current lack of quantitative metrics hampers researchers’ ability to devise appropriate learning strategies, leading to a fragmented understanding of the field. To address this issue, we propose quantitative metrics of domain distance and target difficulty, which allow us to categorize target tasks into three regions on a two-dimensional plane: near-domain tasks, far-domain low-difficulty tasks, and far-domain high-difficulty tasks. For datasets in different regions, we propose a Divide-and-Conquer Strategy (DCS) to tackle few-shot classification across various target datasets. Empirical results across 15 target datasets demonstrate the compatibility and effectiveness of our approach, improving the model performance. We conclude that the proposed metrics are reliable and the Divide-and-Conquer Strategy is effective, offering valuable insights and serving as a reference for future research on CD-FSL.

Keywords:

cross-domain few-shot learning; domain metric; divide-and-conquer strategy; whitened PCA

1. Introduction

The rapid progress of deep learning in computer vision, natural-language processing, and various other fields is primarily attributed to advancements in hardware computing capabilities and the evolution of big-data technology. Presently, cloud-based services [1,2,3], which harness high computing power and big data, represent the primary form of artificial intelligence. Nevertheless, in the era of the Internet of Everything, smart terminal applications, which must adapt to complex and dynamic real-world environments [4,5], often rely on limited power and a scarcity of training samples [6]. Maintaining high performance in Few-Shot Learning scenarios is a crucial aspect of deep-learning technology on its path toward Artificial General Intelligence (AGI).

Classical Few-Shot Learning (FSL) models involve pretraining on a base set and subsequent evaluation on a novel set. Conventionally, these two sets are subsets derived from the same dataset, with non-overlapping categories [7,8,9,10,11]. This results in similar image styles, minimal domain distances, and comparable classification difficulties. In practical scenarios, however, the data distribution of downstream tasks and the difficulty of classification can be highly unpredictable. As a result, models may encounter significant performance degradation when confronted with cross-domain data [11,12]. Studies [11,13] have highlighted that the performance of FSL methods [7,8,14] is often inferior to traditional batch-trained models in cross-domain scenarios. Therefore, investigating Cross-Domain FSL (CD-FSL) not only enhances our understanding of the fundamentals of deep model generalization but also contributes to advancing the practical applications of Few-Shot Learning [15].

Whether a model can generalize to target data and effectively solve target tasks is determined by the distance between the target domain and the source domain [16,17]. A smaller distance indicates a better representation of features for the target task, resulting in good classification performance. Conversely, a larger distance leads to poorer feature representation, making it challenging to directly apply the model to target tasks. This phenomenon arises because models are designed to maximize the discrimination between source categories. Consequently, these models are highly sensitive to the features of classes within the source domain [18], which we term “source class features”. This sensitivity causes the feature vectors to exhibit significant magnitudes in source class-feature directions and minimal magnitudes in others when extracting target domain features. This results in a non-uniform feature space, leading to performance degradation in cross-domain scenarios.

Furthermore, the difficulty of target tasks plays a crucial role in determining the model’s performance [16,17]. If the categories in the target dataset are linearly separable, an optimal decision surface exists. The degree of linear separability in the target data establishes the upper limit for the model’s performance in FSL tasks. When the target data have poor linear separability, incorporating more training data is necessary to globally finetune the model to the target task.

Therefore, the success of FSL is primarily constrained by the domain distance from the source dataset and the difficulty of the target dataset. By measuring and partitioning target datasets based on these two dimensions and employing tailored strategies for different tasks in a divide-and-conquer manner, the model’s ability can be enhanced. In the study of domain distance metrics, Xu et al. [12] introduced a definition based on images and their marginal distributions, laying the foundation for subsequent research. Zhang et al. [19] proposed two measurement methods: one based on the support set and another based on the mean vector of the dataset. However, these methods lacked comprehensive information on data distribution. The metric proposed by Oh et al. [17] utilized class labels that did not conform to the definitions put forward by Xu et al. [12]. Therefore, current research lacks reliable quantitative metrics to perform a Divide-and-Conquer Strategy. In this paper, we introduce novel quantitative metrics to assess domain distance and task difficulty. Domain distance is evaluated by computing the difference in covariance matrices between target and source data. Task difficulty is determined by fitting the target data with a linear classifier and calculating the global linear separability among different categories. Figure 1 illustrates the domain distance and difficulty of 15 target datasets on the Meta-Dataset [20] and BSCD-FSL [16] benchmarks, using the base set of mini-ImageNet [7] as the source dataset. The horizontal axis represents domain distance, while the vertical axis indicates difficulty. The detailed methodology for these metrics is outlined in Section 3.

Based on the distribution of target datasets depicted in Figure 1, we categorize them into three groups and devise strategies, respectively:

Near-Domain Tasks: For tasks in the near domain, the target-data distribution closely resembles the source distribution. The class features learned from the source domain remain crucial in the target tasks. In such cases, the classical FSL algorithm can be directly applied, as it effectively leverages these shared features.
Far-Domain Low-Difficulty Tasks: In far-domain tasks, the class features learned from the source domain are significantly attenuated for the target tasks. It becomes essential to mine information embedded in non-class features. To address this challenge, we propose a whitened-PCA method aimed at reconstructing an isotropic feature space. This approach normalizes the magnitude of all principal components in the high-dimensional space, balancing the contribution of each component. This method enhances performance for all far-domain tasks without any additional cost, as it neither introduces new data nor requires additional parameters.
Far-Domain High-Difficulty Tasks: For tasks in the far domain with high difficulty, the model no longer achieves linear separability with respect to the target dataset. To adapt the model to these complex tasks, we employ global finetuning techniques. This approach allows the model to learn task-specific features and adapt its decision boundaries to better fit the target-data distribution.

The proposed Divide-and-Conquer Strategy (DCS) outlined in this paper is visualized in Figure 2. This approach involves dividing various target tasks based on domain distance and task difficulty and formulating tailored feature optimization strategies to enhance the model’s performance in FSL. Our primary contributions are three-fold:

(1) We introduce a quantitative metric for assessing domain distance and task difficulty within target datasets. This metric facilitates the categorization of target tasks into three distinct groups: near-domain tasks, far-domain low-difficulty tasks, and far-domain high-difficulty tasks.

(2) We present the DCS framework, which is tailored to address the challenges associated with different target task categories. For near-domain tasks, we employ classical FSL algorithms. For far-domain tasks, we introduce a whitened-PCA method to enhance the feature representation of data from distant domains. Furthermore, for far-domain tasks with high difficulty, we leverage a limited amount of labeled data to globally finetune the model, thereby improving its alignment with the target tasks.

(3) Our experimental results, conducted on 15 target datasets, demonstrate the compatibility and efficacy of DCS when combined with classical FSL algorithms. The comprehensive experimental outcomes demonstrate the improved performance, offering a novel approach for future research in the field of few-shot learning.

The remainder of this paper is organized as follows. Section 2 briefly reviews the related works on CD-FSL, feature post-processing, and domain metrics, highlighting our innovations and contributions. Section 3 presents the problem definition and introduces methods for measuring domain distance and difficulty; it also delineates detailed learning strategies tailored for near-domain, far-domain low-difficulty, and far-domain high-difficulty tasks based on these metrics. Section 4 conducts measurements of domain distance and difficulty across 15 target datasets, comprehensively evaluates the effectiveness of the DCS through experiments, and provides an analysis and discussion of the experimental results. Finally, Section 5 draws the main conclusions.

2. Related Works

In this paper, we propose metrics for domain distance and task difficulty, enabling the categorization of target datasets from different domains. Based on the categorization results, we have designed a Divide-and-Conquer Strategy for Few-Shot Learning to address tasks with varying domain distances and difficulties. Additionally, for far-domain tasks, our proposed whitened PCA serves as a feature post-processing method, constructing an isotropic feature space without increasing training data or computational costs. The topics covered in this work include CD-FSL, feature post-processing, and domain metrics, which will be introduced separately below.

FSL & CD-FSL: Few-shot learning (FSL) aims to rapidly acquire the ability to recognize novel categories with minimal training samples, focusing on the recognition problem across different classes within the same domain. It primarily encompasses three approaches: optimization-based methods [21,22,23], metric-based methods [8,9,10,24,25,26], and transfer learning-based methods [11,18,27,28]. Among these, optimization-based and metric-based methods introduce meta-training or episode-training to maintain a consistent paradigm between training and evaluation. Despite the remarkable progress and achievements made by FSL, it often encounters a decline in effectiveness when confronted with cross-domain tasks [12]. This challenge has spurred the emergence of Cross-Domain FSL (CD-FSL) methodologies, encompassing three primary approaches: instance-guided approaches, parameter-based approaches, and feature post-processing approaches. Instance-guided approaches [17,29,30,31] aim to guide the model towards acquiring cross-domain generalization abilities by incorporating a diverse array of target domain samples. These methods typically necessitate extensive training with auxiliary unlabeled samples from the target domain. Parameter-based approaches [13,32,33] focus on optimizing the model’s parameters to reduce the hypothesis space, thereby minimizing the number of training samples required. Feature post-processing approaches [19,34,35,36,37,38,39] endeavor to derive a feature mapping function capable of transferring features from the source domain to the target domain. This facilitates rapid adaptation of features to the task, resulting in enhanced feature representation. The far-domain strategy proposed in this paper falls under the category of feature post-processing methods.

Feature post-processing approaches: Shallow features often possess stronger migration capabilities compared to deep features. As a result, the fusion of features extracted from various layers is typically advantageous in boosting the model’s generalization ability while preserving high-level semantic information. CHEF [35] achieved this by unifying multiple abstraction layers of a neural network into a cohesive feature representation. Zou et al. [38] took a different approach, learning distinctive information unique to each sample by merging intermediate layers of features. Du et al. [34] designed a hierarchical prototype network, integrating information from each layer into the final prototype features. Furthermore, techniques that involve reweighting the feature vector output from the network can enhance feature representation in the target domain. MemREIN [36], for instance, proposed a method that combines memorization, restitution, and instance regularization to mitigate cross-domain incompatibilities of features. Li et al. [37] introduced a nonlinear subspace and hyperbolic tangent transformation technique to minimize task-irrelevant features while preserving migratory dissimilarity features. Song et al. [18] designed a class-feature subspace-mapping method based on the centroid vectors of base classes, which effectively enhanced the model’s performance on near-domain data. CIM [39] offers a straightforward feature transformation function, aiming to compress feature components with large amplitudes and expand those with smaller amplitudes, facilitating global adaptation to all target domains. The far-domain strategy presented in this paper aligns with the feature reweighting approach. However, our method distinguishes itself by utilizing Principal Component Analysis (PCA) in the source domain to derive weighting coefficients for different directions. This allows us to normalize the magnitude of each component, thereby uncovering cross-domain information implicitly embedded within the features.

Domain Metric: Guo et al. [16] introduced perspective distortion, semantic contents, and color depth as criteria for distinguishing domain differences. These criteria, however, heavily rely on subjective researcher perceptions and lack objective, quantitative measurements. Zhang et al. [19] proposed two methods for measuring domain distance: Wasserstein Distance for Measuring Domain Shift (WDMDS) and Maximum Mean Discrepancy for Measuring Domain Shift (MMDMDS). WDMDS is computationally intensive and limited to small sample sizes, making it unsuitable for evaluating entire datasets. MMDMDS assesses differences between the mean features of two domains, overlooking distributional information within each dataset. Oh et al. [17] evaluated target datasets based on domain similarity and FSL difficulty. They employed the Earth mover distance to measure distributional disparities between source and target domain prototypes. This approach, however, fails to capture marginal distributions due to its reliance on class labels. Furthermore, its definition of difficulty is solely based on the target dataset, overlooking the importance of the source domain perspective.

3. Main Approaches

3.1. Problem Definition

Before delving into the specifics of the Divide-and-Conquer Strategy, it is essential to first establish the task construction and symbolic representation of CD-FSL. Following the conventions in FSL literature [8,9,10,18,24,39,40], the model undergoes pretraining on a large-scale source dataset

D_{s} = \{(x_{i}, y_{i}) | y_{i} \in C_{s}\}

to accumulate prior knowledge. Once this pretraining is complete, the model’s performance is evaluated through a series of FSL tasks sampled from the target dataset

D_{t} = \{(x_{i}, y_{i}) | y_{i} \in C_{t}\}

.

The FSL task construction involves randomly selecting N classes from

D_{t}

, with each class containing K labeled training samples and q test samples. This forms a support set

S = {\{x_{i}^{(s)}, y_{i}^{(s)}\}}_{i = 1}^{N K}

and a query set

Q = {\{x_{i}^{(q)}, y_{i}^{(q)}\}}_{i = 1}^{N q}

, which constitutes an N-way K-shot task. It is important to note that the source dataset

D_{s}

and the target dataset

D_{t}

have non-overlapping classes and exhibit distinct data distributions.

3.2. Metric Approaches

The challenge in CD-FSL stems from the diversity of the target-data distribution, which involves varying degrees of domain distances and task difficulties. Historically, researchers’ assessments of domain distance and task difficulty have been largely influenced by personal subjective perceptions of the data, lacking objective and quantitative metrics [19]. Most FSL methods demonstrate effectiveness primarily on near-domain or partially far-domain data. Consequently, there is a pressing need to develop a metric that can objectively quantify the domain distance and task difficulty of the target dataset relative to the source data. Such a metric would enable a more precise understanding of the distributional differences between the source and target data, thereby informing the design of more effective CD-FSL strategies.

According to the definition provided by Xu et al. [12], the domain of a dataset is solely determined by the image samples and their marginal distributions, independent of the class labels. Conversely, task difficulty is dictated by the class labels and their conditional distributions, which correspond to the degree of linear separability of the samples in the feature space. This definition effectively decouples the target dataset into two dimensions: domain distance and task difficulty.

3.2.1. Domain Distance

The discrepancy in data distribution between the target and source domains is a crucial factor that can significantly impact model performance, commonly referred to as domain distance. A smaller domain distance indicates a higher similarity in data distributions, indicating the relevance of source class features in the target domain. Conversely, a larger domain distance signifies greater variability in data distributions, diminishing the utility of source class features and potentially causing them to become interference. Quantifying this difference in data distribution is instrumental in assessing the significance of source class features in the target domain, facilitating feature optimization.

The objective of the domain distance metric is to capture the direction and shape of the distributions of target datasets, thereby facilitating feature transformation within the Divide-and-Conquer Strategy. This PCA-based transformation is highly sensitive to the directions of data distributions in the feature space. Identifying the disparities in distribution directions between the target and source datasets is crucial for selecting an appropriate feature mapping strategy. Since the covariance matrix serves as an effective representation of the direction and shape of data distributions, we define domain distance by calculating the difference between the covariance matrices of the data from the two domains, which is computed as follows:

δ_{d o m a i n} = \frac{{∥cov (D_{s}) - cov (D_{t})∥}_{F}}{\sqrt{d}}

(1)

where

cov (D_{s}) \in R^{d \times d}

and

cov (D_{t}) \in R^{d \times d}

are the covariance matrices of

D_{s}

and

D_{t}

. d is the dimension of the feature space.

{∥\cdot∥}_{F}

is the Frobenius paradigm of the matrix. The numerator characterizes the difference between two distributions. The denominator, with

\sqrt{d}

as a scaling factor, achieves normalization across spatial dimensions. In this formula, only the distribution of image features is considered, regardless of their class labels.

It is important to highlight that the proposed metric reflects the perspective of the source domain

D_{s}

for a given model. It cannot directly capture the distance between any two arbitrary target domains. For instance, even if two separate target domains,

D_{t 1}

and

D_{t 2}

, exhibit the same distance to

D_{s}

, it does not imply that

D_{t 1}

and

D_{t 2}

are proximate to each other; in fact, their actual distance can be significant. Furthermore, variations in network architectures can lead to disparities in generalization abilities, ultimately resulting in diverse metric outcomes.

3.2.2. Difficulty

Different datasets often have varying classification criteria and granularity, leading to diverse challenges for models when applied to the target domains. In few-shot scenarios, finetuning the model can often result in significant overfitting. Hence, it is essential for the model to possess a fundamental capability for linear separability regarding target tasks. The level of linear separability exhibited by the model on the target dataset dictates the upper limit of the few-shot classification ability. A lower degree of linear separability corresponds to increased difficulty. To quantify the difficulty of

D_{t}

, we append a learnable linear layer to the end of the parameter-frozen feature extractor and use it to fit the entire

D_{t}

via gradient descent. Subsequently, we perform inference on all samples in

D_{t}

to obtain the classification accuracy

s (D_{t})

. The task difficulty of

D_{t}

is then defined by the following formula:

δ_{d i f f i c u l t y} = \frac{- \log_{2} (s (D_{t}))}{\log_{2} (N_{y \in D_{t}})}

(2)

where

s (\cdot)

is the operator to calculate the classification accuracy.

N_{y \in D_{t}}

denotes the number of classes in

D_{t}

. The denominator represents the total information content determined by the number of classes, while the numerator signifies the loss of information content resulting from misclassification. The ratio between these two quantities reflects the proportion of information loss incurred by the model in recognizing target data. Evidently, a higher ratio of information loss indicates greater difficulty.

The difficulty of a target task is influenced by the model’s capability for linear division of target data. In cases of low-difficulty tasks, linear optimization techniques can often refine the feature representation and bolster the efficacy of few-shot classification. Conversely, for high-difficulty tasks, linear methods fall short in addressing the problem fundamentally, which is attributed to the model’s inadequate representation of target data. Global finetuning serves to augment the model’s nonlinear representation abilities and enhance its suitability for target tasks.

The backbone and source data play a crucial role in determining the

D_{t}

metric values (as detailed in Section 4.4). To ensure meaningful results, it is crucial to use the same source data and backbone for training, metric, and few-shot classification when implementing our DCS. Typically, the initial step involves selecting appropriate source data and pretraining the network on it. Following this,

D_{t}

is evaluated in terms of domain distance and task difficulty using the pretrained model. Finally, the strategy outlined below is implemented based on the results obtained from the metrics.

3.3. Near-Domain Strategy

The distribution of near-domain data closely resembles that of the source data, with source class features exerting a positive influence on the target data. Therefore, traditional FSL algorithms, such as ProtoNet [8], DeepBDC [9], and FRN [10], inherently serve as excellent models for implementing the near-domain strategy. However, these methods either employ episodic training or introduce multiple iterations of self-distillation, rendering the training process complex and time-consuming. In this paper, we introduce a simple class-feature subspace (CFS-space) [18] mapping technique for batch-trained models (e.g., RFS), serving as an efficient implementation of the near-domain strategy.

Models that are pretrained on the source dataset exhibit an inherent sensitivity towards source class features. This sensitivity manifests in the directional nature of the source data distribution within the feature space. When Principal Component Analysis (PCA) is applied to this distribution, a notable trend emerges: the variance ratio decreases at a rapid pace as the principal component order increases. For instance, within the base set of mini-Imagenet, the first 61 principal components account for 95% of the variance ratio, which closely corresponds to the number of base classes. This observation implies that the source data are primarily distributed within a low-dimensional subspace. By mapping near-domain data into this subspace, it becomes possible to eliminate non-class principal components.

Suppose the feature matrix of the source features is denoted as

Z_{s} \in R^{d \times m}

, and that of the target features is denoted as

Z_{t} \in R^{d \times n}

. Here, m and n represent the number of feature vectors in the corresponding domains, respectively, while d denotes the dimension of the feature space, which satisfies the condition

d < \min (m, n)

. The Principal Component Analysis (PCA) of the source data can be computed using the following formulas:

Z_{s}^{'} = \frac{Z_{s} - \bar{Z_{s}}}{\sqrt{m - 1}}

(3)

U, Σ, V = SVD (Z_{s}^{'})

(4)

where

\bar{Z_{s}}

is the average vector of

Z_{s}

.

SVD (\cdot)

is the operation of singular decomposition.

U \in R^{d \times d}

,

Σ \in R^{d \times m}

and

V \in R^{m \times m}

denote the singular decomposition of

Z_{s}^{'}

. The diagonal elements of

Σ

represent the singular values of

Z_{s}^{'}

, which correspond to the standard deviations of the principal components. Subsequently, we select the initial k rows of U, denoted as

U_{k} \in R^{k \times d}

, and employ the following formulas to execute the feature mapping of both the source and target features:

G_{s} = U_{k} (Z_{s} - \bar{Z_{s}})

(5)

G_{t} = U_{k} (Z_{t} - \bar{Z_{s}})

(6)

where

G_{s} \in R^{k \times m_{s}}

and

G_{t} \in R^{k \times m_{t}}

represent the projections of the source and target samples in the CFS-space, respectively.

m_{s}

and

m_{t}

denote the number of source and target samples, respectively.

For the mapped features

G_{t}

in multi-shot tasks, we adopt a simple Logistic Regression (LR) classifier to adapt to the support set and directly apply it for predicting the query samples. However, for 1-shot tasks, the representativeness of features can be further enhanced by incorporating appearance-similar samples from the source dataset and performing feature fusion on the prototypes [18,27]. This is attributed to the fact that a single training sample is prone to be interfered with by intra-class variations and has a high likelihood of deviating from the true class center. Mapping features to CFS-space not only mitigates randomness in the direction of non-class features but also, in conjunction with feature fusion, facilitates feature approximation toward the class center. Specifically, we search for appearance-similar samples in the source domain in order to calibrate the prototypes. To this end, for the n-th class in the given task, we search for the p nearest neighbors corresponding to the prototype

{\bar{v}}_{n}

within

G_{s}

. Subsequently, we merge these features using the formula below:

{\tilde{v}}_{n} = λ \cdot {\bar{v}}_{n} + (1 - λ) \frac{1}{p} \sum_{j = 1}^{p} g_{j}

(7)

where

{\bar{v}}_{n}

is the prototype of class n.

g_{j}

is one of the nearest neighbors of

{\bar{v}}_{n}

.

{\tilde{v}}_{n}

is the calibrated prototype, which facilitates the model to fit and acquire a more optimal decision boundary.

Our near-domain strategy can be directly applied to pretrained models downloaded from the community, facilitating researchers and engineers to implement few-shot classification with zero training.

3.4. Far-Domain Strategy

The sensitivity of the model to source domain class features is primarily characterized by the principal components of the source data. Generally, the first k components account for a significant portion of the variance ratio. As the component order increases, the variance ratio decreases rapidly. The fact that the first k principal components effectively represent the class attribution of the source data leads the model to exhibit a preference toward certain feature directions. When extracting features, the model automatically amplifies the magnitude in these preferred directions while compressing it in others, thereby increasing the inter-class variance of the source data. However, for far-domain datasets, the data distributions differ significantly. By treating directions differently, the model not only fails to extract valuable information but also introduces source domain bias into target tasks. Therefore, optimizing the representation of target data necessitates mitigating the impact of source domain bias.

In Section 3.3, Equations (3) and (4) present the method for performing PCA using samples from the source domain. For far-domain tasks, to eliminate the feature bias of the source domain, the principal components are normalized by dividing them by their corresponding standard deviations, thereby ensuring that the feature space becomes isotropic. To achieve this, we truncate the first d rows of

Σ

to construct a diagonal square matrix

\hat{Σ} \in R^{d \times d}

. Subsequently, we apply the following transformation to the target domain matrix:

G_{t} = U {\hat{Σ}}^{- 1} (Z_{t} - \bar{Z_{s}})

(8)

This approach referred to as whitened PCA, effectively reduces the model’s sensitivity to source class features. As a result, the previously suppressed principal components are amplified back, leading to an enhanced representation of far-domain data.

In the computation of whitened PCA, the most time-consuming step is Equation (4), which has a computational complexity of

O (d^{3})

.

Σ

represents a diagonal matrix. Thus, the complexity of its inversion operation is negligible. Equation (8) involves two matrix multiplications, with computational complexities of

O (d^{3})

and

O (d^{2} n)

, respectively. Given that

n ≫ d

(i.e., the number of samples significantly exceeds the dimensionality), the overall computational complexity of whitened PCA is primarily determined by

O (d^{2} n)

.

3.5. Finetuning Strategy for Difficult Tasks

Certain target datasets, such as ChestX [41], present significant challenges. The poor linear separability on this dataset greatly limits the few-shot classification performance of the model. This can be attributed to two primary factors: (1) the lack of distinctive features in the source data that adequately represent the target classes; (2) the absence of mutual information between the source and target classes [42], leading the model to disregard these features entirely. To address this, finetuning the model with a limited amount of training data can enhance its nonlinear representation capabilities. Although there is a risk of overfitting, this is not a primary concern due to the inherently low classification accuracies of the model on such challenging tasks. In fact, finetuning is often beneficial when dealing with difficult tasks [43].

In summary, this section comprehensively elucidates the methodology and details of the Divide-and-Conquer Strategy. Figure 3 presents a flowchart of this strategy, serving as a guide for implementation.

4. Experiment

4.1. Implementation Details

Network: ResNet-12 is the most widely utilized network in the field of Few-Shot Learning. To achieve compatibility with a diverse range of models, we have selected this network as the backbone. In our experiments, we removed the classification header from ResNet-12 and retained only the feature extraction portion. The input size was consistently scaled to 84 × 84 × 3, resulting in a final 640-dimensional feature vector.

Training: To provide a concise and consistent comparison of the effects of DCS, we re-implemented five models: ProtoNet [8], Baseline [11], RFS [28], DeepBDC [9], and FRN [10]. They are classical high-performance FSL models that are frequently selected for comparison in the literature. Among them, ProtoNet and FRN are representative metric-based approaches, whereas Baseline, RFS, and DeepBDC are representative methods grounded in transfer learning. We utilized the repository proposed by Xie et al. [9] as the training source code. The Baseline, RFS, and DeepBDC were based on the standard batch training paradigm, employing the Cross-Entropy (CE) loss. An SGD optimizer with a momentum of 0.9 and a weight decay of 0.0001 was used. The initial learning rate was set to 0.05 and decayed by a factor of 10 at epochs 100 and 150, respectively. A total of 170 epochs were trained. Additionally, both RFS and DeepBDC performed three iterations of self-distillation after pretraining [9,28]. For ProtoNet and FRN, we utilized the meta-training paradigm with the same training settings as mentioned above. The base set of mini-ImageNet was used as the pretraining set, and standard data augmentations such as random cropping, color jitter, and random horizontal flipping were applied.

Evaluation: 5-way 1-shot, 5-shot and 20-shot were chosen as representative tasks. The top-1 classification accuracy was calculated over 600 rounds of randomized tasks.

4.2. Datasets

Meta-Dataset [20] and BSCD-FSL [16] are commonly used benchmarks in CD-FSL research. Meta-Dataset encompasses data from 10 diverse domains, including ImageNet [44], Omniglot [45], Aircraft [46], CUB [47], Textures [48], Quick Draw [49], Fungi [50], VGG Flower [51], Traffic Signs [52], and MSCOCO [53]. BSCD-FSL, on the other hand, spans five datasets: ImageNet [44], Crop Disease [54], ISIC [55], EuroSAT [56], and ChestX [41].

To facilitate a comparison with other FSL algorithms, we substituted ImageNet with mini-ImageNet. Specifically, we utilized the base set (miniIN-B) as the source dataset and employed both the validation set (miniIN-V) and the novel set (miniIN-N) as the target dataset. This approach allows us to assess both classical FSL and CD-FSL performance in a unified framework.

A brief introduction to these datasets is provided below, with representative images from each dataset extracted and presented in Figure 4.

mini-ImageNet [7] is a subset of ILSVRC-12, comprising a total of 100 classes with 600 labeled samples per class and an image size of 84 × 84 × 3. As per the division suggested by Ravi S et al., the dataset is segmented into 64 base classes (miniIN-B), 16 validation classes (miniIN-V), and 20 novel classes (miniIN-N). For model pretraining, miniIN-B serves as the source dataset, while miniIN-V and miniIN-N are employed as near-domain target datasets to assess FSL performance.

Omniglot [45] is a dataset that comprises handwritten characters from 50 distinct alphabets, totaling 1623 unique characters. Each character is represented by 20 handwriting samples.

Aircraft [46] is a dataset that includes a diverse collection of aircraft images. It comprises 102 distinct classes or variants, with each class containing 100 samples. For this paper, the original images were utilized for feature extraction without any cropping of the target region based on the bounding box information provided in the dataset.

CUB [47] is a fine-grained dataset specializing in bird categories. It encompasses 200 distinct bird species and includes a total of 11,788 image samples. In our study, we utilized the original images directly for feature extraction, without resorting to cropping the target region based on the provided bounding box information.

Textures [48] is a dataset that comprises various texture patterns. It includes 47 distinct categories and a total of 5640 samples.

Quick Draw [49] is a dataset that consists of black-and-white sketches created by players of the “Quick, Draw!” game. This dataset encompasses 345 different categories and boasts a total of 50 million samples.

Fungi [50] is a comprehensive dataset that comprises a wide array of mushroom images. It includes 1394 distinct mushroom categories and boasts a total of 89,761 image samples, making it a rich resource for fungal research and classification.

VGG Flower [51] comprises 8189 images of flowers distributed across 102 classes.

Traffic Signs [52] is a dataset that consists of 50,000 samples of German road signs, which are divided into 43 distinct categories.

MSCOCO (Microsoft Common Objects in Context) [53] is a large-scale dataset that comprises 1.5 million target instances extracted from Flickr. It includes 80 diverse categories, with each instance labeled by a bounding box. In this paper, we selected the Val2017 subset as our target dataset. Notably, we extracted features directly from the original images without cropping the target region based on the provided bounding box information.

Crop Disease [54] is a dataset that comprises leaf images of diseased plants. It includes 38 distinct categories of crop diseases and boasts a total of 108,610 samples, making it a valuable resource for research in plant pathology and crop health management.

EuroSAT [56] is a dataset that comprises remote sensing images captured by satellites. It includes 10 distinct categories and boasts a total of 27,000 image samples, making it a valuable resource for various applications in the field of remote sensing and Earth observation.

ISIC (International Skin Imaging Collaboration) [55] is a large-scale dermatologic image classification dataset published by the eponymous collaboration. It comprises 33,126 images of both benign and malignant skin lesions sourced from 2056 patients. These images are categorized into seven dermatologic categories, providing a comprehensive resource for research and development in the field of dermatology.

ChestX [41] is a medical dataset that comprises human lung X-ray images. It is a multi-category dataset that includes seven distinct disease categories. For Few-Shot Learning, we have collected the single-category images from this dataset, which constitute the target dataset for our study.

Among the above datasets, mini-ImageNet stands out as the most extensively employed benchmark in conventional FSL research, encompassing a diverse range of common categories. Utilizing mini-B as the base set enables the model to acquire rich prior knowledge. The remaining datasets broadly cover specialized domains such as characters, textures, stick figures, remote sensing, and medicine, encompassing both coarse-grained and fine-grained classification images. The image types involve a variety, including high-resolution and low-resolution images, grayscale images, and color images. This diversity in target domains and richness in data types allow for an effective evaluation of cross-domain strategies under various conditions.

4.3. Metric Results

Using the domain distance and difficulty formula provided in Section 3.2, we extracted features for the source dataset and the 15 target datasets using the RFS model. Table 1 shows the statistical results of the domain distance and difficulty. As expected, miniIN-B has a domain distance of zero from itself and exhibits the lowest difficulty level. Both miniIN-V and miniIN-N, which are derived from the same dataset, have relatively small domain distances from

D_{s}

. The domain distances for the other datasets gradually increase as the discrepancy in data distribution widens.

Based on the metric results, the target datasets are arranged in ascending order of domain distance as follows: miniIN-N < miniIN-V < MSCOCO < Textures <Fungi < CUB < VGG Flower < Crop Disease < Traffic Signs < EuroSAT < Aircraft < Omniglot < ISIC < QuickDraw < ChestX. We use the midpoint of the distance as the dividing line between near-domain and far-domain datasets, and the calculation formula is as follows:

\begin{matrix} d i v_{d o m a i n} & = \frac{δ_{d o m a i n}^{(min)} + δ_{d o m a i n}^{(max)}}{2} \\ = \frac{0.5613 + 0.9937}{2} = 0.7775 \end{matrix}

where

δ_{d o m a i n}^{(min)}

represents the least domain distance to miniIN-B, and

δ_{d o m a i n}^{(max)}

represents the largest domain distance to miniIN-B. This calculation only involves target datasets except for miniIN-B. Using 0.7775 as the dividing line, datasets with distances less than 0.7775 are categorized as near-domain datasets, while those with distances greater than 0.7775 are considered to be far-domain datasets. Consequently, for tasks involving miniIN-N, miniIN-V, and MSCOCO, the near-domain strategy outlined in Section 3.3 is applied. For all other tasks, the far-domain strategy detailed in Section 3.4 is employed.

The target datasets are ranked in ascending order of difficulty as follows: Crop Disease < VGG Flower < EuroSAT < Traffic Signs < miniIN-V < miniIN-N < Omniglot < Textures < Aircraft < ISIC < CUB < QuickDraw < MSCOCO < Fungi < ChestX. We employ the midpoint of the distance as the dividing line between low-difficulty and high-difficulty datasets, which is calculated as follows:

\begin{matrix} d i v_{d i f f i c u l t y} & = \frac{δ_{d i f f i c u l t y}^{(min)} + δ_{d i f f i c u l t y}^{(max)}}{2} \\ = \frac{0.0187 + 0.6170}{2} = 0.3179 \end{matrix}

where

δ_{d i f f i c u l t y}^{(min)}

is the minimum value of difficulty,

δ_{d i f f i c u l t y}^{(max)}

is the maximum value of difficulty. This calculation involves datasets except for miniIN-B. Using 0.3179 as the dividing line, datasets with difficulty less than 0.3179 are categorized as low-difficulty datasets, while those with values greater than 0.3179 are considered to be high-difficulty datasets. Notably, only ChestX stands out as a far-domain dataset with high difficulty. To enhance the model’s adaptability to the target tasks, we will finetune it using the strategy outlined in Section 3.5.

Figure 1 illustrates the distribution of the target datasets, with domain distance plotted on the horizontal axis and difficulty on the vertical axis. These findings are based on miniIN-B with the RFS model. It is important to note that altering the source dataset or model would yield different outcomes. The distribution results obtained using various source datasets and networks can be found in Section 4.4.

4.4. Influence of Source Data and Backbone

The metrics for domain distance and difficulty, outlined in Section 3.2, rely on the source data and the network. The source data provide an empirical perspective of view, while the network serves as a metric tool. These metrics are interpreted from the viewpoint of the pretrained model, implying that variations in the source data or modifications to the network structure would yield distinct metric results.

To validate the above perspective, we altered the neural network to Conv-4 and ResNet-50 while keeping the source dataset as miniIN-B constant. Following the pretraining procedure outlined in Section 4.1, we re-assessed the domain distance and difficulty for the 15 target datasets. The results are presented in Figure 5 and Figure 6. Notably, due to disparities in the dimensions of feature vectors extracted by the two networks, there are slight variations in the metric values. The relative positions of most target datasets remain largely unchanged, reflecting the inherent relationship of the target domains relative to the source domain. However, a few datasets exhibited shifts in their positions, indicating that networks with distinct architectures possess varying degrees of generalization capabilities.

If the source dataset is substituted with CUB while maintaining the neural network as ResNet-12, the resulting distribution of domain distance and difficulty for the target datasets is illustrated in Figure 7. It is evident that with CUB serving as the empirical origin, the distribution of the target datasets undergoes significant alterations. Datasets that were originally considered far-domain relative to miniIN-B, such as Fungi and Aircraft, transition to being classified as near-domain. Conversely, VGG Flower, which was moderately distant from miniIN-B, now exhibits the greatest distance from CUB. Additionally, the level of difficulty associated with the target datasets undergoes some degree of change.

In conclusion, both the model and the source data play a pivotal role in determining the metric results. When implementing DCS, it is imperative to ensure consistency in the model used for domain metrics and few-shot classification. Generally speaking, the source data should be carefully chosen, and model pretraining should be carried out using this dataset. Subsequently, the domain distance and difficulty should be measured for the target datasets using this pretrained model. Finally, DCS can be effectively applied based on these metric results.

4.5. Effectiveness of DCS

To assess the efficacy of DCS, we evaluated the performance of both the near-domain and far-domain strategies on the target datasets using RFS. The results are summarized in Table 2. The black values represent classification accuracies, while the values to their right indicate changes relative to Simple LR, with red signifying an increase and green a decrease. The datasets are organized in order of increasing domain distance.

The experimental results demonstrate that the near-domain strategy enhances performance on near-domain tasks but detracts from performance on far-domain tasks. Conversely, the far-domain strategy shows significant improvement on far-domain tasks but falls short on near-domain tasks. The complementary nature of these two strategies across different domains underscores the varying requirements for feature representation in different task domains. Specifically, source class features are instrumental in aiding near-domain data to effectively convey class information, whereas whitened-PCA features are more apt for representing far-domain data.

Although the far-domain strategy consistently enhances the model’s performance on far-domain tasks, the classification accuracy remains significantly low for high-difficulty tasks, namely ChestX. This dataset encompasses X-ray medical images of human lungs depicting a variety of lung diseases, which exhibit substantial disparities compared to the distribution of conventional images. For non-medical experts, discerning differences among these images is typically arduous, posing significant classification challenges. Models trained on miniIN-B exhibit exceedingly low linear separability when applied to ChestX. As evident from the results in Table 1, even when utilizing the entire ChestX dataset, the global classification accuracy (across 7 classes) is merely 30.1%. Consequently, to achieve further improvements on ChestX, it is imperative to finetune the model in order to enhance its nonlinear representation capabilities.

For 5-way K-shot tasks on ChestX, we experimentally conducted research on global finetuning to enhance the model’s representation capabilities. K was set from 1 to 200 to validate the trend of our method as K changes. For comparison, with the parameters of the backbone frozen, we evaluated the few-shot classification accuracy of RFS [28], Baseline [11], ProtoNet [8], FRN [10], and DeepBDC [9] in the same way. The experimental results are shown in Figure 8. It can be seen that global finetuning can significantly improve the model’s ability to adapt to difficult tasks, making its accuracies on ChestX surpass the second-best model (DeepBDC) by more than 40%. Furthermore, as K increases, the advantages of global finetuning become significant. However, other models show obvious performance saturation as K increases. Evidently, for far-domain high-difficulty FSL tasks, global finetuning consistently leads to improved performance without concerns of overfitting due to the model’s inadequate nonlinear representation.

4.6. Impact of Different Principal Components

The model, due to its sensitivity towards source class features, tends to prioritize the extraction of the source domain principal components while suppressing non-principal components, disregarding the target-data distribution. This trait poses no significant impediment when dealing with near-domain data. However, in the context of far-domain data, the principal components that are crucial for representing the target classes are assigned minimal significance, leading to a notable lack of saliency. By applying PCA on the source domain and executing feature mapping on the target domain, we can establish a correlation between classification accuracy and the principal components number n. Figure 9 illustrates the experimental findings. The solid lines represent results obtained from the far-domain datasets, and the dashed lines correspond to the near-domain datasets.

Figure 9a presents the outcomes of conventional PCA. Classification accuracy improves with an increase in n until it reaches a saturation point at approximately

n = 60

. This saturation occurs because as n grows larger, the variance ratio of the component diminishes, leading to a reduced impact on classification accuracy. When all principal components are normalized, as outlined in Section 3.4, the resulting whitened-PCA outcomes are depicted in Figure 9b. Notably, the near-domain performance, denoted by dashed lines, initially rises and then declines as n increases. Conversely, the far-domain dataset demonstrates consistent improvement with n and does not exhibit saturation as observed in Figure 9a.

These results illustrate the varying significance of principal components for different target domains. The foremost components predominantly represent source class features and hold a dominant influence on the classification of near-domain data. Conversely, the subsequent components act as interference for near-domain data but serve valuable information for far-domain data, aiding in decision-making.

4.7. Comprehensive Experiment

To thoroughly assess the compatibility of DCS with conventional FSL algorithms, we integrate DCS into RFS [28], DeepBDC [9], and FRN [10] and compare them with both classical FSL methods [8,9,10,11,28] and CD-FSL algorithms [39]. The results for 5-way 1-shot, 5-shot, and 20-shot tasks are listed in Table 3, Table 4 and Table 5. Due to the limited number of samples per class in Aircraft and Omniglot, these two datasets are excluded from Table 5. Methods appended with the suffix ++ signify the results achieved by integrating DCS into classical FSL algorithms. Specifically, for RFS, we employed CFS-space mapping and feature fusion as the near-domain strategy and whitened PCA as the far-domain strategy, resulting in improved RFS++. As for DeepBDC and FRN, which are already excellent near-domain algorithms, we maintained the original algorithms for near-domain tasks while applying whitened PCA for far-domain tasks, yielding DeepBDC++ and FRN++.

The experimental results demonstrate the compatibility of DCS with classical FSL models. Compared to RFS, RFS++ exhibits improved performance in both near-domain and far-domain tasks. DeepBDC++ and FRN++, while maintaining the advantages of the original algorithms in near-domain tasks, significantly enhance their performance in far-domain tasks. Compared to ProtoNet, Baseline, and CIM, our strategy demonstrates superior performance across all datasets. Furthermore, although RFS++ slightly lags behind DeepBDC++ and FRN++ in terms of classification accuracy, it adopts the batch training paradigm, resulting in higher training efficiency. DeepBDC++ and FRN++ are comparable in performance in both near-domain tasks and far-domain tasks. Despite minor differences in the results of the three models, they all demonstrate broad adaptability to various domains overall.

In addition, it seems that the generalization ability of FSL models is positively correlated with the number of near-domain datasets it has. As evident in Table 3, Table 4 and Table 5, RFS has 3 near-domain datasets, DeepBDC has 5~6, and FRN has 8. And in most cases, FRN exhibits stronger cross-domain generalization capabilities.

4.8. Discussion

Rationality of Results: Guo et al. [16] provided a ranking of domain distances for the four datasets in the BD-CDFSL benchmark based on three criteria: perspective distortion, semantic content, and color depth. The order of proximity they obtained is: Crop Disease < EuroSAT < ISIC < ChestX, which aligns with subjective human perception (as shown in Figure 4). In Section 4.3, we present experimental results that are consistent with the BD-CDFSL benchmark, highlighting the superiority of our proposed metric. There are also some studies on metric domain distances. Oh et al. [17] reported metric results indicating that the domain distance of Crop Disease is greater than that of EuroSAT. Zhang et al. [19] proposed the WDMDS and MMDMDS metrics, which suggest that the domain distance of Crop Disease is greater than both EuroSAT and ISIC. These results are contradictory to human empirical cognition.

Practicability: To compute the domain distance

δ_{d o m a i n}

and

δ_{d i f f i c u l t y}

using Equations (1) and (2), it is theoretically necessary to employ all images and labels from

D_{t}

. However, this requirement often poses practical challenges, thereby somewhat limiting the direct application of our method in real-world scenarios. Nevertheless, as datasets continue to proliferate, the community can leverage Equations (1) and (2) to accumulate metrics and intuitive insights regarding domain distance and difficulty over time. With these accumulated insights, researchers can estimate the approximate range of the target domain based on support samples, thereby facilitating the selection of an appropriate Few-Shot Learning strategy. Furthermore, the difficulty metric can be both dataset-dependent and task-dependent. The difficulty of a specific few-shot task provides a more refined characterization. However, directly applying Equation (2) to the support set is infeasible since the results would be severely distorted due to overfitting. If we predict the distribution of the target data and sample a sufficient number of high-quality virtual samples from this distribution, it would become possible to utilize Equation (2) to quantify the difficulty of a specific task. This approach may constitute one feasible path for assessing task-dependent difficulty in the future.

Strategy Switching: The experimental findings reveal that various domain tasks exhibit complementary preferences for features. Specifically, near-domain tasks tend to utilize source class features, whereas far-domain tasks show a stronger preference for whitened-PCA features. Conceivably, there should be intermediate tasks that fall between the near and far domains in terms of distance. For such tasks, a hard-switching strategy like DCS might not be optimized. We aim to further explore a soft switching strategy in the future. This could involve determining an optimal feature transformation for any given domain task based on continuous statistical measures.

5. Conclusions

Classical Few-Shot Learning (FSL) algorithms often encounter limited generalization and performance degradation when addressing Cross-Domain FSL (CD-FSL) challenges. Meanwhile, specifically designed CD-FSL algorithms tend to underperform in near-domain tasks. To address this, we introduce a quantitative metric for assessing domain distance and task difficulty. We measure domain distance through the difference of covariance matrices between the target and source datasets, and task difficulty based on the linear separability of the model on the target data. By considering the distribution of target datasets across these two dimensions, we categorize target tasks into near-domain, far-domain low-difficulty, and far-domain high-difficulty tasks. Based on this categorization, we specifically design a Divide-and-Conquer Strategy. For near-domain tasks, we propose a CFS-space mapping and feature fusion approach as a complement to classical FSL models. For far-domain low-difficulty tasks, we introduce the whitened-PCA method to optimize feature representation. As for far-domain high-difficulty tasks, we employ a global finetuning method to enhance the model’s nonlinear representation capabilities. Experimental results on 15 target datasets demonstrate the compatibility and effectiveness of DCS with classical FSL models, improving classification accuracies on the target tasks. We thoroughly discuss the strengths and limitations of the proposed metrics and DCS while also highlighting potential avenues for future research that could extend the application of this strategy to other modalities of data.

Author Contributions

Formal analysis, B.W.; Methodology, B.W.; Project administration, D.Y.; Supervision, D.Y.; Writing—original draft, B.W.; Writing—review and editing, D.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant No. 51375368.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author/s.

Conflicts of Interest

The paper is original in its contents and is not under consideration for publication in any other journals or proceedings. On behalf of all authors, the corresponding author states that there are no competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The funders had no role in the design of the study, in the collection, analyses, or interpretation of data, in the writing of the manuscript, or in the decision to publish the results.

References

Kalyan, K.S. A survey of GPT-3 family large language models including ChatGPT and GPT-4. Nat. Lang. Process. J. 2023, 6, 100048. [Google Scholar] [CrossRef]
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 4015–4026. [Google Scholar]
Wang, X.; Yuan, H.; Zhang, S.; Chen, D.; Wang, J.; Zhang, Y.; Shen, Y.; Zhao, D.; Zhou, J. Videocomposer: Compositional video synthesis with motion controllability. Adv. Neural Inf. Process. Syst. 2024, 36, 7594–7611. [Google Scholar]
Premaratne, P.; Kadhim, I.J.; Blacklidge, R.; Lee, M. Comprehensive review on vehicle Detection, classification and counting on highways. Neurocomputing 2023, 556, 126627. [Google Scholar] [CrossRef]
Wang, B.; Yu, D. Combining BioTRIZ and Multi-Factor Coupling for Bionic Mechatronic System Design. Appl. Sci. 2024, 14, 6021. [Google Scholar] [CrossRef]
Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a few examples: A survey on few-shot learning. ACM Comput. Surv. (CSUR) 2020, 53, 1–34. [Google Scholar] [CrossRef]
Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. Matching networks for one shot learning. Adv. Neural Inf. Process. Syst. 2016, 29. Available online: https://proceedings.neurips.cc/paper_files/paper/2016/file/90e1357833654983612fb05e3ec9148c-Paper.pdf (accessed on 1 January 2025).
Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. Adv. Neural Inf. Process. Syst. 2017, 30. Available online: https://proceedings.neurips.cc/paper_files/paper/2017/file/cb8da6767461f2812ae4290eac7cbc42-Paper.pdf (accessed on 1 January 2025).
Xie, J.; Long, F.; Lv, J.; Wang, Q.; Li, P. Joint distribution matters: Deep brownian distance covariance for few-shot classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 7972–7981. [Google Scholar]
Wertheimer, D.; Tang, L.; Hariharan, B. Few-shot classification with feature map reconstruction networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8012–8021. [Google Scholar]
Chen, W.Y.; Liu, Y.C.; Kira, Z.; Wang, Y.C.F.; Huang, J.B. A Closer Look at Few-shot Classification. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Xu, H.; Zhi, S.; Sun, S.; Patel, V.M.; Liu, L. Deep Learning for Cross-Domain Few-Shot Visual Recognition: A Survey. arXiv 2023, arXiv:2303.08557. [Google Scholar]
Tseng, H.Y.; Lee, H.Y.; Huang, J.B.; Yang, M.H. Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
Li, X.; Li, Y.; Zheng, Y.; Zhu, R.; Ma, Z.; Xue, J.H.; Cao, J. ReNAP: Relation network with adaptiveprototypical learning for few-shot classification. Neurocomputing 2023, 520, 356–364. [Google Scholar] [CrossRef]
Ye, C.; Wang, Q.; Dong, L. Single-Step Support Set Mining for Realistic Few-Shot Image Classification. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 30 June–5 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–8. [Google Scholar] [CrossRef]
Guo, Y.; Codella, N.C.; Karlinsky, L.; Codella, J.V.; Smith, J.R.; Saenko, K.; Rosing, T.; Feris, R. A broader study of cross-domain few-shot learning. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXVII 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 124–141. [Google Scholar]
Oh, J.; Kim, S.; Ho, N.; Kim, J.H.; Song, H.; Yun, S.Y. Understanding cross-domain few-shot learning based on domain similarity and few-shot difficulty. Adv. Neural Inf. Process. Syst. 2022, 35, 2622–2636. [Google Scholar]
Song, B.; Zhu, H.; Wang, B.; Bi, Y. Class feature Sub-space for few-shot classification. Appl. Intell. 2024, 54, 9177–9194. [Google Scholar] [CrossRef]
Zhang, Q.; Jiang, Y.; Wen, Z. TACDFSL: Task Adaptive Cross Domain Few-Shot Learning. Symmetry 2022, 14, 1097. [Google Scholar] [CrossRef]
Triantafillou, E.; Zhu, T.; Dumoulin, V.; Lamblin, P.; Evci, U.; Xu, K.; Goroshin, R.; Gelada, C.; Swersky, K.; Manzagol, P.A.; et al. Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Ye, H.J.; Ming, L.; Zhan, D.C.; Chao, W.L. Few-shot learning with a strong teacher. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 46, 1425–1440. [Google Scholar] [CrossRef] [PubMed]
Shao, Y.; Wu, W.; You, X.; Gao, C.; Sang, N. Improving the generalization of MAML in few-shot classification via bi-level constraint. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 3284–3295. [Google Scholar] [CrossRef]
Zhu, X.; Li, S. MGML: Momentum group meta-learning for few-shot image classification. Neurocomputing 2022, 514, 351–361. [Google Scholar] [CrossRef]
Song, B.; Zhu, H.; Bi, Y. A conditioned feature reconstruction network for few-shot classification. Appl. Intell. 2024, 54, 6592–6605. [Google Scholar] [CrossRef]
Bi, Y.; Zhu, H.; Shi, J.; Song, B. R2Net: Relative relation network with intra-class local augmentation for few-shot learning. Signal Image Video Process. 2024, 18, 5061–5071. [Google Scholar] [CrossRef]
Jiang, M.; Fan, J.; He, J.; Du, W.; Wang, Y.; Li, F. Contrastive prototype network with prototype augmentation for few-shot classification. Inf. Sci. 2025, 686, 121372. [Google Scholar] [CrossRef]
Yang, S.; Liu, L.; Xu, M. Free Lunch for Few-shot Learning: Distribution Calibration. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 4 May 2021. [Google Scholar]
Tian, Y.; Wang, Y.; Krishnan, D.; Tenenbaum, J.B.; Isola, P. Rethinking few-shot image classification: A good embedding is all you need? In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XIV 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 266–282. [Google Scholar]
Xu, H.; Liu, L.; Zhi, S.; Fu, S.; Su, Z.; Cheng, M.M.; Liu, Y. Enhancing Information Maximization with Distance-Aware Contrastive Learning for Source-Free Cross-Domain Few-Shot Learning. IEEE Trans. Image Process. 2024, 33, 2058–2073. [Google Scholar] [CrossRef]
Lee, W.Y.; Wang, J.Y.; Wang, Y.C.F. Domain-agnostic meta-learning for cross-domain few-shot classification. In Proceedings of the ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23–27 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1715–1719. [Google Scholar]
Xu, H.; Zhi, S.; Liu, L. Cross-domain few-shot classification via inter-source stylization. In Proceedings of the 2023 IEEE International Conference on Image Processing (ICIP), Kuala Lumpur, Malaysia, 8–11 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 565–569. [Google Scholar]
Chen, Y.; Zheng, Y.; Xu, Z.; Tang, T.; Tang, Z.; Chen, J.; Liu, Y. Cross-domain few-shot classification based on lightweight Res2Net and flexible GNN. Knowl.-Based Syst. 2022, 247, 108623. [Google Scholar] [CrossRef]
Yalan, L.; Jijie, W. Cross-Domain Few-Shot Classification through Diversified Feature Transformation Layers. In Proceedings of the 2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China, 28–30 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 549–555. [Google Scholar]
Du, Y.; Zhen, X.; Shao, L.; Snoek, C.G. Hierarchical Variational Memory for Few-shot Learning Across Domains. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 4 May 2021. [Google Scholar]
Adler, T.; Brandstetter, J.; Widrich, M.; Mayr, A.; Kreil, D.; Kopp, M.; Klambauer, G.; Hochreiter, S. Cross-domain few-shot learning by representation fusion. arXiv 2020, arXiv:2010.06498. [Google Scholar]
Xu, Y.; Wang, L.; Wang, Y.; Qin, C.; Zhang, Y.; Fu, Y. Memrein: Rein the domain shift for cross-domain few-shot learning. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, Vienna, Austria, 23–29 July 2022; pp. 3636–3642. [Google Scholar] [CrossRef]
Li, P.; Gong, S.; Wang, C.; Fu, Y. Ranking distance calibration for cross-domain few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9099–9108. [Google Scholar]
Zou, Y.; Zhang, S.; Yu, J.; Tian, Y.; Moura, J.M. Revisiting mid-level patterns for cross-domain few-shot recognition. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event, China, 20–24 October 2021; pp. 741–749. [Google Scholar]
Luo, X.; Xu, J.; Xu, Z. Channel importance matters in few-shot image classification. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 25–27 July 2022; pp. 14542–14559. [Google Scholar]
Bi, Y.; Zhu, H.; Shi, J.; Song, B. TsCANet: Three-stream contrastive adaptive network for cross-domain few-shot learning. J. Supercomput. 2025, 81, 139. [Google Scholar] [CrossRef]
Wang, X.; Peng, Y.; Lu, L.; Lu, Z.; Bagheri, M.; Summers, R.M. Chestx-ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2097–2106. [Google Scholar]
Tishby, N.; Zaslavsky, N. Deep learning and the information bottleneck principle. In Proceedings of the 2015 IEEE Information Theory Workshop (ITW), Jerusalem, Israel, 26 April–1 May 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1–5. [Google Scholar]
Luo, X.; Wu, H.; Zhang, J.; Gao, L.; Xu, J.; Song, J. A closer look at few-shot classification again. In Proceedings of the International Conference on Machine Learning, PMLR, Seattle, WA, USA, 30 November–1 December 2023; pp. 23103–23123. [Google Scholar]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
Lake, B.M.; Salakhutdinov, R.; Tenenbaum, J.B. Human-level concept learning through probabilistic program induction. Science 2015, 350, 1332–1338. [Google Scholar] [CrossRef]
Maji, S.; Rahtu, E.; Kannala, J.; Blaschko, M.; Vedaldi, A. Fine-grained visual classification of aircraft. arXiv 2013, arXiv:1306.5151. [Google Scholar]
Wah, C.; Branson, S.; Welinder, P.; Perona, P.; Belongie, S. The Caltech-Ucsd Birds-200-2011 Dataset; California Institute of Technology: Pasadena, CA, USA, 2011. [Google Scholar]
Cimpoi, M.; Maji, S.; Kokkinos, I.; Mohamed, S.; Vedaldi, A. Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, OH, USA, 23–28 June 2014; pp. 3606–3613. [Google Scholar]
Guo, K.; WoMa, J.; Xu, E. Quick, Draw! Doodle Recognition; Stanford University: Stanford, CA, USA, 2018; Available online: https://cs229.stanford.edu/proj2018/report/98.pdf (accessed on 20 December 2024).
Schroeder, B.; Cui, Y. Fgvcx Fungi Classification Challenge. 2018. Available online: https://github.com/visipedia/fgvcx_fungi_comp (accessed on 20 December 2024).
Nilsback, M.E.; Zisserman, A. Automated flower classification over a large number of classes. In Proceedings of the 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, Bhubaneswar, India, 16–19 December 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 722–729. [Google Scholar]
Houben, S.; Stallkamp, J.; Salmen, J.; Schlipsing, M.; Igel, C. Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark. In Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN), Dallas, TX, USA, 4–9 August 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 1–8. [Google Scholar]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
Mohanty, S.P.; Hughes, D.P.; Salathé, M. Using deep learning for image-based plant disease detection. Front. Plant Sci. 2016, 7, 1419. [Google Scholar] [CrossRef]
Codella, N.; Rotemberg, V.; Tschandl, P.; Celebi, M.E.; Dusza, S.; Gutman, D.; Helba, B.; Kalloo, A.; Liopyris, K.; Marchetti, M.; et al. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). arXiv 2019, arXiv:1902.03368. [Google Scholar]
Helber, P.; Bischke, B.; Dengel, A.; Borth, D. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 2217–2226. [Google Scholar] [CrossRef]

Figure 1. Distribution of domain distance and difficulty for 15 target datasets. The horizontal axis represents domain distance, determined by the distribution difference between the target dataset and the source dataset. The vertical axis represents the task difficulty, determined by the degree of linear separability of the target data. MiniIN-B was used as the source dataset, and RFS with ResNet-12 was used as the feature extractor. Based on the metric results, the target dataset can be categorized into three types: near-domain, far-domain low-difficulty, and far-domain high-difficulty.

Figure 2. Divide-and-Conquer Strategy for Cross-Domain Few-Shot Learning. According to the quantitative metrics of domain distance and task difficulty, CD-FSL tasks can be divided into near-domain tasks and far-domain tasks. Different types of tasks necessitate the application of different strategies. For near-domain tasks, classical models such as ProtoNet, DeepBDC, and FRN can be directly applied, or our proposed CFS-space mapping approach can be employed. For far-domain low-difficulty tasks, feature optimization is achieved via whitened PCA. In the case of far-domain high-difficulty tasks, global finetuning is employed.

Figure 3. Flow chart of the Divide-and-Conquer Strategy for CD-FSL.

Figure 4. Illustrative examples of datasets used in this paper, excerpted from Meta-Dataset and BSCD-FSL benchmark.

Figure 5. Distribution of domain distance and difficulty for 15 target datasets. The horizontal axis represents domain distance, determined by the distribution difference between the target dataset and the source dataset. The vertical axis represents the task difficulty, determined by the degree of linear separability of the target data. MiniIN-B was used as the source dataset, and RFS with Conv-4 was used as the feature extractor.

Figure 6. Distribution of domain distance and difficulty for 15 target datasets. The horizontal axis represents domain distance, determined by the distribution difference between the target dataset and the source dataset. The vertical axis represents the task difficulty, determined by the degree of linear separability of the target data. MiniIN-B was used as the source dataset, and RFS with ResNet-50 was used as the feature extractor.

Figure 7. Distribution of domain distance and difficulty for 15 target datasets. The horizontal axis represents domain distance, determined by the distribution difference between the target dataset and the source dataset. The vertical axis represents the task difficulty, determined by the degree of linear separability of the target data. CUB was used as the source dataset, and RFS with ResNet-12 was used as the feature extractor.

Figure 8. 5-way K-shot performance of Global Finetune, RFS, Baseline, ProtoNet, FRN, and DeepBDC on ChestX. The horizontal axis represents the specified number of support samples, while the vertical axis indicates the corresponding classification accuracy.

Figure 9. Results of conventional PCA and whitened PCA in FSL tasks. The solid lines represent far-domain datasets, while the dashed lines represent near-domain datasets. In Figure (a), conventional PCA is used to reduce the dimensionality of the feature space, retaining the first n principal components. As n increases, the accuracies of all datasets quickly saturate. In Figure (b), whitened PCA is employed to normalize the amplitude of each principal component before retaining the first n principal components for dimensionality reduction. For near-domain tasks, the results initially increase with n but then decrease, indicating that higher-order components act as interference. For far-domain tasks, the results increase with n without saturation, suggesting that higher-order components represent useful features.

Table 1. Domain distance and difficulty values for 16 datasets (including

D_{s}

and

D_{t}

).

N_{y \in D_{t}}

represents the number of classes in the dataset.

s (D_{t})

denotes the linear fitting score of the target dataset.

δ_{d o m a i n}

signifies the domain distance between the target dataset and the source dataset.

δ_{d i f f i c u l t y}

indicates the difficulty of the target dataset. The target datasets are ordered from top to bottom according to the domain distance. Specifically, miniIN-B serves as the source dataset, positioned at a domain distance of 0, presenting the lowest level of difficulty. In contrast, ChestX exhibits the farthest domain distance, posing the highest degree of challenge.

Table 1. Domain distance and difficulty values for 16 datasets (including

D_{s}

and

D_{t}

).

N_{y \in D_{t}}

represents the number of classes in the dataset.

s (D_{t})

denotes the linear fitting score of the target dataset.

δ_{d o m a i n}

signifies the domain distance between the target dataset and the source dataset.

δ_{d i f f i c u l t y}

indicates the difficulty of the target dataset. The target datasets are ordered from top to bottom according to the domain distance. Specifically, miniIN-B serves as the source dataset, positioned at a domain distance of 0, presenting the lowest level of difficulty. In contrast, ChestX exhibits the farthest domain distance, posing the highest degree of challenge.

$D_{t}$	$N_{y \in D_{t}}$	$s (D_{t})$	$δ_{domain}$	$δ_{difficulty}$
miniIN-B	64	0.999	0.0000	0.0002
miniIN-N	20	0.872	0.5613	0.0457
miniIN-V	16	0.893	0.6059	0.0408
MSCOCO	77	0.391	0.6901	0.2161
Textures	47	0.749	0.7929	0.0750
Fungi	1394	0.144	0.8636	0.2676
CUB	200	0.484	0.8755	0.1369
VGG Flower	102	0.905	0.8799	0.0215
Crop Disease	38	0.934	0.9577	0.0187
Traffic Signs	43	0.866	0.9593	0.0382
EuroSAT	10	0.932	0.9633	0.0305
Aircraft	100	0.612	0.9676	0.1066
Omniglot	964	0.685	0.9735	0.0550
ISIC	7	0.787	0.9810	0.1230
QuickDraw	345	0.350	0.9878	0.1796
ChestX	7	0.301	0.9937	0.6170

Table 2. Effectiveness of DCS. All reported results stem from the same pretrained model. The LR classifier serves as a baseline for Few-Shot Learning performance. Separate validations are conducted for the near-domain and far-domain strategies across different target domains. The black values represent classification accuracies, while the values adjacent to them indicate changes relative to Simple LR. Growth is represented by red, while reduction is denoted by green. Omniglot and Aircraft do not have enough samples per class to construct 20-shot tasks, resulting in null values for those scenarios.

Datasets	Simple LR			Near-Domain Strategy			Far-Domain Strategy
Datasets	1-Shot	5-Shot	20-Shot	1-Shot	5-Shot	20-Shot	1-Shot	5-Shot	20-Shot
miniIN-N	63.91	80.88	86.76	${65.61}_{+ 1.70}$	${81.12}_{+ 0.24}$	${87.23}_{+ 0.47}$	${46.99}_{- 16.92}$	${70.66}_{- 10.22}$	${84.73}_{- 2.03}$
miniIN-V	67.85	82.81	87.36	${70.35}_{+ 2.50}$	${84.11}_{+ 1.30}$	${87.23}_{+ 1.36}$	${46.20}_{- 21.65}$	${68.83}_{- 13.98}$	${83.76}_{- 3.60}$
MSCOCO	46.21	59.12	67.58	${46.39}_{+ 0.18}$	${61.11}_{+ 1.99}$	${69.02}_{+ 1.44}$	${36.68}_{- 9.53}$	${51.92}_{- 7.20}$	${66.50}_{- 1.08}$
Textures	48.51	66.88	76.20	${47.39}_{- 1.12}$	${67.15}_{+ 0.27}$	${76.49}_{+ 0.29}$	${48.71}_{+ 0.20}$	${67.31}_{+ 0.43}$	${78.62}_{+ 2.42}$
Fungi	40.91	57.82	68.41	${39.29}_{- 1.62}$	${56.95}_{- 0.87}$	${67.90}_{- 0.51}$	${41.51}_{+ 0.60}$	${59.76}_{+ 1.94}$	${72.44}_{+ 4.03}$
CUB	45.57	63.66	74.08	${43.99}_{- 1.58}$	${63.68}_{+ 0.02}$	${73.68}_{- 0.40}$	${45.63}_{+ 0.06}$	${64.32}_{+ 0.66}$	${77.37}_{+ 3.29}$
VGG Flower	67.70	86.74	93.39	${64.55}_{- 3.15}$	${85.67}_{- 1.07}$	${91.86}_{- 1.53}$	${73.97}_{+ 6.27}$	${90.60}_{+ 3.86}$	${95.86}_{+ 2.47}$
Crop Disease	68.24	87.90	94.48	${61.68}_{- 6.56}$	${84.77}_{- 3.13}$	${91.81}_{- 2.67}$	${73.12}_{+ 4.88}$	${91.10}_{+ 3.20}$	${96.10}_{+ 1.62}$
Traffic signs	51.90	76.10	89.82	${51.01}_{- 0.89}$	${73.08}_{- 3.02}$	${85.50}_{- 4.32}$	${58.55}_{+ 6.65}$	${79.48}_{+ 3.38}$	${91.89}_{+ 2.07}$
EuroSAT	60.72	79.70	87.36	${55.10}_{- 5.62}$	${78.08}_{- 1.62}$	${85.96}_{- 1.40}$	${65.65}_{+ 4.93}$	${83.00}_{+ 3.30}$	${89.30}_{+ 1.94}$
Aircraft	27.95	42.43	-	${27.60}_{- 0.35}$	${38.12}_{- 4.31}$	-	${29.75}_{+ 1.80}$	${44.13}_{+ 1.70}$	-
Omniglot	80.01	94.67	-	${76.56}_{- 3.45}$	${92.51}_{- 2.16}$	-	${85.81}_{+ 5.80}$	${96.65}_{+ 1.98}$	-
ISIC	30.83	42.39	53.68	${30.18}_{- 0.65}$	${42.51}_{+ 0.12}$	${51.55}_{- 2.13}$	${31.42}_{+ 0.59}$	${43.88}_{+ 1.49}$	${55.91}_{+ 2.23}$
Quick Draw	53.93	74.31	84.16	${47.51}_{- 6.42}$	${71.12}_{- 3.19}$	${79.37}_{- 4.79}$	${58.18}_{+ 4.25}$	${78.18}_{+ 3.87}$	${86.73}_{+ 2.57}$
ChestX	22.81	25.45	28.32	${21.84}_{- 0.97}$	${25.08}_{- 0.37}$	${28.63}_{+ 0.31}$	${23.21}_{+ 0.40}$	${26.63}_{+ 1.18}$	${31.19}_{+ 2.87}$

Table 3. Comparison with other methods on 5-way 1-shot tasks. The highest accuracies are highlighted in bold, while the second-highest accuracies are underlined. Additionally, the value marked with an asterisk (*) in the last row represents accuracy obtained through finetuning.

Dataset	ProtoNet [8]	Baseline [11]	CIM [39]	RFS [28]	DeepBDC [9]	FRN [10]	RFS++	DeepBDC++	FRN++
miniIN-N	60.02	64.35	61.65	63.91	65.94	65.01	65.61	65.94	65.01
miniIN-V	63.46	68.55	64.02	67.85	67.07	69.03	70.35	67.07	69.03
MSCOCO	43.34	47.18	45.40	46.21	48.25	47.17	46.39	48.25	47.17
Textures	45.03	50.71	50.73	48.51	54.40	51.63	48.71	54.40	51.63
Fungi	38.12	40.33	40.61	40.91	43.86	43.30	41.51	43.86	43.30
CUB	43.47	45.66	44.96	45.57	52.74	52.96	45.63	52.74	52.96
VGG Flower	65.66	67.13	70.07	67.70	74.36	74.15	73.97	77.28	74.15
Crop Disease	67.50	66.58	70.78	68.24	72.10	72.26	73.12	73.75	72.26
Traffic signs	53.60	51.82	54.54	51.90	62.44	62.82	58.55	64.53	66.23
EuroSAT	61.78	61.79	63.75	60.72	64.55	64.34	65.65	65.40	66.89
Aircraft	28.73	27.66	28.15	27.95	30.12	31.04	29.75	31.69	32.82
Omniglot	80.96	76.49	81.50	80.01	81.20	83.96	85.81	84.42	89.61
ISIC	30.79	31.31	31.97	30.83	34.22	33.43	31.42	33.88	33.53
Quick Draw	53.64	50.92	56.28	53.93	56.42	57.48	58.18	58.75	60.56
ChestX	22.55	22.19	22.64	22.81	22.86	22.95	23.21 23.19 *	22.54	23.17

Table 4. Comparison with other methods on 5-way 5-shot tasks. Methods appended with the suffix ++ signify the outcomes achieved by integrating DCS with classical FSL algorithms. Values positioned above the splitting line indicate the utilization of the near-domain strategy, while values below the line represent the implementation of the far-domain strategy. The highest accuracies are highlighted in bold, and the second-highest accuracies are underlined. The value marked with * in the final row represents the result obtained through finetuning.

Dataset	ProtoNet [8]	Baseline [11]	CIM [39]	RFS [28]	DeepBDC [9]	FRN [10]	RFS++	DeepBDC++	FRN++
miniIN-N	76.38	77.40	80.84	80.88	83.59	82.30	81.12	83.59	82.30
miniIN-V	79.97	80.92	82.28	82.81	84.46	83.98	84.11	84.46	83.98
MSCOCO	58.62	58.17	59.48	59.12	63.38	61.07	61.11	63.38	61.07
Textures	65.46	64.70	68.20	66.88	71.84	70.52	67.31	71.84	70.52
Fungi	55.36	50.68	58.89	57.82	62.64	63.17	59.76	62.64	63.17
CUB	59.88	56.85	64.26	63.66	73.39	72.75	64.32	73.39	72.75
VGG Flower	86.07	79.80	87.81	86.74	91.16	92.39	90.60	93.57	92.39
Crop Disease	86.27	78.19	89.37	87.90	89.50	91.60	91.10	91.86	91.60
Traffic signs	72.84	65.54	77.56	76.10	81.35	83.70	79.48	83.32	86.90
EuroSAT	79.53	74.71	81.19	79.70	82.70	80.83	83.00	83.61	82.71
Aircraft	36.80	32.31	42.14	42.43	42.50	49.07	44.13	46.77	52.42
Omniglot	94.85	86.10	95.39	94.67	94.42	96.48	96.65	95.97	98.02
ISIC	41.10	38.88	44.38	42.39	46.85	46.34	43.88	46.88	46.89
Quick Draw	73.67	62.02	76.46	74.31	75.23	80.57	78.18	78.77	81.12
ChestX	26.01	23.94	25.67	25.45	25.97	25.89	26.63 26.69 *	26.57	26.22

Table 5. Comparison with other methods on 5-way 20-shot tasks. The highest accuracies are highlighted in bold, while the second-highest accuracies are underlined. Additionally, the value marked with an asterisk (*) in the last row represents accuracy obtained through finetuning.

Dataset	ProtoNet [8]	Baseline [11]	CIM [39]	RFS [28]	DeepBDC [9]	FRN [10]	RFS++	DeepBDC++	FRN++
miniIN-N	84.96	81.30	87.29	86.76	89.72	88.87	87.23	89.72	88.87
miniIN-V	86.32	84.31	87.95	87.36	89.45	89.55	87.23	89.45	89.55
MSCOCO	67.10	62.69	69.13	67.58	73.04	71.74	69.02	73.04	71.74
Textures	74.58	70.41	77.77	76.20	80.55	80.68	78.62	80.55	80.68
Fungi	65.13	55.46	69.98	68.41	73.99	74.70	72.44	73.99	74.70
CUB	70.43	62.02	75.69	74.08	82.58	82.32	77.37	84.61	82.32
VGG Flower	90.96	83.89	93.98	93.39	95.77	96.15	95.86	96.93	96.15
Crop Disease	92.18	80.66	95.10	94.48	95.05	95.33	96.10	96.45	95.33
Traffic signs	81.49	71.32	90.23	89.82	91.40	93.12	91.89	93.27	93.70
EuroSAT	85.14	77.65	88.05	87.36	89.47	82.67	89.30	90.11	83.24
ISIC	47.50	42.96	55.40	53.68	56.11	54.90	55.91	56.93	55.74
Quick Draw	78.85	64.92	85.39	84.16	83.66	82.79	86.73	86.26	83.65
ChestX	29.78	24.62	29.12	28.32	30.59	29.62	31.19 32.60 *	31.36	29.94

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, B.; Yu, D. A Divide-and-Conquer Strategy for Cross-Domain Few-Shot Learning. Electronics 2025, 14, 418. https://doi.org/10.3390/electronics14030418

AMA Style

Wang B, Yu D. A Divide-and-Conquer Strategy for Cross-Domain Few-Shot Learning. Electronics. 2025; 14(3):418. https://doi.org/10.3390/electronics14030418

Chicago/Turabian Style

Wang, Bingxin, and Dehong Yu. 2025. "A Divide-and-Conquer Strategy for Cross-Domain Few-Shot Learning" Electronics 14, no. 3: 418. https://doi.org/10.3390/electronics14030418

APA Style

Wang, B., & Yu, D. (2025). A Divide-and-Conquer Strategy for Cross-Domain Few-Shot Learning. Electronics, 14(3), 418. https://doi.org/10.3390/electronics14030418

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Divide-and-Conquer Strategy for Cross-Domain Few-Shot Learning

Abstract

1. Introduction

2. Related Works

3. Main Approaches

3.1. Problem Definition

3.2. Metric Approaches

3.2.1. Domain Distance

3.2.2. Difficulty

3.3. Near-Domain Strategy

3.4. Far-Domain Strategy

3.5. Finetuning Strategy for Difficult Tasks

4. Experiment

4.1. Implementation Details

4.2. Datasets

4.3. Metric Results

4.4. Influence of Source Data and Backbone

4.5. Effectiveness of DCS

4.6. Impact of Different Principal Components

4.7. Comprehensive Experiment

4.8. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI