Few-Shot Classification Based on Sparse Dictionary Meta-Learning

Jiang, Zuo; Wang, Yuan; Tang, Yi

doi:10.3390/math12192992

Open AccessArticle

Few-Shot Classification Based on Sparse Dictionary Meta-Learning

by

Zuo Jiang

,

Yuan Wang

and

Yi Tang

^*

School of Mathematics and Computer Science, Yunnan Minzu University, Kunming 650500, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(19), 2992; https://doi.org/10.3390/math12192992

Submission received: 30 August 2024 / Revised: 19 September 2024 / Accepted: 23 September 2024 / Published: 26 September 2024

Download

Browse Figures

Versions Notes

Abstract

:

In the field of Meta-Learning, traditional methods for addressing few-shot learning problems often rely on leveraging prior knowledge for rapid adaptation. However, when faced with insufficient data, meta-learning models frequently encounter challenges such as overfitting and limited feature extraction capabilities. To overcome these challenges, an innovative meta-learning approach based on Sparse Dictionary and Consistency Learning (SDCL) is proposed. The distinctive feature of SDCL is the integration of sparse representation and consistency regularization, designed to acquire both broadly applicable general knowledge and task-specific meta-knowledge. Through sparse dictionary learning, SDCL constructs compact and efficient models, enabling the accurate transfer of knowledge from the source domain to the target domain, thereby enhancing the effectiveness of knowledge transfer. Simultaneously, consistency regularization generates synthetic data similar to existing samples, expanding the training dataset and alleviating data scarcity issues. The core advantage of SDCL lies in its ability to preserve key features while ensuring stronger generalization and robustness. Experimental results demonstrate that the proposed meta-learning algorithm significantly improves model performance under limited training data conditions, particularly excelling in complex cross-domain tasks. On average, the algorithm improves accuracy by 3%.

Keywords:

meta-learning; few-shot learning; sparse dictionary; consistency regularization

MSC:

68T45

1. Introduction

In Meta-Learning, few-shot learning is crucial as models must quickly adapt to new tasks with minimal data. Traditional methods accelerate adaptation by learning shared representations or initializing model parameters, but their reliance on limited data often leads to overfitting and insufficient feature extraction, affecting performance on new tasks. While deep learning excels on large-scale datasets, it requires extensive labeled data and struggles with domain shifts, limiting adaptability to new domains [1]. Meta-learning [2,3] addresses these issues by introducing a new paradigm that allows models to rapidly acquire meta-knowledge from a few examples through network optimization, enabling fast task adaptation and improved few-shot learning. It also supports generalization beyond the training distribution via cross-domain knowledge transfer. Few-shot classification (FSC) [4,5] focuses on classifying new categories with limited labeled samples. Current optimization-based FSL methods enhance this by fine-tuning the meta-learning model on new tasks with limited data [6]. However, domain differences between source and target data often hinder performance, motivating the study of Cross-Domain Few-shot Classification (CD-FSC) [7,8,9,10], which trains models to extract features from one or more domains and test them on unseen domains. Figure 1 illustrates the distinctions between them.

Many few-shot learning methods have been proposed to improve the performance of models in few-shot image classification. Finn et al. [11] proposed a meta-learning-based algorithm that enables models to quickly adapt to new tasks using a small amount of training data. Vinyals et al. [12] combined metric learning and attention-based LSTM to improve classification performance on query images. Chen et al. [13] employed a novel approach to achieve unexpectedly excellent performance by training a feature extractor on base category data and then using labeled samples from the new category to train a task-specific classifier. Additionally, Li et al. [14] designed conditional Wasserstein Generative Adversarial Networks to address the issue of insufficient data. Most methods focus on optimizing the training process and learning metrics while ignoring the ground-truth distribution of samples in the feature space. Yang et al. [15] proposed a distribution calibration (DC) strategy to transfer similar statistical information to the few-shot category by measuring the strength of the match between the base category and the samples in the support set. This method is similar to sparse representation, which fully utilizes sample distribution information, but it is computationally expensive.

To address this challenge, we draw on Finn et al. [16], creating new tasks via linear interpolation between randomly selected meta-training task pairs. This increases task distribution density, significantly reducing computational costs while maintaining effectiveness. While we assume a linear structure for the task space, in reality, there are global nonlinearities and local linearities. Since smooth manifolds [17,18,19,20] exhibit local Euclidean spaces at each point (as shown in Figure 2), we subdivide the data manifold into localized parts, treating each as a collection of linear bases, forming a redundant basis [21,22,23]. These redundant bases efficiently represent the entire data manifold in lower dimensions. Sparse representation then uses a small subset of these bases for compact data representation and processing. However, clustering for subdivision increases the computational burden. To mitigate this, adaptive subdivision is proposed. Notably, dictionary learning [24,25,26] seeks to find a set of basis vectors that efficiently represent samples by minimizing representation error, aligning with the concept of adaptive subdivision. Thus, we apply dictionary learning instead of traditional subdivision methods.

An effective source-to-target domain adaptation model was developed using dictionary learning, achieving efficient knowledge transfer. However, traditional cross-domain adaptation methods often struggle under limited data conditions, as models are prone to overfitting or fail to capture sufficient data features, leading to reduced generalization performance. To address this issue, consistency regularization [27,28,29] was introduced, which constrains model behavior during training and generates new data similar to existing data using the consistency assumption, thus enriching the training set. Combined with meta-learning, this approach enhances the training set through data augmentation techniques, mitigating data scarcity and improving generalization. Sparse dictionary-based meta-learning for few-shot classification is designed to tackle the challenge of classifying new categories with limited labeled data, helping models quickly learn and adapt to new tasks in data-scarce settings—a critical requirement for real-world applications.

When facing challenges such as data scarcity and domain shifts, the key goal of this approach is to effectively leverage the limited data to extract both domain-general and domain-specific meta-knowledge, enhancing the model’s generalization and adaptability. In summary, a novel method based on Sparse Dictionary and Consistency Learning (SDCL) is proposed. By learning sparse representations and incorporating consistency regularization, this method optimizes model performance to effectively address few-shot learning challenges across diverse domains and tasks. Unlike traditional single meta-models, this approach learns a set of more adaptable basis vectors, enabling a more accurate representation of subtle differences between domains and tasks. This improves adaptability to complex task environments and maximizes the utility of limited training data. Compared to conventional methods, SDCL integrates sparse representations and consistency regularization, significantly improving the model’s learning ability and generalization under constrained data conditions. Our method not only demonstrates superior classification accuracy and domain adaptation in experiments but also ensures practical feasibility and efficiency by effectively managing computational costs.

The main contributions of the proposed method are summarized as follows:

A meta-learning framework named SDCL, based on sparse representation and consistency regularization, is proposed to address the meta-learning problem across different domains and task types under limited data.
By utilizing dictionary learning, an effective transfer model from the source domain to the target domain is constructed, facilitating the efficient transfer of knowledge. This approach captures both generic and domain-specific meta-knowledge, thus aiding in addressing the challenges posed by limited sample sizes and cross-domain learning.
The introduction of the consistency regularization technique involves generating new data similar to existing ones based on the consistency assumption, thereby expanding the training set. This process enhances the model’s generalization capability and alleviates the risk of overfitting.

The remainder of this paper is organized as follows. Section 2 provides an overview of related work on few-shot learning, sparse dictionary learning, and consistency regularization. Section 3 presents the proposed Sparse Dictionary and Consistency Learning (SDCL) framework in detail, including the key techniques and methodologies employed. Section 4 describes the experimental setup and results, demonstrating the effectiveness of the proposed approach. Finally, Section 5 concludes the paper and discusses potential future research directions.

2. Related Work

2.1. Few-Shot Classification

Few-shot classification [30] aims to train a classifier capable of recognizing new categories using only a limited number of labeled samples, as illustrated in Figure 1a. To tackle this challenge, few-shot classification methods typically employ a task-based learning mechanism [12], where a large number of tasks are constructed from a relevant dataset to simulate scenarios with a small number of training samples. These methods are generally classified into two broad categories: optimization-based and metric-based approaches.

Optimization-based methods, such as Model-Agnostic Meta-Learning (MAML) [11] and Almost No Inner Loop (ANIL) [31], focus on efficiently tuning network parameters using gradient descent. This involves iteratively refining the initialization, update directions, and learning rates of the network, allowing for quick adaptation to new tasks. While flexible and adaptive, the iterative nature of these gradient-based updates can introduce computational overhead, particularly when applied to large-scale datasets or complex tasks. On the other hand, metric-based approaches, such as Prototypical Networks (ProtoNet) [32] and Matching Networks (MatchNet) [12], adopt a different strategy. These methods rely on learning a metric space where classification is performed by measuring the similarity or distance between the query image and representative samples of each class. Instead of directly optimizing network parameters, the focus is on ensuring that samples from the same category are closer together in the learned metric space, while those from different categories are farther apart. Although metric-based methods are computationally efficient, their performance may degrade when the learned metric space fails to adequately capture class variability.

The proposed method builds on the strengths of both optimization-based and metric-based approaches. While Prototypical Networks and MAML each have distinct advantages in terms of computational efficiency and task adaptability, the proposed approach introduces a more robust feature extraction mechanism. By leveraging sparse dictionary and consistency learning (SDCL), it ensures more consistent generalization across tasks, even in the presence of limited labeled samples. Compared to Prototypical Networks, the proposed method addresses the limitations of fixed metric spaces by enabling more dynamic feature representation. Similarly, in comparison to MAML, it reduces the computational burden associated with iterative gradient updates by focusing on efficient feature alignment and representation within the meta-learning process.

2.2. Sparse Dictionary Learning

Sparse dictionary learning involves acquiring a dictionary that efficiently processes high-dimensional signals by converting them into linear combinations of low-dimensional sparse components. Throughout the learning process, the system dynamically adjusts this dictionary to achieve optimal sparse representations tailored to specific tasks. Li et al. [33] proposed an adaptive distributional sparse coding method, which adjusts sparse coding coefficients’ weights to enhance cross-domain feature representation and classification effectiveness. He et al. [34] introduced a multi-source domain adaptive method combining dictionary learning and low-rank representation to facilitate effective cross-domain migration learning with limited samples. Wang et al. [35] developed a graph-based weighted dictionary learning approach, constructing cross-domain data graphs to learn shared feature representations, particularly benefiting few-shot classification. Guo et al. [36] presented a knowledge distillation method using sparse dictionary learning, transferring knowledge from source domain teacher models to target domain student models for effective domain adaptation. Cao et al. [37] proposed adaptive structural sparse regularization, combining sparse dictionary learning and structural sparse regularization to adaptively enhance feature representation and classification in cross-domain scenarios. Finally, Zhang et al. [38] introduced a knowledge migration-based method for few-shot image recognition, leveraging cross-domain shared features and knowledge migration to achieve efficient recognition of few-shot images.

In recent years, the latest trends [39,40] in sparse dictionary learning have focused on optimizing dictionary atom selection to balance underfitting and overfitting, improving the prediction of spatiotemporal data, especially in geotechnical monitoring and related fields. However, sparse dictionary learning still faces challenges in handling large-scale, high-dimensional data, particularly in terms of computational complexity and real-time processing, with model interpretability being a significant concern. Future research directions are expected to focus on developing more scalable and interpretable models, potentially integrating deep learning techniques to enhance feature extraction. Additionally, reducing reliance on labeled data and promoting unsupervised and self-supervised learning will be critical for advancing cross-domain and few-shot learning applications.

2.3. Consistency Regularization

Consistency regularization aims to ensure the stability of the model with respect to small input perturbations by introducing additional constraints that force the model to be consistent with respect to different views or different samples of the same category. Specifically, consistency regularization utilizes the consistency assumption to generate new samples by applying perturbations or transformations to the original training data and calculates the loss of consistency between the predicted outputs of the original and perturbed samples. Incorporating this loss into the total loss optimization allows the model to learn the ability to produce consistent outputs for similar inputs, improving generalization performance and robustness, effectively expanding the dataset and reducing overfitting, with the process shown in Figure 3. Zhang et al. [41] enhanced the model’s generalization ability by introducing a consistency regularization term that forces the model to be consistent for different samples of the same category. Similarly, Zhu et al. [42] improved the robustness and generalization ability of the model by introducing a consistency loss function to ensure that the model maintains consistency with fewer samples. Cao et al. [43] proposed a meta-consistency learning method that improves the performance of the model in the case of an insufficient number of tasks by introducing a consistency loss in the meta-learning process. In addition, Chen et al. [44] proposed a task-consistency comparison learning method, which improves the model’s generalization ability by introducing task consistency constraints in comparison learning. Sohn et al. [45] proposed a simplified semi-supervised learning method that improves model performance by combining consistency regularization with confidence thresholds. Finally, Yang et al. [46] proposed a task-aware data augmentation method that improves the performance of the model in the case of fewer samples by taking into account the features of the task.

3. Method

In meta-learning, each task

T_{i}

is assumed to be independently and identically distributed (i.i.d.), extracted from the task distribution

P (T)

related to the dataset

D_{i}

. The input to meta-learning is a sequence of tasks following the distribution

{[P (T)]}_{i = 1}^{n}

, where n represents the number of meta-training tasks, and task construction follows the N-way K-shot approach. Each task comprises a support set

D_{i}^{s} = {(x_{i}^{s}, y_{i}^{s})}_{i = 1}^{n_{s}}

and a query set

D_{i}^{q} = {(x_{j}^{q}, y_{j}^{q})}_{j = 1}^{n_{q}}

, with

n_{s}

and

n_{q}

denoting the number of samples in the support and query sets, respectively. In essence, each task consists of data for training (support set) and data for evaluating the model (query set).

The proposed sparse dictionary consistency learning scheme consists of four main components for efficient encoding of task features and model learning: (1) a task encoder, which generates task features by embedding vectors; (2) consistency regularization, which constrains the task encoder through consistency regularization; (3) a sparse dictionary, used to represent inter-task relationships and feature sharing and (4) task-specific modulation, which utilizes task-specific knowledge to tune the globally shared initial parameter

θ_{0}

. Figure 4 illustrates the overall framework of the proposed sparse dictionary consistency learning approach.

3.1. Task Encoder

When handling classification or regression tasks, to mitigate the impact of anomalous samples, a prototype-based relationship graph [32] is employed to represent the connections between samples. In this graph [47], each vertex corresponds to a distinct class of prototypes, while edges and weights are determined by the similarity between prototypes. Instead of directly using the original samples, this method effectively addresses anomalous samples, particularly when the number of training samples is limited. In summary, the use of prototype-based graphs allows for more robust handling of anomalous samples, thereby enhancing the accuracy of classification or regression tasks. Specifically, for the prototype-based classification problem, it is defined as:

Ψ (k_{i}) = \frac{1}{N} \sum_{j = 1}^{N} φ (x_{j})

(1)

where N represents the number of images in each class, and the projection function

φ (x)

is used to map the image into the embedding space.

In the prototype definition [32], for the i-th task, a set of prototypes is contained within

C_{i} = {[Ψ (k_{i})]}_{t = 1}^{N}

, where N represents the number of categories in the N-shot classification. To represent

q_{i}

for the i-th task, the Mean Pooling operator is employed to aggregate the prototypes. This operation is expressed as:

q_{i} = M e a n P o o l (R N N_{e n c} {[Ψ (k_{i})]}_{t = 1}^{N}) = \frac{1}{N} \sum_{i = 1}^{N} (R N N_{e n c} (Ψ (k_{i}))

(2)

Using recurrent neural network (

R N N_{e n c}

), the prototypes

C_{i}

are encoded into dense representations, completing the encoding process. Subsequently, the recurrent encoder (

R N N^{d e c}

) is employed to reconstruct the encoded representations back to the original prototypes

C_{i}

. For each task i, the encoder loss

L_{q_{i}}

quantifies the loss incurred during the encoding process of prototypes Ci. The task encoder loss

L_{q}

aggregates the losses of all tasks, serving as a measure of overall loss. Its expression is as follows:

L_{q_{i}} = ∥ C_{i} - R N N^{d e c} (R N N_{e n c} (C_{i} {)) ∥}_{2}^{2}

(3)

L_{q} = \sum_{i = 1}^{k} L_{q_{i}}

(4)

3.2. Consistency Regularization

Consistency regularization [48] employs the consistency assumption to generate new samples by applying perturbations or transformations to the original training data. It calculates the consistency loss between the predicted outputs of the original and perturbed samples. Incorporating this loss into the total loss optimization process enables the model to learn to produce consistent outputs for similar inputs, thereby improving generalization performance and robustness. This approach effectively expands the dataset and reduces overfitting. By constraining the behavior of the task encoder on training data through consistency regularization, prototypes of tasks can be utilized for consistency constraints to ensure that prototypes within the same task are represented as consistently as possible under the task encoder. The objective of consistency regularization is to minimize the reconstruction error between the task encoding

q_{i}

and its corresponding set of prototypes

C_{i}

.

The formula for consistency regularization can be expressed as:

L_{c o n s i s t e n c y} = \sum_{i = 1}^{N} ∥ q_{i} - R N N^{d e c} (R N N_{e n c} (C_{i} {)) ∥}_{2}^{2}

(5)

By minimizing the consistency loss

L_{c o n s i s t e n c y}

, the task encoder can be made to produce more consistent coded representations of the training data, thus improving the generalization ability of the model.

3.3. Sparse Representation of Task Features

The set of redundant bases obtained from the data-driven decomposition is referred to as the redundant dictionary, denoted as

D = [φ_{1}, φ_{2}, \dots, φ_{m}]

, where each

φ_{i}

represents an atom. The total number mm of these atoms exceeds the dimensionality of each atom. For each redundant base

φ_{i}

, there exists a corresponding coefficient matrix, denoted as

S = [s_{1}, s_{2}, \dots, s_{m}]

. These coefficient matrices form part of the sparse representation model. In other words, the sparse representation model can be expressed as:

I (x_{i}^{s}, y_{i}^{s}) = \sum_{i = 1}^{m} φ_{i} s_{i}

(6)

In the equation,

I (x_{i}^{s}, y_{i}^{s})

is the signal to be represented obtained from the support set

D_{i}^{s}

, and

s_{i}

is the coefficient vector corresponding to the redundant basis

φ_{i}

. For a given input signal

I

, the goal of sparse coding is to find the most suitable sparse representation z such that the input signal can be approximately reconstructed by a linear combination of a few basis vectors in the dictionary. As shown in Equation (7):

m i n_{z} \frac{1}{2} ∥ I (x_{i}^{s}, y_{i}^{s}) {- D z ∥}_{2}^{2} + λ {∥ z ∥}_{1}

(7)

where

{∥ \cdot ∥}_{2}

denotes the

L_{2} - n o r m

,

{∥ \cdot ∥}_{1}

denotes the

L_{1} - n o r m

, and

λ

is the weight of the sparsity penalty term.

The aforementioned procedure yields an efficient dictionary, which is then utilized to sparsely represent new tasks. In this study, the K-sparse autoencoder [49] was selected to generate sparse representations due to its advantages in controlling sparsity and enabling efficient feature extraction. This method enhances both the interpretability and learning efficiency of the model by enforcing sparsity constraints. By promoting sparsity in the hidden layers, the model can capture the core features of tasks more effectively, reducing the risk of overfitting and improving generalization performance. The K-sparse autoencoder significantly improves sparsity control through the Top-K operation. In the hidden layers of the autoencoder, the Top-K operation retains the K neurons with the highest activation values while setting the others to zero. This approach is similar to the ReLU activation function with thresholds but further enforces the sparsity constraint, effectively minimizing unnecessary noise and redundant information. Sparse representations not only provide a more compact description of the data but also improve the model’s generalization ability and computational efficiency.

The construction of the K-Sparse Autoencoder involves two main modules: Encode and Decode. As shown in Figure 5, in the encoding stage, the original task representation

q_{i}

is processed through a series of hidden layers, including four layers with neuron counts of 256, 512, 1024, and 128, respectively. To regulate the initialization parameters, the number of neurons in the last hidden layer is kept the same as the input task

q_{i}

, i.e., 128 hidden units. Then, the last hidden layer

p_{i}

obtains a sparse representation ti by restricting the identification of the first K maximal units and setting the remaining units to 0, which in turn, enhances the task representation. In the decoding phase, the symmetric network uses fewer hidden units to reconstruct

w_{i}

. The whole learning process is described as follows.

First, the original task representation

q_{i}

is encoded to obtain the hidden representation

p_{i}

:

p_{i} = σ (A^{T} * q_{i}) + b

(8)

where

A^{T}

is the weight matrix, b is the bias vector, and

σ

is an element-wise non-linear activation function. Next, the K largest cells in the hidden representation

p_{i}

are retained, and the rest are set to 0 to obtain the sparse representation

t_{i}

:

t_{i} [j] = \{\begin{matrix} p_{i} [j], i f j \in a r g m a x (p_{i}, K) \\ 0, o t h e r w i s e \end{matrix}

(9)

where

a r g m a x (p_{i}, K)

denotes the set of indices corresponding to the K largest elements in

p_{i}

.

Finally, the output

w_{i}

is computed using the sparse representation

t_{i}

, and the loss function

L_{s p a r e}

involving the sparse representation is defined as:

w_{i} = σ (A * q_{i}) + b^{'}

(10)

L_{s p a r e} = ∥ q_{i} - w_{i} ∥_{2}^{2} + μ {∥ t_{i} ∥}_{1}

(11)

where A is the weight matrix (inverse of

A^{T}

),

b^{'}

is the bias vector, and

μ

is a hyperparameter controlling the sparsity. The objective of this function is to minimize the reconstruction error while encouraging sparsity in the representation

t_{i}

.

3.4. Task-Specific Modulation

After obtaining the task representations

q_{i}

and

t_{i}

, a modulation function is applied to tailor the task-specific information to a globally shared initial parameter

θ_{0}

. The purpose of task modulation is to identify a meta-learner capable of generalizing effectively and performing new tasks quickly and accurately, even when faced with limited training data. Given an augmented task representation

t_{i}

, the task representation

q_{i}

is used to adjust the globally shared initial parameter

θ_{0}

. Simultaneously, the structure of the task encoder and augmented task is explored to learn a specific and sparse initialization strategy. This method enhances the model’s adaptability and efficiency, enabling it to better address various new tasks. The process is represented as follows:

θ_{0}^{i} = ξ (H_{F} (q_{i} \oplus t_{j}) + m_{F}) \circ θ_{0}

(12)

where

H_{F}

and

m_{F}

are the fully connected parameters corresponding to

q_{i}

and

t_{j}

, respectively. For each task

T_{i}

, the initialized optimal parameter

θ_{0}^{i}

is used to find the optimal classifier for the specific task, that is, the parameter

θ_{i}

. The optimal parameter

θ_{i}

is obtained through the gradient descent process. In the classical MAML method, the meta-learning optimization process begins from the optimal initialization parameter

θ_{0}

. By combining the task encoder loss

L_{q}

and the sparse representation loss

L_{spare}

, the overall objective function is defined as:

m i n_{Φ} L_{a l l} = m i n_{Φ} L + ω_{1} L_{q} + ω_{2} L_{s p a r e}

(13)

where introduce

ω_{1}

,

ω_{2}

balance the importance of the three,

Φ

for all learnable parameters.

4. Experiment

The effectiveness of the proposed SDCL model was validated through the evaluation of a series of few-shot classification tasks, including five-way one-shot and five-way five-shot settings. The experimental setup, datasets, and baseline methodology are first detailed, followed by an in-depth analysis and validation of the model’s hyperparameter settings. An ablation study was then conducted to assess the impact of each module on overall model performance.

4.1. Experimental Settings

4.1.1. Datasets and Evaluation Metric

Experiments were conducted on four widely used sample-less learning benchmark datasets, including the plain-multi datasets [50] comprising four fine-grained image datasets: (a) CUB-Birds, (b) Textures, (c) Aircraft, and (d) Fungi, as well as remote-sensing datasets: (a) UC Merced Land Use [51], (b) AID [52], and (c) NWPU-RESISC45 [53], alongside the Cars-196 [54] and CIFAR-100 [15] datasets. These datasets were sampled from real-world scenarios and exhibit significant domain differences. The performance of different methods was evaluated using top-1 accuracy in both five-way one-shot and five-way five-shot settings. Details of the datasets are provided in Table 1.

4.1.2. Baseline Methods

To validate the effectiveness of SDCL, two categories of meta-learning methods are adopted as benchmarks. The first category includes gradient-based meta-learning methods, such as MAML [11], Multi-MAML [55], Meta-SGD [56], and BMG [57], as well as task-specific methods, such as HSML [13], ARML [58], and MUSML [59]. The second category consists of metric-based meta-learning methods, including ProtoNet [32] and Matching Networks [12]. In accordance with the traditional setup, nonparametric baselines are utilized exclusively for few-shot classification.

4.1.3. Implementation Details

Scenario tasks are constructed using a standard N-way, K-shot setup, where N classes are randomly sampled from a dataset with K examples per class. Compared to previous benchmarks that build tasks from a single dataset, these benchmarks are more heterogeneous and better aligned with real-world few-shot image categorization, as tasks are sampled across multiple datasets. Images were cropped to

84 \times 84

pixels, and the batch size was set to 4. As per the experimental requirements, three independent runs were performed, and average results with 95% confidence intervals were reported. All models were implemented in PyTorch. Optimization was carried out using Adam with an initial learning rate of

0.1

, training until the learning rate decayed to

0.001

or the number of training rounds reached 100,000. The backbone network used was the standard Conv4 architecture.

4.2. Parameter Analysis

4.2.1. Weights $ω_{1}$ and $ω_{2}$

To balance the importance of the task encoder and task representation, experiments were conducted on the plain-multi dataset to investigate the effect of different values of

ω_{1}

and

ω_{2}

in five-way five-shot classification. The experimental results are shown in Table 2. The average training accuracy peaks when

ω_{1} = 0.01

(64.21%), which is 5.64% higher than when

ω_{1} = 10

(58.57%). The maximum average training accuracy is also achieved with

ω_{2} = 40

(64.21%), which is approximately 3% higher than when

ω_{2} = 30

and about 2% higher than other

ω_{2}

values. These results indicate that

ω_{1}

and

ω_{2}

not only affect the training performance of the model but also significantly impact the final generalization ability. Thus, achieving few-shot generalization requires finding a balance between the task encoder and task components. By adjusting

ω_{1}

and

ω_{2}

, effective integration of information from these two components can be achieved.

4.2.2. Task Representation Sparsity K

To explore the optimal representation of task specificity, a five-way one-shot setting was adopted on plain-multi datasets. Figure 6 shows the testing accuracy under different sparsity levels, with K values representing different sparsity levels, specifically

K = 10, 30, 50

, and 80. The experimental results indicate that appropriate sparsity improves task performance. Specifically, a K value of 50 yields the best performance across the four sub-datasets of plain-multi.

4.3. Experimental Result

4.3.1. Performance Results of Plain-Multi Datasets

Experimental comparisons were conducted in two scenarios: five-way one-shot and five-way five-shot, with results compared to other meta-learning methods on plain-multi datasets using ProtoNet and MAML. The results, as shown in Table 3, indicate that the proposed method performs well in both cases, suggesting that SDCL can efficiently represent task features and enhance generalization across different tasks. The effectiveness of the method is thus validated. In the one-shot setting, with only one sample per category, the sparse coding phase allows the model to learn feature representations of each task, leading to improved generalization in the subsequent meta-learning phase. The model also demonstrates stronger classification ability in the meta-learning phase, further confirming its effectiveness in cross-task learning.

4.3.2. Performance Results of Remote-Sensing Datasets

The model was evaluated using several standard remote-sensing datasets, which capture a broader spectral range compared to natural RGB images. These datasets feature multiple channels corresponding to different spectral bands and information, testing the model’s ability to extract and fuse features from multimodal data. As shown in Table 4, the proposed method outperforms other baseline approaches in few-shot classification, enhancing accuracy and robustness. The rich information present in remote-sensing data benefits from sparse dictionaries, which improve decomposition and representation by capturing complexity through sparse coding of each space. The approach effectively addresses the challenges of wider spectra and multi-channel inputs, achieving superior performance in remote-sensing tasks by capturing nuances and intrinsic correlations in the data. The learned dictionary allows for effective representation of the diversity in remote-sensing targets and scenes, enabling quick adaptation to new few-shot tasks. Additionally, SDCL enhances the performance of metric-based methods (e.g., ProtoNet) for processing remote-sensing images, demonstrating its compatibility as a general framework.

4.3.3. Performance Results of Mixed Datasets

The Cars-196 and CIFAR-100 datasets, originating from different domains, form a mixed dataset that facilitates the evaluation of model performance in cross-domain scenarios. The model is trained separately on each domain dataset and then evaluated on the mixed dataset. For the Cars-196 dataset, which includes images from 196 automobile categories, four baseline few-shot classification methods—MatchingNet [12], ProtoNet [32], RelationNet [60], and GNNNet [14]—were used for comparison. In contrast, the CIFAR-100 dataset, which contains 60,000 images categorized into 100 classes, is divided into 60 training groups, 20 validation groups, and 20 testing groups. The algorithms MAML [30], MAML+ALFA [61], Meta-AdaM [6], ProtoNet [32], e3bm [62], and Baseline [7] were selected for comparison. The experimental setup included both five-way one-shot and five-way five-shot scenarios. According to the results shown in Table 5, the proposed method outperforms the baseline methods in terms of performance. This improvement is primarily due to the application of sparse dictionaries, whose sparse coding effectively captures the complex structure and features of the data, thereby enhancing the model’s generalization ability and performance in few-shot learning tasks.

4.4. Ablation Experiments

A comprehensive ablation study is conducted to explore the contributions of three key modules in the proposed architecture: sparse dictionary learning, top-K selection, and consistency regularization. Each module is removed individually, and changes in performance are analyzed to quantify their importance. For a thorough evaluation, ablation experiments were performed on four different datasets using MAML as the underlying meta-learner in five-way one-shot and five-shot settings. The results are presented in Table 6. Additionally, experiments were conducted with and without consistency regularization on three subsets of the remote sensing dataset for five-way one-shot tasks with varying numbers of tasks. These experiments assess the impact of consistency regularization on model performance and its potential role in cross-domain learning. The experimental results are summarized and presented in Figure 7 for further analysis and discussion. It is evident from the figure that consistency regularization (CR) has a positive impact on model performance across all datasets. The experimental results further demonstrate that CR significantly improves model accuracy in cross-domain learning tasks, especially when dealing with a smaller number of tasks, by effectively stabilizing and enhancing the model’s learning performance. Overall, the performance of CR across various datasets highlights its critical role in improving model accuracy.

Removing sparse dictionary learning leads to significant performance degradation across all datasets, and this module addresses the challenges of limited sample availability and context scarcity. Sparse dictionary learning enables us to learn more robust and discriminative meta-knowledge from limited training samples. It improves the sparsity of the learned representation and makes the encoding of input data more efficient and compact. As a result, it helps to better generalize and transfer learned meta-knowledge to unseen tasks. Removing the top-K selection resulted in a significant performance degradation. This module helps to identify the most informative and discriminating features for each task. By reducing noise and irrelevant information in the learned representation, this feature selection process enhances the model’s ability to generalize to new tasks. As a result, it can acquire task-specific knowledge more efficiently and improve its performance on unseen tasks. Removing consistency regularization similarly leads to performance degradation. This module enriches the training set by utilizing the consistency assumption to generate new data that are similar to existing data. By increasing the diversity and richness of the data, the model can better learn the intrinsic representation of the data which, in turn, improves its ability to generalize to unseen tasks. Thus, consistency regularization not only increases the size of the training set but also improves the generalization performance of the model and mitigates the risk of overfitting.

By analyzing the performance changes after the removal of individual modules, we can gain more insight into their specific contributions and their impact on the overall meta-learning performance. The results confirm the importance of these modules in improving the performance and efficiency of the model in meta-learning tasks.

5. Conclusions

This paper introduces a novel meta-learning framework, referred to as SDCL, designed to address meta-learning challenges across different domains and task types, particularly in scenarios with limited data. The framework is based on sparse representation and consistency regularization methods. First, an effective migration model from the source domain to the target domain is constructed through dictionary learning, enabling successful knowledge transfer. This approach effectively captures both generic and domain-specific meta-knowledge, addressing the issues of scarce samples and cross-domain learning. Unlike traditional single meta-models, the proposed method aims to learn a set of more adaptable basis vectors from the data, allowing for a more precise representation of the subtle differences between various domains and tasks. Secondly, a consistency regularization technique is introduced to constrain the model’s behavior on the training data by applying consistency constraints during the model training process. Based on the consistency assumption, new data similar to the existing data are generated, thereby enriching the training set. By combining consistency regularization with meta-learning, the framework cleverly expands the training sample size through data augmentation techniques. This approach not only mitigates the problems associated with insufficient data volume but also enhances the model’s generalization ability, effectively overcoming challenges posed by limited samples.

However, while the SDCL framework demonstrates strengths in sparse representation and consistency regularization, it also has limitations. The effectiveness of sparse representation is highly dependent on the selection of hyperparameters, particularly the number of basis vectors, which can be computationally burdensome in high-dimensional or complex domains. Additionally, although consistency regularization helps in data augmentation, its ability to handle outliers or noisy data remains limited, potentially affecting the model’s robustness and performance. Future research could focus on developing adaptive hyperparameter tuning methods to reduce manual intervention and incorporating more robust outlier detection and noise handling mechanisms to improve the model’s generalization capabilities. Moreover, exploring the integration of transfer learning techniques within the framework could enhance its ability to handle complex tasks with significant domain differences, thereby improving cross-domain knowledge transfer.

Author Contributions

Conceptualization, Z.J.; Methodology, Y.T.; Software, Y.W.; Resources, Y.T.; Writing—original draft, Y.W.; Writing—review & editing, Z.J.; Visualization, Z.J. and Y.W.; Supervision, Y.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China 61866040.

Data Availability Statement

The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Marcus, G. Deep learning: A critical appraisal. arXiv 2018, arXiv:1801.00631. [Google Scholar]
Kunzel, B.; Sekhon, J.S.; Bickel, P.J. Meta learners for estimating heterogeneous treatment effects using machine learning. Proc. Natl. Acad. Sci. USA 2019, 116, 4156–4165. [Google Scholar] [CrossRef] [PubMed]
Thrun, S.; Pratt, L. Learning to learn: Introduction and overview. In Learning to Learn; Springer Nature: Boston, MA, USA, 1998. [Google Scholar]
Fei-Fei, L.; Fergus, R.; Perona, P. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 594–611. [Google Scholar] [CrossRef] [PubMed]
Lake, B.M.; Salakhutdinov, R.; Tenenbaum, J.B. Human-level concept learning through probabilistic program induction. Science 2015, 350, 1332–1338. [Google Scholar] [CrossRef] [PubMed]
Sun, S.; Gao, H. Meta-AdaM: A Meta-Learned Adaptive Optimizer with Momentum for Few-Shot Learning. In Proceedings of the Conference on Neural Information Processing Systems, New Orleans, NL, USA, 10–16 December 2023; Volume 37. [Google Scholar]
Chen, Z.; Liu, L.; Kira, Z. A closer look at few-shot classification. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019; pp. 1–17. [Google Scholar]
Tseng, H.; Lee, C.; Huang, J.; Yang, M. Cross-domain few-shot classification via learned feature-wise transformation. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020; pp. 1–18. [Google Scholar]
Gonzalez-Garcia, A.; Van De Weijer, J.; Bengio, Y. Image-to-image translation for cross-domain disentanglement. arXiv 2018, arXiv:1805.09730. [Google Scholar]
Fu, Y.; Jiang, Y.G. Meta-FDMixup: Cross-domain few-shot learning guided by labeled target data. In Proceedings of the 29th ACM International Conference on Multimedia, Virtua, 20–24 October 2021; pp. 5326–5334. [Google Scholar]
Finn, C.; Abbeel, P.; Levine, S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Volume 70, pp. 1126–1135. [Google Scholar]
Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. Matching Networks for One Shot Learning. Adv. Neural Inf. Process. Syst. 2016, 29, 3630–3638. [Google Scholar]
Yao, H.; Wei, Y.; Huang, J.; Li, Z. Hierarchically structured meta-learning. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 7045–7054. [Google Scholar]
Li, Z.; Zhang, J.; Fu, Y. Adversarial Feature Hallucination Networks for Few-Shot Learning. In Proceedings of the EEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 4–19 June 2020; pp. 13467–13476. [Google Scholar]
Yang, J.; Wu, S.; Liu, L.; Xu, M. Bridging the Gap between Few-Shot and Many-Shot Learning via Distribution Calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 12, 9830–9843. [Google Scholar] [CrossRef]
Yao, H.; Zhang, L.; Finn, C. Meta-learning with fewer tasks through task interpolation. arXiv 2021, arXiv:2106.02695. [Google Scholar]
Belkin, M.; Niyogi, P.; Sindhwani, V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 2006, 7, 2399–2434. [Google Scholar]
Coifman, R.R.; Lafon, S. Diffusion maps. Appl. Comput. Harmon. Anal. 2006, 21, 5–30. [Google Scholar] [CrossRef]
Elhamifar, E.; Vidal, R. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2765–2781. [Google Scholar] [CrossRef] [PubMed]
Zhou, Z.H. Mendeley: A free reference manager and academic social network. J. Med Libr. Assoc. 2011, 99, 237. [Google Scholar]
Aharon, M.; Elad, M.; Bruckstein, A. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 2006, 54, 4311–4322. [Google Scholar] [CrossRef]
Wright, J.; Yang, A.Y.; Ganesh, A.; Sastry, S.S.; Ma, Y. Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 210–227. [Google Scholar] [CrossRef]
Elad, M.; Aharon, M. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process. 2006, 15, 3736–3745. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Zhang, L.; Li, C.; Zhang, D. Deep Dictionary Learning for Single Image Super-Resolution. IEEE Trans. Image Process. 2023, 32, 102–116. [Google Scholar]
Chen, X.; Liu, Y.; Zhao, Q.; Liu, Z.; Liu, X.; Wang, Y. Dictionary Learning based Autoencoder for Classification. Pattern Recognit. Lett. 2022, 55, 92–99. [Google Scholar]
Shao, J.; Liu, Z.; Li, Q. Self-paced dictionary learning with graph regularization for feature selection. Pattern Recognit. 2021, 113, 107–117. [Google Scholar]
Liu, Y.; Wang, H.; Zhang, Q.; Chen, L. Consistent Regularization for Unsupervised Domain Adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 522–535. [Google Scholar]
Zhang, L.; Li, X.; Wang, J.; Zhang, D. Consistent Feature Learning for Unsupervised Domain Adaptation. Pattern Recognit. 2022, 126, 108471. [Google Scholar] [CrossRef]
Li, Q.; Zhao, W.; Wang, Z.; Zhang, L. Consistency Regularized Learning for Semi-supervised Object Detection. IEEE Trans. Image Process. 2021, 30, 1856–1870. [Google Scholar]
Zhang, H.; Xu, J.; Jiang, S.; He, Z. Simple Semantic-Aided Few-Shot Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 28588–28597. [Google Scholar]
Raghu, A.; Raghu, M.; Bengio, S.; Vinyals, O. Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML. In Proceedings of the Eighth International Conference on Learning Representations, Addis Ababa, Ethiopia, 6–30 April 2020. [Google Scholar]
Snell, J.; Swersky, K.; Zemel, R. Prototypical Networks for Few-shot Learning. Adv. Neural Inf. Process. Syst. 2017, 30, 4080–4090. [Google Scholar]
Li, Y.; Jiang, H.; Li, C. Cross-domain Few-shot Learning via Adaptive Distributional Sparse Coding. IEEE Trans. Image Process. 2022, 36, 4451–4465. [Google Scholar]
He, Z.; Liu, Y.; Liu, X.; Zhang, H. Multi-source Domain Adaptation for Few-shot Learning via Dictionary Learning and Low-rank Representation. IEEE Trans. Image Process. 2021, 30, 3595–3608. [Google Scholar]
Wang, X.; Peng, X.; Lai, M.; Zhang, L. Cross-Domain Few-Shot Learning via Graph-Based Weighted Dictionary Learning. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 1125–1137. [Google Scholar]
Guo, X.; Zhang, J.; Zhang, X.; Xiong, H. Cross-Domain Few-Shot Learning via Knowledge Distillation. IEEE Trans. Image Process. 2021, 30, 5493–5505. [Google Scholar]
Cao, H.; Cao, Y.; Zuo, W.; Zhang, L. Adaptive Structural Sparsity Regularization for Domain Adaptive Few-shot Learning. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 4014–4027. [Google Scholar]
Zhang, M.; Liu, J.; Tao, D.; Li, X. Few-shot Image Recognition via Knowledge Transfer. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 1953–1969. [Google Scholar]
Tian, H.; Wang, Y. Optimal Selection of Dictionary Atoms for Sparse Dictionary Learning of Time-Varying Monitoring Data. Comput. Geotech. 2024, 165, 105953. [Google Scholar] [CrossRef]
Liu, Y.; Yang, S.; Li, W. Cross-Domain Sparse Dictionary Learning for Remote Sensing Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62. [Google Scholar]
Zhang, Y.; Yang, Y.; Zhang, J.; Huang, K.; Tan, T. Consistency Regularization for Few-shot Classification with Insufficient Tasks. Neurocomputing 2022, 357, 46–57. [Google Scholar] [CrossRef]
Zhu, C.; Han, J.; Yang, W.; Cheng, G.; Liu, Y. Consistency Regularization for Few-shot Learning with Limited Data. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 46–54. [Google Scholar]
Cao, Y.; Long, M.; Wang, J. Consistency Regularization for Cross-Domain Few-Shot Classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7202–7211. [Google Scholar]
Chen, Y.; Zhang, J.; Zhang, Y.; Xie, C.; Sun, T. Task-consistent Contrastive Learning for Few-shot Classification. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 436–441. [Google Scholar]
Sohn, K.; Berthelot, D.; Li, C.-L.; Zhang, Z.; Carlini, N.; Cubuk, E.D.; Kurakin, A.; Zhang, H.; Raffel, C. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. arXiv 2020, arXiv:2001.07685. [Google Scholar]
Yang, Y.; Liu, M.; Yang, Z.; Huang, K. Task-aware Data Augmentation for Few-shot Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7795–7804. [Google Scholar] [CrossRef]
Garcia, V.; Bruna, J. Few-shot learning with graph neural networks. arXiv 2017, arXiv:1711.04043. [Google Scholar]
Miyato, T.; Maeda, S.; Koyama, M.; Ishii, S. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 1979–1993. [Google Scholar] [CrossRef]
Makhzani, A.; Frey, B. K-sparse autoencoders. arXiv 2013, arXiv:1312.5663. [Google Scholar]
Yang, Y.; Newsam, S. Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; pp. 270–279. [Google Scholar]
Xia, G.S.; Hu, J.; Hu, F.; Shi, B.; Bai, X.; Zhong, Y.; Zhang, L.; Lu, X. AID: A benchmark dataset for performance evaluation of aerial scene classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3965–3981. [Google Scholar] [CrossRef]
Cheng, G.; Han, J.; Lu, X. Remote sensing image scene classification: Benchmark and state of the art. Remote Sens. 2017, 105, 1865–1883. [Google Scholar] [CrossRef]
Krause, J.; Stark, M.; Deng, J.; Fei-Fei, L. 3D object representations for fine-grained categorization. In IEEE International Conference on Computer Vision Workshops; IEEE: Piscataway, NJ, USA, 2013; pp. 554–561. [Google Scholar]
Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; Technical Report TR-2009; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
Vuorio, R.; Sun, S.-H.; Hu, H.; Lim, J.J. Multimodal model-agnostic meta-learning via task-aware modulation. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019; Volume 32, pp. 1–12. [Google Scholar]
Li, Z.; Zhou, F.; Chen, F.; Li, H. Meta-sgd: Learning to learn quickly for few shot learning. arXiv 2017, arXiv:1707.09835. [Google Scholar]
Flennerhag, S.; Schroecker, Y.; Zahavy, T.; van Hasselt, H.; Silver, D.; Singh, S. Bootstrapped meta-learning. arXiv 2018, arXiv:1803.02999. [Google Scholar]
Yao, H.; Wu, X.; Tao, Z.; Li, Y.; Ding, B.; Li, R.; Li, Z. Automated relational meta-learning. arXiv 2020, arXiv:2001.00745. [Google Scholar]
Zhou, P.; Zou, Y.; Yuan, X.-T. Task similarity aware meta learning: Theory-inspired improvement on maml. Uncertain. Artif. Intell. 2021, 161, 23–33. [Google Scholar]
Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P.H.S.; Hospedales, T.M. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 1199–1208. [Google Scholar]
Baik, S.; Choi, M.; Choi, J.; Kim, H.; Lee, K.M. Meta-learning with adaptive hyperparameters. Int. Adv. Neural Inf. Process. Syst. 2020, 33, 20755–20765. [Google Scholar]
Liu, Y.; Schiele, B.; Sun, Q. An ensemble of epoch-wise empirical bayes for few-shot learning. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Volume 16, pp. 404–421. [Google Scholar]

Figure 1. Comparison of few-shot learning and cross-domain few-shot learning.

Figure 2. We subdivide the 3D data manifold into multiple localized parts and form redundant bases by considering each part as a collection of linear bases. We then utilize a partial subset of these redundant bases to achieve a sparse representation of the data.

Figure 3. Flow of Consistency Regularization Algorithm. The algorithm enhances the model’s robustness and generalization ability through consistency regularization. First, the input data

x

is stochastically augmented to generate two different augmented versions

{\tilde{x}}_{1}

and

{\tilde{x}}_{2}

, which are then passed through a network with dropout to produce corresponding predictions

{\tilde{y}}_{1}

and

{\tilde{y}}_{2}

. Next, the squared difference

Squared Difference ({\tilde{y}}_{1}, {\tilde{y}}_{2})

between the outputs is computed to measure the model’s consistency with respect to the perturbed inputs. At the same time, the cross-entropy loss Cross-Entropy

({\tilde{y}}_{1}, y)

is calculated to evaluate classification accuracy. Finally, the consistency loss and task loss are combined through a weighted sum

L

, generating the total loss, which is then used to optimize the model’s parameters, thereby balancing task performance and consistency requirements.

Figure 3. Flow of Consistency Regularization Algorithm. The algorithm enhances the model’s robustness and generalization ability through consistency regularization. First, the input data

x

is stochastically augmented to generate two different augmented versions

{\tilde{x}}_{1}

and

{\tilde{x}}_{2}

, which are then passed through a network with dropout to produce corresponding predictions

{\tilde{y}}_{1}

and

{\tilde{y}}_{2}

. Next, the squared difference

Squared Difference ({\tilde{y}}_{1}, {\tilde{y}}_{2})

between the outputs is computed to measure the model’s consistency with respect to the perturbed inputs. At the same time, the cross-entropy loss Cross-Entropy

({\tilde{y}}_{1}, y)

is calculated to evaluate classification accuracy. Finally, the consistency loss and task loss are combined through a weighted sum

L

, generating the total loss, which is then used to optimize the model’s parameters, thereby balancing task performance and consistency requirements.

Figure 4. Framework of our proposed sparse dictionary consistency learning (SDCL).

Figure 5. The dictionary created by the K-sparse autoencoder includes both encoding and decoding components, along with a top-k mechanism used in the activation step to generate a sparse representation.

Figure 6. Different sparse tasks represent performance on five-way one-shot.

Figure 7. The figure demonstrates the variation in model accuracy across three datasets (AID, NP-45, UC-21) with and without the application of Consistency Regularization (CR) as the number of tasks increases.

Table 1. Details of the dataset.

Dataset	Type	Description
CUB-Birds	Fine-grained	A dataset of bird species with images from 200 different bird categories.
Textures	Fine-grained	A collection of texture images, often used for texture classification.
Aircraft	Fine-grained	A dataset containing images of different aircraft models for fine-grained classification.
Fungi	Fine-grained	Images of various species of fungi for fine-grained classification tasks.
UC Merced Land Use	Remote-sensing	Satellite images covering 21 land-use classes for remote sensing tasks.
AID	Remote-sensing	Aerial image dataset with 30 classes to aid in land-use classification.
NWPU-RESISC45	Remote-sensing	A large-scale remote-sensing image dataset covering 45 scene classes.
Cars-196	Fine-grained	A dataset containing 196 car models for fine-grained classification.
CIFAR-100	General-purpose	A general image dataset containing 100 object categories, widely used for image classification tasks.

Table 2. The training accuracy of five-way five-shot on the plain-multi datasets for

ω_{1}

and

ω_{2}

.

Table 2. The training accuracy of five-way five-shot on the plain-multi datasets for

ω_{1}

and

ω_{2}

.

$ω_{1}$	Bird	Tex-Ture	Aircraft	Fungi	Average
0	71.03%	47.30%	67.80%	53.30%	59.86%
0.01	74.25%	49.28%	74.54%	58.77%	64.21%
0.1	72.43%	46.71%	72.83%	56.25%	62.06%
1	70.02%	43.95%	70.45%	53.89%	59.58%
10	69.25%	44.38%	69.27%	51.38%	58.57%
$ω_{2}$	Bird	Tex-ture	Aircraft	Fungi	Average
1	71.84%	47.96%	70.93%	55.86%	61.65%
10	72.49%	48.28%	71.29%	56.92%	62.25%
20	72.31%	46.55%	72.83%	57.35%	62.26%
30	70.39%	45.91%	71.34%	56.37%	61.00%
40	74.25%	49.28%	74.54%	58.77%	64.21%
50	72.63%	46.16%	72.91%	56.58%	62.07%

Bold indicates the highest value in each column.

Table 3. Five-way one-shot and five-way five-shot classification accuracy (%) on Plain-multi datasets with 95% confidence intervals. The values in bold have intersecting confidence intervals with the most accurate method.

	Backbone	Method	Bird	Texture	Aircraf	Fungi	Average
		MatchingNet [12]	48.25 ± 0.91%	29.71 ± 1.02%	43.86 ± 1.71%	36.21 ± 1.26%	39.50 ± 1.22%
	ProtoNet	ProtoNet [32]	50.65 ± 1.21%	31.04 ± 0.92%	51.23 ± 1.06%	39.44 ± 0.86%	43.09 ± 1.01%
		SDCL (our)	56.72 ± 0.93%	32.48 ± 1.38%	52.37 ± 0.75%	42.46 ± 1.21%	46.01 ± 1.07%
		MAML [11]	53.02 ± 1.14%	32.43 ± 1.07%	51.22 ± 1.18%	41.96 ± 1.26%	44.65 ± 1.26%
		Multi-MAML [55]	54.43 ± 0.72%	34.02 ± 0.59%	51.61 ± 0.92%	42.86 ± 1.19%	43.57 ± 0.88%
one-shot		MetaSGD [56]	50.86 ± 1.15%	31.52 ± 1.01%	50.40 ± 0.70%	40.88 ± 0.74%	43.41 ± 0.92%
	MAML	BMG [57]	54.46 ± 1.16%	33.37 ± 0.98%	52.26 ± 1.07%	43.13 ± 1.15%	45.81 ± 1.06%
		HSML [13]	55.31 ± 0.94%	32.12 ± 1.29%	49.73 ± 0.67%	41.18 ± 1.02%	44.58 ± 1.01%
		ARML [58]	57.81 ± 1.29%	34.89 ± 1.31%	55.57 ± 1.11%	44.52 ± 1.32%	48.19 ± 1.25%
		MUSML [59]	57.36 ± 0.58%	36.14 ± 0.81%	53.01 ± 0.78%	42.81 ± 0.73%	47.33 ± 0.72%
		SDCL (our)	64.18 ± 0.78%	37.25 ± 1.08%	61.32 ± 1.19%	45.39 ± 0.81%	52.04 ± 0.97%
		MatchingNet [12]	65.84 ± 0.72%	38.98 ± 1.65%	62.63 ± 0.63%	50.22 ± 1.01%	54.41 ± 1.01%
	ProtoNet	ProtoNet [32]	66.73 ± 0.57%	39.62 ± 0.64%	62.86 ± 0.89%	50.54 ± 0.79%	54.93 ± 0.72%
		SDCL (our)	71.35 ± 0.93%	44.27 ± 0.97%	68.37 ± 0.92%	51.18 ± 0.87%	58.79 ± 0.92%
		MAML [11]	69.83 ± 0.77%	44.56 ± 0.58%	65.23 ± 0.66%	53.65 ± 0.89%	58.31 ± 0.72%
		Multi-MAML [55]	66.99 ± 0.60%	42.51 ± 0.62%	64.82 ± 1.01%	50.30 ± 0.34%	56.15 ± 0.64%
five-shot		MetaSGD [56]	67.93 ± 0.78%	45.22 ± 0.82%	67.16 ± 0.70%	53.15 ± 0.73%	58.36 ± 0.75%
	MAML	BMG [57]	69.59 ± 0.73%	45.72 ± 0.68%	66.47 ± 0.92%	52.50 ± 0.87%	58.57 ± 0.80%
		HSML [13]	72.05 ± 0.84%	46.53 ± 0.76%	72.02 ± 0.84%	56.02 ± 0.70%	61.65 ± 0.78%
		ARML [58]	72.33 ± 0.67%	48.22 ± 0.77%	72.43 ± 0.73%	56.48 ± 0.82%	62.36 ± 0.74%
		MUSML [59]	74.35 ± 1.25%	50.62 ± 1.03%	75.37 ± 1.28%	56.70 ± 1.43%	64.26 ± 1.24%
		SDCL (our)	74.25 ± 1.18%	49.58 ± 0.80%	76.47 ± 1.19%	58.77 ± 0.81%	64.77 ± 1.01%

Table 4. Five-way one-shot and five-way five-shot classification accuracy (%) on Remote-sensing datasets with 95% confidence intervals. The values in bold have intersecting confidence intervals with the most accurate method.

	Backbone	Method	UC-21	AID	NP-45	Average
		MatchingNet [12]	$47.11 \pm 2.28 %$	$52.42 \pm 1.63 %$	$42.84 \pm 1.96 %$	$47.45 \pm 1.95 %$
	ProtoNet	ProtoNet [32]	$54.06 \pm 1.24 %$	$50.25 \pm 1.21 %$	$48.09 \pm 0.78 %$	$50.80 \pm 1.07 %$
		SDCL (our)	56.26 ± 1.04%	53.85 ± 1.18%	56.37 ± 0.75%	55.49 ± 0.99%
		MAML [11]	$56.21 \pm 1.08 %$	$57.47 \pm 1.46 %$	$51.82 \pm 1.34 %$	$55.16 \pm 1.29 %$
		Multi-MAML [55]	$58.93 \pm 0.68 %$	$58.87 \pm 1.13 %$	$52.27 \pm 1.04 %$	$56.69 \pm 0.91 %$
one-shot		MetaSGD [56]	$54.12 \pm 1.16 %$	$52.78 \pm 1.07 %$	$49.41 \pm 1.62 %$	$52.10 \pm 1.28 %$
	MAML	BMG [57]	$57.19 \pm 1.51 %$	$55.18 \pm 1.52 %$	$53.51 \pm 1.24 %$	$55.29 \pm 1.42 %$
		HSML [13]	$63.82 \pm 1.76 %$	$72.09 \pm 1.06 %$	$52.72 \pm 1.16 %$	$62.87 \pm 1.29 %$
		ARML [58]	$65.13 \pm 0.84 %$	$71.53 \pm 1.46 %$	$54.53 \pm 0.91 %$	$63.73 \pm 0.90 %$
		MUSML [59]	$66.82 \pm 1.64 %$	$57.47 \pm 1.14 %$	$56.07 \pm 0.81 %$	$63.44 \pm 1.16 %$
		SDCL (our)	69.58 ± 0.78%	73.10 ± 1.08%	56.26 ± 1.19%	66.31 ± 1.02%
		MatchingNet [12]	$56.56 \pm 1.87 %$	$55.56 \pm 1.09 %$	$57.24 \pm 1.79 %$	$56.45 \pm 1.58 %$
	ProtoNet	ProtoNet [32]	$63.65 \pm 0.45 %$	$65.26 \pm 1.36 %$	$67.90 \pm 0.72 %$	$65.60 \pm 0.84 %$
		SDCL (our)	68.35 ± 0.93%	66.27 ± 0.59%	69.37 ± 0.92%	68.01 ± 0.81%
		MAML [11]	$73.27 \pm 1.03 %$	$74.89 \pm 1.29 %$	$71.49 \pm 0.84 %$	$73.21 \pm 1.05 %$
		Multi-MAML [55]	$73.27 \pm 1.03 %$	$74.89 \pm 1.29 %$	$71.49 \pm 0.84 %$	$73.21 \pm 1.05 %$
five-shot		MetaSGD [56]	$65.45 \pm 1.10 %$	$63.41 \pm 0.83 %$	$60.62 \pm 0.88 %$	$63.16 \pm 0.93 %$
	MAML	BMG [57]	$68.21 \pm 0.61 %$	$65.79 \pm 1.21 %$	$65.61 \pm 0.68 %$	$66.53 \pm 0.83 %$
		HSML [13]	$82.72 \pm 0.88 %$	$81.09 \pm 0.77 %$	$71.92 \pm 0.75 %$	$78.57 \pm 0.80 %$
		ARML [58]	$78.67 \pm 0.63 %$	$80.91 \pm 0.83 %$	$70.81 \pm 0.69 %$	$76.79 \pm 0.71 %$
		MUSML [59]	$83.12 \pm 0.91 %$	$81.43 \pm 0.70 %$	$72.73 \pm 1.12 %$	$79.09 \pm 0.91 %$
		SDCL (our)	83.66 ± 1.18%	81.72 ± 0.80%	73.74 ± 1.19%	79.71 ± 1.06%

Table 5. Comparison results on Cars-196 and CIFAR-100 datasets are reported in terms of accuracy (%) with standard deviation.

Dataset	Method	One-Shot	Five-Shot
	MatchingNet [12]	$28.96 \pm 0.45 %$	$39.87 \pm 0.51 %$
	ProtoNet [32]	$31.18 \pm 0.48 %$	$42.03 \pm 0.55 %$
Cars-196	RelationNet [58]	$29.53 \pm 0.45 %$	$40.64 \pm 0.54 %$
	GNNNet [47]	$32.95 \pm 0.56 %$	$48.91 \pm 0.67 %$
	SDCL (our)	44.06 ± 0.58%	63.52 ± 0.59%
	MAML [30]	$38.20 \pm 0.48 %$	$49.94 \pm 0.49 %$
	ProtoNet [32]	$35.56 \pm 0.77 %$	$51.12 \pm 0.77 %$
	MAML+ALFA [61]	$39.77 \pm 0.48 %$	$53.39 \pm 0.49 %$
CIFAR-100	e3bm [62]	$39.90 \pm 0.49 %$	$52.6 \pm 0.49 %$
	baseline [7]	$39.72 \pm 0.68 %$	$56.04 \pm 0.76 %$
	Meta-AdaM [6]	$41.11 \pm 0.49 %$	$56.32 \pm 0.49 %$
	SDCL (our)	42.12 ± 0.91%	57.49 ± 0.69%

The bold font denotes the highest values achieved by the proposed method under various experimental settings.

Table 6. Results of ablation experiments with five-way one-shot/five-shot for four datasets. Where SD is a sparse dictionary and CR is consistency regularization.

			Plain-Multi		Remote-Sensing		Cars-196		CIFAR-100
SD	Top-K	CR	One-Shot	Five-Shot	One-Shot	Five-Shot	One-Shot	Five-Shot	One-Shot	Five-Shot
√			49.13%	60.84%	63.04%	75.34%	42.64%	60.32%	39.43%	55.24%
√	√		50.38%	62.46%	65.41%	77.35%	43.16%	61.96%	40.74%	55.97%
√		√	50.97%	63.19%	64.07%	76.57%	43.04%	62.74%	41.56%	56.35%
√	√	√	52.04%	64.77%	66.31%	79.71%	44.06%	63.52%	42.12%	57.49%

The check mark (√) indicates that the corresponding condition is applied in the experiment.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jiang, Z.; Wang, Y.; Tang, Y. Few-Shot Classification Based on Sparse Dictionary Meta-Learning. Mathematics 2024, 12, 2992. https://doi.org/10.3390/math12192992

AMA Style

Jiang Z, Wang Y, Tang Y. Few-Shot Classification Based on Sparse Dictionary Meta-Learning. Mathematics. 2024; 12(19):2992. https://doi.org/10.3390/math12192992

Chicago/Turabian Style

Jiang, Zuo, Yuan Wang, and Yi Tang. 2024. "Few-Shot Classification Based on Sparse Dictionary Meta-Learning" Mathematics 12, no. 19: 2992. https://doi.org/10.3390/math12192992

APA Style

Jiang, Z., Wang, Y., & Tang, Y. (2024). Few-Shot Classification Based on Sparse Dictionary Meta-Learning. Mathematics, 12(19), 2992. https://doi.org/10.3390/math12192992

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Few-Shot Classification Based on Sparse Dictionary Meta-Learning

Abstract

1. Introduction

2. Related Work

2.1. Few-Shot Classification

2.2. Sparse Dictionary Learning

2.3. Consistency Regularization

3. Method

3.1. Task Encoder

3.2. Consistency Regularization

3.3. Sparse Representation of Task Features

3.4. Task-Specific Modulation

4. Experiment

4.1. Experimental Settings

4.1.1. Datasets and Evaluation Metric

4.1.2. Baseline Methods

4.1.3. Implementation Details

4.2. Parameter Analysis

4.2.1. Weights $ω_{1}$ and $ω_{2}$

4.2.2. Task Representation Sparsity K

4.3. Experimental Result

4.3.1. Performance Results of Plain-Multi Datasets

4.3.2. Performance Results of Remote-Sensing Datasets

4.3.3. Performance Results of Mixed Datasets

4.4. Ablation Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Few-Shot Classification Based on Sparse Dictionary Meta-Learning

Abstract

1. Introduction

2. Related Work

2.1. Few-Shot Classification

2.2. Sparse Dictionary Learning

2.3. Consistency Regularization

3. Method

3.1. Task Encoder

3.2. Consistency Regularization

3.3. Sparse Representation of Task Features

3.4. Task-Specific Modulation

4. Experiment

4.1. Experimental Settings

4.1.1. Datasets and Evaluation Metric

4.1.2. Baseline Methods

4.1.3. Implementation Details

4.2. Parameter Analysis

4.2.1. Weights ω 1 and ω 2

4.2.2. Task Representation Sparsity K

4.3. Experimental Result

4.3.1. Performance Results of Plain-Multi Datasets

4.3.2. Performance Results of Remote-Sensing Datasets

4.3.3. Performance Results of Mixed Datasets

4.4. Ablation Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.2.1. Weights $ω_{1}$ and $ω_{2}$