FedGAT-DCNN: Advanced Credit Card Fraud Detection Using Federated Learning, Graph Attention Networks, and Dilated Convolutions

Li, Mengqiu; Walsh, John

doi:10.3390/electronics13163169

Open AccessArticle

FedGAT-DCNN: Advanced Credit Card Fraud Detection Using Federated Learning, Graph Attention Networks, and Dilated Convolutions

by

Mengqiu Li

^* and

John Walsh

International College, Krirk University, Bangkok 10220, Thailand

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(16), 3169; https://doi.org/10.3390/electronics13163169

Submission received: 15 July 2024 / Revised: 6 August 2024 / Accepted: 8 August 2024 / Published: 11 August 2024

(This article belongs to the Special Issue Advances in AI Engineering: Exploring Machine Learning Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Credit card fraud detection is a critical issue for financial institutions due to significant financial losses and the erosion of customer trust. Fraud not only impacts the bottom line but also undermines the confidence customers place in financial services, leading to long-term reputational damage. Traditional machine learning methods struggle to improve detection accuracy with limited data, adapt to new fraud techniques, and detect complex fraud patterns. To address these challenges, we present FedGAT-DCNN, a model integrating a Graph Attention Network (GAT) and dilated convolutions within a federated learning framework. FedGAT-DCNN employs federated learning, allowing financial institutions to collaboratively train models using local datasets, enhancing accuracy and robustness while maintaining data privacy. Incorporating a GAT enables continuous model updates across institutions, quickly adapting to new fraud patterns. Dilated convolutions extend the model’s receptive field without extra computational overhead, improving detection of subtle and complex fraudulent activities. Experiments on the 2018CN and 2023EU datasets show that FedGAT-DCNN outperforms traditional models and other federated learning methods, achieving a ROC-AUC of 0.9712 on the 2018CN dataset and 0.9992 on the 2023EU dataset. These results highlight FedGAT-DCNN’s robustness, accuracy, and applicability in real-world fraud detection scenarios.

Keywords:

credit card fraud detection; privacy preserving; federated learning; graph attention networks; dilated convolutions

1. Introduction

In the modern financial sector, credit card fraud poses a significant challenge to both consumers and financial institutions. This illegal activity ranges from the use of unauthorized cards to sophisticated cyber schemes. Recent reports from the Federal Trade Commission indicate a notable increase in such incidents, leading to annual losses of billions of dollars [1]. The globalization of financial markets further complicates this issue, as cross-border transactions provide more opportunities for fraud and make tracking and prosecuting these crimes more difficult [2,3]. Addressing fraud at this international level requires coordinated efforts and advanced detection systems. Consequently, there is an urgent need for innovative strategies, as the banking industry is increasingly turning to artificial intelligence and machine learning to improve fraud detection capabilities [4,5].

In the domain of credit card fraud detection, traditional machine learning methods such as Support Vector Machines (SVM), Random Forests, and k-Nearest Neighbors (k-NN) were extensively utilized due to their simplicity and interpretability [6,7,8]. However, these approaches often rely on extensive manual feature engineering, constraining their adaptability to the continuously evolving fraud patterns and the emergence of novel fraudulent techniques. With advancements in technology, deep learning models incorporating state-of-the-art (SOTA) techniques, such as Convolutional Neural Networks (CNNs) [9,10], Recurrent Neural Networks (RNNs) [11,12], Variational Autoencoders (VAEs) [13], and Generative Adversarial Networks (GANs), are now gaining prominence. These advanced models enhance the detection of subtle fraudulent activities through their capability to perform automatic feature extraction and recognize complex patterns.

Despite significant theoretical advancements in deep learning technology, its practical deployment still encounters several critical challenges. A prominent issue is the dependence of deep learning models on extensive labeled datasets, which poses significant difficulties for small to medium-sized financial institutions with limited data resources (Challenge 1). These institutions often lack the capacity to collect sufficient data to cover all potential fraud patterns, thereby constraining the models’ performance and robustness in real-world scenarios. Additionally, as credit card fraud techniques rapidly evolve, existing anti-fraud models quickly become outdated. For smaller financial institutions with limited resources for updates, their pool of credit card fraud samples may not reflect the latest fraudulent activities, making it challenging for models to effectively predict and prevent emerging fraud patterns (Challenge 2). Therefore, there is an urgent need to develop models that can continuously learn and adapt to new fraudulent behaviors. These models need not only to be able to handle diverse fraud strategies but also to be capable of running updates and iterations in real-time while ensuring the protection of user privacy. Another significant challenge is the detection of subtle and complex fraud patterns in large transaction networks, particularly when data are limited. Traditional models often struggle to identify these intricate patterns due to their limited feature extraction capabilities and the small amount of available data (Challenge 3). Hence, innovative approaches that enhance feature extraction and effectively leverage limited data are essential to improve fraud detection performance.

This research addresses the critical challenges in credit card fraud detection systems by integrating personalized federated learning and advanced graph models. Our proposed framework, FedGAT-DCNN, aims to enhance detection capabilities while strictly adhering to privacy and regulatory standards, thus catering to the dynamic and complex landscape of credit card fraud. To address Challenge 1, FedGAT-DCNN employs a federated learning architecture that enables collaborative model training across institutions. Each institution independently trains customized models using its local data, which helps improve model performance even when local data are insufficient. This approach mitigates the dependency on large, centralized datasets and enhances data privacy by keeping sensitive information local. Our framework addresses Challenge 2 by incorporating a Graph Attention Network (GAT) within the federated learning setup. This allows institutions to collaboratively update their models with the latest data without sharing sensitive information. GATs are particularly adept at quickly integrating updated data, identifying complex patterns within transaction networks, and thereby improving the ability to detect new fraud patterns while maintaining data privacy. To tackle Challenge 3, we integrate dilated convolutional networks within the GAT framework. Dilated convolutions extend the receptive field without adding computational overhead, enabling the model to capture more valuable transaction patterns and detect subtle fraud activities more effectively. This combination allows for the efficient and accurate detection of intricate fraud patterns, even when data are limited. Small financial institutions in a developing country often struggle with limited access to comprehensive transaction data, making traditional fraud detection models less effective. By implementing FedGAT-DCNN, these institutions can leverage collaborative training with larger banks via federated learning without compromising customer data privacy. Through this setup, even with sparse local data, the institutions continuously receive model updates informed by a wider network of transaction patterns. This ongoing refinement enhances the model’s responsiveness to emerging fraud techniques, demonstrating a significant improvement in detection rates within just a few months of implementation.

Our contributions are significant and multifaceted:

Improvement in Fraud Detection Accuracy with Limited Data: FedGAT-DCNN employs federated learning, allowing institutions to collaboratively train models using their local datasets. This approach enhances model accuracy and robustness, even with limited local data, and ensures sensitive information remains private.
Dynamic Adaptation to Evolving Fraud Techniques: By incorporating a GAT into the federated learning system, our framework enables continuous and collaborative model updates across institutions. This integration allows models to quickly adapt to new fraud patterns while maintaining data privacy.
Effective Detection of Intricate Fraud Patterns: The inclusion of dilated convolutional networks within the GAT framework extends the model’s receptive field without additional computational overhead. This combination enhances the model’s ability to detect subtle and complex fraudulent activities, even with limited data.

2. Related Work

In the financial industry, the complexity and challenges of credit card fraud detection continue to drive technological innovation. As big data and machine learning technologies evolve, the limitations of traditional machine learning methods in processing complex financial data have become increasingly apparent. Recently, the introduction of advanced techniques, such as deep learning, GATs, and federated learning, has brought fresh solutions and perspectives to this field. These methods not only excel in enhancing the accuracy and efficiency of fraud detection but also offer unique advantages in protecting user privacy and handling distributed datasets. This section reviews the development of these technologies and their application in credit card fraud detection.

2.1. Machine Learning and Deep Learning Methods in Credit Card Fraud Detection

In the realm of credit card fraud detection, traditional machine learning methods have been widely utilized for years. Techniques such as Decision Trees, SVMs [14,15], Logistic Regression [16,17], and ensemble methods like Random Forests [18,19] and Gradient Boosting Machines [20,21] are extensively employed to identify potential fraudulent activities. These algorithms analyze historical transaction data to detect fraud patterns, offering benefits such as relatively low computational complexity and high interpretability. However, their performance can be hindered when faced with complex, high-dimensional, and non-linear data.

The emergence of big data technologies and increased computational power has led to significant advancements in deep learning methods for credit card fraud detection. Deep Neural Networks, including CNNs [22,23] and RNNs [11,24], have shown remarkable improvements in detection performance due to their robust feature extraction capabilities. Additionally, Long Short-Term Memory networks (LSTMs) [25] are particularly effective for analyzing sequential transaction data, making them well-suited for fraud detection tasks. These deep learning approaches can autonomously uncover complex patterns in data, thereby enhancing detection accuracy. However, they often require substantial amounts of data and computational resources, and their “black box” nature can pose challenges in terms of interpretability.

2.2. Advancements in Graph Neural Networks for Fraud Detection

Graph Neural Networks (GNNs) have recently emerged as a significant development in the field of deep learning, characterized by their ability to directly learn from graph-structured data. This capability is particularly crucial in credit card fraud detection, as GNNs can capture complex patterns and relationships within transaction networks. Among the various GNN architectures, Graph Convolutional Networks (GCNs), GATs, and Graph Isomorphism Networks (GINs) stand out for their effectiveness in identifying fraudulent behaviors in intricate transaction networks.

GCNs draw inspiration from the concept of convolution used in traditional image processing and adapt it for graph data, enabling the model to learn node representations by aggregating features from neighboring nodes. This approach is highly effective in identifying local node patterns that may indicate fraudulent activities [26,27]. GATs introduce an attention mechanism, allowing the model to differentially weigh the importance of neighboring nodes. This selective focus on specific parts of the graph is particularly beneficial in fraud detection, where not all transactions or connections are equally relevant to identifying fraudulent behavior [28,29].

2.3. Federated Learning for Credit Card Fraud Detection

Federated learning is a cutting-edge distributed machine learning paradigm that enables multiple participants to collaborate on training models while preserving data privacy [30]. This is particularly crucial in sensitive areas like credit card fraud detection, where financial institutions are often unable or unwilling to share customer data due to strict privacy regulations and data security concerns. Through federated learning, models can be trained on local data at various institutions without exchanging the data itself, only the model updates. This approach not only enhances the accuracy of fraud detection by leveraging diverse data sources but also robustly protects user privacy.

Advanced algorithms such as Federated Averaging (FedAvg), Federated Optimization, Secure Aggregation, and Federated Proximal (FedProx) have been developed to address the inherent challenges of federated learning, such as low communication efficiency and uneven data distribution among different institutions. For instance, FedAvg [31,32] is a widely used algorithm where local models are independently trained on each participant’s data, and their parameters are then averaged to update the global model, achieving a balance between the customization of the local models and the coherence of the global model. Federated Optimization techniques [33] focus on optimizing the federated learning process itself, reducing communication overhead, and improving convergence rates. Secure Aggregation protocols [34,35] ensure that the aggregation of model updates from various participants is performed securely, further enhancing privacy protection by making individual updates indiscernible. FedProx [36] is designed to handle system heterogeneity by introducing a proximal term to the objective function, which stabilizes training in the presence of diverse local datasets.

2.4. Comparative Analysis with State-of-the-Art Methods

Credit card fraud detection has evolved significantly with advancements in machine learning technologies. Traditional methods such as Support Vector Machines (SVM) and Random Forests are well-regarded for their robustness but often require extensive feature engineering and struggle with non-linear data. Furthermore, these approaches typically necessitate centralized data storage, raising significant privacy concerns.

Graph Neural Networks (GNNs), particularly Graph Attention Networks (GATs), have been adapted to better understand the complex relationships within transaction data networks. While GNNs offer improved pattern recognition over traditional algorithms, they usually require large amounts of data, which can be impractical in settings with strict data privacy regulations.

Federated learning offers a solution to the privacy issues by allowing models to be trained collaboratively without centralizing the data. However, this approach can suffer from issues related to data heterogeneity, where the diverse data distribution can lead to decreased model accuracy.

Our FedGAT-DCNN model merges the strengths of GNNs with the privacy-preserving characteristics of federated learning. By integrating dilated convolutions into the GAT framework, FedGAT-DCNN extends the model’s receptive field, enhancing its ability to detect subtle and complex fraudulent activities without additional computational costs. This integration not only maintains high accuracy in heterogeneous data environments but also ensures compliance with privacy standards, making it uniquely effective for real-world applications. Thus, FedGAT-DCNN outperforms traditional models and standard federated learning approaches by providing a more adaptable, efficient, and privacy-aware solution in the landscape of credit card fraud detection.

3. Methodology

In this study, we introduce an innovative approach, FedGAT-DCNN, to credit card fraud detection by leveraging graph-based neural networks and federated learning. Our method addresses the challenges of skewed data distributions and privacy concerns in financial applications. We begin by constructing a transaction similarity graph, which provides a structured representation of the data, enhancing detection capabilities beyond traditional methods. This graph structure allows the application of GAT to capture complex data patterns. To further enhance model performance, we integrate a dilated convolutional network within the proposed framework. Inspired by signal processing, this approach extends the receptive field in the GAT without the computational overhead typically associated with increased network depth or breadth. Finally, we embed our graph-based detection model within a federated learning framework using the FedProx algorithm. Federated learning is crucial as it enables multiple financial institutions to collaboratively train the model without sharing sensitive customer data. This collaboration addresses data privacy issues and enhances the model’s generalization capability by learning from a diverse set of transaction data across various institutions.

3.1. Transaction Similarity Graph

The transaction similarity graph is the foundation of our proposed method, offering a structured representation of credit card transactions that captures the inherent relationships between different records. Unlike traditional methods such as SVM and Multi-Layer Perceptrons (MLP), which treat each transaction independently, our graph-based approach considers the similarity between transactions, thus leveraging the relational information for improved fraud detection.

3.1.1. Graph Construction

To construct the transaction similarity graph, each transaction is represented as a node, and edges between nodes indicate the similarity between the corresponding transactions. The similarity is measured using cosine similarity, a common metric for assessing the similarity between two high-dimensional vectors. For each transaction, we identify the top-k most similar transactions based on their feature vectors. This creates a graph where each node is connected to its k most similar neighbors. An edge is established between nodes i and j if j is among the top-k most similar transactions to i. This results in an adjacency matrix

A

, where

A_{i j} = 1

if there is an edge between nodes i and j, and

A_{i j} = 0

otherwise.

3.1.2. Advantages over Traditional Methods

The primary advantage of the transaction similarity graph over traditional methods lies in its ability to capture the relational structure of the data. Traditional methods like SVM and MLP operate on the assumption that transactions are independent and identically distributed (i.i.d.). This assumption often fails in real-world scenarios, especially in fraud detection, where fraudulent transactions can be related to each other in complex ways. By representing transactions as a graph, our approach inherently models these relationships, allowing the detection algorithm to consider the context of each transaction. Furthermore, the graph structure facilitates the application of advanced GNN techniques, which can effectively aggregate information from a transaction’s neighbors to improve classification performance. This relational aggregation is not possible with traditional i.i.d.-based methods.

3.1.3. Implementation Details

The dataset is partitioned into multiple groups to simulate the data distribution among different clients in a federated learning setup. Each group is processed separately, ensuring that the model can effectively learn from non-uniform and limited data samples, as is common for small to medium-sized financial institutions.

The process of constructing the graph involves computing the cosine similarity for each transaction and selecting the top-k similar transactions to form edges. This process is computationally efficient and can be parallelized to handle large datasets.

3.2. Dilated Convolutional Network

To address the limitations of existing GAT in handling large graphs, we introduce a dilated convolutional network within the graph framework. This innovation is inspired by the concept of dilated convolutions from the field of signal processing, which allows us to extend the receptive field of each node without significantly increasing computational complexity. Figure 1 shows the framework of the dilated convolution proposed in this work.

3.2.1. Concept and Advantages

In the context of GATs, dilated convolutions enable the model to aggregate information from nodes that are multiple hops away. This is particularly advantageous in fraud detection, where fraudulent transactions may not be highly similar across multiple dimensions but can be linked through certain shared feature dimensions. Traditional convolutional methods are limited in their ability to capture such long-range dependencies, as they typically focus on immediate neighbors. Dilated convolutions, however, expand the receptive field without increasing the number of parameters or the computational load significantly. This allows the model to consider a broader context and integrate information from distant nodes, which is crucial for identifying subtle and dispersed fraud patterns that might otherwise go undetected. For example, a fraudulent transaction may have a few direct similarities with other transactions, but when viewed through the lens of shared intermediate nodes, a more complex pattern of fraud activity can emerge. By using dilated convolutions, our model can effectively capture these long-range dependencies, enhancing its ability to detect complex fraud patterns that span across the transaction network. This approach not only improves the accuracy of fraud detection but also adds robustness to the model, making it more resilient to various types of fraudulent behaviors that exploit the limitations of traditional detection methods. Furthermore, the integration of dilated convolutions with a GAT allows for a more nuanced understanding of transaction relationships, providing deeper insights into the underlying mechanisms of fraud.

3.2.2. Computing Dilated Convolutional Embedding

In our methodology, each GNN layer comprises a dilated convolution layer followed by a GAT layer. The dilated convolution layer computes embeddings that effectively capture neighborhood information within the graph structure, thereby enhancing the model’s ability to detect credit card fraud.

For the c-th dilated convolution channel and each node v in the graph, a random subset of neighbors, denoted by

N_{v}^{c}

, is selected. The features of these neighbors are processed to incorporate multi-hop information efficiently. The procedure for each dilated kernel i involves the following steps. First, the feature vectors of the selected neighbors are concatenated and the resulting matrix is transposed:

X_{N_{v}^{c}} = Concatenate {(x_{j} \forall j \in N_{v}^{c})}^{T} .

Next, the dilated convolution transformation is applied using the current dilated convolution kernel

W_{dil, i}

:

H_{temp, i} = W_{dil, i} \cdot X_{N_{v}^{c}} .

The result is then transposed back to match the original feature orientation, ensuring that each transformed feature vector aligns with its corresponding neighbor:

H_{N_{v}^{c}, i} = H_{temp, i}^{T} .

Then, the results from all dilated kernels are averaged to obtain the transformed features of the c-th dilated convolution channel for the neighbors:

H_{N_{v}^{c}} = \frac{1}{c} \sum_{i = 1}^{c} H_{N_{v}^{c}, i} .

(1)

After that, we apply the output linear transformation

W_{out}

to obtain the final dilated convolutional embeddings:

h_{v} = W_{out} \cdot Concatenate (h_{v}^{1}, h_{v}^{2}, \dots, h_{v}^{c}),

(2)

where

h_{v}^{c}

denotes the transformed feature of node v in dilated convolution channel c.

h_{v}

is the aggregation of the dilated embeddings from all dilated convolution channels.

To incorporate the final dilated convolutional embeddings into the node’s feature representation, we combine them with the original node features using a weighted mixing approach controlled by the parameter

α

:

h_{v}^{mix} = α \cdot h_{v} + (1 - α) \cdot x_{v},

(3)

where

h_{v}^{mix}

denotes the node features for node v after mixing multi-hop neighbor information.

Finally, the mixed features

h_{v}^{mix}

∈

H^{mix}

are fed into the current GAT layer to compute the node embeddings for the layer l:

h_{v}^{l} = GAT (H^{mix}, edge_index),

(4)

where

H^{mix}

denotes the mixed node feature matrix.

This methodology effectively utilizes dilated convolutions within a graph neural network framework to capture extended neighborhood interactions, enhancing the model’s performance in detecting complex and subtle fraud patterns in transaction data.

3.3. Graph Attention Network

The GAT is a powerful extension of GNNs that introduce an attention mechanism to dynamically weigh the importance of a node’s neighbors. This capability allows the model to focus on the most relevant parts of the graph, enhancing its ability to learn meaningful representations for complex tasks such as fraud detection.

3.3.1. Concept and Advantages

The key innovation of the GAT lies in its attention mechanism, which assigns different weights to different neighbors based on their importance. Unlike traditional GNNs that treat all neighbors equally, the GAT learns to prioritize the information from more relevant neighbors. This selective aggregation helps the model to better capture the underlying patterns and relationships in the data, improving both accuracy and robustness.

3.3.2. GAT Layer

In a GAT layer, each node computes attention coefficients for its neighbors. These coefficients determine the contribution of each neighbor’s features to the node’s updated representation. The attention mechanism is typically implemented using a shared attention function, which computes the coefficients based on the features of the nodes involved.

Formally, let

h_{i}^{(l)}

and

h_{j}^{(l)}

be the feature vectors of nodes i and j at layer l, respectively. The attention coefficient

e_{i j}^{(l)}

between nodes i and j is computed as:

e_{i j}^{(l)} = LeakyReLU (a^{T} [W h_{i}^{(l)} | | W h_{j}^{(l)}]),

(5)

where

W

is a weight matrix,

a

is the attention vector, and

[\cdot | | \cdot]

denotes concatenation. The LeakyReLU activation function introduces non-linearity and helps to stabilize training.

The attention coefficients are then normalized using the softmax function:

α_{i j}^{(l)} = \frac{exp (e_{i j}^{(l)})}{\sum_{k \in N_{i}} exp (e_{i k}^{(l)})},

(6)

where

N_{i}

denotes the set of neighbors of node i. The normalized attention coefficients

α_{i j}^{(l)}

are used to compute the weighted sum of the neighbors’ features, which forms the updated representation of node i:

h_{i}^{(l + 1)} = σ (\sum_{j \in N_{i}} α_{i j}^{(l)} W h_{j}^{(l)}),

(7)

where

σ

is a non-linear activation function, such as ReLU.

3.3.3. Multi-Head Attention

To enhance the stability and expressiveness of the model, the GAT employs multi-head attention, where multiple attention mechanisms are applied in parallel. The outputs of these attention heads are concatenated or averaged to form the final representation:

h_{i}^{(l + 1)} = ∥_{k = 1}^{K} σ (\sum_{j \in N_{i}} α_{i j}^{(l, k)} W_{k} h_{j}^{(l)}),

(8)

where K is the number of attention heads, and ‖ denotes concatenation.

3.3.4. Benefits in Fraud Detection

The attention mechanism in the GAT allows the model to dynamically focus on the most relevant transactions for each node, making it particularly effective for fraud detection. By weighing the importance of each neighboring transaction, the model can better distinguish between normal and fraudulent patterns. This capability is crucial for accurately identifying fraudulent transactions that may be subtly connected to others in the transaction graph.

Furthermore, the ability to handle multi-hop relationships and aggregate information from a broader context enhances the model’s robustness against various types of fraud. The use of multi-head attention also improves the stability and generalization of the model, making it more reliable in real-world applications.

3.4. Integration with FedProx

To leverage the benefits of federated learning, we integrate our graph-based fraud detection model with the FedProx framework. FedProx is designed to handle the heterogeneity and non-IID (non-independent and identically distributed) nature of data typically encountered in federated learning scenarios. This integration enables multiple financial institutions to collaboratively train a robust fraud detection model without sharing sensitive customer data, thereby preserving privacy and enhancing security.

3.4.1. FedProx Algorithm

FedProx is an extension of the standard federated averaging algorithm (FedAvg), specifically designed to address the challenges posed by heterogeneous data distributions. It introduces a proximal term in the local objective function to limit the impact of local updates, thereby stabilizing the training process and improving convergence.

The local optimization problem in FedProx for client k is defined as:

min_{w} f_{k} (w) + \frac{μ}{2} {∥ w - w_{t} ∥}^{2},

(9)

where

f_{k} (w)

is the local loss function for client k,

w

represents the model parameters,

w_{t}

is the global model parameters at iteration t, and

μ

is the proximal term coefficient. The proximal term

\frac{μ}{2} {∥ w - w_{t} ∥}^{2}

penalizes large deviations from the global model, ensuring more consistent updates across clients.

The global model update at the server is performed by aggregating the local updates from all clients

w_{t + 1} = w_{t} - η \sum_{k = 1}^{K} \frac{n_{k}}{n} \nabla f_{k} (w_{t}),

(10)

where

η

is the learning rate, K is the number of clients,

n_{k}

is the number of samples at client k, and n is the total number of samples across all clients.

3.4.2. Integration with Graph-Based Model

Integrating our graph-based model, which combines dilated convolutional networks and the GAT, into the FedProx framework involves the following steps:

(i): Local Model Training: Each client trains the graph-based model on its local transaction similarity graph using the dilated convolutional and GAT layers to extract features and perform fraud detection.
(ii): Proximal Term Adjustment: During local training, the proximal term in the FedProx algorithm ensures that the updates are consistent with the global model parameters, preventing large deviations that could destabilize the training process.
(iii): Model Aggregation: The local model updates from all clients are sent to the central server, where they are aggregated to update the global model parameters.
(iv): Iterative Refinement: The global model is redistributed to all clients, and the process is repeated iteratively until the model converges.

3.4.3. Benefits for Fraud Detection

The integration of our advanced graph-based model with the FedProx framework provides several benefits for credit card fraud detection:

Enhanced Privacy: By keeping transaction data local to each institution, the framework ensures that sensitive information is not exposed, addressing privacy concerns.
Robustness to Data Heterogeneity: The proximal term in FedProx stabilizes the training process despite the presence of heterogeneous data distributions across clients (institutions).
Improved Model Performance: The collaborative training approach leverages diverse transaction data from multiple institutions, leading to a more generalized and accurate fraud detection model.
Scalability: The federated learning setup allows the model to scale to a large number of clients, each contributing to the overall improvement of the global model.

In summary, the integration of our graph-based fraud detection model with the FedProx framework provides a scalable, privacy-preserving, and robust solution for credit card fraud detection. This combination leverages the strengths of advanced graph neural network techniques and the collaborative power of federated learning to deliver superior performance in real-world scenarios.

3.5. Time Complexity Analysis

In this section, we thoroughly analyze the time complexity of the proposed method to assess its scalability on large datasets. The model comprises two main components: the dilated convolutional network and the GAT layer.

The dilated convolutional network module is responsible for computing dilated convolutional embeddings by aggregating information from the neighborhood of each node in the graph. The detailed time complexity analysis for this component is as follows:

Neighbor Feature Extraction: For each node v, the features of its d neighbors are extracted. This operation involves accessing the feature matrix and has a complexity of $O (N \cdot d \cdot F)$ , where N is the number of nodes, d is the node degree, and F is the feature dimension.
Dilated Convolutional Transformations: The neighbors’ features for each node are transformed through c dilated channels. For each channel, the node features are permuted and passed through a linear transformation, resulting in a complexity of $O (N \cdot c \cdot d \cdot F^{2})$ . The factor $F^{2}$ arises from the matrix multiplication involved in the linear transformation.
Aggregation and Final Transformation: After applying dilated convolutions, the results are concatenated and transformed by an additional linear layer, contributing an additional $O (N \cdot F^{2})$ complexity.
Overall Complexity for the Dilated Convolution Layer: The overall complexity for the dilated convolution layer is:

$O (N \cdot d \cdot F + N \cdot c \cdot d \cdot F^{2} + N \cdot F^{2})$

This simplifies to $O (N \cdot c \cdot d \cdot F^{2})$ as the matrix multiplication and transformation are the dominant operations.

The GAT layer processes the mixed features from the dilated convolutional network and applies attention mechanisms to dynamically weight the contributions of neighboring nodes. The specific time complexity analysis for the GAT layer is as follows:

Attention Coefficient Calculation: For each edge, attention coefficients are computed using the node features, resulting in a complexity of $O (E \cdot F \cdot n h e a d s)$ , where E is the number of edges and $n h e a d s$ is the number of attention heads.
Feature Aggregation: Node features are aggregated based on the attention coefficients, leading to a complexity of $O (N \cdot F \cdot n h e a d s)$ .
Overall Complexity for the GAT Layer: The overall complexity for the GAT layer is:

$O (E \cdot F \cdot n h e a d s + N \cdot F \cdot n h e a d s)$

The complexity is primarily dominated by the attention coefficient calculation, due to the potentially large number of edges in the graph.

The total time complexity of the model is determined by combining the complexities of the dilated convolutional network modules and the GAT layers. Considering the dominant terms from each component, the complexity can be expressed as:

O (N \cdot c \cdot d \cdot F^{2} + E \cdot F \cdot n h e a d s)

The model’s complexity depends on the number of nodes N, node degree d, feature dimension F, number of dilated channels c, number of edges E, and number of attention heads

n h e a d s

. Optimizing these parameters is crucial for balancing model accuracy and computational efficiency, especially in large-scale graph data scenarios like credit card fraud detection.

4. Experiment

4.1. Experimental Settings

4.1.1. Benchmark Models

In this experiment, we evaluate and compare several benchmark models for credit card fraud detection that are commonly utilized in the financial industry, as detailed in Table 1. These models represent a mix of traditional machine learning techniques and innovative federated learning approaches, reflecting the diverse methodologies adopted within the industry. By assessing their performance against FedGAT-DCNN, we aim to demonstrate the potential advantages and effectiveness of FedGAT-DCNN in addressing credit card fraud detection challenges.

4.1.2. Training Device and Parameter Configuration

In this research, we use PyTorch version 2.2.0 and Python 3.12 as our computing framework. Our simulations are performed on a computer configured with an Intel i5 processor clocked at 3.7GHz, 64.0GB of RAM, and an NVIDIA RTX 4070 GPU with 12.0 GB of memory.

The optimal configurations for maximum model performance are detailed in Table 2. This table provides a detailed examination of the essential parameters employed during the training of the FedGAT-DCNN model. The number of communication rounds is fixed at 300. Each client performs local training across 10 epochs, with a learning rate established at 0.01. The Adam optimizer is utilized, augmented by a weight decay of 1 × 10⁻⁴ to reduce overfitting risks. Additionally, the model employs binary cross-entropy as its loss function, which aids in accurately assessing prediction errors and enhances the overall accuracy of the model.

4.2. Dataset Overview and Data Segmentation Strategy

To facilitate our exploration of FedGAT-DCNN, we leverage two comprehensive datasets that provide extensive and varied transaction data. These datasets have been carefully selected to offer both breadth and depth of insight into the patterns and characteristics of fraudulent activities, allowing us to apply and test advanced analytical techniques in realistic settings. Figure 2 shows the category distribution of the datasets used in this work. In the following, we detail the specific attributes and configurations of each dataset used in our study.

1.: 2018 4th ‘HaoDai Cup’ China Risk Management Control and Capability Challenge Dataset (2018CN): This dataset comprises credit card transaction records from September 2013, spanning over two days with a total of 284,785 transactions, including 483 fraudulent cases. Due to the significant imbalance, with fraudulent transactions accounting for only $0.1700 %$ of the total, this dataset poses challenges to fraud detection models. All features, except for “Time” and “Amount”, have been converted to numerical values through Principal Component Analysis (PCA), maintaining confidentiality of the original data. The V1 to V28 features represent the principal components derived from PCA. The “Time” feature notes the seconds since the first transaction recorded in the dataset, while the “Amount” indicates the transaction value, which is pivotal for cost-sensitive learning algorithms. The “Class” variable specifies the transaction type, where 1 denotes fraud and 0 indicates non-fraud.
2.: European Cardholders’ Credit Card Transaction Records in 2023 (2023EU): This dataset aggregates anonymized transaction data from European cardholders in 2023, featuring over 568,630 entries. Each entry is a distinct transaction with an identifier and 28 anonymized attributes potentially describing aspects like transaction timing and location. The dataset also details transaction amounts and categorizes each transaction as fraudulent (1) or legitimate (0). To facilitate more effective model training and research, the dataset has been adjusted to balance the counts of fraudulent and non-fraudulent transactions, eliminating biases commonly found in unbalanced datasets.

Our dataset is randomly divided into multiple non-overlapping subsets based on the number of predetermined financial institutions, with each subset assigned to only one institution. This method ensures that the data between any two financial institutions are inconsistent, simulating the complete isolation of customer data across different institutions in reality. This careful division allows us to more accurately mimic real-world scenarios, thereby enhancing the credibility of our experimental results.

Once the data are split into multiple non-overlapping subsets, we further partition the subsets within each federated learning client. The data owned by each client are independently shuffled to ensure that the data partitioning of each financial institution is unique and representative of the wider dataset. Following standard experimental protocols, we divide each client’s dataset into a training set, validation set, and test set in a 7:2:1 ratio.

After partitioning the datasets for all federated learning clients, we construct a transaction similarity graph for each client. For any transaction record in the dataset, we calculate its similarity to the remaining transaction records and select the top-k most similar records as neighbors. This approach transforms the fraudulent transaction detection task into a semi-supervised node classification task on the graph.

This balanced approach is crucial for enabling models developed by various financial institutions to generalize effectively, ultimately leading to more effective and reliable fraud detection results. Our comprehensive strategy ensures that our federated learning framework achieves high performance while adhering to stringent security and privacy standards.

4.3. Comparison with Benchmark Models

Experimental analysis on the 2018CN and 2023EU datasets demonstrates that FedGAT-DCNN exhibits a significant advantage in ROC-AUC while maintaining high accuracy (as shown in Table 3).

Specifically, on the 2018CN dataset, FedGAT-DCNN achieves an accuracy of 0.9851, which is not the highest. However, considering that 2018CN is an extremely imbalanced dataset, its ROC-AUC of 0.9712 is far superior to all other methods. This indicates that despite the severe imbalance in the data, FedGAT-DCNN can effectively identify fraudulent transactions, showcasing its robust capability in handling imbalanced data.

On the 2023EU dataset, FedGAT-DCNN also performs exceptionally well, with an accuracy of 0.9889 and an ROC-AUC of 0.9992. This demonstrates the method’s robustness and superiority in a more balanced dataset environment. Although other methods also perform well on this dataset, FedGAT-DCNN remains significantly ahead in the crucial ROC-AUC metric.

Since FedGAT-DCNN is developed based on FedProx, we paid special attention to the performance of FedProx. On the 2018CN dataset, FedProx achieves an ROC-AUC of only 0.8100, highlighting its limitations in handling imbalanced data. In contrast, on the 2023EU dataset, FedProx achieves an accuracy of 0.5005 and an ROC-AUC of 0.5583, which is significantly lower than that of FedGAT-DCNN.

The superior performance of FedGAT-DCNN can be attributed to its incorporation of a Graph Attention Network (GAT) and dilated convolutions in the local model. The GAT dynamically focuses on the most relevant transaction nodes, allowing the model to better differentiate between normal and fraudulent patterns. Dilated convolutions expand the receptive field, enabling the model to capture broader contextual information and enhance detection capabilities.

In summary, FedGAT-DCNN not only demonstrates outstanding performance on the highly imbalanced 2018CN dataset but also maintains high levels of accuracy and ROC-AUC on the more balanced 2023EU dataset. This indicates that FedGAT-DCNN, developed based on FedProx, possesses strong adaptability and generalization capabilities across different data distribution conditions, making it particularly suitable for real-world credit card fraud detection scenarios.

4.4. Ablation Study

To further investigate the performance and characteristics of FedGAT-DCNN, targeted ablation experiments were designed and conducted. The ablation study utilized the 2018CN dataset, selected due to its extreme class imbalance, which closely mirrors real-world scenarios in credit card fraud detection. By employing a dataset with such extreme class imbalance, this study aims to evaluate the robustness and effectiveness of the proposed method under conditions that accurately reflect the challenges faced in real-world fraud detection. This choice enhances the relevance and applicability of the findings to practical applications in credit card fraud detection. The ablation study’s results provide critical insights into how different components of FedGAT-DCNN contribute to its overall performance, particularly in handling highly imbalanced data environments.

4.4.1. Ablation Study on Different Number of Clients

This study examines how the FedGAT-DCNN model’s performance is influenced by the number of clients participating in the federated learning process. By varying the client count and measuring the model’s accuracy and generalization ability, we can gain a deeper understanding of the scalability and robustness of FedGAT-DCNN in different deployment scenarios.

The ablation study with four clients, as illustrated by the confusion matrix in Figure 3, provides clear insights into the performance of FedGAT-DCNN before and after applying federated learning. The black-framed matrices represent the performance before federated learning, while the red-framed matrices show the performance after federated learning.

From the confusion matrices, it is evident that there is a significant improvement in the false positive rate (i.e., normal transactions predicted as fraudulent) after federated learning. Before federated learning, the confusion matrices for each client show a considerable number of false positives. After applying federated learning, the false positive rate decreases substantially, indicating that FedGAT-DCNN becomes more precise in identifying fraudulent transactions without incorrectly classifying legitimate transactions as fraudulent.

Additionally, the stability of credit card fraud detection is maintained, as indicated by the number of true positives (correctly identified fraudulent transactions) and true negatives (correctly identified normal transactions). This stability demonstrates that while FedGAT-DCNN reduces false positives, it does not compromise on the ability to accurately detect fraudulent transactions.

The reduction in false positives highlights that FedGAT-DCNN, when integrated with federated learning, becomes more precise in distinguishing between fraudulent and normal transactions. This precision is critical in real-world applications where minimizing false alarms can save significant resources and improve user trust. The consistent performance in detecting true fraud cases shows that the model maintains its effectiveness in identifying actual fraudulent transactions even after federated learning. This stability is crucial for ensuring that the model remains reliable and trustworthy in practical deployments.

The comparison between the performance before and after federated learning underscores the benefits of federated learning in enhancing the model’s performance. By allowing multiple clients to collaboratively train the model, federated learning helps in leveraging diverse data distributions, which contributes to the improved overall accuracy and robustness of FedGAT-DCNN.

Additionally, we also evaluated FedGAT-DCNN with client numbers 8, 12, and 16 (as shown in Figure 4, Figure 5 and Figure 6), and found similar performance to when the client number is 4. This consistency across different client numbers indicates that FedGAT-DCNN maintains its effectiveness and robustness regardless of the number of participating clients.

Insights from these evaluations demonstrate that FedGAT-DCNN is highly adaptable and reliable across various configurations of federated learning environments. The ability to maintain high performance and low false positive rates across different numbers of clients suggests that FedGAT-DCNN can be effectively scaled and applied in diverse real-world settings, making it a versatile tool for credit card fraud detection.

4.4.2. Ablation Study on Different Node Degrees in the Transaction Similarity Graph

The structure of the transaction similarity graph plays a pivotal role in the effectiveness of the FedGAT-DCNN model. This study explores how changes in the degree of the mean node, reflecting varying levels of connectivity between nodes, affect the model’s ability to identify patterns and detect fraudulent transactions. By examining the model’s performance with different node degrees (3, 5, and 7), we can gain insights into how the graph structure influences the model’s capabilities.

The results, as shown in the Table 4, highlight the sensitivity of FedGAT-DCNN to the graph structure. For the 2018CN dataset, the model achieves the highest ROC-AUC of 0.9712 with a node degree of 3, indicating that a lower degree of connectivity is more beneficial in highly imbalanced data scenarios. As the node degree increases to 5 and 7, the ROC-AUC slightly decreases to 0.9701 and 0.9565, respectively. This suggests that overly connected nodes might introduce noise, potentially confusing the model and diminishing its ability to accurately identify fraudulent transactions.

For the 2023EU dataset, the model’s ROC-AUC remains remarkably high across all node degrees, with values of 0.9988, 0.9992, and 0.9991 for degrees 3, 5, and 7, respectively. This consistency indicates that in more balanced data scenarios, the model is robust to changes in node connectivity, maintaining high accuracy and detection capability regardless of the graph structure.

The performance of FedGAT-DCNN is sensitive to the structure of the transaction similarity graph. Lower node degrees tend to be more effective in highly imbalanced datasets, as they likely reduce noise and enhance the clarity of fraudulent patterns. The model demonstrates robustness to varying node degrees in balanced datasets like 2023EU, maintaining high ROC-AUC values across different degrees. However, in imbalanced datasets like 2018CN, the model’s performance is optimal at a lower node degree, suggesting that careful tuning of graph connectivity is crucial for such data. These findings inform strategies for graph preprocessing. For imbalanced datasets, it is advisable to limit node connectivity to avoid overcomplicating the graph structure. In contrast, for balanced datasets, the model is flexible to different levels of node connectivity.

4.4.3. Ablation Study on Different Number of Attention Heads

Attention mechanisms can significantly influence the model’s ability to focus on critical transaction features. This study examines the impact of varying the number of attention heads on the FedGAT-DCNN model’s performance. By comparing the results across different configurations, we can understand the role of attention heads in feature extraction and pattern recognition, ultimately identifying an optimal configuration that balances model complexity with predictive accuracy.

The analysis reveals that the number of attention heads plays a crucial role in the model’s performance. As shown in Figure 7, there is a clear variation in ROC-AUC and accuracy with different numbers of attention heads. Specifically, the model achieves the highest ROC-AUC when the number of attention heads is set to 2 and 8, indicating that these configurations allow the model to effectively focus on the most relevant features without being overwhelmed by complexity. In contrast, a single attention head results in the lowest ROC-AUC, suggesting that insufficient attention capacity can limit the model’s ability to capture critical patterns in the data.

Furthermore, the model maintains a relatively stable accuracy across all configurations, with a slight dip when the number of attention heads increases to 4 and 12. This suggests that while accuracy is less sensitive to the number of attention heads, the ability to precisely identify fraudulent transactions (as indicated by ROC-AUC) is more affected by this parameter. The balance between the number of attention heads and model performance highlights the importance of tuning this parameter to achieve optimal results.

In summary, the study demonstrates that the optimal number of attention heads for the FedGAT-DCNN model lies at 2 and 8, where the model achieves the best balance between complexity and performance. This insight is critical for enhancing the model’s capability in real-world credit card fraud detection scenarios, as it ensures the model can effectively leverage attention mechanisms to improve predictive accuracy without unnecessary computational overhead.

4.4.4. Ablation Study on Different Number of Dilated Channels and Dilated Message Weights

To evaluate the impact of the number of dilated channels and dilated message weights on the performance of the FedGAT-DCNN model, we conducted a series of experiments. These experiments provide valuable insights into how these parameters affect feature extraction and pattern recognition, helping to determine the optimal configurations.

As illustrated in Figure 8, the number of dilated channels significantly influences the model’s ROC-AUC and accuracy. The model achieves its highest ROC-AUC and accuracy when the number of dilated channels is set to 2, indicating that this configuration effectively expands the model’s receptive field, allowing it to capture broader contextual information. However, when the number of dilated channels increases to 3, the ROC-AUC drops significantly, suggesting that too many dilated channels may introduce noise and reduce the model’s performance. Further analysis reveals that while accuracy remains relatively stable across different configurations, it peaks when the number of dilated channels is 2. This suggests that an optimal number of dilated channels strikes a balance between expanding the receptive field and avoiding excessive complexity, thereby enhancing the overall performance of the FedGAT-DCNN model.

Similarly, as shown in Figure 9, the dilated message weights also have a significant effect on the model’s ROC-AUC and accuracy. The model achieves the highest ROC-AUC and accuracy when the dilated message weight is set to 0.4, indicating that this weight configuration allows the model to optimally utilize dilated messages for feature extraction. However, when the dilated message weight increases to 0.6 and 0.8, the ROC-AUC and accuracy drop significantly, suggesting that too high a dilated message weight may introduce noise and affect the model’s performance. Further analysis shows that although the impact of dilated message weight on accuracy is less pronounced, the model performs best in terms of both accuracy and ROC-AUC when the weight is 0.4. This indicates that an optimal dilated message weight can balance effective feature extraction and avoid excessive complexity, thus enhancing the overall performance of the FedGAT-DCNN model.

In summary, the study demonstrates that the FedGAT-DCNN model performs best with two dilated channels and a dilated message weight of 0.4. These findings are crucial for optimizing model configuration and improving the accuracy and robustness of credit card fraud detection in practical applications.

5. Conclusions

In this research, we have explored the effectiveness of the FedGAT-DCNN model for detecting credit card fraud, yielding substantial insights and outcomes through comprehensive experimental validation and ablation studies. The key conclusions drawn from our research are:

Robust Detection Performance: FedGAT-DCNN significantly enhances fraud detection capabilities, particularly in scenarios with highly imbalanced data. It achieves an ROC-AUC of 0.9712 on the 2018CN dataset and 0.9992 on the 2023EU dataset, demonstrating robust performance across varied data distributions.
Advanced Feature Integration: The integration of a Graph Attention Network (GAT) and dilated convolutions within the FedGAT-DCNN framework allows for dynamic and efficient adaptation to emerging fraud patterns, enhancing the model’s ability to capture complex transaction patterns and contextual information.
Privacy and Collaboration: Utilizing federated learning, FedGAT-DCNN enables multiple financial institutions to collaboratively train the model while preserving data privacy. This collaboration not only enhances the detection accuracy by leveraging diverse data sources but also ensures compliance with stringent data protection regulations.
Future Research Directions: Going forward, we intend to refine the FedGAT-DCNN model further by exploring additional graph-based techniques and expanding its applicability to other types of financial fraud. This future work aims to provide valuable guidelines for deploying effective fraud detection systems in real-world financial environments.

Author Contributions

Conceptualization, M.L.; Methodology, M.L.; Software, M.L.; Validation, M.L.; Formal analysis, M.L.; Resources, M.L.; Data curation, M.L.; Writing—original draft, M.L.; Writing—review & editing, M.L.; Visualization, M.L.; Supervision, J.W.; Project administration, J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Federal Trade Commission. Consumer Sentinel Network Data Book 2021; Federal Trade Commission: Washington, DC, USA, 2022. [Google Scholar]
Gundur, R.; Levi, M.; Topalli, V.; Ouellet, M.; Stolyarova, M.; Chang, L.Y.C.; Mejía, D.D. Evaluating criminal transactional methods in cyberspace as understood in an international context. CrimRxiv 2021. [Google Scholar] [CrossRef]
Syaufi, A.; Zahra, A.F.; Gholi, F.M.I. Employing Forensic Techniques in Proving and Prosecuting Cross-border Cyber-financial Crimes. Int. J. Cyber Criminol. 2023, 17, 85–101. [Google Scholar]
Mahalakshmi, V.; Kulkarni, N.; Kumar, K.P.; Kumar, K.S.; Sree, D.N.; Durga, S. The role of implementing artificial intelligence and machine learning technologies in the financial services industry for creating competitive intelligence. Mater. Today Proc. 2022, 56, 2252–2255. [Google Scholar] [CrossRef]
Hassan, M.; Aziz, L.A.R.; Andriansyah, Y. The role artificial intelligence in modern banking: An exploration of AI-driven approaches for enhanced fraud prevention, risk management, and regulatory compliance. Rev. Contemp. Bus. Anal. 2023, 6, 110–132. [Google Scholar]
Awoyemi, J.O.; Adetunmbi, A.O.; Oluwadare, S.A. Credit card fraud detection using machine learning techniques: A comparative analysis. In Proceedings of the 2017 international conference on computing networking and informatics (ICCNI), Lagos, Nigeria, 29–31 October 2017; pp. 1–9. [Google Scholar]
Khalid, A.R.; Owoh, N.; Uthmani, O.; Ashawa, M.; Osamor, J.; Adejoh, J. Enhancing credit card fraud detection: An ensemble machine learning approach. Big Data Cogn. Comput. 2024, 8, 6. [Google Scholar] [CrossRef]
Mathew, J.C.; Nithya, B.; Vishwanatha, C.; Shetty, P.; Priya, H.; Kavya, G. An analysis on fraud detection in credit card transactions using machine learning techniques. In Proceedings of the 2022 Second International Conference on Artificial Intelligence and Smart Energy (ICAIS), Coimbatore, India, 23–25 February 2022; pp. 265–272. [Google Scholar]
Chen, J.I.Z.; Lai, K.L. Deep convolution neural network model for credit-card fraud detection and alert. J. Artif. Intell. 2021, 3, 101–112. [Google Scholar]
Karthika, J.; Senthilselvi, A. Smart credit card fraud detection system based on dilated convolutional neural network with sampling technique. Multimed. Tools Appl. 2023, 82, 31691–31708. [Google Scholar] [CrossRef]
Benchaji, I.; Douzi, S.; El Ouahidi, B. Credit card fraud detection model based on LSTM recurrent neural networks. J. Adv. Inf. Technol. 2021, 12, 113–118. [Google Scholar] [CrossRef]
Forough, J.; Momtazi, S. Ensemble of deep sequential models for credit card fraud detection. Appl. Soft Comput. 2021, 99, 106883. [Google Scholar] [CrossRef]
Mrčela, L.; Kostanjčar, Z. Probabilistic Deep Learning Approach to Credit Card Fraud Detection. In Proceedings of the 2024 47th MIPRO ICT and Electronics Convention (MIPRO), Opatija, Croatia, 20–24 May 2024; pp. 181–186. [Google Scholar]
Rtayli, N.; Enneya, N. Enhanced credit card fraud detection based on SVM-recursive feature elimination and hyper-parameters optimization. J. Inf. Secur. Appl. 2020, 55, 102596. [Google Scholar] [CrossRef]
Kumar, S.; Gunjan, V.K.; Ansari, M.D.; Pathak, R. Credit card fraud detection using support vector machine. In Proceedings of the 2nd International Conference on Recent Trends in Machine Learning, IoT, Smart Cities and Applications: ICMISC 2021; Springer: Singapore, 2022; pp. 27–37. [Google Scholar]
Hussein, A.S.; Khairy, R.S.; Najeeb, S.M.M.; Alrikabi, H.T.S. Credit Card Fraud Detection Using Fuzzy Rough Nearest Neighbor and Sequential Minimal Optimization with Logistic Regression. Int. J. Interact. Mob. Technol. 2021, 15, 24–42. [Google Scholar] [CrossRef]
Alenzi, H.Z.; Aljehane, N.O. Fraud detection in credit cards using logistic regression. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 540–551. [Google Scholar] [CrossRef]
Xuan, S.; Liu, G.; Li, Z.; Zheng, L.; Wang, S.; Jiang, C. Random forest for credit card fraud detection. In Proceedings of the 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC), Zhuhai, China, 27–29 March 2018; pp. 1–6. [Google Scholar]
Dileep, M.; Navaneeth, A.; Abhishek, M. A novel approach for credit card fraud detection using decision tree and random forest algorithms. In Proceedings of the 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), Tirunelveli, India, 4–6 February 2021; pp. 1025–1028. [Google Scholar]
Taha, A.A.; Malebary, S.J. An intelligent approach to credit card fraud detection using an optimized light gradient boosting machine. IEEE Access 2020, 8, 25579–25587. [Google Scholar] [CrossRef]
Mishra, A.; Ghorpade, C. Credit card fraud detection on the skewed data using various classification and ensemble techniques. In Proceedings of the 2018 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), Bhopal, India, 24–25 February 2018; pp. 1–5. [Google Scholar]
Fu, K.; Cheng, D.; Tu, Y.; Zhang, L. Credit card fraud detection using convolutional neural networks. In Neural Information Processing, Proceedings of the 23rd International Conference, ICONIP 2016, Kyoto, Japan, 16–21 October 2016, Proceedings, Part III 23; Springer: Berlin, Germany, 2016; pp. 483–490. [Google Scholar]
Sadgali, I.; Sael, N.; Benabbou, F. Fraud detection in credit card transaction using neural networks. In Proceedings of the 4th International Conference on Smart City Applications, Casablanca, Morocco, 2–4 October 2019; pp. 1–4. [Google Scholar]
Roy, A.; Sun, J.; Mahoney, R.; Alonzi, L.; Adams, S.; Beling, P. Deep learning detecting fraud in credit card transactions. In Proceedings of the 2018 Systems and Information Engineering Design Symposium (SIEDS), Charlottesville, VA, USA, 27 April 2018; pp. 129–134. [Google Scholar]
Wiese, B.; Omlin, C. Credit card transactions, fraud detection, and machine learning: Modelling time with LSTM recurrent neural networks. In Innovations in Neural Information Paradigms and Applications; Springer: Berlin/Heidelberg, Germany, 2009; pp. 231–268. [Google Scholar]
Liu, G.; Tang, J.; Tian, Y.; Wang, J. Graph neural network for credit card fraud detection. In Proceedings of the 2021 International Conference on Cyber-Physical Social Intelligence (ICCSI), Beijing, China, 18–20 December 2021; pp. 1–6. [Google Scholar]
Jing, R.; Tian, H.; Zhou, G.; Zhang, X.; Zheng, X.; Zeng, D.D. A GNN-based Few-shot learning model on the Credit Card Fraud detection. In Proceedings of the 2021 IEEE 1st International Conference on Digital Twins and Parallel Intelligence (DTPI), Beijing, China, 15 July–15 August 2021; pp. 320–323. [Google Scholar]
Shi, F.; Zhao, C. Enhancing financial fraud detection with hierarchical graph attention networks: A study on integrating local and extensive structural information. Financ. Res. Lett. 2023, 58, 104458. [Google Scholar] [CrossRef]
Liu, C.; Sun, L.; Ao, X.; Feng, J.; He, Q.; Yang, H. Intention-aware heterogeneous graph attention networks for fraud transactions detection. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore, 14–18 August 2021; pp. 3280–3288. [Google Scholar]
Yang, Q. Toward responsible ai: An overview of federated learning for user-centered privacy-preserving computing. Acm Trans. Interact. Intell. Syst. (TiiS) 2021, 11, 1–22. [Google Scholar] [CrossRef]
Yang, W.; Zhang, Y.; Ye, K.; Li, L.; Xu, C.Z. Ffd: A federated learning based method for credit card fraud detection. In Big Data–BigData 2019, Proceedings of the 8th International Congress, Held as Part of the Services Conference Federation, SCF 2019, San Diego, CA, USA, 25–30 June 2019, Proceedings 8; Springer: Cham, Switzerland, 2019; pp. 18–32. [Google Scholar]
Abdul Salam, M.; Fouad, K.M.; Elbably, D.L.; Elsayed, S.M. Federated learning model for credit card fraud detection with data balancing techniques. Neural Comput. Appl. 2024, 36, 6231–6256. [Google Scholar] [CrossRef]
Zheng, W.; Yan, L.; Gou, C.; Wang, F.Y. Federated meta-learning for fraudulent credit card detection. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, Montreal, QC, Canada, 19–27 August 2021; pp. 4654–4660. [Google Scholar]
Byrd, D.; Polychroniadou, A. Differentially private secure multi-party computation for federated learning in financial applications. In Proceedings of the First ACM International Conference on AI in Finance, New York, NY, USA, 15–16 October 2020; pp. 1–9. [Google Scholar]
Kanamori, S.; Abe, T.; Ito, T.; Emura, K.; Wang, L.; Yamamoto, S.; Le, T.P.; Abe, K.; Kim, S.; Nojima, R.; et al. Privacy-preserving federated learning for detecting fraudulent financial transactions in japanese banks. J. Inf. Process. 2022, 30, 789–795. [Google Scholar] [CrossRef]
Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]

Figure 1. Dilated convolution computing framework. The edge lengths in the graph reflect the similarity between transaction records. In the Random Select Neighbors phase, we sort the neighbors by similarity and extract fixed-ranked neighbors to perform feature mixing.

Figure 2. Comparative class distribution in the 2018CN and 2023EU datasets.

Figure 3. Confusion matrix for federated learning with 4 clients. The black-framed matrix represents the performance before federated learning, and the red-framed matrix represents the performance after federated learning.

Figure 4. Confusion matrix for federated learning with 8 clients. The black-framed matrix represents the performance before federated learning, and the red-framed matrix represents the performance after federated learning.

Figure 5. Confusion matrix for federated learning with 12 clients. The black-framed matrix represents the performance before federated learning, and the red-framed matrix represents the performance after federated learning.

Figure 6. Confusion matrix for federated learning with 16 clients. The black-framed matrix represents the performance before federated learning, and the red-framed matrix represents the performance after federated learning.

Figure 7. Accuracy of Different Number of Attention Heads.

Figure 8. Accuracy of different number of dilated channels.

Figure 9. Accuracy of different dilated message weights.

Table 1. Benchmark models for traditional and federated learning-based credit card fraud detection.

Benchmark Model	Description
Logistic Regression	A statistical model well-suited for binary classification tasks. It estimates the probability of an event’s occurrence, making it ideal for linearly separable data.
K-Nearest Neighbors (KNN)	This instance-based learning algorithm classifies a new data point based on the majority class among its k nearest neighbors, effective in datasets with discernible clusters of similar data points.
Histogram-based Gradient Boosting Classifier (HGBC)	A robust ensemble method that leverages decision trees, built in a sequential error-correcting process to enhance prediction accuracy, particularly effective for extensive datasets.
Support Vector Machine (SVM)	An algorithm that identifies the optimal separating hyperplane in high-dimensional space, making it particularly effective when the feature space is larger than the sample size.
Random Forest Classifier	This technique builds multiple decision trees and integrates their outcomes either by averaging or majority vote, reducing overfitting and improving model accuracy.
AdaBoost Classifier	Utilizes a sequence of weak classifiers to build a strong classifier by focusing on the misclassified instances by previous models, dynamically adapting to the peculiarities of the data.
Multi-layer Perceptron Classifier (MLP)	A deep neural network with one or more hidden layers, capable of capturing complex non-linear relationships in data, suitable for intricate classification tasks.
FedProx	A federated learning approach that incorporates a proximal term in the aggregation process to handle data and system heterogeneity, stabilizing training over disparate and partial datasets.
Personalized Federated Learning (Personalised FL)	This technique adapts federated learning models to individual users or devices by allowing local deviations from the global model, optimizing performance for unique data characteristics.
FedAMP	Enhances federated learning by using an adaptive parameter to aggregate updates effectively, improving convergence and performance in non-IID data scenarios.
FedFomo	A federated learning strategy that uses a regret-based mechanism to prioritize client updates, optimizing the learning process by evaluating potential against actual updates.
Adaptive Personalized Federated Learning (APFL)	A federated learning model that combines local and global updates to tailor personalization, improving performance through individualized adjustments.
pFedMe	Employs a Moreau envelope-based regularization in personalized federated learning to optimize personal models for each client, enhancing convergence and personalization across diverse data.
Agnostic Personalized Private Learning (APPLE)	Integrates differential privacy with personalized model learning, allowing for privacy-preserving adaptations to diverse client data distributions.

Table 2. Parameter Configuration.

Parameters	Parameter Values
seed	42
communication rounds	300
local epochs	10
dropout	0.3
batch size	64
node degree	3
learning rate	0.01
alpha	0.4
no. attention head	8
no. dilated channels	2
optimizer	Adam
weight decay	1 × 10⁻⁴
loss function	BinaryCrossEntropyLoss

Table 3. Comparison with state-of-the-art benchmark models.

	2018CN		2023EU
	Accuracy	ROC-AUC	Accuracy	ROC-AUC
Logistic Regression	0.9984	0.7698	0.9652	0.9648
KNN	0.9988	0.8674	0.9898	0.9897
HGBC	0.9985	0.7635	0.9813	0.9812
SVM	0.9993	0.8398	0.9614	0.9618
Random Forest	0.9982	0.6745	0.9318	0.9316
AdaBoost Classifier	0.9992	0.7719	0.9565	0.9567
MLP	0.9992	0.7841	0.9976	0.9979
FedProx	0.9989	0.8100	0.5005	0.5583
Personalised FL	0.9976	0.5000	0.9395	0.9596
FedAMP	0.9980	0.5927	0.5001	0.5000
FedFomo	0.9996	0.8581	0.9270	0.9826
APFL	0.9983	0.7030	0.9028	0.9622
pFedMe	0.9977	0.8509	0.5003	0.9291
APPLE	0.9989	0.7389	0.9495	0.9832
FedGAT-DCNN	0.9851	0.9712	0.9889	0.9992

Table 4. The performance under different node degree.

	Degree = 3		Degree = 5		Degree = 7
	Accuracy	ROC-AUC	Accuracy	ROC-AUC	Accuracy	ROC-AUC
2018CN	0.9851	0.9712	0.9794	0.9701	0.9770	0.9565
2023EU	0.9828	0.9988	0.9889	0.9992	0.9882	0.9991

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, M.; Walsh, J. FedGAT-DCNN: Advanced Credit Card Fraud Detection Using Federated Learning, Graph Attention Networks, and Dilated Convolutions. Electronics 2024, 13, 3169. https://doi.org/10.3390/electronics13163169

AMA Style

Li M, Walsh J. FedGAT-DCNN: Advanced Credit Card Fraud Detection Using Federated Learning, Graph Attention Networks, and Dilated Convolutions. Electronics. 2024; 13(16):3169. https://doi.org/10.3390/electronics13163169

Chicago/Turabian Style

Li, Mengqiu, and John Walsh. 2024. "FedGAT-DCNN: Advanced Credit Card Fraud Detection Using Federated Learning, Graph Attention Networks, and Dilated Convolutions" Electronics 13, no. 16: 3169. https://doi.org/10.3390/electronics13163169

APA Style

Li, M., & Walsh, J. (2024). FedGAT-DCNN: Advanced Credit Card Fraud Detection Using Federated Learning, Graph Attention Networks, and Dilated Convolutions. Electronics, 13(16), 3169. https://doi.org/10.3390/electronics13163169

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

FedGAT-DCNN: Advanced Credit Card Fraud Detection Using Federated Learning, Graph Attention Networks, and Dilated Convolutions

Abstract

1. Introduction

2. Related Work

2.1. Machine Learning and Deep Learning Methods in Credit Card Fraud Detection

2.2. Advancements in Graph Neural Networks for Fraud Detection

2.3. Federated Learning for Credit Card Fraud Detection

2.4. Comparative Analysis with State-of-the-Art Methods

3. Methodology

3.1. Transaction Similarity Graph

3.1.1. Graph Construction

3.1.2. Advantages over Traditional Methods

3.1.3. Implementation Details

3.2. Dilated Convolutional Network

3.2.1. Concept and Advantages

3.2.2. Computing Dilated Convolutional Embedding

3.3. Graph Attention Network

3.3.1. Concept and Advantages

3.3.2. GAT Layer

3.3.3. Multi-Head Attention

3.3.4. Benefits in Fraud Detection

3.4. Integration with FedProx

3.4.1. FedProx Algorithm

3.4.2. Integration with Graph-Based Model

3.4.3. Benefits for Fraud Detection

3.5. Time Complexity Analysis

4. Experiment

4.1. Experimental Settings

4.1.1. Benchmark Models

4.1.2. Training Device and Parameter Configuration

4.2. Dataset Overview and Data Segmentation Strategy

4.3. Comparison with Benchmark Models

4.4. Ablation Study

4.4.1. Ablation Study on Different Number of Clients

4.4.2. Ablation Study on Different Node Degrees in the Transaction Similarity Graph

4.4.3. Ablation Study on Different Number of Attention Heads

4.4.4. Ablation Study on Different Number of Dilated Channels and Dilated Message Weights

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI