1. Introduction
Blood, as one of the most crucial biological fluids in organisms, possesses unique characteristics in carrying biological and genetic information that other bodily fluids lack. It has found broad applications and achieved significant research outcomes in various fields such as biopharmaceuticals, species detection, and forensic exploration. Given the distinctive nature of blood products, countries have successively established corresponding legal regulations to combat illegal activities such as the smuggling of blood products. Consequently, the rapid identification of illicit blood products during import and export has become an urgent issue requiring resolution. Researchers such as H. Inouel [
1] have successfully differentiated blood samples from primates and non-primates using High-Performance Liquid Chromatography (HPLC). Although this approach has yielded remarkable results, it is essential to note that it involves complex preprocessing of blood samples and demands a high level of experimental environment control. On another front, Espinoza [
2] and colleagues have achieved significant classification results in distinguishing blood samples from birds, reptiles, and mammals using Mass Spectrometry (MS). Additionally, Dalton [
3] and others have employed DNA analysis technology in identifying blood samples from wild animals, particularly playing a crucial role in combating poaching activities. However, both of these approaches impose high requirements on sample quality and necessitate specialized knowledge during the detection process. Consequently, the three mentioned detection methods cannot provide a quick and non-destructive inspection of samples during import and export. Overcoming the challenges of non-destructive and rapid detection for large sample datasets has become a pressing research direction that urgently needs breakthroughs.
Raman spectroscopy is a method employed for collecting spectroscopic data, acquiring specific information by recording spectral lines induced by molecular vibrations in the sample under optical excitation. Due to the diverse molecular structures of different substances, Raman spectra exhibit unique characteristics. Its advantages, including convenience, speed, and non-destructiveness, have led to widespread applications and high acclaim in disciplines such as medicine [
4,
5], chemistry [
6,
7,
8], and biology [
9,
10]. It is noteworthy that peak information plays a crucial role in spectral data, containing key details about the molecular structure and chemical composition of the sample. In the early 1970s, scholars such as Goheen [
11] delved into the impact of artificial red blood cell membrane peripheral proteins on Raman spectroscopy, achieving significant research outcomes. However, the equipment and detection methods of that time were relatively primitive, and the detection process was cumbersome, limiting its applicability in practical work environments. Subsequently, researchers gradually deepened their exploration of Raman spectroscopy in blood. It was not until the 21st century that Raman spectroscopy found extensive application in blood data analysis. In 2008, Saade [
12] and colleagues successfully employed a near-infrared Raman spectroscopy approach combining Principal Component Analysis (PCA) with Mahalanobis distance to identify Hepatitis C virus in human serum, effectively classifying 24 blood samples. This experiment validated the feasibility of machine learning in Raman spectroscopy detection of blood, although the study did not incorporate preprocessing operations such as denoising on the blood dataset, potentially rendering the experimental results susceptible to noise and other factors. Simultaneously, overreliance on feature extraction methods similar to Principal Component Analysis (PCA) entails the risk of human intervention, potentially causing the omission of critical microscopic feature information in spectral data. In 2018, Kyle C. Doty [
13] and colleagues successfully utilized Partial Least Squares Discriminant Analysis (PLS-DA) to distinguish blood from 17 different organisms, including humans, providing crucial assistance for forensic investigations at crime scenes and paving the way for future research into differentiating blood data from various organisms. Subsequently, researchers such as Wang [
14] applied the Support Vector Machine (SVM) method to successfully inspect the blood spectral data of four avian species, offering a novel solution for analyzing the presence of food additives in blood. However, traditional machine learning algorithms often fail to achieve the expected results when handling large sample datasets. Hence, the search for a convenient, rapid, and precise Raman spectroscopy detection method becomes imperative.
With the rapid development in the field of artificial intelligence, deep learning techniques have found extensive applications across various disciplines, including Raman spectroscopy classification. The application of deep learning in Raman spectroscopy classification has yielded significant research outcomes. Currently, mainstream spectral classification methods primarily involve the utilization of one-dimensional feedforward neural networks for Raman spectral feature extraction and classification. Building upon this foundation, Dong et al. [
15] successfully devised a one-dimensional convolutional neural network for distinguishing between human and animal blood, achieving efficient classification of human, dog, and rabbit blood with an accuracy of 96.33%. In their study, the authors emphasized the importance of data denoising and baseline correction and introduced unique modules to implement these steps. In comparison to traditional machine learning methods such as Support Vector Machines (SVM) and Partial Least Squares Discriminant Analysis (PLSDA), this research indicates that convolutional neural networks exhibit superior performance. This study provides valuable insights for further exploration in related fields. Huang et al. [
16] designed a hierarchical convolutional neural network, achieving an average accuracy of 97% in a blind test involving 20 different animal species. Additionally, Chen et al. [
17] combined a convolutional neural network with the Stochastic Gradient Descent (SGD) optimizer, achieving significant results in differentiating 19 types of blood in experiments. The model in this study demonstrated a recognition accuracy as high as 98.79%, establishing a solid foundation for research in trace blood analysis. These research achievements offer robust support for subsequent scholars engaged in trace blood studies. It is noteworthy that, in the aforementioned three experiments, no manual extraction of feature information from the data was employed. This further underscores the superiority of combining deep learning with Raman spectroscopy. From the experimental results, this approach not only yielded significant research outcomes but also exhibited superior performance compared to traditional machine learning methods. Therefore, the integration of deep learning with Raman spectroscopy presents a viable approach to overcoming challenges in the non-destructive and rapid detection of large-sample data.
Convolutional operations play a crucial role in deep learning models. However, the significant variability in peak widths observed in different Raman spectral datasets poses a challenge for traditional single-sized convolutional kernels to adequately capture information across various peak widths. Therefore, one of the primary challenges in the current field of spectral classification is achieving compatibility of convolutional kernels with information from peaks of different widths. In this context, the adoption of a multi-scale convolutional kernel strategy becomes imperative. To address this issue, Ding [
18] and colleagues successfully designed a multi-scale convolutional neural network. This model adopts a cascaded hierarchical structure, utilizing three convolutional kernels of different sizes to extract more refined feature information from input spectral data, achieving a classification accuracy of 96.77%. This effectively demonstrates compatibility with peak information of different widths. The study substantiates the rationality of employing multi-scale convolutional kernels for feature extraction in Raman spectra and provides valuable directions for future research. Similarly, Den [
19] and collaborators proposed an adaptive-scale deep learning model. This model achieved accuracies of 86.7% and 98% in the classification of 30 isolates and eight empirical treatment tasks, respectively, in bacterial Raman spectral classification. The performance of this model confirms the superiority of multi-scale models over both single convolutional kernel deep learning models and traditional machine learning algorithms. These studies offer directions for overcoming the challenge of effectively capturing information from peaks of different widths. However, the introduction of multi-scale models inevitably increases the model parameters, posing challenges for deployment on certain portable devices. Additionally, Raman spectra, as a type of remote sensing data with coherent information features, may suffer from a loss of feature coherence to some extent during traditional convolutional operations. Consequently, preserving feature coherence during the process of extracting feature information becomes a crucial problem that urgently needs addressing. To tackle the aforementioned issues, we propose a model named RepDwNet. This model effectively integrates local information without increasing model parameters. Experimental results demonstrate that the RepDwNet model achieves classification-balanced accuracies of 97.17% and 97.31% on transmissive and reflective blood Raman spectral datasets, respectively, showcasing its outstanding performance in blood Raman spectral classification tasks.
3. Results
3.1. Model Training
This study implemented the described model using the Python programming language and utilized core modules implemented with major frameworks such as PyTorch and scikit-learn. The model falls into the category of a classification model, and it is crucial to judiciously select a loss function and optimizer for fine-tuning its weights to achieve optimal performance. In our experiments, we employed the Adam optimizer for optimizing model parameters. Specifically, we set the learning rate of the Adam optimizer to 0.001, while configuring the exponential decay rates for the first-moment estimate () as 0.9 and the second-moment estimate () as 0.999. Furthermore, we applied an exponential decay strategy to adjust the model’s learning rate, with a decay factor of 0.95.
Taking into consideration the multi-class nature of the model, we employed the technique of one-hot encoding in conjunction with the cross-entropy loss function to quantify the disparity between the model’s predictions and the actual labels. This evaluative approach has demonstrated remarkable efficacy when addressing tasks involving multiple classification categories.
The experimentation was conducted on a server equipped with the NVIDIA GeForce RTX 4090 GPU which purchased in Shanghai, China. Throughout the experimental procedure, a batch size of 128 was employed for input data, and the model underwent 80 iterations of training. Additionally, an early stopping strategy was implemented, whereby training ceased if the model’s accuracy failed to exhibit improvement over a consecutive span of 10 epochs. Ultimately, the optimal model weight parameters, reflecting superior performance during the experimentation, were preserved for subsequent evaluative purposes. The fluctuations in metrics such as loss and accuracy during the model training process are illustrated in
Figure 8.
3.2. Model Evaluation
Accuracy is one of the most intuitive metrics for assessing model performance. However, when dealing with imbalanced datasets, traditional calculation methods are prone to causing the model to overly focus on the majority class, resulting in a bias in performance metrics. To provide a more objective evaluation of model performance, this study employs balanced accuracy for analysis. Balanced accuracy aims to address class imbalance issues by calculating the average sensitivity (recall) for each class and assigning equal weight to each class. It is worth noting that balanced accuracy shares conceptual similarities with macro-recall, especially in multi-class tasks, where their formulas are identical. To clearly distinguish between the two, this paper introduces the adjusted balanced accuracy, whose formula is presented in Equation (
7). The adjusted balanced accuracy incorporates adjustments for randomness in the results, ensuring that a random performance scores as 0 and a perfect performance scores as 1.
Additionally, considering the presence of data imbalance in the dataset, this study employs evaluation metrics such as precision, recall, and F1-Score to comprehensively assess the model performance. Precision reflects the accuracy of positive predictions, with its value indicating the proportion of correctly predicted positive instances among all instances predicted as positive. As for recall, an increase in its numerical value signifies a stronger ability of the model to detect true positive instances. F1-Score, on the other hand, integrates both precision and recall, providing a balanced measure of the overall model performance. During the evaluation process, given the relatively limited number of true instances for certain species, the corresponding sample size in the test set is also constrained. Therefore, we adopt a macro-average approach to calculate the relevant performance metrics to maintain the comprehensiveness of the evaluation. Specifically, the formulas for computing the performance metrics are provided in Equations (
8)–(
10).
In the equations, , , and denote true positives, false positives, and false negatives, respectively.
During the experimental process, we employed functions provided by the sklearn library to calculate relevant performance metrics. This toolkit afforded us the necessary support, enabling the effective evaluation of experimental outcomes.
3.3. Model Performance
To enhance the reliability of model outcomes and mitigate the stochasticity during the model training process, this study employed a 10-fold cross-validation methodology for conducting multiple experiments while controlling the variables associated with model parameters. Throughout the experimental procedure, for each cross-validation iteration, we recorded the model’s performance metrics on the test dataset, including accuracy, precision, recall, and F1 score. A detailed depiction of the model’s performance evaluation results on two distinct datasets is presented in
Figure 9. Notably, across the transmissive and reflective Raman spectroscopy datasets, the model exhibited remarkable performance, achieving balanced accuracies of 97.17% and 97.31%, respectively. It is noteworthy that, while maintaining elevated precision and recall rates, the model demonstrated significant stability. Specifically, precision values of 97.80% and 97.70%, along with recall rates of 97.27% and 97.40%, were recorded. The culmination of these high-caliber evaluation metrics signifies that the model is adept not only at accurately predicting positive samples but also at effectively identifying a substantial proportion of positive instances within the samples. Furthermore, the model achieved F1 scores of 97.09% and 97.45% on the transmissive and reflective datasets, respectively, further underscoring the well-balanced equilibrium achieved between recall and precision.
The confusion matrix is widely regarded as a valuable tool for intuitively demonstrating the classification performance of a model on different categories. In
Figure 10, the results of the model’s classification predictions for two distinct test samples are presented. In this figure, the vertical axis represents the actual categories of the samples, while the horizontal axis represents the predicted sample categories. It is noteworthy that in the transmissive blood dataset, the model performs relatively poorly in classifying Syrmaticus Reevesii compared to other species. We attribute this outcome primarily to the higher requirements of this species for sample quantity and quality. Despite some degree of data augmentation, the model still struggles to fully learn more effective feature information. However, in the classification of other species, the model demonstrates excellent performance. More detailed confusion matrix data can be obtained from
Tables S3 and S4 in the Supplementary Information. To further assess the sensitivity and specificity of the model’s classification results, we employ Receiver Operating Characteristic curves (ROC curves) for a more in-depth examination, as specifically presented in
Figure 11. Through analysis of this figure, it becomes clear that the RepDwNet model exhibits outstanding classification efficiency in the small-sized sample dataset after data augmentation. Combining the aforementioned classification results and analysis, we have reason to believe that the classification performance of RepDWNet on the two blood datasets is reliable.
3.4. Comparison with Other Classification Methods
To assess the performance of Raman RepDwNet in Raman spectroscopy classification, we conducted a comparative analysis with other network classification models proposed by scholars in the last two years. These models include a one-dimensional VGG Raman spectroscopy classification network proposed by Sang [
27], a model combining LSTM with convolutional neural networks proposed by Bratchenkoa [
28], a one-dimensional AlexNet Raman spectroscopy classification network designed by Zhang [
29], as well as a pure multi-head attention mechanism network adopted by Liu et al. [
30] and an adaptive multi-scale convolutional neural network designed by Deng et al. [
19]. In the process of experimental comparison, we utilized the same experimental dataset and employed a 10-fold cross-validation method. To achieve optimal classification predictive performance, we fine-tuned each model.
According to the experimental results in
Table 4, RepDWNet demonstrates unique advantages in extracting blood spectral features compared to models designed by scholars such as Song, Zhang, and Liu. On both the transmissive blood dataset and reflective blood dataset, RepDWNet achieves classification-balanced accuracies of 97.17% and 97.31%, respectively. This indicates that even when facing imbalanced datasets, RepDWNet can still make accurate predictions for the majority of cases. Furthermore, the model has achieved breakthroughs in terms of parameter size and inference speed. In comparison to models designed by Bratchenko et al. [
28], although RepDWNet exhibits a slightly lower inference speed, it demonstrates relatively high classification performance under a smaller memory footprint. In addition, despite not achieving a significant breakthrough in classification performance compared to the network designed by Deng et al. [
19], as mentioned earlier, RepDWNet has not addressed the parameter growth issue caused by multi-scale models. While RepDWNet experiences a slight decline in classification performance, this decrease contributes to the model’s characteristics of being more lightweight and having a faster inference speed. Therefore, compared to other Raman spectroscopy classification networks on the market, RepDWNet possesses unique model performance advantages. It successfully maintains high classification performance while reducing the model’s parameter size and improving inference speed.
In addition, this study conducted a subtle evaluation of machine learning algorithms commonly used for Raman spectroscopy classification. As evidenced by the comparative results in
Table 5, RepDWNet demonstrates outstanding classification performance compared to the traditional Partial Least Squares (PLS) algorithm. Furthermore, an empirical comparison was performed between RepDWNet and the combination of Principal Component Analysis (PCA) with Support Vector Machine(SVM). However, the PCA+SVM algorithm exhibited a noticeable overfitting phenomenon on the test set, despite achieving significant effectiveness on the training and validation sets. We attribute this to the possibility that PCA+SVM learned inappropriate information, during the process of acquiring blood spectral feature information. Therefore, we chose not to present the experimental results of PCA+SVM in the table. Consequently, from a comprehensive perspective, RepDWNet demonstrates unique superiority in extracting feature information from blood Raman spectra.
4. Conclusions
In this study, we introduce RepDWNet, a lightweight Raman spectroscopy classification network. This model combines multiple-scale convolutional kernels while maintaining a smaller model parameter size and faster inference speed. To address the coherence of Raman spectroscopy, we incorporate residual connections to facilitate inter-layer information transfer. Simultaneously, to enhance the model’s suitability for portable devices, we employ result reparameterization techniques, rendering the model more concise and expediting the inference speed. Furthermore, to better capture the intrinsic features of Raman spectra, data augmentation techniques are applied to augment two imbalanced datasets. Comprehensive experimental results demonstrate that data augmentation operations, coupled with RepDWNet, yield significant balanced accuracy on both transmissive and reflective blood datasets, achieving 97.17% and 97.31%, respectively. Through ablation experiments, we observe that, for Raman spectroscopy classification, larger convolutional kernels may outperform smaller ones, a direction we aim to thoroughly validate in future investigations. Finally, the Raman spectroscopy augmentation techniques and RepDWNet proposed in this paper can be extended to other spectral classification domains.