1. Introduction
Automated visual and acoustic monitoring methods for birds can provide information about the presence and the number of bird species [
1] or individuals [
2] in certain areas, but analyzing the physiological conditions of individual birds allows us to understand potential causes of negative population trends. For example, measuring the physiological stress of birds can serve as a valuable early warning indicator for conservation efforts. The physiological conditions and the stress of birds can be determined in several ways, e.g., by assessing the body weight or the fat and muscle scores in migratory birds [
3,
4]. Other frequently used methods are investigating the parasite loads, measuring the heart rates, and measuring the levels of circulating stress hormones, such as corticosterone [
5,
6,
7,
8,
9]. Depending on the research questions studied, these methods can be a good choice for assessing long-term stress or the investment in immunity.
The method investigated in this article comprises analyzing blood smears and counting blood cells [
10]. Not only are white blood cells, i.e., leukocytes, an important part of the immune system of vertebrates such as mammals or birds, but also the composition of leukocytes is known to change in response to elevated stress hormones (glucocorticoids) and can, therefore, be used to assess stress levels [
10]. In particular, the ratio of heterophils to lymphocytes (H/L ratio) is considered to be a well-established stress index for assessing long-term stress in birds [
10,
11]. Since the H/L ratio changes only 30 to 60 min after the onset of an acute stress event, it is possible to measure stress without mirroring the influence of the capture event [
12]. It is also possible to calculate the leukocyte concentration (leukocytes per 10,000 erythrocytes) or the concentration of specific leukocyte cell types for gaining an understanding of the current health status of a bird and the investment in immunity [
13,
14,
15].
Leukocyte counts are quite cost-effective since they do not require complex laboratory techniques. However, evaluation under the microscope often requires manual interpretation by human experts, is time-consuming, and can only assess small portions of the entire smear. Typically, leukocytes are counted until 100 leukocytes are reached [
16]. Consequently, the counted values and subsequent ratios are not always reproducible, and the result depends on the section counted. Furthermore, the method is prone to observer errors. Therefore, there is an urgent need for automated methods to perform leukocyte counts in avian blood smears.
Bird and human blood cell analysis have some aspects in common. The counted leukocytes are similar: lymphocytes, eosinophils, basophils, and monocytes can be found in mammalian as well as avian blood. However, there are some significant differences that make the automated counting of avian blood cells more difficult [
17]. The neutrophils in human blood are equivalent to heterophils in birds. One of the main differences between bird and human blood, however, is the presence of nuclei in bird erythrocytes (i.e., red blood cells) and thrombocytes, whereas there is no nucleus in mammalian erythrocytes and thrombocytes [
17]. The presence of a nucleus in erythrocytes makes the cell identification process more complicated since lysed and ruptured erythrocytes can be mistaken for other cell types. Lastly, during ornithological field studies, the bird blood samples are usually not taken in a sterile environment, leading to dirt contaminating the smears. Such contaminants and stain remnants can further lead to confusion. Because of these differences from human blood and the associated challenges, it is necessary to develop dedicated solutions instead of relying on existing machine learning approaches for human blood analysis to automatically analyze bird blood samples.
A solid understanding of the different leukocyte types is necessary when analyzing avian blood samples since some are quite similar to each other.
Figure 1 shows examples of each blood cell type as well as two challenging anomalies, i.e., stain remnants and lysed cells (
Figure 1d) as well as ruptured cells (
Figure 1h). Lymphocytes appear small, round, and blue in blood smears and are most common in passerine birds. Their nuclei usually take up more than 90% of the cell (see
Figure 1b). Heterophils can be identified by their lobed cell nuclei and rod-shaped granules in the cytoplasm, as shown in
Figure 1e. In birds, heterophils and lymphocytes make up approximately 80% of the leukocytes [
18]. Eosinophils are similar to heterophils but have round granules (see
Figure 1c). Basophils can be recognized by their purple-staining granules, as shown in
Figure 1f, but they are rare to find. Monocytes are larger cells that can be confused with lymphocytes, but their nucleus often has a kidney-shaped appearance and takes up only up to 75% of the cell (see
Figure 1g) [
19]. Additionally, it is important to be aware of possible variations regarding the morphology and staining characteristics of these cell types between different avian species, which may affect their identification and interpretation.
Avian blood counts are still mostly obtained manually. However, there are several approaches for more systematic, automated ways of counting avian blood cells. For instance, Meechart et al. (2020) [
20] developed a simple computer vision algorithm based on Otsu’s thresholding method [
21] to automatically segment and count erythrocytes in chicken blood samples. Beaufrère et al. (2013) [
22] used image cytometry, i.e., the analysis of blood in microscopy images, in combination with the open-source software CellProfiler [
23,
24] to classify each cell using handcrafted features as well as machine learning algorithms. However, they stated their results were not satisfactory.
Another way of automating avian blood counts is the use of hardware devices for blood analysis. For example, the Abbott Cell-Dyn 3500 hematology analyzer [
25] (Abbott, Abbott Park, IL, USA) was used in studies analyzing chicken blood samples [
26,
27]. The Cell-Dyn 3500 works on whole blood samples and relies on flow cytometry, i.e., the analysis of a stream of cells by a laser stream and electric impedance measurements. The device was standardized for poultry blood.
The CellaVision
® DC-1 analyzer [
28] (CellaVision AB, Lund, Sweden) scans blood smears and pre-classifies erythrocytes as well as leukocytes. In combination with the proprietary CellaVision
® VET software [
29], the device can be used to analyze animal blood, including bird blood. However, the pre-classification results still need to be verified by a human expert. The device has a limited capacity of a single slide and is able to process roughly 10 slides per hour, according to the manufacturer [
28]. This throughput does not appear to reduce turnaround times in (human) blood analysis [
30]. Yet, in a distributed laboratory network, the device could indeed contribute to reduced turnaround times [
31].
In the last decade, deep learning models, in particular convolutional neural networks (CNNs), have become the state of the art in many computer vision tasks, such as image classification, object detection, and semantic segmentation. These deep neural networks are highly suitable for image processing since they can learn complex image features directly from the image data in an end-to-end manner. Apart from their success in natural image processing, they have also contributed to biological and medical imaging tasks, e.g., in cell detection and segmentation [
32,
33], blood sample diagnostics [
34,
35], histopathological sample diagnostics [
36], such as breast cancer detection [
37], and magnetic resonance imaging (MRI) analysis [
38].
However, only a few deep learning approaches are available for avian blood cell analysis. For instance, Govind et al. (2018) [
39] presented a system for automatically detecting and classifying avian erythrocytes in whole slide images. Initially, they extract optimal areas from the whole slide images for analyzing erythrocytes. In the first step, regions are chosen from low-resolution windows using a quadratic determinant analysis classifier. These optimal areas are then refined at higher resolution using an algorithm based on binary object sizes. This algorithm identifies overlapping cells that need to be split. The actual separation is conducted in a multi-step handcrafted algorithm. Intensity- and texture-based features are used to distinguish between erythrocytes and leukocytes, but the latter are not actually detected. In the final step, all detected erythrocytes, i.e., solitary and separated from clumps, are classified. This is the only part of the approach that relies on deep learning. Each detected cell is cropped and fed to a GoogLeNet deep neural network [
40]. The resulting model can classify the detected erythrocytes as mammalian, reptilian, or avian. Furthermore, the model can categorize erythrocytes into one of thirteen species. However, only one of these is a bird species.
Kittichai et al. (2021) [
41] used different CNN models to detect infections of an avian malaria parasite (
Plasmodium gallinaceum) in domestic chickens. Initially, a YOLOv3 [
42] deep learning model was used to detect erythrocytes in thin blood smear images. Then, four CNN architectures were employed for the classification of the detected cells to characterize the different avian malaria blood stages.
However, to the best of our knowledge, there is no hardware-independent and publicly available approach for the automated segmentation and classification of avian blood cells, i.e., erythrocytes as well as leukocytes.
In this article, we present a novel deep learning approach for the automated analysis of avian blood smears. It is based on two deep neural networks to automatically quantify avian red and white blood cells in whole slide images, i.e., digital images produced by scanning microscopic glass slides [
43]. The first neural network model determines image regions that are suitable for counting blood cells. The second neural network model performs instance segmentation to detect blood cells in the determined image regions. For both models, we investigate different neural network architectures and different backbone networks for feature extraction in cell instance segmentation. We provide an open-source software tool to automate and speed up blood cell counts in avian blood smears. We make the annotated dataset used in our work publicly available, along with the trained neural network models and source code [
44]. In this way, we enable ornithologists and other interested researchers to build on our work.
3. Results
3.1. Tile Selection
The models were evaluated on a held-out dataset consisting of 298 positive and 346 negative examples.
As
Table 2 shows, both of the models performed very well with accuracies and F1 scores above 96%. The smaller version, i.e., EfficientNet-B0, performed better with an accuracy of 97.5% and an F1 score of 97.3%.
3.2. Detection and Segmentation
First, we performed the training with no augmentation at all. The results are summarized in
Table 3 for the detection task. Adding the default data augmentation of CondInst, i.e., random horizontal flipping, improved the results by roughly 1.9% in terms of mAP. The application of further data augmentation, namely, random vertical flipping, random brightness, random contrast, and random saturation, again improved the score by roughly 2.8%. If we instead applied horizontal flipping and elastic deformations, the models still achieved an mAP of 87.7%. Combining all data augmentation methods in one model resulted in the best model, achieving 98.9% for erythrocytes, 90.2% for lymphocytes, 87.3% for eosinophils, and 86.3% for heterophils in terms of AP and, hence, an mAP score of 90.7%. Overall, combining all data augmentation methods resulted in an improvement of roughly 5.2% in terms of mAP.
The results of the instance segmentation shown in
Table 4 are similar to those obtained for the corresponding bounding box detections. Each additional data augmentation step increased the mAP scores, and the best result was achieved by applying all random data augmentation techniques. However, the best model was not as dominant as in the detection task and could not outperform the other approaches in every category.
We observe that erythrocytes were continuously recognized almost perfectly with 98.9% and 99.0% AP for detection and segmentation, respectively. Thus, the model learned to not confuse thrombocytes or immature erythrocytes with erythrocytes. However, there was also an obvious drop in performance for all leukocyte subclasses. On the one hand, erythrocytes were easiest to identify because of the characteristic nucleus, and on the other hand, they were by far the most frequent cell type in avian blood samples, i.e., roughly 98% of all instances in our dataset. Therefore, the model could learn better features from this large set of samples. Among the leukocytes, this trend was evident as well. The most frequent leukocyte class, i.e., lymphocytes, still achieved 90.0% in terms of AP, while eosinophils and heterophils achieved 87.3% and 85.2%, respectively. Thus, there appears to be a correlation between the number of training samples and performance, which is, however, not statistically significant.
To further analyze the relation of precision and recall in more detail, we plotted the precision–recall curve of our best model in
Figure 7. The graph of the erythrocyte (blue line) class is almost perfect, as expected, with an AP and, hence, an area under the curve of 99%. The graphs for the other classes start descending sooner, but by choosing the best threshold, in our case 0.5, a good balance between precision and recall could be achieved.
CondInst with a smaller backbone, namely ResNet-50, performed very well but could not compete with the model based on ResNet-101. In comparison, the performance deteriorated by roughly 2.5% in terms of mAP. However, the anchor-based Mask R-CNN approach using a ResNet-101 backbone showed a clear drop in performance of roughly 6.8% compared to the anchor-free CondInst approach using the identical backbone.
We did not have enough samples of basophils and monocytes for a comprehensive evaluation of their respective classes, but these samples could be aggregated into their superclass leukocytes. We trained a binary CondInst model that could classify avian blood cells into erythrocytes and leukocytes. As
Table 5 and
Table 6 show, our model could perform very well on this task, achieving more than 93% and 98.8% in terms of AP for leukocytes and erythrocytes, respectively. As before, the larger backbone, i.e., the ResNet-101, pushed the model to a better performance on leukocytes.
The mAP score regarding all leukocytes was higher than for any of the subclasses. Presumably, the multi-class model confused cell instances of the different subclasses.
3.3. Inference Runtimes
The inference runtimes for samples of different sizes are shown in
Table 7.
We included the largest whole slide image (i.e., sample 8_036) consisting of more than 19 billion pixels as well as the smallest sample (5_055) with only roughly 2.5 billion pixels. However, in addition to the size of the image, the fraction of actual countable tiles played a crucial role in the processing times. For the largest file (8_036) containing 97,200 tiles with a countable tiles fraction of roughly one-fourth, our approach took roughly 25 min. Yet, another sample (1_023) with only 91,059 tiles, but more than half of them classified as countable, took roughly 52 min. Processing the three smaller samples took less than 15 min each. In general, none of the selected images needed more than one hour to determine the cell counts in the corresponding blood smear. Depending on the mentioned factors, processing took mostly less than a tenth of a second for one countable tile, including tile selection, segmentation, and identification, as well as counting of the respective cell instances. In contrast, our human expert took an average of roughly two minutes to annotate a tile with labels and segmentation masks in our semi-automated setting.
4. Discussion
Our novel approach offers a proficient assessment of avian blood scans, which speeds up the workflow of blood cell counting significantly compared to the traditional method of visually counting on microscopes. Compared to existing hardware devices for automated blood analysis [
25,
28], which are usually quite expensive, our approach is freely available. Hence, we enable researchers who do not have access to such devices used in veterinarian laboratories to utilize an automated cell-counting method. The CellaVision
® DC-1 analyzer has been evaluated for mammalian, reptilian, and avian blood by comparing its pre-classification to the final results after review by veterinarians [
61]. The agreement was very good for neutrophils, heterophils, and lymphocytes (each > 90%) and good for monocytes (81%). However, eosinophils and basophils needed massive re-classification by human experts. Interestingly, while we agree that achieving good performance for basophils is a challenge, our model appears to be more reliable for eosinophils. However, we could not evaluate our model on monocytes that were recognized in a satisfactory way by the CellaVision
® DC-1. Moreover, our approach can be more efficient than hardware-based approaches. The DC-1 [
28] analyzer processes given slides sequentially, achieving a throughput of no more than roughly 10 slides an hour. Our approach allows users to scan slides with various methods, e.g., with microscope cameras or high throughput scanners, like the Leica Aperio AT2 Scanner [
47] with a capacity of 400 slides, as used in our study. The Leica Aperio AT2 Scanner can be used to digitize a large number of slides in a very time-efficient manner. Our approach can be arbitrarily scaled by processing several slide images in parallel and is only limited by the available hardware resources. Furthermore, our approach can handle low-quality blood smears because it has been trained under such conditions, while the CellaVision
® DC-1 analyzer is primarily designed for usage in veterinary laboratories. Moreover, because of its proprietary design, it is not possible to use custom training data to adapt the classification approach. Hence, regarding large numbers of avian blood data sampled in ornithological field studies, our approach opens new possibilities for bird-related research.
While our approach shows that it is feasible to automatically count not only red but also white avian blood cells with open-source software, it still has some downsides. Because of the low number of samples in our training set, our neural network model is not yet able to reliably recognize basophils or monocytes. Furthermore, the model is trained on a limited number of bird species. Because of potential variations in staining intensity, coloration, and cell morphology, it may be a challenge to detect cells of other bird species as reliably as for the given species [
62]. In particular, eosinophils may be quite different between bird species.
However, these issues indicate several areas for future work. The model performance can be further improved by extending the dataset in general and particularly for the rare classes, i.e., for basophils and monocytes. Instead of bluntly annotating more images that barely contain any of these cells, this can be done using an active learning approach, which reliably provides unlabeled images that contain these types of avian white blood cells. Moreover, it is a promising direction to generate more training samples by generative deep learning approaches, like GANs [
63], or image generation models based on latent diffusion [
64]. Furthermore, our approach can be extended to recognize and count blood parasites (e.g.,
Haemosporida and
Trypanosoma). Another interesting aspect is investigating and improving the generalization ability of our neural network model in cross-domain scenarios. This can include different techniques when creating blood smears for different bird species. We plan to include further bird species into our model, e.g., penguins.
Several studies have indicated that extreme ecological conditions can significantly increase hematocrit levels in birds. For example, a female great tit from the northernmost populations in Northern Finland showed a hematocrit level of 0.83 [
65]. This makes the blood viscous and leads to densely packed cells in the blood smear image, which can be challenging for automated counting approaches. Since we trained our model to count only areas matching human quality standards, we only counted tiles from the monolayer. Hence, a high hematocrit level may lead to significantly more rejected tiles. However, our approach is adaptable to new annotated data sources. Thus, providing our models with manually labeled images with high hematocrit levels in future training iterations will improve their ability to process and count cells with such rare conditions better. In general, our approach is based on open-source software. Therefore, the models can easily be adapted to other datasets or extended to recognize further cell types.
So far, our approach aims to automate the tedious task of manually counting avian blood cells. Furthermore, it eliminates inter-observer errors. However, it still counts cells only in the monolayer. Future work may expand the countable areas, as achieved by handcrafted feature algorithms [
39]. For a deep learning approach like ours, this can be achieved by training the model with data involving lower-quality areas. By learning useful features from the annotated samples, the resulting models may be capable of achieving superhuman performance.
Our deep learning model opens up new opportunities in ornithology and ecology for documenting and evaluating the stress levels and health conditions of bird populations and communities efficiently and can, therefore, be used as an early warning indicator to detect physiological changes within populations or communities even before a population declines. With this fast, reliable, and automated approach, even old collection samples may retrospectively be incorporated into modern ornithological research. Our approach is currently used in practice for research on the relative stress load of forest birds by automatically determining H/L ratios.