1. Introduction
Emergency medicine commonly uses ultrasound (US) imaging for its portability and real-time capabilities to diagnose various injury types. A typical example is the detection of foreign bodies lodged in soft tissues following high-energy penetrating trauma such as gunshots or blast explosions [
1,
2]. The retention of foreign bodies, if undetected and unmonitored can lead to pain, infection, or vessel and nerve laceration. However, the majority of cases can be handled conservatively [
3,
4], and, in the context of large-scale warfare, remain in combat operation. It is not the presence of the shrapnel itself, as much as its threat to nearby neurovascular structures that determine the severity and urgency of the injury.
Identification of the presence of these foreign bodies, such as fragments of shrapnel, requires advanced training for the end user. Clear image acquisition typically requires knowledge of various injury types and anatomical landmarks, patient positioning, and hours of image analysis experience to properly identify shrapnel within the field of view. Image interpretation can therefore be complicated and localization of the shrapnel with respect to other vital anatomical landmarks is essential to effectively and correctly triaging patients.
Machine learning algorithms to aid in US image interpretation have been developed for a variety of applications, including detection of tumors or nodules [
5,
6] and classification of lung pathologies seen in COVID-19 [
7]. Previous work from our lab has developed a classifier model to successfully detect the presence of shrapnel in phantom tissue, as well as swine tissue [
8,
9] and compare its performance to conventional models trained on ImageNet datasets [
10]. While the conventional classification models and our model (ShrapML) produced high performance metrics, identification of the precise location of shrapnel within the individual US images was still difficult.
Objection detection models can be deployed to detect instances of visual objects in digital images (either static or video), classify that object, and produce a bounding box around the object. These deep learning detection algorithms typically fall into one of two classes: dual-stage (R-CNN, Faster R-CNN [
11,
12,
13]) and single-stage (YOLO [
14,
15,
16,
17], SSD [
18,
19]). While dual-stage methods typically achieve higher detection accuracy, these models are slower and computationally intensive. The end goal for a shrapnel detection model is eventual integration into an ultrasound instrument for real-time detection. For this reason, a single-stage detector was identified (YOLOv3 [
16]) and applied in this work.
Here, we developed an object detection model for the detection of shrapnel in phantom tissue as well as with typical neurovascular landmarks (vein, artery, and nerve fiber). To highlight the use case for object detection models in ultrasound imaging applications, we develop a triage metric for quantifying shrapnel proximity to neurovascular landmarks as identification of shrapnel and its proximity to neurovascular structures is a critical triage tool for the eventual integration into real-time ultrasound hardware.
2. Materials and Methods
2.1. Fabrication of a Tissue Phantom Mold
A previously developed tissue phantom design was based on dimensions of a human adult male thigh and modified for these experiments [
9]. The mold was composed of three distinct compartments: bone, muscle layer, and a subcutaneous fat layer. The bone had a diameter of 26 mm, the muscle layer was 94 mm, and the fat layer was 120 mm. The 3D models were designed using computer aided design software (Autodesk Inventor, San Rafael, CA, USA) and fabricated using a fused deposition modeling 3D printer (Raise3D, Irvine, CA, USA) with polylactic acid (Raise3D, Irvine, CA, USA) filament. The muscle layer mold was two pieces which snapped together and contained a recess for the nylon bone to slot in the center. The mold was fitted with a lid and held closed with vise-grips and sealing tape (McMaster-Carr, Elmhurst, IL, USA). Similarly, the outer fat layer was designed with a recess for the bone and components that snap together.
2.2. Construction of Gelatin Tissue Phantom
The inner layer (muscle) was 10% (w/v) gelatin (Fisher Scientific, Fair Lawn, NJ, USA) dissolved in 2:1 solution of water and evaporated milk (v/v) (Costco, Seattle, WA, USA) and 0.25% (w/v) flour (H-E-B, San Antonio, TX, USA). The water and milk solution were heated to approximately 45 °C, to increase gelatin solubility. To mimic heterogenous properties seen in ultrasound views of muscle, agarose (Fisher Scientific, Fair Lawn, NJ, USA) components were also incorporated. Two subsets of 2% agarose solutions were produced, one with 0.80% flour and the second with 0.10% flour, to include both brighter and darker echogenic components to the muscle layer, respectively.
The tissue phantom was created sequentially so the inner layer (muscle) solidified first around the bone and was removed from the mold and placed within the outer layer (fat) mold. This allowed the outer layer to solidify around the inner layer and bone. The outer layer (fat) was the same gelatin solution with 0.10% (w/v) flour. Solidification occurred at 4 °C for approximately 1 h for each layer. Afterwards, the completed tissue phantom was removed from the mold and used for shrapnel insertion and ultrasound imaging.
2.3. Incorporation of Vascular Vessels and Nerves
A modified phantom design was also used for this study which incorporated vein, artery, and nerve fiber features within the tissue phantom. This was performed after the base phantom model was constructed. First, the base phantom was divided into quarters and depending on bone positioning, the individual quarters were labeled as a right or left leg. One of the quarters that had vein, artery, and nerve simulating a right leg, contained shallow vessels near the interface between the fat and muscle layers, while the second quarter contained slightly deeper vessels fully in the muscle layer. The same methodology applied to the quarters that simulated the left leg. Then, the artery channels were created using a circular 8 mm biopsy punch (McMaster-Carr, Elmhurst, IL, USA) and oval-shaped biopsy punch with a minor axis of 6.5 mm for the vein. The oval-shaped biopsy punch was manually augmented from an 8 mm circular biopsy punch (McMaster-Carr, Elmhurst, IL, USA) by mechanical compression. Next, the nerve cavity was created lateral to the artery with the same biopsy punch as the vein and was filled with 15% gelatin made in 2:1 (v/v) water to evaporated milk ratio solution with 0.5% (w/v) flour. The ultrasound probe was aligned transversely for views of both right leg and left leg orientation. When the nerves were solidified, the phantom was considered ready for use with shrapnel insertion and ultrasound imaging.
2.4. Ultrasound Imaging with Shrapnel
All imaging was performed under water with the HF50 probe (Fujifilm, Bothell, WA, USA) from the Sonosite Edge ultrasound system (Fujifilm, Bothell, WA, USA). For both phantom designs, baseline images were obtained as frames from 15 s ultrasound video clips. For shrapnel fragments, we used a 2.5 mm diameter brass rod (McMaster Carr, Elmhurst, IL, USA) cut into fragments of varying length ranging from 2 mm to 10 mm. Shrapnel was inserted using surgical forceps at varying depths. For the modified, neurovascular phantom shrapnel was placed at similar depths but care was taken to avoid the shrapnel hitting the vein, artery, or nerve. Ultrasound imaging for shrapnel positive phantoms followed the same process as was performed for baseline imaging. Only out of plane images were used for the modified phantom so that neurovascular features could be identified. Both in plane and out of plane images were used for the base phantoms.
2.5. Image Processing and Bounding Boxes
Frames of all video clips were extracted using an implementation of FFmpeg with a Ruby script, yielding 90 individual frames per video. Duplicate frames were removed, and all images were processed with MATLAB’s image processing toolbox (MathWorks, Natick, MA, USA) in which a function was written to crop images to remove ultrasound settings from view and then resize them to 512 × 512 × 3. MATLAB also was used for the manual addition of bounding boxes to all images for all the objects. Individual rectangular boxes were drawn enclosing the smallest area around the shrapnel, vein, artery, or nerve (n = 6734 images base phantom; n = 10,777 images modified neurovascular phantom).
2.6. ShrapOD Architecture
The object detection model, ShrapOD, used a SqueezeNet neural network backbone [
20] with modifications to include YOLOv3 object detection heads [
16], as shown in
Figure 1. This network architecture was built based on MATLAB-provided object detection code [
21]. The feature extraction network in SqueezeNet was modified to use an image input layer of (512 × 512 × 3) followed by a convolution block containing a convolution layer with ReLU activation and max pooling layer. This is followed by 4 Fire blocks prior to the network splitting to integrate the YOLOv3 object detection heads. Fire modules, per the SqueezeNet architecture [
20], comprised a single convolution squeeze layer (1 × 1, ReLU activator) followed by expanding layers consisting of a mixture of (1 × 1) and (3 × 3) convolutional layers in parallel to increase the depth and width for higher detection accuracy. These parallel layers are concatenated prior to the next layer in the network architecture to reduce the number of model parameters. Five additional Fire blocks are used on the YOLOv3 class output layer pathway, followed by a convolutional layer with batch normalization and ReLU activation (left pathway,
Figure 1).
An additional output layer was used for bounding box predictions in which the network was fused after the Fire block 9 concatenation step with Fire block 8 with an additional convolutional block for feature resizing. The model contained a final concatenation layer and convolution block to align the predicted bounding box coordinates to the output image (right pathway,
Figure 1). The YOLOv3 network also used optimized anchor boxes to help the network predict boxes more accurately [
21].
2.7. ShrapOD Training Overview
Model training was performed using MATLAB R2022b with the deep-learning and machine-learning toolboxes for the base phantom and then repeated for the modified, neurovascular phantom. For the base phantom use case, only images containing shrapnel were used. For the neurovascular phantom, images were taken from datasets with and without shrapnel. Images were cropped to remove ultrasound file information, sized to 512 × 512 × 3 and then datasets were split into 75% training, 10% validation and 15% testing quantities. Augmentation of the training datasets included random X/Y axis reflection, +/− 20% scaling, and +/− 360° rotation (
Figure 2). These augmentation steps were written into a function that also applied it to the bounding box data. Validation and testing set images were not augmented. Training was performed using a stochastic gradient descent with momentum (SGDM) solver, 23 anchors, 125 epochs, L2 regularization of 0.0005, with a penalty threshold of less than 0.5 Intersection over Union (IoU), validation frequency of 79 iterations, and an image batch size of 16 images. The learning rate started at 0.001 and, after a warmup period of 1000 iterations, began a scheduled slowdown by
. Training parameters were adapted from MATLAB object detection example code [
21]. All training was performed using the CPU on an HP workstation (Hewlett-Packard, Palo Alto, CA, USA) running Windows 10 Pro (Microsoft, Redmond, WA, USA) and an Intel Xeon W-2123 (3.6 GHz, 4 core, San Clara, CA, USA) processor with 64 GB RAM.
2.8. Evaluating ShrapOD Performance
After ShrapOD model training, blind test (15%) images were used to measure model performance. For the ShrapOD model trained on the original phantom image sets (shrapnel only object class), 1010 images were used for testing while 1617 images were used in the multi-object trained model from the neurovascular phantom image sets. Predictions were compared to ground truth images to generate precision-recall curves using the evaluateDetectionPrecision function in MATLAB. The area under the precision-recall curve was found for determining average precision (AP) [
22,
23]. For intersection over union (IoU), a bboxoverlay function (MATLAB) was used for all test images. While calculating IoU scores, true positive (TP) counts were identified as having a prediction and ground truth with an IoU score greater than or equal to 0.50. False positive (FP) and false negative (FN) counts were based on this same IoU criteria of 0.50 for when no prediction exceeded this threshold and a ground truth was present or there was a prediction without a ground truth, respectively. Additionally, false positives were counted when multiple predictions for a single ground truth were detected. Precision, recall, and F1 scores were then calculated with this IoU gating of 0.50. Mean IoU (mIoU) scores were calculated across each object class, and, for the multi-object model, mean AP (mAP) and an average mIoU across all object classes was determined.
2.9. Triage Metric Measurement
For a medical use-case for object detection prediction, a triage metric score was developed that tracked the smallest distance between shrapnel and vein, artery, or nerve predictions (
Figure 3). After test predictions were acquired, images were filtered to select only predictions where shrapnel, vein, artery, and nerve predictions were present. For images with multiple predictions for a category, only the highest confidence prediction window was used. Midpoints for each point were calculated by adding ½ the width and height to the X
1 and Y
1 coordinate, respectively (
Figure 3A). Next, distances between shrapnel and vein, artery, and nerve were calculated using Equation (1).
In Equation (1),
and
refer to midpoint coordinates for the shrapnel prediction window, and
and
refer to midpoint coordinates for the vein, artery, or nerve prediction windows (
Figure 3B). Three distances were calculated, in pixel units, and the minimum distance was selected as the measurement for the triage metric for each ultrasound image. Results were compiled across filtered test images for further analysis.
4. Discussion
Ultrasound imaging is frequently used during emergency and battlefield medicine, thanks to its high portability and immediate nature. Interpreting images and using that information to prioritize care in resource limited situations remains a challenge that AI can potentially mitigate. Simple image classification algorithms have been developed to provide binary decisions from an ultrasound image, but that does not provide enough granularity in some use cases where the proximity of the tumor or shrapnel to other anatomical structures may be critical for decision making in the next steps for medical care. Here, we have highlighted how an object detection framework for shrapnel and multi-class tracking in ultrasound imaging can be used for assisting in emergency or military medicine triage, potentially avoiding unnecessary evacuation of the majority of shrapnel casualties from the battlefield.
First, we show that the YOLO object detection framework was able to be successfully trained for tracking only shrapnel in a base phantom design that we have previously used for developing image classification algorithms [
8,
10]. The tissue phantom allows for collection of shrapnel images at different locations, sizes, orientations across multiple phantoms created specifically for this purpose, introducing subject variability in the data set [
9]. This is an ideal starting spot for deep learning model development and is possible for use with real-time deployment of the object detection model. Overall, the ShrapOD model was successful for detecting the approximate shrapnel location with a true positive rate of 87% and a mIOU of 0.65. With larger, more diverse image sets, these scores can likely be improved. The performance error is split evenly between false positives (6%) and false negatives (7%), suggesting it was missing shrapnel in the phantom as frequently as it was misidentifying complexities in the tissue phantom as shrapnel. However, more training and improvement on the object detection model was not performed in this work as this was an initial proof-of-concept use case.
Instead, we added physiological features to the tissue phantom by introducing a vein, artery, and nerve fiber bundle to mimic key neurovascular complexities. The original phantom mimics fat, muscle, and bone tissue in a thigh, so these neurovascular components are logical additions to more closely mimic anatomical composition of human tissue. Furthermore, this anatomical complexity provided a challenging, multi-class scenario for the object detection model. Training on these image sets resulted in stronger model performance for all four classes, with a mAP of 0.94. mIoU for the vein, artery, and nerve surpassed 0.7 while shrapnel was similar to the single class model at 0.68. Shrapnel is intentionally more irregularly present in the phantom due to different imaging angles compared to the other features, which likely contribute to the lower mIoU score. Similarly, like in real tissue, the nerve fiber bundle is more similar than the artery and vein in terms of echogenic properties to the tissue bulk, and, as anticipated has a lower mIoU score (0.71) when compared to the vein (0.75) and artery (0.78). While ShrapOD was successful at tracking multiple classes, this likely will require additional transfer learning to successfully predict in animal or human image cases. Supplementing phantom image sets with animal images was previously successful for training shrapnel classification models. However, the neurovascular features in the phantom are more spaced out than often found in real tissue, due to limitations in phantom design which may impact transfer-learning for this model.
For medical applications, especially in emergency situations, simply identifying where shrapnel is in tissue still leaves a subjective decision for a medical provider to make—remove the shrapnel or not. As this decision often relies on proximity to key vital structures—mostly neurovascular structures—the triage metric introduced in this work provides a practical use case for object detection. This was defined as the minimal distance from the center shrapnel prediction to the center of neurovascular features. This operation is simple enough that it could be paired with real time object detection model deployment to allow for proximities to key features to be automatically tracked. Further, while not evaluated in the current study, by calibrating this distance to a reference length, it can allow for real-life units of measurement instead of arbitrary pixel values. This metric can be further automated by setting an approximate distance for gating the distances against which alerts the end user. However, this would require additional clinical knowledge to determine a “critical threshold” of shrapnel distance from key structures that indicates an increased risk and may warrant surgical intervention.
One limitation for this triage use case is it solely relies on accurate tracking of up to four objects by the model—incorrect prediction of any prediction will result in incorrect triage distances, so fully automating will require a more robust image training set and rules for identifying poor prediction by the ShrapOD model. Another limitation is the triage metric is based on 2D ultrasound images, so objects may be out of plane from key neurovascular features not currently being tracked without 3D ultrasound information. In previous studies, 3D ultrasound volumes have been used for deep learning algorithms to more accurately track objects [
24,
25]. However, this metric is not meant to replace additional medical evaluation, but highlight at risk subjects, so the 2D tracking is likely sufficient. Lastly, all predictions are based on midpoints currently, which for larger vein diameters or more irregularly shaped shrapnel may misrepresent feature proximity. This can be improved by making measurements from each corner of the bounding boxes, or an image segmentation architectures may prove to be more applicable for improving on this metric [
26].
Next steps for this work will take four paths. First, customization to the hardware side to allow object detection and triage measurement to be performed in real-time. Second, the ShrapOD model must be transfer learned for use with ex vivo or live animal tissue for further evaluation and for inclusion of pulsatile flow in the artery as it may impact object detection. Further, additional tissue and object tracking, such as proximity to joints, will be critical as this work is optimized for use in humans and translated into swine. Third, ultrasound image interpretation is only one half of the challenge when using ultrasound in military medicine; the other half of the challenge is proper image acquisition by non-expert users. This work will investigate further AI and robotic platforms to further automate the ultrasound process to make this technology more suitable for frontline application where resources and personnel are limited. Lastly, object detection algorithms can be valuable for other ultrasound applications so work will be conducted to develop AI models for other emergency medicine applications, such as eFAST (extended focused assessment with sonography in trauma) exams [
27,
28] for identifying pneumothorax [
29], hemothorax, and abdominal hemorrhage.