1. Introduction
Degenerative pathology of the spine such as osteoporotic fractures of vertebral bodies, spinal canal stenosis, spondylolisthesis, and other degenerative deformations of vertebrae are a common healthcare risk in the elderly population [
1]. Early detection of degenerative diseases may enable preventative measures to reduce the risk of chronic severe back pain and disability [
2]. In recent years, deep learning has emerged as a powerful tool for many medical applications. Advances in deep learning in computer-aided diagnosis have paved the way for more timely interventions and improved patient outcomes. In this study, we specifically focus on the intersection of deep learning and the diagnosis of vertebral body fractures.
So far, most deep learning-based approaches for classification of vertebral body fractures are trained on intensity images in an end-to-end manner. Even after the VerSe dataset/benchmark for segmentation and localization of vertebrae was introduced in 2019 and 2020 [
3,
4,
5], most introduced methods only rely on the localization of vertebrae instead of leveraging the available segmentation masks [
6,
7,
8]. The amount of available ground truth segmentation masks of vertebrae was further increased by the TotalSegmentator dataset [
9,
10]. To date, only few recently proposed methods [
11,
12] employ the information provided by spinal segmentation masks in diagnostic downstream tasks.
We identified three relevant challenges that are not sufficiently addressed by existing methods and have so far prevented a wider clinical adoption of vertebral fracture detection. (a) Image-based classifiers are prone to deterioration under domain shifts; i.e., they are limited in their adaptability for variations of the image intensity, scanner acquisition settings, and other shifts in the data distribution. Furthermore, 3D convolutional neural networks (CNN) trained in a fully-supervised setting tend to overfit and require a larger amount of labeled data of sufficient quality. (b) Surface-based geometric learning models [
13,
14] have so far been less expressive than 3D CNNs and so far do not achieve the required accuracy on limited labeled data. (c) Shape encoder–decoder architectures [
11,
12] may help to overcome the label scarcity by shifting a larger proportion of the training to an unsupervised setting for which even healthy spines can be employed, but they may still fail to learn representative latent space representations, due to an over-pronounced weight of the decoder (that is later discarded for the classification task).
We suggest making the following improvements: (a) Instead of training image-based classifiers, we propose to leverage information from a preceding segmentation model and directly operate on shape information of vertebrae (surface contour) for the classification task. This allows trained deep learning models to be independent of shifts in image intensities and moves the demand for labeled data away from the classification task. (b) Leveraging the vast amount of available segmentation masks, unsupervised learning can help geometric learning models overcome problems related to limited labeled data. (c) To ensure a more representative latent space representation, we introduce a novel decoder architecture that ensures that most learned parameters are located in the encoder model and thus relevant for the classification task.
Contributions: (1) We believe that the effectiveness of recently proposed auto-encoder (AE) models is limited due to the sub-optimal design of encoder–decoder components. Therefore, we perform an in-depth analysis of the effectiveness of various AE architectures for the diagnosis of vertebral body fractures. (2) For this purpose, we develop a modular encoder–decoder framework that allows arbitrary combinations of point- and image-based encoder and decoder architectures. (3) To address the problem of over-pronounced weights in the decoder, we designed a novel point-based shape decoder model that employs a differentiable sampling step to bridge the gap between point- and voxel-based solutions. (4) By performing extensive experiments to analyze various combinations of encoder and decoder architectures, we verified that the detection of vertebral fractures, which by definition is a geometric problem, is better solved using shape-based methods compared to image-intensity-based ones. (5) The results of our experiments furthermore demonstrate the particular advantages of employing our novel point-based shape decoder compared to other models.
Outline: In the subsequent
Section 2, we provide a comprehensive review of the related literature, emphasizing the existing disparity between image- and point-based analyses. Following this, our comprehensive shape-based classification framework (
Section 3.1), incorporating the proposed point-based shape decoder (
Section 3.2), is meticulously detailed in methods
Section 3. The experiments and results
Section 4 elaborates on our setup, encompassing data particulars and implementation details (
Section 4.1). Afterwards, the results of our experiments are presented in
Section 4.2 and
Section 4.3, followed by a thorough discussion of the outcomes in
Section 5 and a concise summary of our findings in
Section 6.
3. Methods
To investigate the potential of shape features for detecting vertebral fractures, we explore several encoder–decoder architectures and deploy their latent space vector for classification. In particular, we compare combinations of encoder and decoder trained to reconstruct shape features of vertebrae based on the surface of their respective segmentation mask. The training of our AE architecture comparison pipeline is split into two stages. In the first stage, we train our AE models in an unsupervised manner on a large-scale dataset without classification labels or a specific high occurrence of spine diseases to generate meaningful shape features in the latent space. During the second stage, we employ the generated shape features of the encoder (freezing the encoder layers) and train an MLP for detection of fractures based on these shape features on a smaller labeled dataset.
The problem setup can be defined as follows: Based on multi-label segmentation masks of vertebrae performed on a CT scan, extract patches
of the segmentation mask contour that is computed using an edge-filter. For geometric learning methods, we extract a point cloud representation
of
N 3D points by employing a threshold on the grid. Firstly, train an AE model comprising a shape encoder
f and a shape decoder
g on an unsupervised reconstruction task by minimizing the smooth L1-Loss of the Chamfer distance between the input
or
and the prediction
or
. Secondly, using the latent space vector
computed by the shape encoder
f as input, train an MLP as classifier
h to predict the fracture status (healthy or fractured) by minimizing the cross-entropy loss function between the target class
and the predicted class
. As encoders we employ 3 approaches, a simple convolutional encoder
, a point encoder
based on the PointNet architecture [
13] and a graph encoder
based on the DGCNN [
17]. As decoders, we employ 3 approaches, a convolutional decoder
, our point decoder
, and a novel point-based shape decoder
. The resulting framework thus comprises 9 combinations of encoder–decoder architectures. The main idea of the general framework for the employed image- or point-based vertebrae auto-encoder (AE) framework is depicted in
Figure 1 Section A. Our newly proposed decoder architecture is visualized in
Figure 2 Section B.
3.1. Architectural Building Blocks
■Convolutional encoder: Given a 3D surface patch
of depth
, height
, and width
, extracted from an automatic multi-label 3D CT segmentation [
9,
10] that is separated into binary individual vertebrae contours, we aim to encode the vertebral shape into a low-dimensional embedding
with
using a fully convolutional encoder network
parameterized by trainable weights
.
■Point encoder: Based on the 3D surface patches
, we extract a 3D point cloud
where
corresponds to the number of key points in the point cloud that are sampled from voxel representation by applying a threshold on the values in the voxel grid. Using a PointNet [
13] model
with weights
, we generate a low-dimensional embedding
with
.
■Graph encoder: As for the point encoder, the graph encoder utilizes an extracted 3D surface point cloud
for each individual vertebra. We employ a DGCNN [
17]
with parameter
for the k-Nearest Neighbor (kNN) graphs and weights
to compute an embedding
with
.
■Convolutional decoder: After generating the embedding , a convolutional decoder with weights is used to map back into . During training, is used to minimize a smooth L1-Loss function to reconstruct . The utilized convolutional decoder model is based on PixelShuffle layers.
■Point decoder: The aim of the point decoder is to map back to a 3D point cloud representation where corresponds to the number of points in . Similar to the PixelShuffle layers, this is achieved by subsequently transferring from network channels into the spatial dimension . In the last step, is set to 3 to predict . During training, we minimize a loss function based on the Chamfer distance between and
■Point-based shape decoder (pbShape decoder): Our point-based shape decoder can be described as a combination of point decoder and convolutional decoder. In a first step, is mapped to an embedding in shape of key points using a a 3-layer MLP . In a second step, a differentiable extrapolation is performed to map back into a volumetric representation . This allows us to then minimize a smooth L1-Loss function to reconstruct from . In contrast to the convolutional decoder and the point decoder, our point-based shape decoder relies on a considerably smaller amount of trainable parameters. A more detailed description of our proposed method is provided in the next section.
MLP: After training of the encoder–decoder models was performed in an unsupervised manner on a large dataset, a small MLP is trained on a smaller dataset with fracture labels to predict a binary fracture status from the embedding . For this step, the parameters of the previously trained encoder are fixed, and only the weights of the MLP are optimized using a cross-entropy loss function .
Data augmentation: To improve the generalizability of our models, we introduce affine augmentation on our input data. For encoder models that take a volumetric patch as input, the augmentation is performed before the forward pass during training on each newly drawn batch. For our geometric models, we also apply the affine augmentation on the volumetric patch after drawing the batch and sample a random set of key points from the augmented volumetric patch using a threshold.
We provide details about the capacity and computational complexity of our AE models in
Table 1. Our proposed pbShape decoder requires 600k fewer computations and has only 62k trainable parameters compared to its convolutional counterpart (155k). During the AE-model training, the encoders and decoders are combined, whereas, during training of the MLP classification, only the MLP parameters are adapted. During the test time of the classification, the encoder and MLP parameters are used without further adaptation.
Author Contributions
Conceptualization, H.H., A.B. and M.P.H.; data curation, H.H. and M.P.H.; formal analysis, H.H., A.B. and M.P.H.; funding acquisition, M.P.H.; investigation, H.H., A.B. and M.P.H.; methodology, H.H., A.B. and M.P.H.; project administration, M.P.H.; resources, H.H., A.B. and M.P.H.; software, H.H., A.B. and M.P.H.; supervision, M.P.H.; validation, H.H., A.B. and M.P.H.; visualization, H.H.; writing—original draft, H.H.; writing—review & editing, H.H., A.B. and M.P.H. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the German Federal Ministry of Education and Research grant number 01EC1908D.
Informed Consent Statement
Not applicable.
Data Availability Statement
Conflicts of Interest
Author Alexander Bigalke was employed by the company Dräger, Drägerwerk AG & Co. KGaA. The remaining authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
References
- Ballane, G.; Cauley, J.; Luckey, M.; El-Hajj Fuleihan, G. Worldwide prevalence and incidence of osteoporotic vertebral fractures. Osteoporos. Int. 2017, 28, 1531–1542. [Google Scholar] [CrossRef] [PubMed]
- Papaioannou, A.; Watts, N.B.; Kendler, D.L.; Yuen, C.K.; Adachi, J.D.; Ferko, N. Diagnosis and management of vertebral fractures in elderly adults. Am. J. Med. 2002, 113, 220–228. [Google Scholar] [CrossRef] [PubMed]
- Liebl, H.; Schinz, D.; Sekuboyina, A.; Malagutti, L.; Löffler, M.T.; Bayat, A.; El Husseini, M.; Tetteh, G.; Grau, K.; Niederreiter, E.; et al. A computed tomography vertebral segmentation dataset with anatomical variations and multi-vendor scanner data. Sci. Data 2021, 8, 284. [Google Scholar] [CrossRef] [PubMed]
- Löffler, M.T.; Sekuboyina, A.; Jacob, A.; Grau, A.L.; Scharr, A.; El Husseini, M.; Kallweit, M.; Zimmer, C.; Baum, T.; Kirschke, J.S. A vertebral segmentation dataset with fracture grading. Radiol. Artif. Intell. 2020, 2, e190138. [Google Scholar] [CrossRef] [PubMed]
- Sekuboyina, A.; Husseini, M.E.; Bayat, A.; Löffler, M.; Liebl, H.; Li, H.; Tetteh, G.; Kukačka, J.; Payer, C.; Štern, D.; et al. VerSe: A vertebrae labelling and segmentation benchmark for multi-detector CT images. Med. Image Anal. 2021, 73, 102166. [Google Scholar] [CrossRef] [PubMed]
- Nicolaes, J.; Raeymaeckers, S.; Robben, D.; Wilms, G.; Vandermeulen, D.; Libanati, C.; Debois, M. Detection of vertebral fractures in CT using 3D convolutional neural networks. In Proceedings of the Computational Methods and Clinical Applications for Spine Imaging: 6th International Workshop and Challenge, CSI 2019, Shenzhen, China, 17 October 2019; Springer: Cham, Switzerland, 2020; pp. 3–14. [Google Scholar]
- Yilmaz, E.B.; Buerger, C.; Fricke, T.; Sagar, M.M.R.; Peña, J.; Lorenz, C.; Glüer, C.C.; Meyer, C. Automated deep learning-based detection of osteoporotic fractures in CT images. In Proceedings of the Machine Learning in Medical Imaging: 12th International Workshop, MLMI 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, 27 September 2021; Springer: Cham, Switzerland, 2021; pp. 376–385. [Google Scholar]
- Zakharov, A.; Pisov, M.; Bukharaev, A.; Petraikin, A.; Morozov, S.; Gombolevskiy, V.; Belyaev, M. Interpretable vertebral fracture quantification via anchor-free landmarks localization. Med. Image Anal. 2023, 83, 102646. [Google Scholar] [CrossRef] [PubMed]
- Wasserthal, J.; Breit, H.C.; Meyer, M.T.; Pradella, M.; Hinck, D.; Sauter, A.W.; Heye, T.; Boll, D.T.; Cyriac, J.; Yang, S.; et al. TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images. Radiol. Artif. Intell. 2023, 5, e230024. [Google Scholar] [CrossRef] [PubMed]
- Isensee, F.; Jaeger, P.F.; Kohl, S.A.; Petersen, J.; Maier-Hein, K.H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 2021, 18, 203–211. [Google Scholar] [CrossRef]
- Husseini, M.; Sekuboyina, A.; Bayat, A.; Menze, B.H.; Loeffler, M.; Kirschke, J.S. Conditioned variational auto-encoder for detecting osteoporotic vertebral fractures. In Proceedings of the Computational Methods and Clinical Applications for Spine Imaging: 6th International Workshop and Challenge, CSI 2019, Shenzhen, China, 17 October 2019; Springer: Cham, Switzerland, 2020; pp. 29–38. [Google Scholar]
- Sekuboyina, A.; Rempfler, M.; Valentinitsch, A.; Loeffler, M.; Kirschke, J.S.; Menze, B.H. Probabilistic point cloud reconstructions for vertebral shape analysis. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, 13–17 October 2019; Proceedings, Part VI. Springer: Cham, Switzerland, 2019; pp. 375–383. [Google Scholar]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. (Tog) 2019, 38, 1–12. [Google Scholar] [CrossRef]
- Genant, H.K.; Wu, C.Y.; Van Kuijk, C.; Nevitt, M.C. Vertebral fracture assessment using a semiquantitative technique. J. Bone Miner. Res. 1993, 8, 1137–1148. [Google Scholar] [CrossRef] [PubMed]
- Husseini, M.; Sekuboyina, A.; Loeffler, M.; Navarro, F.; Menze, B.H.; Kirschke, J.S. Grading loss: A fracture grade-based metric loss for vertebral fracture detection. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, 4–8 October 2020; Proceedings, Part VI. Springer: Cham, Switzerland, 2020; pp. 733–742. [Google Scholar]
- Zhang, M.; Cui, Z.; Neumann, M.; Chen, Y. An end-to-end deep learning architecture for graph classification. Proc. AAAI Conf. Artif. Intell. 2018, 32, 4438–4445. [Google Scholar] [CrossRef]
- Huo, L.; Cai, B.; Liang, P.; Sun, Z.; Xiong, C.; Niu, C.; Song, B.; Cheng, E. Joint spinal centerline extraction and curvature estimation with row-wise classification and curve graph network. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, 27 September–1 October 2021; Proceedings, Part V. Springer: Cham, Switzerland, 2021; pp. 377–386. [Google Scholar]
- Bürgin, V.; Prevost, R.; Stollenga, M.F. Robust vertebra identification using simultaneous node and edge predicting Graph Neural Networks. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention: 26th International Conference, Vancouver, BC, Canada, 8–12 October 2023; Proceedings, Part IX. Springer: Cham, Switzerland, 2023; pp. 483–493. [Google Scholar]
- Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 2015. Available online: https://proceedings.neurips.cc/paper_files/paper/2015/hash/33ceb07bf4eeb3da587e268d663aba1a-Abstract.html (accessed on 14 February 2024).
- Heinrich, M.P.; Bigalke, A.; Großbröhmer, C.; Hansen, L. Chasing Clouds: Differentiable Volumetric Rasterisation of Point Clouds as a Highly Efficient and Accurate Loss for Large-Scale Deformable 3D Registration. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 8026–8036. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).