Harmonization for Parkinson’s Disease Multi-Dataset T1 MRI Morphometry Classification
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsIn this paper, the author used the machine learning methods to harmonize the Parkinson’s Disease Multi-Dataset T1 MRI Morphometry Classification. The manuscript has the following issues.
1. An important factor affecting the accuracy of machine learning is the accuracy of T1 MRI diagnosis. Have these samples been previously diagnosed by multiple doctors? Is the result consistent? Is the progression of the disease consistent?
2. In the early stages of PD, the changes in brain regions are very small. Can suitable cases be found as machine learning cases? How were these early cases diagnosed? How can this model be used for early diagnosis of PD? How accurate is it?
Author Response
We thank reviewer 1 for their effort in discussing our work. Our paper, “Harmonization for Parkinson’s Disease Multi-Dataset T1 MRI Morphometry Classification”, provides an example application of data harmonization in a machine learning pipeline, as well as associated pitfalls, specifically for detecting Parkinson’s disease in morphometric data. We addressed and updated our paper according to the reviewer’s responses:
Comment 1: An important factor affecting the accuracy of machine learning is the accuracy of T1 MRI diagnosis. Have these samples been previously diagnosed by multiple doctors? Is the result consistent? Is the progression of the disease consistent?”
Response 1: Diagnosis of Parkinson’s disease is usually based on clinical symptoms by trained neurologists, response to dopamine treatment, and ruling out similar diseases based on an obtained T1 MRI. The gold standard for confirming Parkinson’s disease is by a DaTscan showing dopamine deficiency in the basal ganglia, though only the PPMI study used this. We acknowledge that the lack of a unified diagnostic criteria between datasets could also possibly be a cause of the batch effects during our analysis.
Other external public datasets were diagnosed according to their noted documentation.
- According to PPMI documentation, “Patients must have at least two of the following: resting tremor, bradykinesia, rigidity (must have either resting tremor or bradykinesia); OR either asymmetric resting tremor or asymmetric bradykinesia” with further confirmation based on a DaTscan.
- According to documentation provided in Badea 2017, the Neurocon cohort was diagnosed based on EFNS/MDS-ES criterion for Parkinson’s disease diagnosis.
- In documentation provided in Badea 2017, the Tao Wu dataset is described as having early to moderate stages of the disease according to the Hoehn-Yahr staging, though exact information on diagnosis is not given.
- Cases from within our Institute were diagnosed by Movement Disorders Specialists based on the UK Parkinson’s Disease Society Brain Bank Clinical Diagnosis Criteria (Hughes et al. 2001) . .
We note progression of disease remains highly heterogeneous as well. For an analysis of the progression, we would have need longitudinal data. This is a cross-sectional study.
In the revised version, we included information in the discussion about issues of misdiagnosis and different diagnoses criteria as contributing to batch effects we see impacting our classifier.
The inclusion criteria from each cohort described in response to this comment are now included in supplementary materials.
Comment 2: In the early stages of PD, the changes in brain regions are very small. Can suitable cases be found as machine learning cases? How were these early cases diagnosed? How can this model be used for early diagnosis of PD? How accurate is it?
Response 2: Early cases were determined based on years after diagnosis or either UPDRS rating or Hoehn-Yahr staging, as appropriate. We note that PPMI subjects, as well as Neurocon and Tau Wu were early-stage PD subjects, while subjects internal to NIH were early to mid-stage subjects. The model was not effective for early diagnosis of PD for all combined datasets. We note we can train and overfit to predict well for single scanner classifiers, but that the lack of generalization remains an issue, suggesting that early-stage differences found in PD are likely to have very small effect sizes across multiple cohorts.
This model cannot be used for diagnosis of PD, and the accuracy remains well below usage for clinical work. We note instead some inflated metrics by either purposefully picking a single scanner dataset that produces the best AUC, or by incorrectly configuring a harmonization step (our NeuroComBat model) to attempt to harmonize for the group effect as a neurobiological site effect.
Our goal is to highlight the difficulties with the methodology available today to obtain an diagnosis based on the T1 MRI in early stages of Parkinson’s disease. It is a call of caution on how to harmonize datasets, and to be aware of the pitfalls and needs before clinical use.
Reviewer 2 Report
Comments and Suggestions for Authorsneurosci 3208452-Recommendation-Accept in present form.
The work is scholarly, well written, well thought out and important clinically.
The authors have harmonized meta data from different MRI scanners and protocols. A significantly large number of patients are included in their studies. More than adequate controls are clear to this reviewer.
The authors have tackled the problem of elucidating neurodegeneration at least in terms of one of the three deadliest dementia ones, Parkinson's disease. .
Classification procedures are discussed. ComBat is compared with Jacobian and so on.
The Supplementary Data Section is impressive.
The work deserves a high recommendation to publish in NeuroSci.
Minor details should not override this carefully studied contribution from these authors.
Thanks for inviting me to review this fine manuscript as it is most timely.
Comments on the Quality of English Language
NA
Author Response
We thank reviewer 2 for their effort in discussing our work. Our paper, “Harmonization for Parkinson’s Disease Multi-Dataset T1 MRI Morphometry Classification”, provides an example application of data harmonization in a machine learning pipeline, as well as associated pitfalls, specifically for detecting Parkinson’s disease in morphometric data. We are grateful for their acceptance of the work.
Reviewer 3 Report
Comments and Suggestions for AuthorsI read with great interest the paper of Saqib et al. I would recommend several improvements before considering publication:
- only scanners with at least 10 healthy volunteerswere included and this could lead to the exclusion of relevant data. a discussion on this decision would be beneficial
- what was the scanning protocol and the image characteristics (MRI parameters) for the sequences used? was it similar across the different MRI machines? does this impact the results?
- were other potentially relevant imaging features such as cortical thickness, volume or functional connectivity considered?
- were other harmonization methods considered and tested in comparison for efficiency?
- it would be beneficial to elaborate on why the particular classifiers were chosen over others
- the discussions in the dedicated section are well driven, however, i would consider talking about alternative methods to mitigate the limitations; also, it would be beneficial to dive deeper into the neurobiological interpretations aand broaden the comparisons to other literature studies (especially regarding deep learning)
Respectfully submitted,
Author Response
We thank reviewer 3 for their effort in discussing our work. Our paper, “Harmonization for Parkinson’s Disease Multi-Dataset T1 MRI Morphometry Classification”, provides an example application of data harmonization in a machine learning pipeline, as well as associated pitfalls, specifically for detecting Parkinson’s disease in morphometric data. We will address and update our paper according to the reviewer’s responses:
Comment 1: - Only scanners with at least 10 healthy volunteers were included and this could lead to the exclusion of relevant data. a discussion on this decision would be beneficial.
Response 1: We note this as a trade-off to the specific harmonization technique we used. The PPMI dataset, where this restriction applied to, was imbalanced and had far more PD subjects compared to HV. We found in initial experiments, that NeuroComBat model parameters were unstable for scanners with only a few scans. We also note that as a practicality that creating useful cross-validation splits that balance between scanners requires sufficient PD and HV subjects.
We have included information on how our method is limited to scanners with sufficient representation in the training set, and how other methods in the field could represent future directions of inquiry.
Comment 2: - What was the scanning protocol and the image characteristics (MRI parameters) for the sequences used? was it similar across the different MRI machines? does this impact the results?
Response 2: Sequence information are now provided in the supplemental information. PPMI scanners use reportedly harmonized MPRAGE parameters. The other cohorts use their own set parameters which are within the range of what was harmonized for PPMI.
Comment 3: - Were other potentially relevant imaging features such as cortical thickness, volume or functional connectivity considered?
Response 3:
Our analysis includes imaging features such as cortical thickness and volume which are derived from the T1 images, and we refer to as morphometric measures. We did not include functional connectivity in this analysis. Structural data is commonly collected for clinical purposes and thus might be a more readily available tool to use.
The morphometric measures included determinant of Jacobian as well as FreeSurfer value measures. FreeSurfer value measures were values such as cortical thickness, and brain volume for specific parcels of a subject brain. As per our supplemental materials, we normalized specific volume features by brain volume. Of note, an initial analysis we found showed that total brain volume had a significant group effect, reflecting other literature, so we included that value during training.
Comment 4: - Were other harmonization methods considered and tested in comparison for efficiency?
Response 4: We primarily considered NeuroCombat as it is the primary algorithm for harmonization of morphometric data in the field. Our other analyses with differing harmonization methods showed similar results and were excluded for brevity.
We note in our discussion that further investigation of other ComBat variants could be warranted in the future.
Comment 5: - It would be beneficial to elaborate on why the particular classifiers were chosen over others
Response 5: In the revised version, we included in the discussion relevant information on choice of classifiers. We note our dataset size remains relatively small, which forced us to use relatively simpler algorithms such as logistic regression, random forest, and support vector machines. Our use of XGBoost as a popular boosting algorithm for random forest was the most advanced algorithm we attempted to use.
We include information about why we use “simpler” classifiers, as our limited dataset size prevents us from originally considering deep learning approaches.
Comment 6: - The discussions in the dedicated section are well driven, however, i would consider talking about alternative methods to mitigate the limitations; also, it would be beneficial to dive deeper into the neurobiological interpretations aand broaden the comparisons to other literature studies (especially regarding deep learning)
Response 6: We thank the reviewer for this suggestion. We accordingly revised the discussion.
- Bega D, Kuo PH, Chalkidou A, et al. Clinical utility of DaTscan in patients with suspected Parkinsonian syndrome: a systematic review and meta-analysis. npj Parkinsons Dis. 2021;7(1):1-8. doi:10.1038/s41531-021-00185-8
Round 2
Reviewer 1 Report
Comments and Suggestions for Authorsnone