Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Measurement Equivalence and Feasibility of the Electronic and Paper Versions of the POSAS, EQ-5D, and DLQI: A Randomized Crossover Trial

Eur. Burn J. 2024, 5(4), 321-334; https://doi.org/10.3390/ebj5040030

by Jill Meirte^1,2,*

, Nick Hellemans³, Ulrike Van Daele^1,2

, Koen Maertens^2,4, Lenie Denteneer^1,5

, Mieke Anthonissen²

and Peter Moortgat²

Reviewer 1: Anonymous

Reviewer 2:

Paul Kind

Eur. Burn J. 2024, 5(4), 321-334; https://doi.org/10.3390/ebj5040030

Submission received: 28 July 2024 / Revised: 7 September 2024 / Accepted: 25 September 2024 / Published: 11 October 2024

(This article belongs to the Special Issue Person-Centered and Family-Centered Care Following Burn Injuries)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Thank you for the opportunity to review the manuscript, Measurement equivalence and feasibility of the electronic and paper versions of the POSAS, EQ-5D and DLQI: a randomized crossover trial. This is a well-written article describing a nicely designed study that will have high practical quality for the readership of this journal. A few clarifications are requested to enhance the impact:

1. In the introduction, the authors state that most PROMs were created and validated on paper, and that moving them to electronic administration could alter the clinical properities and psychometric properties cannot be assumed. This seems to be the idea that the article is written to address, and yet I wondered about this theoretical idea. In the conclusion (p 5, lines 340-347) the authors review literature that PROMs administered on paper typically yield equivalent results as electronic. The reader is left wondering if the current study was truly needed. I recommend the authors add in some citations to the introduction to set up the idea that PROMs may not always yield the same results electronically or through paper to better demonstrate the gap in the literature that this study is addressing.

Method

2. In the method section 2.4, please include more about the psychometric properties of each of the measures using a parallel structure for each measure. For example, each metric should include the name, construct it measures, number of items, examples of response set, valence (e.g., high means worse/better), some statements about reliability and validation sample. In particular, I would examine the test-retest reliability of each measure and explore if readministering within 15 minutes would be expected to impact any responses.

Results

1. Was there any other medical data available about the patients used in this sample, for example, sizes of their scars or time since injuries? It’s okay if not but if so and this could be examined, I wonder if that would have any variation with the DLQI question on treatment.

Minor:

1. There is a confusing line on pages 2 lines 149-152, wherein authors state that all of the research procedure was conducted electronically; however, I believe this is untrue and part of the study included pen-and-paper administration so please review this line.

2. The term VAS should be defined first time used.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Measurement equivalence and feasibility of the electronic and 2 paper versions of the POSAS, EQ-5D and DLQI: a randomized 3 crossover trial

This topic is increasingly relevant given the growing interest in quantifying health outcomes. This paper represents the convergence of 3 important elements – firstly the concept of PRO itself, secondly the characteristics of (selected) measurement systems and lastly, the determination of equivalence. Consideration of technical/practical feasibility is in many ways, dependent on these priors.

Let’s accept the opening statement, patient reported outcome measures (PROMs) are crucial within person-centered care, but in so doing let us also recall that a PRO measure is defined as “any report of the status of a patient's health condition that comes directly from the patient without interpretation of the patient's response by a clinician or anyone else”. The term PRO is itself problematic since we typically measure health status at a point in time and outcomes are identified by repeating that measurement at a later stage.

The principal feature of a PRO is that measurements come DIRECTLY from the patient. What we see on the page/screen by way of patient response to a single item fits this definition perfectly. What we as researchers/clinicians do next is often to “convert” that response into a more amenable format. Hence a 4-point categorical item with text response levels is assigned a corresponding numerical “value” 1 to 4. This violates the central PRO property – the score is no longer the patient’s response. The implication of adopting such a scoring system is to assert that “Very much” [4] has twice the weight of “A little” [2]. This issue extends to assigning equal value to all items that share such a scoring system, for example, that itchy, sore, painful skin has identical value as being embarrassed or self-conscious. All this leads to inappropriate reporting of mean/SD values for what are effectively ordinal responses (DQLI).

This is a general challenge to all PRO metrics, whether they are paper-based or electronic. You may feel that this is to labour the point, but it has major relevance in this present study. Equivalence is tested here by comparing summary scores (based predominantly on non-patient value systems). The central consideration is surely much simpler. Given 2 modes of administration, do patients give identical responses when completing a health status questionnaire? Do they tick “very much” for a single DQLI item on both measures. If we rely on numeric “values”, then it is easy to see that an “error” on one item may be cancelled out by a reverse “error” on another, yielding identical summary scores.

Although POSAS uses a 10-point rating scale, the same limitation applies to its summary score. Variation in gains/losses are obscured through item aggregation. Furthermore, correlation analysis focuses on group-level performance. Equivalence testing needs to start with a within-respondent examination at item level.

Given the centrality of equivalence in this study, it seems remarkable that this is summarised (page 7) in a single sentence “Mean scores of the paper and electronic PROMS were computed and there were no significant differences found.”

POSAS and DQLI systems include several single item questions. Table 2 exemplifies the problem in relying on summary scores. The Treatment item in DQLI records mean scores of 0.56 and 0.69 (paper/electronic) with the latter being 23% greater. This difference is more or less compensated for by the scores for Daily Activities. The DQLI sum scores differ by less than 1%.

It is the performance on individual items that matters when determining equivalence and the reader is entitled to see this. What would be really useful would be to show (say) contingency tables (paper vs electronic) which will graphically demonstrate the nature of differences (within patient) between modes. This would suffice for DQLI at least. A similar scatterplot of ratings for the 6 POSAS items could be similarly informative.

Analysis and reporting of EQ-5D data similarly needs more careful attention. EQ-5D consists of 5 items, each with 5 levels of response. It is essential that each dimension/item is reported separately (as is the case with DQLI and POSAS).

It is essential too, that the EQ-VAS is fully reported. Given the 15-minute washout between tests, this could be quite useful as a sort of internal consistency check. The EQ-5D prompt to report a value for their health status TODAY ought to evoke a stable response. If this response is not identical, then this could perhaps indicate a less “stable” patient? Equally important to test (maybe using a simple sign test) is whether EQ-VAS differs systematically with MOA. In addition, it would also be useful to know if any differences emerge when paper EQ-VAS is first/second.

One final note on EQ-5D, the index score format based on population social preference weights is designed for use in economic evaluation rather than clinical applications. It is, without doubt, NOT a patient-reported index. Its use here needs to be so qualified. The source of the scoring algorithm and its intended use should be included in the text.

Whilst this reviewer recognises the widespread use of ISPOR guidance on equivalence testing, such guidance ought not to act as a limiting factor or as a replacement for a commonsense approach to analysis.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Thank you for the opportunity to review the revised manuscript, "Measurement equivalence and feasibility of the electronic and paper versions of the POSAS, EQ-5D and DLQI: a randomized crossover trial". The authors were very responsive to feedback and they have made excellent revisions to the manuscript. Their revisions have resulted in a product that is now ready for publication.

Article Menu

Measurement Equivalence and Feasibility of the Electronic and Paper Versions of the POSAS, EQ-5D, and DLQI: A Randomized Crossover Trial

Further Information

Guidelines

MDPI Initiatives

Follow MDPI