1. Introduction
Articular cartilage (AC) is the specialised tissue covering the ends of the long bones and providing a low-friction, lubricated and wear-resistant surface for articulation of the synovial joints during locomotion [
1,
2,
3]. Cartilage lesions are common in the human knee. Although the exact incidence is unknown, several studies report the presence of such lesions in 60–66% of knee arthroscopies of patients presenting with knee symptoms that require investigation (including unexplained knee pain and dysfunction) [
4,
5,
6,
7] and estimate a 12% incidence in the population as a whole [
8]. Articular cartilage is characteristically avascular and aneural which, in combination with the low metabolic turnover of cartilage extracellular matrix components by the resident chondrocytes, results in a poor intrinsic capacity for self-repair [
2,
3,
9]. Therefore, without intervention, AC lesions often fail to heal and can predispose the patient to further, progressive cartilage loss and eventually to secondary osteoarthritis (OA) [
10].
The need to prevent the progression from cartilage lesion towards secondary OA has led to the development of numerous techniques that aim to repair, replace or regenerate lesioned AC [
11,
12,
13]. These techniques range from palliative approaches (debridement, lavage), intrinsic reparative strategies (marrow stimulation through abrasion, subchondral drilling and microfracture), whole tissue transplantation (osteochondral auto- and allografting) and tissue engineering strategies (Autologous Chondrocyte Implantation, Matrix-induced Autologous Chondrocyte Implantation and Autologous Matrix-Induced Chondrogenesis) [
11,
12,
14,
15,
16]. There has also been an associated development of new scoring systems to assess the outcome of cartilage repair techniques. A multitude of histological [
17], arthroscopic [
18,
19] and imaging-based scoring systems [
20,
21,
22] are in use to assess the structural quality of repair cartilage, each containing a number of different parameters.
Thus, the published literature displays a wide range of opinions regarding the most important parameters and outcome measures in the structural assessment of cartilage repair. There is also a range of opinions on baseline factors that may influence the repair, including demographic factors such as gender [
23] and age [
24,
25,
26,
27,
28], and defect factors such as number [
23,
29], size [
23,
26,
30] and location [
26,
28,
29], with little agreement on which is the most important. We hypothesized that the combined expertise of a panel of experts could identify where consensus exists and areas where consensus is lacking on parameters that could be used to assess or to predict cartilage repair.
The Delphi technique is a method for acquiring group knowledge by turning individual opinions into a group consensus. The technique was primarily developed in the 1950’s by Norman Dalkey and Olaf Helmer and found publication for the first time in 1963 following the declassification of some of the military projects for which the technique was developed [
31,
32,
33]. The Delphi technique aims to collate existing beliefs and ideas surrounding a specific topic, deduce which of these are the most important and determine the consensus among a group of relevant people on an issue where previously there was little agreement [
32]. The Delphi technique is based on the theory that the opinion of a group is more valid than that of the individual or that ‘two heads are better than one’ [
32,
34,
35]. This is implemented through a series of iterative questionnaire rounds, between which there is a statistical analysis and controlled feedback of results [
36]. Our Delphi study sought to compile items that were deemed to be important in the field of cartilage repair and, subsequently, to determine the levels of consensus on these items amongst a panel of experts.
4. Discussion
The present Delphi study utilised a panel of experts to compile items deemed to be important mainly in assessing or predicting the outcome of cartilage repair, for which 46 single statement items and 4 ranked series were put forward. Subsequently, the same panel was used to determine the level of consensus on support for these items, of which 30 single statements and 1 statement series reached threshold consensus levels.
The items collected in the idea-generating focus group and, therefore, the content of the subsequent questionnaires, varied widely in the subtopic and scope within the cartilage repair field. This is not wholly surprising as the study was designed to be broad, allowing panel members to raise and discuss issues, with minimal restrictions, from their own research that they consider to be important. We attempted to collate the 30 single statement items that were supported by the panel as being important factors in cartilage repair into three useful groups (
Table 4). While the novelty of the items that reached consensus resided in their curation and collation in these groups, it was of interest to appraise some of the individual items to understand their utility as a collection.
An increase in collagen type II, aggrecan and lubricin expression were agreed by the panel as important markers in determining the quality of repair cartilage in human and animal studies. Abundant collagen type II and proteoglycans such as aggrecan has long been considered a marker for repair cartilage quality and longevity, making consensus on these items unsurprising [
43,
44,
45]. Lubricin is less established as a marker of cartilage quality but in a recent paper lubricin was found in the superficial zone of 84% of biopsies taken from repair cartilage following ACI [
43]. Lubricin is known to reduce friction [
43], and prevent abnormal cell adhesion and overgrowth [
46,
47] at the cartilage surface and, therefore, its presence in repair cartilage may be indicative of its resemblance to native articular cartilage and, therefore, its success.
The panel further agreed that all tissues in the joint should be considered when assessing the quality of cartilage repair in human or animal studies. These statements reflect the view of the knee as an organ in which the constituents work together to maintain function and dysfunction affects multiple tissues [
48]. The panel’s statement suggests that the knee should also be regarded as an organ when determining the success of cartilage repair. The panel also agreed on the statement ‘the repair has an effect on the other surrounding cells’, which conveys a similar message that the success of cartilage repair should not be judged solely on the quality of the repair cartilage as this is not the only tissue affected by the repair. These two statements, combined with the agreed statement that non-invasive measures would reduce time and costs, suggest developing imaging-based cartilage repair scoring systems that are able to consider and assess all of the joint tissues. Such a scoring system could combine a whole-joint MRI scoring system such as the Whole-Organ Magnetic Resonance Imaging Score (WORMS) or the Magnetic Resonance Imaging Osteoarthritis Knee Score (MOAKS) with a repair-specific system such as the 3-Dimensional Magnetic Resonance Observation of Cartilage Repair Tissue (3D-MOCART) score [
20,
21,
49]. While the MOAKS score is able to assess synovitis semi-quantitatively, recently, Maksymowych and colleagues described a new scoring system, the OMERACT Knee Inflammation Scoring System (KIMRISS) which is able to more reliably quantify synovitis-effusion [
50]. The inclusion of a quantitative soft-tissue inflammation score such as this would further improve the whole-joint assessment.
The panel agreed that the bone–cartilage interface is another important marker of the structural quality of repair cartilage. It has been reported that cartilage repair strategies that lead to the formation of fibrocartilage often demonstrate little regeneration of the tidemark and calcified cartilage and, therefore, develop a less stable tissue–bone interface [
51]. The regeneration of the osteochondral interface and, therefore, the integration of the repair cartilage to the bone is necessary for a stable repair. The calcified cartilage layer contributes not only to mechanical functionality and stability, but also to cartilage–bone homeostasis [
52]. Thus, the quality of the cartilage–bone interface could be indicative of the quality of the repair as a whole.
The panel also agreed on a number of baseline factors that could influence the success of cartilage repair. While the precise nature of the influence of these factors may not be clear, the panel did agree they were important. These factors included the disease status of the patient, as patients with chronic symptoms and related inflammation tend to have an increased failure rate of cartilage repair techniques or do not benefit at all [
53]. A consensus was also reached on the influence of environmental factors, such as age, gender and BMI, on the success of cartilage repair. In the elderly, for example, the chondrogenic potency of bone marrow-derived mesenchymal stem cells is inferior to that of younger patients, which could lead to a reduced chance of success of marrow stimulation techniques such as microfracture [
54,
55]. The size and depth of the cartilage lesion were also agreed upon by the panel as important factors to consider that may influence the outcome of cartilage repair surgery. Not only does the size and depth of the lesion often determine which repair technique is employed, but most procedures have a maximum size recommendation, beyond which success rates for that particular technique worsen [
56].
Only one of the four ranked series, ‘Tissue Type’, reached the consensus in this Delphi study (
Table 6), providing a hierarchy of joint tissues based on their importance in the cartilage repair process. This agreed hierarchy is particularly useful given the panel’s opinion to regard cartilage degeneration and repair as processes that affect and involve all tissues in the joint, rather than the articular cartilage alone [
48,
57]. One of the difficulties in appraising cartilage repair techniques, and finding ways to improve them, is determining the contribution of the other tissues of the joint to the cartilage repair process. This hierarchy can, therefore, serve to aid in prioritising the other knee joint tissues for future research of their role in affecting the structural quality of repair cartilage.
A number of the items put forward by the Delphi panel in the idea-generating focus group did not reach the threshold consensus in subsequent rounds, suggesting a dissent amongst the panel and, by extension, within the field. There was an increase in consensus between rounds two and three in six of the remaining 16 single-statement items that did not reach consensus and in the remaining three rated series. In theory, a Delphi process can have an unlimited number of rounds and further rounds might have led to further convergence on these nine items. However, the rate of participant attrition suggests that further rounds were unlikely to provide sufficient returns to be viable. The lack of a consensus more likely suggests a corresponding lack of knowledge around the statements. These statements can, therefore, serve as a list of potential research topics within the cartilage repair field. A total of ten single statements showed either no change or a decrease in the consensus between rounds two and three, suggesting that a difference in opinion on these topic areas remains.
Of the items that did not reach a consensus, the statements ‘hyaline cartilage is necessary in the repair’ and ‘functional fibrocartilage is sufficient in the repair’ are of particular interest and neither reached the threshold consensus levels. These statements represent two of the major opposing arguments in the field of cartilage repair. The ultimate aim of any cartilage repair technique is to (re)generate a tissue that is as close as possible to the native hyaline articular cartilage in order to achieve the best possible biomechanical properties and longevity of the repair. Fibrocartilage is considered biomechanically inferior to hyaline cartilage and, therefore, provides a more temporary repair that only slows the progression from cartilage lesion to OA [
7,
58]. The more the repair tissue resembles hyaline cartilage, the better the repair quality is considered [
43]. However, the fact that neither of these statements reached consensus indicates that this idea is not as ingrained as expected.
To our knowledge, no guideline criteria have been published for the selection of ‘experts’ to form a Delphi panel and expertise itself is hard to define. In the case of this Delphi study, we used the criteria put forward by Adler and Ziglio (1996) to define expertise in the Delphi context: ‘Knowledge and experience with the issue under investigation’, ‘capacity and willingness to participate’, ‘sufficient time to participate’ and ‘effective communication skills’ [
59]. A proportion of those that were invited to take part declined to do so (as demonstrated in
Table 1) and there was no attempt to encourage attendees to do so, as voluntary participation ensured that the entirety of the Delphi panel met these requirements. Throughout this Delphi process, the panel was composed entirely of individuals working in the United Kingdom. Although the study was, therefore, limited in its geographical scope, the results will have potential international applicability.
An additional limitation of the present study was highlighted by appraising the composition of the panel. In both rounds one and three the vast majority of the panel members were research scientists, indicating that this group was over-represented throughout. Over-representation of one particular group in the panel is a commonly reported limitation of the Delphi technique [
60,
61,
62,
63,
64]. Other studies have employed purposive sampling in the selection of their Delphi panels in an attempt to manufacture a balance between backgrounds [
63,
65,
66]. However, due to high levels of participant attrition, also commonly reported in Delphi studies, certain groups are more likely to complete the process and, therefore, are commonly over-represented in the final round, even in studies with purposive sampling [
60,
63]. In the case of our study, the over-representation of research scientists, particularly in round 3, did not diminish the impact of the findings. Rather, the fact that a number of items still did not reach a consensus, even when appraised by a largely homogenous group (in terms of job title), highlights further the dissent within the field and the need for studies such as this and further basic research to improve clarity and convergence.
Previously published Delphi studies varied widely in the size of the panel, from five participants, to around 400 [
36,
42,
67,
68]. A larger panel size increases the variety of expertise but ultimately is likely to lead to diminishing responses [
69]. The first-round panel size of 28 participants in this Delphi study allowed for the inclusion of a series of comments and opinions from a range of experts, without making the subsequent questionnaires overly time consuming. A recent systematic review, which aimed to evaluate previously published Delphi studies and produce guidance for future studies, detailed the number of experts that were invited to take part in 80 studies [
68]. Of these, 76 reported the number of individuals that were invited to participate. The median number invited was 17 (IQR 11- 31), suggesting our number was on the large side.
As demonstrated in
Table 1, there was some degree of participant attrition over the course of this Delphi process, with the panel size decreasing for each subsequent round. However, this smaller size did not impede the ability of the final round panel to reach consensus, as a further 14 single-statement items reached consensus in this round.
The decrease in panel size came with an associated decrease in the response rate, presented in
Table 1 as a percentage of the previous round respondents. The lowest response rate was observed in round 3, likely due to the distribution of this questionnaire electronically rather than in person as in the previous rounds. Once again, the lack of overreaching methodological guidelines for the Delphi process made it difficult to appraise the response rates resulting from this study. It is widely accepted that a 100% response rate is very rare in Delphi studies, particularly those which are at least partly carried out remotely, such as ours [
32,
37]. The previously mentioned systematic review reported that of the 80 Delphi studies that were interrogated, the median round one response rate was 90% (IQR 80–100%) and the median final round response rate was 88% (IQR 69–96%) [
68]. However, only 31 of the 80 studies (39%) reported their response rates, so these numbers could possibly suffer from publication bias [
68]. A handbook recommends that a response rate of 70% should be maintained for each round but also acknowledges this is difficult to obtain [
32]. This recommended response rate was obtained in the second round (85.7%) but not in the third round (62.5%), likely due to the electronic distribution of the third-round questionnaire. A higher response rate is easier obtained if all Delphi rounds are carried out face-to-face [
32,
37], which was not possible in this case.