1. Introduction
Radiofrequency (RF) catheter ablation is an established treatment option for atrial fibrillation (AF) [
1]. Ablation success highly depends on the location and size of the lesion created by the ablation catheter, thereby leading to a continuous and durable lesion set forming an ablation line between the pulmonary vein (PV) and the left atrium. The main reason for AF recurrence after ablation is the electrical reconnection of the PV and the left atrium [
2], whereas prolonged RF application duration bears the risk of complications, such as steam pops, pericardial effusion, or atrioesophageal fistula [
3]. Aiming at creating durable ablation results while mitigating complications emphasizes the importance of precise lesion size prediction.
The Ablation Index (AI) and Lesion Size Index (LSI) are widely accepted, easy-to-use lesion metrics for estimating the RF lesion size and help guiding the ablation procedure. Depending on the physicist’s discretion, they are available in their corresponding 3D mapping systems: CARTO© 3 (Biosense Webster, Irvine, CA, USA) and EnSite™ (Abbott, Abbott Park, MA, USA). The feasibility of both AI and LSI has been shown in previous studies [
4,
5,
6]. Both metrics are dependent on their input parameters, such as contact force (
), delivered power (
P), radiofrequency current (
I), and application duration (
). Local impedance (LI) is another parameter that guides catheter ablation and does not depend on further input parameters. LI dynamics measurement was implemented in the RHYTHMIA HDx™ (Boston Scientific, Marlborough, MA, USA) platform with DIRECTSENSE™ technology. The LOCALIZE trial showed that LI dynamic predicts acute and chronic PV segment conduction block, even without CF monitoring [
7]. Next-generation ablation catheters (IntellaNav Stablepoint™ with DIRECTSENSE™ technology) integrate both LI and CF measurement, but the current RHYTHMIA HDx™ platform neither provides AI nor LSI indices to guide the ablation procedure.
Currently, there are a lack of data on the direct comparison of lesion metrics, such as AI-, LSI-, and LI-drop-guided ablations. Most studies have reviewed lesion metrics individually regarding their acute efficacy and long-term success. For example, they show shorter procedure times and fewer PV gaps for an LSI-guided approach than an LI-drop-guided ablation procedure [
8]. However, a direct comparison regarding RF application duration and influencing factors for lesion creation is particularly required.
Our study aims to provide insights into RF application durations and influencing factors by comparing index-guided (AI and LSI) and LI-guided approaches, which is achieved by predicting AI and LSI metrics using a machine learning approach and by showing the feasibility of their applicability in a clinical setting.
2. Methods
While CF and LI monitoring (Stablepoint™, RHYTHMIA HDx™ with DIRECTSENSE™ technology, Boston Scientific, Marlborough, MA, USA) are available simultaneously during ablation, AI and LSI formulas contain proprietary coefficients, which are not disclosed (see Equations (1) and (2)).
Equation (
1) shows how the Ablation Index (AI) is calculated, while Equation (
2) shows how the Lesion Size Index (LSI) is calculated. Ablation Index (AI, arbitrary units):
= contact force;
P = power;
= application duration; and
represent constants (proprietary data). Lesion Size Index (LSI, arbitrary units):
F = 6 s sliding window of average contact force;
I = 6 s sliding window of average radiofrequency current;
= time; and
represent constants (proprietary data).
We performed a two-step approach to estimate the AI and LSI for the lesions created by the RHYTHMIA HDx™ system, as these indices are not provided by the system. In the first step, as predicting lesion indices can be comprehended as a regression problem, we trained two machine learning algorithms for AI and LSI predictions using collected ablation data. In the second step, we retrospectively analyzed the atrial ablation data of patients undergoing LI-drop-guided ablation, utilizing our pre-trained models to predict AI and LSI lesion metrics.
Data preparation, model generation, and statistics were conducted in Python using Numpy [
9], Pandas [
10], scikit-learn [
11], Scipy [
12], Statsmodel [
13], pyGAM [
14], and Miceforest. Matplotlib [
15] and Seaborn [
16] were used for visualization. For the software library version used, see
Table A1.
2.1. Patient Population
Patient data, both for model generation and data analysis, were provided and anonymized at the time of export from its specific ablation system, with baseline patient characteristics attached. For model generation, patients undergoing radiofrequency PVI for the ablation of paroxysmal or persistent AF using either AI-guided (Carto© 3 system, Biosense Webster, Irvine, CA, USA) or LSI-guided (EnSiteX™ system, Abbott, Abbott Park, MA, USA) approaches were selected. Data were exported from the RHYTHMIA HDx™ system with DIRECTSENSE™ technology for data analysis.
All procedures were carried out in accordance with the current guidelines [
1]. SmartTouch
® and TactiCath™ ablation catheters were used for index-guided ablation while using the CARTO© 3 system (Biosense Webster, Irvine, CA, USA) and EnSite™ system (Abbott, Abbott Park, MA, USA) accordingly. The energy was delivered using an IntellaNav Stablepoint™ ablation catheter (RHYTMIA HDx™ system, Boston Scientific, Marlborough, MA, USA) for the LI drop-guided procedures.
2.2. Model Generation
We gathered two training datasets by exporting the ablation data from ten patients that had been undergoing ablation for paroxysmal or persistent atrial fibrillation from their respective mapping systems. This resulted in a total of 553 (AI) and 22 (LSI) lesions. As we aimed to be able to predict the lesion indices for every time point
during the ablation process, we further separated the independent and dependent variables during lesion creation, with a total of
(AI) and
(LSI) data points for training purposes. Furthermore, using the Simpsons’ rule [
17] during data preparation, we calculated the force–time integral for every time point
during lesion creation. For the parameter space of both models, refer to
Table 1 and
Table 2.
We used the Shapiro–Wilkins and Anderson–Darling tests as normality tests. For both training datasets, the Shapiro–Wilkins and Anderson–Darling tests indicated a non-Gaussian distribution. Additionally, the variance inflation factors (VIFs) and correlation coefficients suggested the existence of multicollinearity, potentially affecting the stability of the models. To address these challenges, we compared a Random Forest Regressor [
18,
19] and a Gradient Boosting Regressor [
19] alongside three linear models—Ridge, Lasso, and ElasticNet [
19]. During the training of the LSI model, we employed quantile transform to handle the non-Gaussian distribution and skewness of the LSI training data.
Furthermore, we conducted hyperparameter [
20] tuning to improve the performance of all models using
GridSearchCV with a 10-fold cross-validation. We chose the hyperparameters to be optimized based on our own conducted early experiments to account for both simple and more complex models. For the optimized hyperparameters and their value ranges, refer to
Table A2.
To evaluate our final models, we used the coefficient of determination (
) and the mean absolute error (MAE) while applying a 30% split for cross-validation. Regarding our AI prediction model, the Random Forest model performed best with an
of
and an MAE of
. For our LSI prediction, we chose the Gradient Boosting model, as it showed the best performance with an
of
and an MAE of
. The three linear models also performed less successfully, most likely due to their inability to model the non-linear relationship of the underlying data (refer to
Table 3 for the complete test results).
2.3. Data Analysis
Subsequently, we analyzed the atrial ablation data of 27 patients who underwent LI-drop-guided ablation. For patient demographics, refer to
Table 4. To our knowledge, no patients suffered from procedural complications or had to undergo a redo procedure. The data exported from the RHYTHMIA HDx™ mapping system (Boston Scientific, Marlborough, MA, USA) contained raw LI, CF, and power measurements. We used linear interpolation to transfer to a standard time base and sampling frequency of 997 Hz for every ablation trace. According to the LOCALIZE trial’s approach, raw LI measurements were first filtered through a moving mean filter with a window length of
s (see
Figure 1) [
7]. We based all further calculations on the filtered LI, which we refer to as LI in the following sections. In analogy to Ohm’s law for alternating current, the RF current was calculated from the LI and power measurements.
The data contained three-dimensional coordinates of the ablation catheter’s tip location. Using the k-nearest neighbor algorithm [
21,
22], we calculated the linear interlesion distance (ILD) between two adjacent lesions. Therefore, the tip’s mean position was calculated for each lesion.
We then calculated the predicted indices for every time point
on the ablation trace using our earlier trained AI and LSI prediction models (
Figure 2). The local impedance drop plateau was defined as the local minimum. While only lesions that reached the operator’s desired LI-drop target were exported, we visually confirmed a correctly set LI drop point/plateau for each lesion. AI targets were set to
, and LSI targets were set to ≥4.
Lesion metrics were then exported for further statistical analysis.
2.4. Statistics
During data preparation, we excluded lesions with missing values for the AI prediction model. We used Little’s MCAR (Missing Completely at Random) test to address missing values for our LSI prediction model. With the test yielding a chi-square statistic of
and a
p-value of
, we employed Multiple Imputation by Chained Equations (MICE) with five iterations to handle the missing values for the LSI ablation duration. Subsequently, to further provide robustness to our results, a sensitivity analysis of the imputation process was performed. We compared MICE against the mean, median, and k-nearest neighbor (KNN) imputations with consistent results of
for
and
for MAE. Also, we tested for different iteration counts for MICE (5, 10, and 15), with stable performance metrics ranging from
to
for
and
to
for MAE. We, therefore, opted to use MICE as our imputation method for the LSI duration data. Additionally, lesions with RF application durations over 30 s and ILD over
were excluded beforehand according to the CLOSE-Protocol [
4] and Kanamori et al. [
23]. Overall, a total of 13% of all lesions were excluded.
We reported continuous variables as the median with inter-quartile range (Q1–Q3) or mean ± standard deviation, and the categorical variables were summarized as count and percentage. We tested against a Gaussian distribution using Shapiro–Wilkins and Anderson–Darling tests. In addition, we utilized Q-Q plots and histograms for visual evaluation.
The Friedman and post-hoc Wilcoxon signed-rank tests were used to compare the ablation durations between groups.
The residuals were not normally distributed, and the Lagrange Multiplier test indicated signs of heteroscedasticity. Since the Gauss–Markov assumptions were violated, we used Generalized Additive Models (GAMs) to capture the potential non-linear relationships and Random Forests as a robust, non-parametric approach to regression analysis.
3. Results
Our main findings are presented in
Table 5. The results indicate that the median RF application duration differed significantly depending on the lesion metric used (Friedman-Test:
,
, and
). While the median RF application duration guided by an AI target of ≥400 was
(IQR = 5.05–9.57) s, the median RF application duration guided by an LSI target of ≥4 was
(IQR = 17.59–22.95) s. The LI drop plateau was reached after a median RF application duration of
(IQR = 8.02–16.72) s.
There is a moderate difference when comparing an LI-guided approach and an AI-guided approach in significantly shorter RF application durations for AI-guided procedures, , , r (Cohen, 1992) = . In contrast, the LI plateau is achieved more quickly than an LSI target of ≥4, , , r (Cohen, 1992) = . Comparison of the AI-guided and LSI-guided RF application durations also revealed a substantial difference with shorter RF application durations for AI guided procedures, , , r (Cohen, 1992) = .
In predicting the RF application duration for an AI-guided procedure, GAMs and Random Forests consistently identified the mean CF, starting CF, LI drop, and LI starting impedance as critical predictors. While all parameters were significant (GAMs ), the mean CF had the highest importance () in prediction RF application duration, followed by starting CF (importance ). In contrast, in an AI-guided approach, ILD had no predictive value for RF application duration.
Regarding the predictors for the RF application duration in LSI-guided procedures, the mean CF, LI start, and drop values were significant factors influencing the RF application duration (GAMs ). However, their importance differed, with the mean CF values having the highest importance (importance ), as identified by Random Forest. Contrary to an AI-guided approach, the interlesion distance was a significant (GAMs ) factor; however, it was a less important factor (importance ) than the mean CF in predicting RF application durations.
In contrast to AI- and LSI-guided procedures, GAMs and Random Forest identified the LI drop and LI start values as significant (GAMs ) and important ( and ) predictors, respectively, for RF application duration in LI-guided procedures. While GAMs and Random Forest identified significant and important predictors, they were put into perspective by emphasizing the Random Forest metrics. As the mean squared error was considered high (mean CV MSE = ) and the coefficient of determination () was low, the Random Forest model was less reliable, possibly compromising the identified values of the predictors.
4. Discussion
This study provides new insights into the level of lesion creation during catheter ablation, showing significant differences in the RF application durations between the AI-, LSI-, and LI-guided ablation strategies with AI and LSI targets set to 400 and 4, respectively. The data analysis based on our trained machine learning models predicted the AI and LSI values and, thus, the presented data are based on our models’ reliance in the context of machine learning. The AI and LSI metrics remain proprietary to their respective mapping systems. This retrospective cohort study with a comparative analysis is the first to provide a direct comparison to such an extent.
With RF catheter ablation already providing an established treatment option, especially in AF, to an aging population in Europe and the consecutively rising burden of arrhythmias for individuals, as well as the healthcare system, its importance will increase. While recent analysis, such as the study conducted by Wita et al., shows regional differences in ablation rates [
24], it is essential to understand and compare the current metrics to optimize outcomes across different healthcare systems.
Published data showed longer RF application durations in LSI-guided procedures compared to LI-guided procedures, which might result in more durable lesions [
8]. Regarding RF application duration and energy delivery, HPSD ablation is currently being reviewed [
3,
25], with recent studies, such as La Fazia et al., showing the impact of the power setting in achieving transmural lesions and their effect on arrhythmia recurrence [
26].
This current study had its main focus on the level of lesion creation. We report significant differences in the RF application duration between AI-, LSI-, and LI-guided approaches. The significantly shorter RF application duration observed with an AI-guided ablation approach compared to an LSI-guided approach was one of the most notable outcomes. Similarly, the RF application duration for an LI-guided approach was significantly shorter than an LSI-guided approach. While the exact clinical impact of different RF application durations remains unclear, our findings may suggest that shorter RF application durations with an AI-guided approach lead to shorter, more efficient procedures while mitigating complications such as atrio-esophageal fistula. On the other hand, LSI-guided ablations with longer RF application durations may subsequently result in more durable lesions, as suggested by Lian et al. [
8].
Predictors for RF application durations vary between approaches, with the mean CF having the highest relative importance for AI- () and LSI-guided () approaches. As the AI formula already incorporates CF as one input variable and the LSI formula contains a 6 s sliding window of mean CF, the finding of the mean CF as the most important predictor does not seem to have much novelty. However, this proves that our machine learning model predictions work as expected, as AI and LSI rely on their input variables. Furthermore, the results underline the necessity of stable tissue–catheter contact for durable lesions. However, the variability in CF between cases may depend on anatomical differences, such as atrial wall thickness, or operator techniques.
LI drop and starting LI values were identified with regression analysis as important predictors (
and
) for RF application duration during LI-guided procedures. These results are in line with the findings of the LOCALIZE trial [
7] while also linking them with RF application duration. The results of the LOCALIZE trial [
7] may provide a reference for our machine learning models as the findings of influencing factors are assumed to be of close similarity. In contrast to the LOCALIZE trial [
7], the relative importance of LI drop and starting LI as a predictor for RF application duration when compared to the mean CF for AI- and LSI-guided procedures, as a predictor, showed a considerably lower predictive performance. This suggests that LI, as a real-time indicator, lacks the ability to fully capture the complexity of lesion creation. The Random Forest model’s lower predictive performance (
) further underlines these findings as it may influence the operator’s choice of metric to guide the procedure.
More studies are needed to evaluate the safety and efficacy of these results in a clinical setting. Assessment of different machine learning techniques and the incorporation of more procedural and periprocedural parameters will help to construct a more patient-tailored ablation approach [
27,
28,
29].
Study Limitations
Our study has several fundamental limitations that primarily arise from using a two-step approach. First, the limitations that apply when using machine learning. Second, the limitations arise from methodological and underlying data challenges.
While the AI and LSI metrics are proprietary, the exact formula coefficients have not been publicly disclosed. Thus, our approach relies on our models correctly predicting their according indices. Although we used Random Forest and Gradient Boosting for our underlying models, the small training data size of 52,968 (AI) and 4253 (LSI) for every time point
during lesion creation can lead to overfitting. In particular, this is problematic when applying lesion metric prediction to unseen data. However, Random Forest is less susceptible to overfitting due to its nature of reducing variance compared to Gradient Boosting. This especially applies to a limited training data parameter space. On the other hand, Gradient Boosting reduces bias, thus leading to a better fit within the parameter space. Regarding outliers, Random Forest tends to be more resistant as the averaging effect across many trees reduces their impact. For the training data parameter space on which our models were trained, refer to
Table 1 and
Table 2. For example, as shown by La Fazia et al., different power settings affect transmural lesion creation [
26]. This is assumed not to be a concern with Random Forest as the effect of a different power setting is averaged in certain trees as long as the specific power setting is included in our training data parameter space. In addition, it has to be acknowledged that our models were trained on data exported from PVI procedures, while the data analysis included all atrial ablation procedures.
Thus, future studies using a similar machine learning approach must primarily address the below challenges. In particular, this means larger and more specific training datasets, for instance, only on PVI data, to enhance robustness and transferability. A refined model training process would also provide benefits, such as evaluating different ensemble learning techniques or neuronal networks.
In the second step of our approach, we analyzed the data from patients who underwent LI-guided ablation alongside our predicted AI and LSI metrics. While this is a retrospective study, data were provided anonymously during export. Thus, the potential for unmeasured confounders affecting the outcomes cannot be excluded. Our dataset represents a heterogeneous patient population, with a relatively small sample size of only 27 patients who underwent LI-guided ablation for different atrial arrhythmias. Likewise, data analysis was not conducted separately for different segments, such as anterior/superior or posterior/inferior. In addition, our dataset lacks periprocedural baseline characteristics, such as echocardiographic parameters, operator variability regarding experience and techniques used, and procedural characteristics such as procedural time. Also, our study did not examine the effect of different RF application durations on the mitigation of complications and long-term clinical outcomes. Thus, the potential for unmeasured confounders affecting the outcomes cannot be excluded, and the likelihood of statistical bias is also increased.
As already stated, the lower performance of our LSI prediction model leads to missing values for our LSI RF application duration. We, therefore, had to rely on MICE to deal with missing data. While we performed Little’s MCAR test and sensitivity analysis, we assumed that our data were not Missing Completely at Random (MCAR) and likely Missing at Random (MAR). This also comes with the assumption that our data were not Missing Not at Random (MNAR) as no other confirmation of MNAR, like comparing to different parameters like CF, was applicable. In addition to the already lower performance of our LSI model, this procedure may introduce bias. While the beforehand exclusion of lesions with RF application duration over 30 s and ILD over admits established practices, similar aspects of bias are applicable.
As already acknowledged above, future studies would benefit from more specific and larger datasets that allow for subpopulation analysis, such as different types of arrhythmias or adjusting AI and LSI targets for anterior/superior and posterior/inferior segments.