In this section, a detailed description of the four experiments conducted as well as reasoning for the parameters chosen is given.
Appendix B.5.1. Set-Transformer
The same network architecture and hyper-parameters as in [
17] were used, namely a sequence of three ISAB blocks with a row-wise linear layer on top, with
induced points, a latent embedding dimension of
and 4 attention heads for each layer. This is a supervised approach, hence a training and evaluation data set are required. Since samples with blast cells are scarce, a cross-validation was performed to maximize the amount of training and evaluation data. The model should be capable of detecting MRD in FCM data of new patients, meaning for a fair assessment samples taken from one patient must not be split up between the training, evaluation and test set. Therefore we conducted a “patient-cross-validation”, where samples taken from one patient form the test set and the rest of the patients are divided for training and evaluation so that the corresponding samples split up in approximately the fractions of
and
, respectively. The architecture of the Set-Transformer can take a varying number of events per sample as input but requires a fixed length of the feature vector. As mentioned above, for many samples the backbone markers are not sufficient to successfully identify blasts and discriminate them from normal regenerative cells. On the other hand using all markers of the 8-color tube-specific panel of CFU (+drop ins) and LAIP tubes results in less training samples. Thus, there is a trade-off between the number of training data and the number of markers used. We test this method using only backbone markers as well as using tube-specific panel markers of CFU and LAIP tubes separately. A detailed listing of the number of evaluation, train and test samples is given in
Table A6.
Table A6.
Three experiments are performed separately, one with CFU data, one with LAIP data and one with backbone (BB) data. Available data sets are splitted into training, evaluation and test set. Each patient generates one split because the patients samples are hold out as test data (meaning that the rightmost column corresponds to the number of CFU, LAIP and backbone samples of each patient). All remaining samples from other patients are divided into training and evaluation set.
Table A6.
Three experiments are performed separately, one with CFU data, one with LAIP data and one with backbone (BB) data. Available data sets are splitted into training, evaluation and test set. Each patient generates one split because the patients samples are hold out as test data (meaning that the rightmost column corresponds to the number of CFU, LAIP and backbone samples of each patient). All remaining samples from other patients are divided into training and evaluation set.
Patient | Marker | Train | Eval | Test |
---|
A | CFU | - | - | 0 |
| LAIP | 27 | 9 | 1 |
| BB | 52 | 13 | 1 |
B | CFU | 22 | 4 | 3 |
| LAIP | 27 | 8 | 2 |
| BB | 50 | 11 | 5 |
C | CFU | 18 | 8 | 3 |
| LAIP | 24 | 7 | 6 |
| BB | 44 | 13 | 9 |
D | CFU | 19 | 8 | 2 |
| LAIP | 29 | 6 | 2 |
| BB | 51 | 11 | 4 |
E | CFU | 21 | 6 | 2 |
| LAIP | 25 | 9 | 3 |
| BB | 50 | 11 | 5 |
F | CFU | 20 | 6 | 3 |
| LAIP | 23 | 6 | 8 |
| BB | 44 | 11 | 11 |
G | CFU | 17 | 3 | 9 |
| LAIP | 21 | 7 | 9 |
| BB | 37 | 11 | 18 |
H | CFU | 23 | 5 | 1 |
| LAIP | 26 | 10 | 1 |
| BB | 51 | 13 | 2 |
I | CFU | 22 | 6 | 1 |
| LAIP | 26 | 10 | 1 |
| BB | 53 | 11 | 2 |
J | CFU | 17 | 7 | 5 |
| LAIP | 25 | 8 | 4 |
| BB | 43 | 14 | 9 |
Appendix B.5.2. UMAP-HDBSCAN Classification Pipeline
The essence of the method proposed is to mix events of healthy cell populations (control-events) with the events to be classified (test-events) before UMAP is applied; it is crucial that the graph, on which the UMAP embedding is based, is constructed with control- and test-events pooled together. This way, differences of healthy cell populations between control- and test-events are smoothed out by the combination of the local distance metric with the push and pull characteristic of UMAP while optimizing the low dimensional embedding as explained in
Appendix B.2. Healthy cell populations exhibit only small variations between patients; usually the location is stable and variations occur mainly in density as shown in
Figure A1.
Figure A2 shows the variations of the cell populations of the same patient at different time points of the therapy. The main differences are in the density of the cell populations, as different stages of regeneration occur at different times of therapy.
Figure A1.
Healthy cell populations exhibit low variation between patients. Each row shows an FCM sample of a blast-free patient taken at the same stage during therapy (after the third cycle of consolidation therapy). Differences in density as well as minor shifts of locations are noticeable. Colors of events falling into the bermude area represent promyelocytes (blue), granulocytes (brown), proerythrocytes (purple), monocytes (green), CD34 progenitors (yellow).
Figure A1.
Healthy cell populations exhibit low variation between patients. Each row shows an FCM sample of a blast-free patient taken at the same stage during therapy (after the third cycle of consolidation therapy). Differences in density as well as minor shifts of locations are noticeable. Colors of events falling into the bermude area represent promyelocytes (blue), granulocytes (brown), proerythrocytes (purple), monocytes (green), CD34 progenitors (yellow).
Figure A2.
FCM samples of one patient from different stages during therapy are compared. Therapy mainly affects the density of different cell populations. Colors of events falling into the bermude area represent promyelocytes (blue), granulocytes (brown), proerythrocytes (purple), monocytes (green), CD34 progenitors (yellow).
Figure A2.
FCM samples of one patient from different stages during therapy are compared. Therapy mainly affects the density of different cell populations. Colors of events falling into the bermude area represent promyelocytes (blue), granulocytes (brown), proerythrocytes (purple), monocytes (green), CD34 progenitors (yellow).
Our test set contains samples from different timepoints during and after therapy. To account for possible variations, 15 control samples are randomly chosen such that each timepoint is represented at least once (if available). Since for this method the number of control samples needed is quite low, the trade off between data and marker richness as discussed for the Set-Transformer above does not apply. Therefore, we use the 8-color tube-specific panel of CFU (+ drop ins) and LAIP tubes. The steps conducted to predict MRD in each test sample are described in detail in the next paragraphs.
First control-events are mixed with the events of the test sample. The control-events are sampled as follows:
Determine the total amount of events to be sampled from control samples.
The number of control-events mixed with the test sample varies with the number of its events and is determined by , where the ratio is a parameter that was set to .
Select events from all available control samples.
is divided by the overall number of available control samples M to obtain the number of events to be selected from each control sample, i.e., . Using events from different control samples (in our case ) ensures a representative variety of phenotype expressions. Each control sample should have an equal contribution.
Every cell population should be represented in the control-events.
Assuming we have cell population
A,
B and
C with
,
and
cells, respectively, the proportion of events of each population sampled is
. The populations as defined in
Section 2.2.1 are selected for sampling: monocytes, granulocytes, proerythrocytes, promyelocytes and CD34 positive cells. In addition, cells not belonging to either of these categories but falling in the bermude area are also selected.
Concatenate all selected events and shuffle them.
After the control-events have been sampled and pooled together with the test sample a 3D UMAP embedding is created. For this procedure default parameters settings are used except for the parameter , which is set to allowing for denser packing of cells in the embedding space. This is recommended by the authors of UMAP, if subsequent clustering is performed. It was opted for a 3D embedding space as it is the maximum amount of dimensions that still provides the possibility of intuitive visualizations.
Once the events are embedded, clusters are detected with the HDBSCAN algorithm, where again all parameters were left at default settings except for , which determines the minimum amount of events that can form one cluster. It was set to , since samples with less than 50 blasts are denoted as MRD negative.
UMAP and HDBSCAN allows to represent and process unseen data without altering the learned mappings. This enables to transform new events into a previously learned embedding space and to assign these new events to the previously determined clusters. Due to UMAP properties, the embedding of unseen data into a fixed UMAP tends to loose significance, if these unseen data are not close to data from which the UMAP was generated. On the one hand, it is important that the UMAP is created with a reasonable proportion of test sample events. On the other hand, if there are to few control-events involved in the generation of the UMAP, these events will not lie close to the healthy populations of the test-sample and will produce unknown clusters falsely classified as blasts. Before predicting blasts based on the amount of control-events in each identified cluster, we therefore make use of the transform function and add additional control-events to the clusters in the embedding space. By doing so we maximize the ratio of control-events in non-blast clusters and hence lower the sensibility to the threshold used for classifying clusters as blast clusters. Those additional control-events are selected as described above, the number is again defined by a ratio, which we call
to distinguish them from the first round of control-events sampled. We set
, meaning that the same amount of events as present in the test sample are selected from the control samples and added to the learned embedding and clustering. If control-events dominate, blasts run the risk of adhering to non-blasts, which impedes identifying correct clusters as shown by
Figure A3.
In general the ratios seem to not have a drastic effect as long as the test sample dominates the learned embedding and the variations of healthy cell populations are roughly covered by control-events. Therefore it was opted for a a bit below 1 and a of 1 so that in total almost double as many control-events as test-events are randomly selected. Finally, all clusters identified by HDBSCAN that consist of less than control-events are classified as blast clusters, meaning all events in this clusters are labelled as blasts and all others as non-blasts. In theory the blast clusters should exclusively contain events from the test sample, since control samples are blast free. However, this is usually not the case. If for example blasts are not entirely separated from healthy populations or due to events transformed into the embedding space at a later stage, some contamination of blast clusters with control-events can be observed. An empirical evaluation of hold-out samples suggested an impurity-acceptance-rate of to be reasonable. By assigning a threshold rather than taking the cluster with the least control-events we open up the possibility of having no blasts as well as more than one blast cluster. More than one blast cluster can appear since blasts might differ in their CD34 expression or one biological meaningful blast cluster might be separated by HDBSCAN.
Figure A3.
UMAP embedding of a test sample mixed with control-events for a fixed but varying (columns A–D). The first row shows the embedding learned with increasing number of control-events mixed to the test sample. Blasts are in red, non-blast events are grey. The second row shows the same respective UMAP representation with the test-events (orange) where additional control-events (blue) are transformed into the embedding space. For low value of blast events are well separated but additionally projected control-events are spread out between clusters (column A–B). For higher values of blasts adhere to healthy cell populations, which makes it harder to detect them as separate cluster, but additionally projected control-events mix well with the clusters of healthy cell populations.
Figure A3.
UMAP embedding of a test sample mixed with control-events for a fixed but varying (columns A–D). The first row shows the embedding learned with increasing number of control-events mixed to the test sample. Blasts are in red, non-blast events are grey. The second row shows the same respective UMAP representation with the test-events (orange) where additional control-events (blue) are transformed into the embedding space. For low value of blast events are well separated but additionally projected control-events are spread out between clusters (column A–B). For higher values of blasts adhere to healthy cell populations, which makes it harder to detect them as separate cluster, but additionally projected control-events mix well with the clusters of healthy cell populations.