We propose a new pruning constraint when mining frequent temporal patterns to be used as classification and prediction features, the
Semantic Adjacency Criterion [SAC], which filters out temporal patterns that contain potentially semantically contradictory components, exploiting each medical domain’s knowledge. We have
[...] Read more.
We propose a new pruning constraint when mining frequent temporal patterns to be used as classification and prediction features, the
Semantic Adjacency Criterion [SAC], which filters out temporal patterns that contain potentially semantically contradictory components, exploiting each medical domain’s knowledge. We have defined three SAC versions and tested them within three medical domains (oncology, hepatitis, diabetes) and a frequent-temporal-pattern discovery framework. Previously, we had shown that using SAC enhances the repeatability of discovering the same temporal patterns in similar proportions in different patient groups within the same clinical domain. Here, we focused on SAC’s computational implications for pattern discovery, and for classification and prediction, using the discovered patterns as features, by four different machine-learning methods: Random Forests, Naïve Bayes, SVM, and Logistic Regression. Using SAC resulted in a significant reduction, across all medical domains and classification methods, of up to 97% in the
number of discovered temporal patterns, and in the
runtime of the discovery process, of up to 98%. Nevertheless,
the highly reduced set of only semantically transparent patterns, when used as features, resulted in
classification and prediction models whose performance was
at least as good as the models resulting from using the complete temporal-pattern set.
Full article