Once the datasets were prepared, experiments involving inducers could be started, and they were divided into three groups. The first batch concerned the application of the classifiers operating in the continuous domain. The second group of tests was focused on the discrete data, but only with the implementation of standard discretisation (one-level and uniform for all attributes) approaches. In the third phase, all classification systems were employed to work on the data discretised by the combined supervised–unsupervised two-level transformations.
5.2. Standard Discretisation Approaches
As stated before, discretisation involves a transformation of features that can result in an increase in performance, but also a decrease is possible; everything depends on the data and the selected discretisation algorithm. In standard approaches to the task, the same method is applied to all available features, and the popular opinion is that supervised procedures lead to better results than unsupervised ones. However, past studies have showed that, depending on the data, this bad reputation of the latter is not always deserved [
25].
Furthermore, in the case when several separate sets with the same features (such as training and test sets) are needed, the methodology of their transformations can play a significant role. Separate sets can be discretised entirely independently, and then intervals and cut points are constructed based only on the local context of each set. The other path of transformations leads through learning discrete data models from the training data, and then the formed intervals are enforced on the test data. In the research described, an independent transformation of all sets was used.
For discretisation using supervised MDL methods, the performance of the classification systems considered is shown in
Table 3. In the majority of cases, discretisation resulted in worsened accuracy—sometimes rather slight—but for some inducers, degradation was severe, which happened in consequence of data irregularities existing in the sets. Naive Bayes and Bayesian Net shared some similarities. Therefore, they reported close results, and they suffered the most. Some cases of improvement could be observed for the J48 and kNN classifiers, in particular for the male writer dataset. When the results from the Fayyad and Irani method were compared against those received when the Kononenko method was employed, the latter led to an overall higher accuracy.
For the two unsupervised methods examined, the input parameter of a required number of intervals to be constructed was always studied in the range from 2 to 10, and nine variants of the data were explored. For equal width binning, the performance of all inducers was included in
Table 4.
NB classifier again suffered for both the female and male writer datasets, regardless of the number of bins considered for equal width binning, yet it was not so bad as in the case of supervised discretisation. For BNet, the results were varied depending on a number of intervals constructed and a dataset. For M-writers, mostly some improved performance was observed, while such a situation for F-writers happened only once when seven bins were formed. For J48, M-writers fared better than F-writers through discretisation. For kNN, the results showed some cases of improvement for the female writers, but for the male writer dataset, mostly worse recognition was noted. Random Forest worked surprisingly well (in comparison to other learners) when higher numbers of intervals were constructed.
Table 5 lists the performance of the inducers for equal frequency binning. Both unsupervised discretisation methods can be compared against each other and measured against supervised approaches, but firstly a look back to the continuous domain was needed.
For the male writers, NB always obtained worse results, but for the female writers in roughly half of the variants, some improvement was noted. BNet showed enhanced results only once for F-writers, yet for M-writers, it showed enhanced results in six out of nine cases. J48 recorded mostly enhanced results for the male writers, but for the female writers, it happened rarely. The performance of kNN was more often worse than better, and the opposite was true for the RndF classifier. Again, the improvement here was more noticeable than for other inducers.
When the unsupervised methods were compared against each other, it turned out that the resulting performance was often better for equal frequency binning. It confirmed some initial expectations and popular opinions. For most of the inducers, some numbers of bins constructed led to higher accuracy.
For the female and male writer datasets, the discretisation method and, when applicable (as for unsupervised algorithms), the value of the input parameter, which led to the best performance for each studied classifier, were analysed. They are presented in the section dedicated to the analysis of the obtained research results.
5.3. Two-Level Discretisation
As could be observed above in the reported research results, supervised discretisation can return such variants of the data where noticeably many input features are treated as useless in a discrete domain. Also, unsupervised algorithms can lead to more advantageous performance of classifiers working on the discretised datasets. These observations provided motivation for a more detailed study which combined both approaches.
In the proposed research framework, the fundamental notion was to get more out of variables than supervised discretisation could offer. Therefore, for all features, for which through supervised processing only single intervals were found to represent their values, an additional transformation step was executed this time employing unsupervised methods. Consequently, in a discrete domain, all variables had at least two bins assigned, and all could be mined in data exploration processes.
Unsupervised discretisation algorithms were used with varying the input parameter that defined the number of bins requested, and they were paired with both the Fayyad and Irani and Kononenko discretisation methods, which resulted in multiple versions of the datasets. All these variants were next mined with the group of selected classification systems, and their performance was observed. Some inducers showed similar trends, while others varied greatly. In the presented results, the bin number equal one denotes the situation where a dataset was transformed only by supervised methods; thus, some variables were assigned single intervals. When the number given is greater than one, it means additional transformations by some unsupervised algorithm, and then the number corresponds to the input parameter. The combination of the Fayyad and Irani method with equal width binning was denoted as dsF-duw and with equal frequency binning as dsF-duf. In the same convention, dsK-duw and dsK-duf stand for the combination of the Kononenko algorithm with equal width binning and equal frequency binning, respectively.
BNet stood out from the other learners. Its performance was included in
Table 6. For the female and male writer dataset, for all combinations of supervised and unsupervised methods, the change in performance was observed only with varying the bin number in the test sets. Changing the number of intervals in the training sets with the constant number of bins in the test sets brought the same accuracy. For each pairing, it can be noted that additional transformations by unsupervised discretisation algorithms caused increased performance. For F-writers, the trends were monotonic; for M-writers, they were close to monotonic.
For the F-writers, the Kononenko method used as a base always led to better results than the Fayyad and Irani algorithm: dsK-duf and dsK-duw resulted in better performance of the BNet than that obtained for dsF-duf and dsF-duw, regardless of the number of bins constructed in additional second-level transformations. However, for the M-writers, the visible trend depended on the unsupervised method—for equal width binning combined with either of the supervised discretisation processes, the accuracy was higher than for equal frequency binning.
For the remaining classifiers, the performance was much more varied, as shown in the plots in
Figure 2,
Figure 3,
Figure 4, and
Figure 5. In each chart, the categories describing the horizontal axis correspond to the number of bins for the additionally transformed variables in the training sets, while the series was defined by the numbers of bins in the test sets. The top four plots are for F-writers, and the bottom ones for the male writer datasets. Combinations of discretisation methods are commented on in the chart titles.
For the NB classifier (
Figure 2), the trends indicate that for the female writers, some similarities could be observed with respect to the performance of BNet. It is visible that additional transformations of variables in the test sets brought more noticeable results than when the training sets were subjected to the second level of discretisation. Such a statement for the M-writers would not be true; here, differences were visible for both directions.
Figure 2.
Performance [%] of the Naive Bayes classifier for the data transformed through supervised discretisation by the Fayyad and Irani (dsF) and Kononenko (dsK) algorithms combined with unsupervised equal frequency (duf) or equal width (duw) binning.
Figure 2.
Performance [%] of the Naive Bayes classifier for the data transformed through supervised discretisation by the Fayyad and Irani (dsF) and Kononenko (dsK) algorithms combined with unsupervised equal frequency (duf) or equal width (duw) binning.
For both the female and male writer datasets, additional processing of variables caused enhanced performance in many cases but rarely for the higher ranges of bin numbers. In particular, for M-writers, fewer bins led to enhanced predictions, while more intervals brought worsened accuracy. For F-writers, the trend lines were much clearer and showed that for the Fayyad-based data variants, increasing the number of bins in the test sets above one caused enhanced results. For the Kononenko-based versions, the opposite situation was observed. For M-writers, the changes occurred in both directions.
An analysis of the plots for J48 (
Figure 3) resulted in the conclusion that for the female writers the trends visible for categories were very similar, with hardly any differences. It indicated a higher degree of dependence on transformations of the test sets than of the training sets. For the male writers, much more variation was noticed, with yet again many cases of improved predictions. As previously for NB, for F-writers, closer similarities were observed between the plots based on the same supervised discretisation method, while for M-writers, the closeness depended on the unsupervised method employed.
Figure 3.
Performance [%] of the J48 classifier for the data transformed through supervised discretisation by the Fayyad and Irani (dsF) and Kononenko (dsK) algorithms combined with unsupervised equal frequency (duf) or equal width (duw) binning.
Figure 3.
Performance [%] of the J48 classifier for the data transformed through supervised discretisation by the Fayyad and Irani (dsF) and Kononenko (dsK) algorithms combined with unsupervised equal frequency (duf) or equal width (duw) binning.
The kNN classifier (
Figure 4) behaved in a rather distinctive manner. For the female writer dataset, the second-level discretisation of the training sets rarely brought better results, but additional transformations of the test sets almost always increased the accuracy by a noticeable degree. For the male writer dataset, in both groups of processing, the predictive power was mostly degraded, and hardly any improvement could be found among all tested variants of the data.
Figure 4.
Performance [%] of the kNN classifier for the data transformed through supervised discretisation by the Fayyad and Irani (dsF) and Kononenko (dsK) algorithms combined with unsupervised equal frequency (duf) or equal width (duw) binning.
Figure 4.
Performance [%] of the kNN classifier for the data transformed through supervised discretisation by the Fayyad and Irani (dsF) and Kononenko (dsK) algorithms combined with unsupervised equal frequency (duf) or equal width (duw) binning.
The performance of Random Forest is shown in
Figure 5. For the female writer dataset, high dependence on the number of bins in the test sets was visible for all combinations of discretisation methods. The opposite statement was true for M-writers, where transformations of the test sets caused rather worsening results.
The presented research results showed that two-level (instead of standard one-level) discretisation of attributes in the training and test sets, by combining supervised with unsupervised transformations, can be expected to have some influence on the performance of the classifiers working on discrete versions of the data. However, the charts presented differences between trends that were noticeable for classifiers, for datasets, and for combinations of methods. Therefore, the proposed methodology proved its merit, yet the answer to such a question as which method or combination of methods for each inducer is most advantageous, or which classifier works best, requires more analysis and is addressed in the next section of the paper.
Figure 5.
Performance [%] of the Random Forest (RndF) classifier for the data transformed through supervised discretisation by the Fayyad and Irani (dsF) and Kononenko (dsK) algorithms combined with unsupervised equal frequency (duf) or equal width (duw) binning.
Figure 5.
Performance [%] of the Random Forest (RndF) classifier for the data transformed through supervised discretisation by the Fayyad and Irani (dsF) and Kononenko (dsK) algorithms combined with unsupervised equal frequency (duf) or equal width (duw) binning.