1. Introduction
Managing farming operations is a challenging task due to its complexity and unpredictability [
1]. It includes the activities like irrigation scheduling [
2,
3], pest management [
4,
5], nutrient management [
6,
7], investment of agricultural machinery [
8,
9], harvesting [
10,
11], logistics [
12,
13] and so forth. Farmers and stakeholders not only need to deal with short-term (daily and weekly) scheduling problems but also have to consider long-term (yearly) management. Typically, farmers are used to make these decisions according to their own observations and experiences [
14,
15]. However, an inappropriate decision may usually cause serious issues like decreasing the productivity, damaging the soil fields, increasing the costs and so forth. Owing to the latest advance of Internet of Things (IoT) and sensor techniques, data collected by climate sensors, ground sensors, radiation sensors and weather stations (made of sensors) enable researchers to build an IoT-based platform and therefore to execute tasks like monitoring, knowledge mining, reasoning and control [
16,
17]. However, with the growing amount of data collected by various sensors, farmers sometimes have great difficulties in making proper judgments, since they are not data scientists. As a consequence, farmers are now gradually employing decision support systems (DSSs) [
18,
19] for obtaining advice, because DSSs are able to transfer unstructured raw data into useful knowledge, therefore assisting farmers in managing agricultural activities efficiently and profitably.
As one of the most popular techniques in artificial intelligence, case-based reasoning (CBR) has been gradually employed for modelling DSSs in the domain of smart agriculture [
20,
21]. In general, an agricultural decision support system (ADSS) is a platform that gathers and analyses data collected from a variety of sources (meteorological, plant/crop-related, economic data). The purpose of an ADSS aims at assisting farmers in smoothening the decision-making processes for agricultural management by providing a list of feasible solutions [
18]. With strong reasoning capability, CBR can be used to generate these solutions. Once farmers encounter a new agricultural problem, the description of this problem is treated as a new case to a CBR enabled ADSS. Afterwards, the ADSS uses similarity measures to retrieve the most similar past cases from the case base, along with corresponding solutions. It is acknowledged that if the new case and retrieved past cases have great commonalities, then the solutions of retrieved cases can be used to solve the new case as well [
22]. Therefore, farmers can obtain the decision supports from the ADSS for managing the agricultural tasks.
Though applying case-based reasoning has promising advantages like ease of use and precise response, some critical issues in case retrieval have been pointed out by researchers [
23]. For example, each past case is typically considered as an independent individual in the case base and assigned with a sequential number as its unique identifier. However, a case in the case base could share similar (or dissimilar) feature values with others, leading to the fact that these cases can be interconnected by a similar association. Under such circumstance, the retrieval task may skip unnecessary comparisons between new and past cases and therefore to accelerate the retrieval process. Unfortunately, few researches pay attention to use the internal associations between cases. The negligence on these relations may lead to poor performance at the stage of case retrieval, because the case retrieval algorithms would sequentially traverse all the past cases for matching the most similar ones, even though a large volume of cases is stored in the case base.
To improve retrieval efficiency, some methods like the rough set theory [
24,
25] and filtering techniques [
26,
27] were adopted. On the one hand, the rough set theory could reduce the number of compared cases by defining lower and upper approximations. However, all past cases have to get involved when generating a set of qualified cases that meets the approximations. On the other hand, some researchers defined a rule set manually for filtering past cases. The rules were specified based on the observation of cases and researchers’ own interests for case retrieval tasks. Unfortunately, both rough set theory and filtering techniques failed to address any associations between cases and they were task specific. Once a new task is put forward, the filtering process has to be executed over and over.
Case retrieval plays an essential role in CBR systems because the rest of steps (reuse, revise and retain) cannot further proceed without successfully retrieving the most similar past cases in the first place. Current studies on the case retrieval algorithms mainly concern the following two aspects—(i) proposing new similarity measures and (ii) proposing new indexing methods.
In case-based reasoning, similarity measures are used to quantify the similarity between two objects [
28]. Usually, a smaller distance value means that the compared two objects have more commonalities. For retrieving the most similar past cases, researchers have contributed a lot towards proposing new similarity measures.
Wang et al. [
29] proposed a novel hybrid similarity measure for case retrieval in case-based reasoning systems, with considerations for five formats of attributes values—crisp symbols, crisp numbers, fuzzy numbers, fuzzy linguistic variables and fuzzy intervals. The calculation formula of the global similarity was established by combining the hybrid similarity measure and the synthesis weight measure for retrieving the proper historical case. Yoon et al. [
30] presented C-Rank, a link-based similarity measure for identifying similar scientific literatures in databases. This similarity measure used both in-link and out-link references, disregarding the direction of the references. The experimental result demonstrated that C-Rank achieved higher accuracy than existing approaches. Yazid et al. [
31] designed a new similarity measure based on Bayesian network for brain tumors cases retrieval. Their proposal was based on graph correspondences and signature nodes comparison from the Bayesian classifiers. The promising experimental results indicated that the proposed similarity measure outperformed classical methods. Zhai et al. [
32] proposed a novel triangular similarity measure, overcoming the shortcomings of cosine similarity and Euclidean distance similarity. The experimental result showed that their proposal had strong robustness and great accuracy. Jiang et al. [
33] introduced a novel semantic similarity measure for formal concept analysis by taking advantages of linked data and WordNet. The proposed method was not only used for data analysis and knowledge representation but also for concept formation and learning.
Though newly-proposed similarity measures indeed enable case-based reasoning systems to retrieve more accurate past cases, these measures do not improve the efficiency of case retrieval. During the step of case retrieval, the algorithms have to traverse all past cases in the case base, leading to low efficiency when a large volume of cases is stored. Therefore, proposing new similarity measures is not enough for improving the performance of case-based reasoning systems.
As a computational data structure, an index enables a case to be stored and searched in memory. Case indexing assigns indexes to cases for facilitating their retrieval [
34] and it plays a key role in case base maintenance. Many literatures have concerned the indexing issues.
Honigl and Kung [
35] proposed a data quality index method for maintaining the case base and avoiding redundant cases. Three indices (average solutions per case, count of similar retained queries and missing values) were used to build an index for the quality of the case base. Wiltgen et al. [
36] presented two indexing methods, named functional indexing and structural indexing. Both indexing methods generated separate discrimination networks and had mechanisms for preventing the network from having duplicate nodes. Similar past cases could be retrieved by adopting the indexing methods and similarity measurements. Ahmad et al. [
37] adopted the locality sensitive hashing (LSH) technique for obtaining short binary codes to represent medical radiographs. These hashing codes enabled indexing and efficient retrieval in large scale image collections. Durmaz and Bilge [
38] proposed an approach named randomized distributed hashing (RDH), which used LSH in a distributed scheme. RDH randomly distributed data to different nodes on a cluster and used hash function for indexing. Then the query sample was locally searched in different nodes during the query stage. The experimental result showed that the proposed distributed scheme had great potential to search images in large datasets with multiple nodes. Ahmed and Sarma [
39] detected that the accuracy of a system degraded with the increase in the size of the database, therefore an indexing approach was designed to deal with the feature deviation under noise. Considering the retrieval task, the proposed indexing approach gave higher hit rate than existing approaches, even at low penetration rate.
From above review on current literatures, it is concluded that indexing methods have great influence on case retrieval and case base maintenance. LSH is especially popular in case indexing. LSH refers to use a family of functions to map hash data points into buckets [
40]. As a result, data points that near to each other are located in the same buckets with high probability, while data points that are far from each other are likely to be placed in different buckets. This makes it easier and more efficient to identify past cases that are similar to the new one. However, LSH does not guarantee accuracy of classified cases. For example, two similar data points may be separated into different buckets due to the design of hashing functions. Thus, improvements on new indexing methods for case retrieval and case base maintenance are expected. It is worth noticing that none of above literatures mentioned mining and using the internal associations between past cases. In other words, each case is still individually stored and searched.
Therefore, in this paper, a new case retrieval algorithm for agricultural case-based reasoning systems is proposed. Before executing the algorithm, an association table is constructed, containing the associations between past cases. At first, the new case is compared with an entry-point case. Based on the similarity measurement, the similar or dissimilar association is then selected for comparison in the next iteration until the most similar past cases are detected. Under such circumstance, potential similar past cases can be evaluated preferentially and the number of compared cases is therefore reduced, because the proposed algorithm is able to skip unnecessary comparisons. Meanwhile, our proposal takes case base maintenance into account. The association table is updated during runtime. After resolving the problem, the new case is retained in the case base, as well as its similar and dissimilar associations.
The rest of this paper is organized as follows.
Section 2 presents the materials and methods of the proposed case retrieval algorithm. The results and discussions are presented in
Section 3. Finally, conclusions are drawn in
Section 4.
2. Materials and Methods
The proposed algorithm relies on a pre-constructed association table. Within this table, each past case is interconnected to several similar and dissimilar past cases. Once a new case is reported, it firstly compares with an entry point (as a starting case for comparison in the first iteration). If the similarity measurement between the new case and the entry point indicates these two cases have great commonalities, the similar association of the entry point is then selected for comparison in the next iteration, otherwise, the new case is compared with the dissimilar past cases which are associated with the entry point. The retrieval process keeps going until the termination of the algorithm is reached. On the contrary to traversing all past cases in typical case retrieval algorithms, our proposal measures the similarity of associated cases preferentially. Under this circumstance, the number of compared cases can be greatly reduced, therefore efficiency of case retrieval can be improved. For case retention, features of the new case, its similar and dissimilar associations, as well as its solutions, are stored in the case base. Meanwhile, the association table is updated if the new case shows closer relations than the old ones (associations).
2.1. Case Representational Formulism
The proposed algorithm focuses on retrieving agricultural cases which are formularized by the feature vector representation [
41]. As the simplest formulism of case representation, it represents cases by a set of features that describes the problems and corresponding solutions. In this manuscript, the agricultural case-based reasoning systems tries to manage pest problems, therefore, the agricultural cases are defined and shown in
Table 1.
In
Table 1, pest, crop and environment data are considered in agricultural cases [
42]. Each case has the same type and number of features. For implementation, past cases are stored in the CSV format. Since our agricultural case-based reasoning system is coded by the programming language Python, libraries like “Numpy” and “Pandas” can be used to manipulate the stored past cases easily. Furthermore, data in CSV format are understandable and readable for farmers, even though they do not have any expertise in knowledge management and computer science.
Contents of some features in
Table 1 are given by texts, like “pest name,” “pest stage,” “crop name” and “growth stage.” To deal with these textual features, we encode them into integers. Transforming a linguistic feature into a real number is a common approach in case-based reasoning systems [
43,
44]. For example, the life cycle of pest includes “egg,” “pupae,” “larvae” and “adult.” An integer is assigned to each stage respectively. Thus, integer “1” represents “egg,” “2” represents “pupae,” “3” represents “larvae” and “4” represents “adult.” The same transformational process works for the rest of textual features as well. For normalization both numeric and textual features, we adopted the Min-Max feature scaling method, mapping the original features into the range from 0 to 1. Although, there are some interrelations between a single feature, such as the life cycle of pest follows a time sequence. The process of data normalization does not eliminate these interrelations, since the original values (1,2,3,4) would be normalized as (0,0.3333,0.6667,1), which also reflects the interrelations.
In the feature vector representation, it is worth mentioning that there are no relationships built between cases. In other words, each case is individually stored in memory. This is the reason why an association table is constructed in the next section for case retrieval.
2.2. Construction of an Association Table
The association table contains the interconnection between past cases. Within the case base, a case could be similar or dissimilar to several other cases. An example of a case base is given in
Table 2, including six past cases. Each case has four features. After data normalization [
45], all the cases are visualized in
Figure 1.
According to the visualization result in
Figure 1, it is obvious that case 1 is similar to cases 3 and 5 because their data deviation is small. Meanwhile case 1 is dissimilar to cases 2 and 4 because their data distribution has major differences. Similarly, it is observed that case 2 is similar to cases 4 and 6, while case 2 is dissimilar to cases 3 and 5. Thus, the following association table can be constructed, shown in
Table 3. Each past case is associated with two similar and two dissimilar associations. The similarity measurements between two associated cases are stored in the association table as well. The similarity and the dissimilarity measurements are both calculated according to Reference [
32]. In
Table 3, for filling in the similar association, the cases that achieve the top two highest measurements will be chosen. Meanwhile, the cases that achieve the last two lowest measurements will be selected as the dissimilar association.
In
Table 3, each case has two types of associations:
Similar association—This type of association indicates that the features of concerned two cases have great commonalities. Consequently, the IDs of these similar cases are stored in the similar association, building interconnections to the source case. For example, case 1 is associated with cases 3 and 5. Once a new case is reported and case 1 is treated as the entry point, the cases 3 and 5 are selected for comparison if the new case is considered similar to case 1. Because other potential similar cases might exist among the similar association, the similar association offers the chance of evaluating the past cases within a smaller range, instead of searching the whole case base. As a result, the number of compared cases can be reduced and retrieval efficiency can be improved.
Dissimilar association—This type of association specifies that there are significant differences between the features of concerned two cases. The IDs of these dissimilar cases are stored in the association table as well. For example, case 2 is associated with cases 5 and 3. The dissimilar association aims at assisting the new case in identifying a relative similar case at the very beginning of case retrieval. Meanwhile, this association is also helpful when the retrieval process is trapped in a local optimal solution. In other words, the dissimilar association can adjust the searching trajectory in order to detect the global optimal solution.
For constructing such association table, it is necessary to measure the similarity between each past case. For instance, in
Table 2, case 1 has to be compared with cases 2, 3, 4, 5 and 6 respectively, case 2 has to be compared with cases 1, 3, 4, 5 and 6 respectively and so forth. After obtaining all the similarity measurements, each case can be associated with several similar and dissimilar ones. For instance, the number of similar associations is two, then two cases with the top similarity measurements are selected and stored in the association table. As shown in
Table 3, cases 3 and 5 achieves the top 2 similarity measurements when being compared with case 1, therefore, cases 3 and 5 are selected as the similar associations of case 1. The number of associated cases depends on the size of the case base. More cases are stored in the case base, more associations should be built.
Though constructing the association table is a time-consuming process when a large volume of cases is stored, it is still essential to explore the relations between past cases, because these relations could be useful for case retrieval. Besides, this construction is a one-time task. The association table is constructed at the initial stage, just before the system receives new inquires. After completing the construction, the association table is ready for use. For maintaining the association table, on the one hand, if a new retrieval task is completed and the CBR system decides to retain this new case, the association of this new case will be added to the association table. Meanwhile, the detected closer relations will update the old ones in the association table. On the other hand, if the CBR system determines not to retain the new case, the association table would remain unchanged.
2.3. Case Retrieval Algorithm
The workflow of the proposed case retrieval algorithm is presented in
Figure 2.
In
Figure 2, the case retrieval algorithm firstly starts with the new case input by users. An entry point is randomly selected from the case base for comparison in the first iteration. Based on the similarity measurement, the algorithm decides whether the new case is similar (or dissimilar) to the entry-point case. Afterwards, the corresponding association is determined and associated similar (or dissimilar) cases are read and retrieved from the association table. Then, the similarity between the new case and associated similar (or dissimilar) ones are measured in the next iteration, until the termination condition is reached. The termination condition of the algorithm is defined as—(i) the maximum iteration number is reached or (ii) a satisfied similar past case is found.
For determining whether the compared two cases are similar or dissimilar,
Table 4 is presented, indicating the correspondence between similarity level and measurements.
In regard to the association determination (for selecting the associated similar or dissimilar cases), a set of policies is defined in the case retrieval algorithm as follows.
Policy 1—Detection of identical cases—If a past case is detected identical to the new case, the case retrieval algorithm terminates immediately. The output is the retrieved past case.
Policy 2—Token assignments—Once a past case is considered highly similar to the new case, three positive tokens will be assigned to this past case. Once a past case is considered similar to the new case, one positive token will be assigned to this past case. Once a past case is considered highly dissimilar to the new case, three negative tokens will be assigned to this past case. Lastly, once a past case is considered dissimilar to the new case, one negative token will be assigned to this past case.
Policy 3—Association selection—In general, the association with more tokens will be selected for comparison. When the number of positive tokens is greater than negative ones, the past case with the highest similarity measurement will be selected. The associated similar cases of this chosen one will be evaluated in the next iteration. While the comparative result of the current iteration suggests that the number of negative tokens is more, then the past case with the lowest similarity measurement will be selected. Consequently, the associated dissimilar cases of this selected one are retrieved from the association table for comparison in the next iteration.
Policy 4—Selection of previous cases—It happens that all associated cases in a single iteration have been compared previously due to the reason that a past case can be associated with a 1-to-N relation. For instance, in
Table 3, case 5 has the similar association with cases 1 and 3. It makes no sense to repeatedly evaluate cases that have been already compared, resulting in an endless loop for the algorithm. Under this circumstance, the cases to be evaluated in the next iteration are selected from previous iterations. Based on the number of tokens, corresponding association is determined and the past case with the second highest (or lowest) similarity measurement from the previous iteration will be chosen for comparison. If the past cases in the previous iteration have all been selected, then the algorithm will repeat Policy 4 one more time.
In
Table 4, apart from the identical, similar and dissimilar levels, we also defined highly similar and highly dissimilar levels. Assume that the retrieval algorithm meets the following situation—the similarity measurements between the new case 1 and past cases 1, 2 and 3 are 90.00%, 30.00% and 40.00% respectively. Without the definition of highly similar and highly dissimilar levels, according to the pre-defined Policy 3, the dissimilar association of past case 2 will be selected for comparison in the next iteration. However, since the past case 1 is so similar to the new case 1, the similar association of the past case 1 has a great chance of being similar to the new case 1. Therefore, it would be a better choice to search in the similar association of the past case 1. Therefore, for avoiding this situation from happening, we decided to define the highly similar and highly dissimilar levels. Under such circumstance, the proposed retrieval algorithm would be forced to follow the potential optimal searching path. In summary, we equally divide the measurement into four intervals, denoting the highly similar, similar, dissimilar and highly dissimilar respectively.
The pseudo code of the proposed case retrieval algorithm is displayed in
Table 5.
For better demonstrating the proposed case retrieval algorithm, an example is presented in
Figure 3.
In
Figure 3, P
i represents the ith past case in the case base, while N
1 is the first new case. Initially, P
1 is selected as the entry point for comparing with N
1 in the first iteration. The comparative result suggests that the dissimilar association of P
1 should be chosen for comparison (Policies 2 and 3). Thus, P
336, P
157 and P
479 are compared with N
1. In the second iteration, the number of positive tokens is greater than negative ones. Consequently, the associated similar cases of P
157, which has the greatest similarity measurement, are selected for comparison in the next iteration (Policies 2 and 3). The case retrieval algorithm keeps running until the 6th iteration, all past cases have been repeated and used previously. According to Policy 4, P
339 which has the second highest similarity measurement from the 5th iteration is chosen as a substitution (Policy 4). The output of this algorithm is a past case which has the greatest commonalities with the new case. The termination condition of the proposed algorithm is defined as—(i) the maximum iteration number is reached; or (ii) an identical case is detected. The travelling sequence of the proposed case retrieval algorithm in above scenario is presented in
Figure 4.
2.4. Case Base Maintenance
After retrieving the most similar past case, the solution of this past case can be reused and revised for resolving the new problem. However, the solution reuse and revision are not the main concern of this manuscript. Thus, this issue is not going to be further discussed. Our main objective focuses on case retrieval and case base maintenance.
In terms of case retention and case base maintenance, the typical approach is to directly add the newly-solved case into the case base, along with its solution [
46]. Under the circumstance when the new case is extremely similar to a past case that has been already stored in the case base, a case forgetting strategy could be applied after evaluating the quality of both cases [
47]. In our case, we have to pay attention to the association table as well, because the performance of our algorithm depends on this table.
The proposed case retrieval algorithm takes care of case base maintenance in the following two aspects—(i) storing the learned case and (ii) updating the existing association of past cases.
Firstly, the case base should retain the learned case which is composed by the problem description of the new case, the corresponding solution and its association. In general, the learned case is assigned with a sequential number, as its unique identifier and then stored in the case base. The similar and dissimilar associations of this learned case are added at the end of the association table. The addition of new cases certainly ensures the possibility of retrieving cases that are similar to the target problems, however, this continuous addition also enlarges the size of the case base, leading to the complexity and low efficiency of case retrieval tasks [
48]. As a consequence, if the new case is extremely similar to a certain past case in the case base, its retention should be dropped for avoiding redundancy. This solution is acknowledged as forgetting strategy [
49]. By calculating the goodness of the learned case, the case-based reasoning system decides whether the learned case should be remembered or forgotten. For simplifying the process of case retention, a threshold is defined at 98.00%, suggesting that if the similarity measurement between the learned case and the retrieved most similar past case achieves beyond 98.00%, then the learned case will be forgotten and it will be not stored in the case base. Otherwise, case retention follows the general circumstance mentioned at the beginning of this paragraph.
Secondly, once the CBR system decides to retain the learned case, the existing association of past cases should be updated as well. The following two scenarios are considered in the manuscript.
Scenario 1—Updating the association of the new case—If a closer similar or dissimilar association with the new case is detected during runtime, this past case should replace the old ones (
Figure 5a).
Scenario 2—Updating the association of the past cases—If the new case shows a closer association with the compared past case, the old association of this past case should be replaced by the new case (
Figure 5b).
In
Figure 5a, when N
1 is compared with P
133, P
148 and P
301, it is detected that the similarity measurement between N
1 and P
148 achieves the highest value. As a result, P
148 takes the first position in the similar association of N
1. Meanwhile, N1’s existing association with P
407 and P
157 has a minor adjustment by moving backward their positions in the association table. It works the same for updating the association with P
301.
In
Figure 5b, the similar and dissimilar association of P
14 is presented. During the iteration, P
14 is compared with N
1. The comparative result indicates that N
1 has a closer association than P
256 with P
14. Consequently, N
1 updates the third position in the similar association of P
14 and P
256 is therefore removed from the association table.
2.5. Scalability of the Retrieval Algorithm and the Case Base
It is necessary to consider the scalability issues since the number of cases in the case base may keep growing. On the one hand, for the retrieval algorithm, the number of positive and negative tokens can be increased as the size of the case base enlarges. The retrieval process remains the same presented in
Section 3.3. On the other hand, the size of the case base would not increase infinitely. It is noted that we do not record the value of each variable for every day. The CBR system only stores useful past experiences. In other words, the system only remembers those variables in executed tasks with a complete pair of problem and solution features and the system does not record daily measurements. Therefore, the case base in a CBR system is quite different from those databases for social networks and weather stations. For instance, the weather station would store all the measured data. For an agricultural task like spraying the pesticide for rice, the average times are around 5 to 6 during the complete growing circle. Even when the farmland is divided into grids consisting of 50 blocks. The useful past experiences can be stored in the case base are maximum 300 pieces for a single farmland. If we have 50 farmlands in total, the maximum number stored in the case base is 15,000. In conclusion, the scalability of the retrieval algorithm and the case base would not be an obstacle for the CBR system.
4. Conclusions
Typical approaches of case retrieval try to match the most similar past cases by traversing the whole case base, leading to low efficiency when a large volume of cases is stored. Therefore, this paper focuses on proposing a case retrieval algorithm for agricultural case-based reasoning systems. Before performing the retrieval tasks, an association table is constructed, consisting of both the similar and dissimilar relationships between all past cases. The novelty of our proposal lies on selecting associated cases from the table and evaluating their similarity between the new case preferentially. Under this circumstance, the proposed case retrieval algorithm is able to retrieve similar past cases by comparing fewer cases in the case base. Our proposal also concerns the retention part in the loop of case-based reasoning. The association table is updated during runtime. After successful retrieval, the new case is retained in the case base, along with its similar and dissimilar associations when the similarity measurement between the new case and the retrieved most similar past case is smaller than 98.00%. Meanwhile, the associations of past cases are updated as well. The experimental result demonstrates that our proposal is able to retrieve similar past cases with great efficiency and accuracy. The case base is successfully maintained with newly retained cases and their associations.
It is acknowledged that case retrieval is one of the most significant parts in case-based reasoning. Because the rest of processes like reuse and revision cannot proceed further without successful case retrieval. Thus, the proposed case retrieval algorithm is not only useful in CBR enabled agricultural systems but also has great potential in CBR systems for other domains. With the efficient retrieval capability, a CBR based ADSS enables to provide farmers with quick decision supports about agricultural management.
Since this work was developed within the AFarCloud project, we are expecting to receive real data from the farms to verify the proposed case retrieval algorithm. For further improving the performance of the proposed retrieval algorithm, it is also worth looking into the selection of preferable entry-point case. By classifying similar cases in a single cluster and selecting the most representative case as the entry-point to be compared with the new case, it might potentially improve the performance of the algorithm. Furthermore, it might be helpful for improving the algorithm performance if we could set the range of similarity levels (presented in
Table 4) more precisely. Lastly, it would be interesting to investigate the performance of the proposed algorithm when the size of the case base increases to a larger magnitude. Under such circumstances, the number of similar and dissimilar associations are supposed to increase as well.