Incrementally Mining Column Constant Biclusters with FVSFP Tree
Round 1
Reviewer 1 Report
They suggested a gradual mining strategy. It can be considered a unique instance of a frequent pattern. It is currently the approach for progressively mining frequent patterns that is used the most. Also, an inventive incremental mining technique with a modified FP tree data structure was developed. The idea is interesting, but several issues need to be considered:
1. Introduction need to be improved.
2. Future directions should be discussed.
3. More examples should be conducted.
4. More references should be discussed:
Cluster-based information retrieval using pattern mining
A general-purpose distributed pattern mining system.
Author Response
Please see the attachment. Thank you for your helpful commentAuthor Response File: Author Response.pdf
Reviewer 2 Report
The paper is correct from the Editorial point of view, however, I can’t agree with the experimental section. The Authors provide a new incremental approach of kind of pattern extraction. That’s interesting. Unfortunately, I have a lot of doubts related to described experiments:
11) What was the intention to split dataset into original/incremental disjoint subsets?
22) How does the original/incremental label refer to incremental/batch in Table 5 and further? If incremental/batch calculations are carried out on the same data, why to split them earlier (Table 4) – if they are carreid out on different data (I suspect incremental/original accordingly) what is the sense to compare the time of computation and memory usage?
33) The Authors do not try to compare the quality of results of different approaches application – the number of found patterns or even to compare their volume or correspondance of results (do we get the same patterns from both approaches).
I also miss a short discussion whether the obtained patterns are inclusion-maximal (it is not possible to add any row or any column to the pattern without the CCB requirements violating). There are some approaches (e.g. based on Boolean reasoning – Michalak, ÅšlÄ™zak: Boolean Representation for Exact Biclustering) whose results inclusion-maximality property is mathematically proved.
Author Response
Please see the attachment. Thank you for your helpful comment
Author Response File: Author Response.pdf
Reviewer 3 Report
This article investigates an incrementally mining CCB problem. The matter is interesting. The proposed FVSFP structure looks effective from the experimental results. What concerns me the most is the contribution of this article is not enough to meet the journal’s acceptance criteria. If the authors could highlight the contributions of this paper, that would be better. The followings are the detailed comments:
1. In Sec. 1, the authors keep emphasizing that this is the first article to consider to mine CCB in an incremental way. This motivation is not clear since whether incremental mining CCB is necessary is an open question. If the authors could explain the need for incremental mining, that would be better. Furthermore, the related work is not clear, that is to say, what are the state-of-the-art methods for mining CCB?
2. In Sec. 1, the description of the definitions gives the impression of copying existing work, including the FVSFP structure is only a slight change in the FP structure. Once unfrequent nodes are preserved, the space cost of the proposed FCSFP will increase. Besides, what is the reason for using feature values-based sorting rather than feature values’ counts/frequency?
3. In Sec.2, when describing the steps of the proposed CCB mining, the authors mentioned “same as FP tree”. Such a writing style makes the authors’ contribution seem very weak and makes the article hard to read. For example, the description of header table construction is very abstract. The authors only mentioned “unfrequent feature value is not deleted”, but exactly how this header table is constructed is not clear, not even an example is given.
4. In Sec. 3, the article only tests the efficiency of the algorithm and lacks an experimental analysis of the effectiveness of the proposed techniques. For example, is the proposed FVSFP structure effective? How to prove that?
5. In Sec.3, the article adopts “batch” as the comparison algorithm, however, what are the latest CCB mining algorithms that are available? Why not use them to do comparison experiments?
Author Response
Please see the attachment. Thank you for your helpful comment
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
Dear Authors,
Thank You for answering my questions/doubts, however I am still not satsfied with most important of them.
-----------------------------------
„Thank you for your pointing out the problem. We apologize for the confusion caused by us. Incremental/batch calculations are carried out on the same data. Incremental in Table 5 and further refers to firstly dividing the whole dataset into original and incremental two disjoint subsets(original/incremental label). Batch means no division. For example, if the whole dataset contains 1000 samples, batch calculation means processing the 1000 sample continuously. Incremental calculation means first processing the 1-500 samples, then delete the 1-500 samples and processing the remaining 501-1000 samples. In batch way, memory stores 1000 samples at most. In incremental way, memory stores 500 samples at most. Therefore, incremental way can cost less memory.”
I am still not satisfied. If the data are to big (or the memory complexity is to high) two solutions should be considered: to not analyze these data (and to not mention about it in the text) or carry out the experiments on better equipped hardware. Moreover, dividing the data may cause that we’ll find two patterns (in two divisions separately) instead of the one and bigger (wider). That refers also to „inclusion maximality” of found patterns – if the pattern is divided none of the parts may be inclusion maximal.
-----------------------------------
„We would like to appreciate your helpful comment. In the revised manuscript, the number of found column constant bicluster is the same, the quality is also the same. The difference of different methods just lies in the time and memory used to find column constant bicluster.”
Ok.
-----------------------------------
„Thank you for your suggestion. The obtained patterns are proved to be inclusion-maximal.”
I do not see any proof in the paper. Moreover, the second part of the sentence „Analyzing … are inclusion-maximal with Boolean reasoning[37].” has no sense: a pattern is inclusion-maximal or not and that has nothing with the way of its search. I only mentioned one of Boolean reasoning-based approach as the example of the method, which property of providing all and only inclusion-maximal patterns can be proved in the mathematical way. If some other method provides patterns of such a property, it does not have to mean that for any data we’d get inclusion maximal results. The proof closes the case – here, we have no proof.
-----------------------------------
Concluding, I sustain my previous review.
Author Response
Thank you for your second comment. For second response, please see the attachment
Author Response File: Author Response.pdf
Reviewer 3 Report
I appreciate the author’s efforts and their response to the previous version’s comments. Most of the problems that arose in the previous version have been addressed, but the following changes are suggested to the author before the article is published:
1. The response to Q1 is not very good. Firstly, the example given by the author is common sense. If slicing the dataset is the reason for doing incremental mining, then all the problems with massive data can be explained by the author's response. The article emphasises that their innovation is that no one else has done incrementally mining CCB, but this motivation is lacking in support. Along the lines of this article, any data-driven research effort can be said to face the challenge of constrained computation resources, which can then all be solved with incrementality.
2. For the different datasets, the article uses almost the same description of the experimental results except for the different names of the datasets (see Sec. 3.23-3.26). There is no mention at all of the varying “Minimal support rates”, so why change this value at all? If the author does not want to describe a change in trend, then choosing a value is sufficient.
3. Figure 5 looks strange. The text in this figure is larger than the body of the context, and more importantly, the article talks throughout about constrained memory resources, but none of the data sets consumes more than 500 MB of memory. With today's computer configurations, even a mobile phone would have no problem with this amount of data. The necessity of the article's work is better demonstrated if the authors can do tests on a dataset, which “required computation resource is bigger than the maximal available computation resource”.
Author Response
Thank you for your second comment. For second response, please see the attachment
Author Response File: Author Response.pdf