Developing Language-Specific Models Using a Neural Architecture Search
Round 1
Reviewer 1 Report
Thanks for recommending me as a reviewer. This study were applied the neural architecture search (NAS) method to Korean and English grammaticality judgment tasks. Based on the previous research, which only discusses the application of NAS on a Korean dataset, the authors were extend the method to English grammaticality tasks and compare the resulting two architectures from Korean and English. If the authors complete the revision, the quality of the study will be further improved.
- The introduction section is well written. If the authors describe research trends for NAS application cases in more detail in the introduction section, it can help readers understand.
- The research methods section is well written.
- page 9: " Rectified Linear Unit (ReLU) functions as an activation function." - In recent studies, leaky ReLU is used as an activation function for deep learning due to the limitations of ReLU. However, the method of this study analyzed with ReLU is not wrong. I suggest that the authors add this to the study's limitations.
- Authors should add study limitations to the discussion section.
Author Response
- The introduction section is well written. If the authors describe research trends for NAS application cases in more detail in the introduction section, it can help readers understand. .;
(Answer): First of all, we appreciate the valuable comments of the reviewer for this paper. According to your advice, we added some more details with respect to the linguistic application of NAS in the introduction.
However, we take a different perspective on the application of NAS to the linguistic phenomenon. The main goal of this paper is to explore the resulting architecture out of NAS application in various linguistic data that contain different syntactic operations. The previous research of NAS application to linguistic data focused on the improvement of accuracy compared to existing language models [23]. However, as they noted in the article, the research is somewhat limited as NAS does not provide a better language model. In this experiment, we will compare the resulting architecture of the Korean grammaticality judgment dataset to the architecture of the English grammaticality judgement dataset. Given that Korean and English have very different linguistic properties, we predict that NAS will generate different architecture that is fitting to each dataset. The prediction is borne out.
- The research methods section is well written. ;
(Answer). : We appreciate this.
- Page 9: " Rectified Linear Unit (ReLU) functions as an activation function." - In recent studies, leaky ReLU is used as an activation function for deep learning due to the limitations of ReLU. However, the method of this study analyzed with ReLU is not wrong. I suggest that the authors add this to the study's limitations. ;
(Answer): As you mentioned, we added explanation why we use ReLU function.
It has one hidden layer with 5 nodes and 4 additional links between the hidden layer with ReLU as an activation function and the output layer, instead of leaky ReLU, because of calculation speed.
- Authors should add study limitations to the discussion section.;
(Answer) As you mentioned, we have added the limitation in the discussion, as follows.
The limitation of this research needs to be clearly stated. The first issue is the volume of the database. Since the entire database has to be checked manually by individual linguists, it requires more time to expand the data. We predict NAS is sensitive to the syntactic operations thus the volume would not affect the result, yet we still need to expand the database. The second issue is to develop a methodology to compare resulting structures, and to understand the implication of it. We plan to add a third language to this experiment to investigate this issue.
Author Response File: Author Response.doc
Reviewer 2 Report
The objective of the paper is extending NAS, in an experimental set already applied to Korean, to English.
Being the research rationale very simple, quality of presentation becomes essential.
And actually it is not good.
There are too many repetitions, syntactic errors, and some erong claimings.
Being the manuscript without line numbering, suggestions are laborious to be given.
However, here some detailed comments of mine:
- Introduction:
- you must asses that what you are considering is comparable (for comparable K amd E syntactic phenomena)
- two tasks: but they are not 2 tasks, rather 1 task applied to 2 languages
- researchers have suggested an automated design process: here you must add “that” can be efficient
- we predict that NAS will generate different architecture: architectures
- The prediction is borne out: too colloquial (is confirmed)
- Table: caption missing
- section 2
- resented (p)
- example 1: you say that there are other 2 possible combinations, thus it is useful to have both of them. In addition, for 1b, you should give complete glosses as well
- who are native to K language: is not used. who are K native speakers
- 2.3: it has the same layers with initial architecture: no sense
- Section 3
- you must give an explanation for 1, 2, 3, 1 (according to the grammatical category, noun, verb, adjective, noun)
- Section 4
- The grammar of English: The syntactic structure of English (grammar is a more wide concept)
- whereas English allows the verb to appear anywhere: this is wrong.
- You can say “with a major degree of placement for English “
- epoch is 20: with 20 epochs
- References: for [1] I suggest you to give the published version,
- by adding the preprint id (2019. Journal of machine learning research, 20, 55, 1-21.)
Author Response
- You must asses that what you are considering is comparable (for comparable K amd E syntactic phenomena).;
(Answer) First of all, we appreciate the valuable comments for this paper.
- Two tasks: but they are not 2 tasks, rather 1 task applied to 2 languages.;
(Answer) As you mentioned, we have changed the term to ‘two experiments’
- Researchers have suggested an automated design process: here you must add “that” can be efficient:
(Answer) According to the reviewer’s comments, we have added the word in this paper.
>>
- We predict that NAS will generate different architecture: architectures.:
(Answer) According to your comment, we have corrected the paper.
>>.
- The prediction is borne out: too colloquial (is confirmed).:
(Answer) According to your comment, we have corrected the paper.
>>
- Resented (p).:
(Answer) According to your comment, we have corrected the paper.
- Example 1: you say that there are other 2 possible combinations, thus it is useful to have both of them. In addition, for 1b, you should give complete glosses as well.:
(Answer) Thanks for your comments: we have corrected the paper.
>>
- Who are native to K language: is not used. who are K native speakers.:
(Answer) According to your suggestions, we have corrected the paper.
>>
- Who are native to K language: is not used. who are K native speakers.:
(Answer) According to your suggestions, : According to your suggestions, we have corrected the paper.
>>
- 3: It has the same layers with initial architecture: no sense.:
(Answer) According to your suggestions, : we have corrected the sentence as follows.
>> . It has the same number of layers with initial architecture.
- You must give an explanation for 1, 2, 3, 1 (according to the grammatical category, noun, verb, adjective, noun).:
(Answer) According to your suggestions, we have added the relevant information as follows.
>> The data are expressed as a combination of some digits, according to their grammatical categories (1: Noun , 2:Verb, 3: Adjcetives, etc). For example, the sentence ‘John likes beautiful Mary’ is expressed as ‘1, 2, 3, 1’.
- The grammar of English: The syntactic structure of English (grammar is a more wide concept) whereas English allows the verb to appear anywhere: this is wrong. You can say “with a major degree of placement for English”.:
(Answer) According to your suggestions, :
>> On par with your advice, we have added the following sentence as shown below:
the verbs in Korean must come at the end of the sentence, whereas English allows the verb to appear with a major degree of placement
- epoch is 20: with 20 epochs.:
(Answer) According to your suggestions, : we have corrected the phrase.
>>
- References: for [1] I suggest you to give the published version, by adding the preprint id (2019. Journal of machine learning research, 20, 55, 1-21.):
(Answer) According to your suggestions, : we have changed the reference.
Author Response File: Author Response.doc
Reviewer 3 Report
Developing language specific models using Neural Architecture Search (NAS) is a non-trivial and very important task from the application point of view. The more that the methods were compared to Korean and English due to judgment tasks. In addition, two architectures from Korean and English were compared. Besides, NAS has generated different models for Korean and English, which has different syntactic operations.
However, there are many (too many) inaccuracies in the work. It should be corrected once more by paying attention to the following points.
1. Literature on the subject should not be cited in the Abstract;
2. The introduction is too long, and it does not contain a purpose for work and contributions;
3. 'Image recognition' -> 'image recognition';
4. The examples on page 2 should be deleted;
5. Table 1 and the whole paper should be better formatted; there are colorful words in the text;
6. The VCGA algorithm is not described. It is not known what the mutation is, what is the crossover, what are the control parameters of this algorithm?
7. Also, the very experiment with AR has not been described in detail. Which function is optimized? What is the fitness of the VCGA algorithm? Is there access to datasets?
I think that after reading the article carefully by the Authors and introducing the necessary changes, I can once again review this work, which undoubtedly has considerable scientific potential.
Author Response
- Literature on the subject should not be cited in the Abstract;.;
(Answer) First of all, we appreciate the valuable comments of the reviewer. As you mentioned,: we have deleted the references
- The introduction is too long, and it does not contain a purpose for work and contributions.;
(Answer) It is difficult for us to completely change the introduction. One reviewer mentioned that this introduction is well written, while the reviewer points out it is too long. We ask the reviewer to reconsider the first paragraph of the introduction with respect to this issue.
We show an interesting result on the application of a modified neural architecture search (NAS) in [7] to linguistic tasks (grammaticality judgment) for Korean and English syntactic phenomena. Based on the previous research on this subject in[4], we show that the extension of the NAS method to English grammaticality tasks provides a different architecture from the one generated for Korean dataset. This is rather unexpected given the similarity of input data. The major contribution of this paper is to show that the previous application of NAS to linguistically complex datasets of Korean [4] can be extended to the linguistic phenomena of English. Notably, the different resulting architecture in these two experiments clearly indicates that the NAS method is sensitive to the different word order that contains multiple syntactic operations.
- 'Image recognition' -> 'image recognition:
(Answer) According to your advice, we have changed the capitalization.
>>
- The examples on page 2 should be deleted;.:
(Answer) With respect to this issue, another reviewer pointed out that all the examples must be presented. We also believe that presenting the essential information for this experiment in the introduction is essential.
>>
- Table 1 and the whole paper should be better formatted; there are colorful words in the text.:
(Answer). : We have corrected the paper as you advised.
- The VCGA algorithm is not described. It is not known what the mutation is, what is the crossover, what are the control parameters of this algorithm?:
(Answer) As you mentioned,: we have added simple description for operators of the VCGA. Details about VCGA is presented in our previous paper “Variable chromosome genetic algorithm for structure learning in neural networks to imitate human brain”
>> These operators change hyper parameters such as composition of layers, linkage, the number of nodes, activation function, and etc [7].
- Also, the very experiment with AR has not been described in detail. Which function is optimized? What is the fitness of the VCGA algorithm? Is there access to datasets?:
(Answer) AS you mentioned, we have added the description.
>> The proposed NAS algorithm searches neural architectures using VCGA [7] which optimizes overall structure including composition of layers, connections between layers, the number of nodes and activation function using input neural networks. In order to optimize initial neural network, we use the number of chromosomes and loss value of generated neural networks as fitness value. And generated neural networks use Korean and English datasets.
Author Response File: Author Response.doc
Round 2
Reviewer 2 Report
The integrations improved the paper.
There are a couple of non-English expressions:
- and ect. (remove “and”)
- the paragraph added in the final section is not easy to be understood:With database I suppose you intend dataset,in addition it is not clear whether the volume (better “size “) would affect or not the results
Author Response
We acknowledge reviewer’s helpful comments and suggestions. They are very important for the improvement of the quality of our manuscript. All the comments and suggestion are incorporated into the revised manuscript or explained why not.
- and ect. (remove “and”).;
(Answer) we have corrected it.
- the paragraph added in the final section is not easy to be understood:With database I suppose you intend dataset,in addition it is not clear whether the volume (better “size “) would affect or not the results.;
(Answer) Dear reviewer, we tried to improve the readability of the last section as follows:
The first issue is the size of the dataset. Since the entire database has to be checked manually by individual linguists, it requires more time to expand the data. We predict NAS is sensitive to the syntactic operations thus the size would not affect the result, yet we still need to expand the dataset to confirm the resulting architecture
Author Response File: Author Response.doc
Reviewer 3 Report
The authors corrected most of my comments from the previous review. However, there is still room for improvement.
1. Lack of a scientific purpose in the introduction.
2. Minor editorial errors:
page 2: the sentence 'To our knowledge, the finding sheds new light on the research of language modeling since the automation of architecture is sensitive to
the grammatical information underlying the word order of languages. ' should be written in the same paragraph.
b) reference to table 3 (p. 7) is after its occurrence in the text (p. 8).
c) 'mutant rate' => 'mutation rate'.
3. Loss function and fitness function are not still defined. There are fundamental functions in your experiments and should be clearly defined, even if it has been done in other works.
Author Response
We acknowledge reviewer’s helpful comments and suggestions. They are very important for the improvement of the quality of our manuscript. All the comments and suggestion are incorporated into the revised manuscript or explained why not.
- Lack of a scientific purpose in the introduction.;
(Answer) Dear reviewer, we have added the purpose as follows:
The scientific purpose of this paper is to develop language models using NAS.
- Page 2: the sentence 'To our knowledge, the finding sheds new light on the research of language modeling since the automation of architecture is sensitive to the grammatical information underlying the word order of languages. ' should be written in the same paragraph.;
(Answer) we have corrected the error
- Reference to table 3 (p. 7) is after its occurrence in the text (p. 8).:
(Answer)
>> Dear reviewer, we intentionally positioned the table in p.8 for the ease of exposition. If you think it is mandatory to change the order, we will. But for now, we believe this arrangement improves the readability of this paper/
- 'mutant rate' => 'mutation rate'.:
(Answer) As you mentioned, we have changed the term to 'mutation rate'.
- Loss function and fitness function are not still defined. There are fundamental functions in your experiments and should be clearly defined, even if it has been done in other works.:
(Answer) As you mentioned, we added definition of fitness function and loss function.:
We use fitness function to determine next generation on genetic algorithm. Fitness function (equation (1)) is defined as follows:
Parameter |
Value |
Population |
50 |
Generations |
30 |
Mutation rate |
0.05 |
Cross-over rate |
0.05 |
Non-disjunction rate |
0.1 |
Learning rate |
0.01 |
Loss function |
MSE Loss |
Author Response File: Author Response.doc