Next Article in Journal
A Simple and Compact Laser Scattering Setup for Characterization of a Pulsed Low-Current Discharge
Next Article in Special Issue
Shannon Entropy Analysis of Reservoir-Triggered Seismicity at Song Tranh 2 Hydropower Plant, Vietnam
Previous Article in Journal
Complex Monitoring of Vertical Land Motions Corresponding to Geological Structure of Coastal and River Areas in Northwestern Poland
Previous Article in Special Issue
Shannon (Information) Measures of Symmetry for 1D and 2D Shapes and Patterns
 
 
Article
Peer-Review Record

A Possible Information Entropic Law of Genetic Mutations

Appl. Sci. 2022, 12(14), 6912; https://doi.org/10.3390/app12146912
by Melvin M. Vopson
Reviewer 1:
Reviewer 2:
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Appl. Sci. 2022, 12(14), 6912; https://doi.org/10.3390/app12146912
Submission received: 7 June 2022 / Revised: 1 July 2022 / Accepted: 6 July 2022 / Published: 8 July 2022
(This article belongs to the Special Issue Shannon's Information Theory and Its Applications)

Round 1

Reviewer 1 Report

This is quite an interesting paper and certainly timely given the recent pandemic and the concerns about mutation rates and detecting mutations. This seems like a useful technique for try to detect mutations given an already known reference genome. One could imagine calculating spectra for various window sizes and using the obtained IE ratio to identify which windows to focus upon in the search for mutations. That might speed up search speeds though I imagine if the entire genome is known, simply point by point comparisons would be fastest. So the issue of whether mutations can be predicted is a much more interesting possibility. I don't readily see how mutations can be predicted beforehand given a reference genome. Is there any evidence suggesting a relationship between the IE value of a window prior to the appearance of a mutation and the probability of it undergoing a mutation subsequently.

I would certainly think that the comments on the applicability of these results need to be strengthened. In addition, this study was based upon mutation patterns for just a single virus.  It needs confirmation over observations on multiple viral species. The Coronaviruses seem to be capable of expressing very high mutation rates - perhaps there is something specific to the Coronaviruses which influences how and how frequently mutations occur. If other viruses exhibit different patterns, perhaps that could shed some light into differential mutation patterns produced by different species. The focus on just a single species seems to weaken the generalizability of the results.

Author Response

Thank you very much for the positive and constructive comments. The main comments are revolving around the question of how mutations can be predicted beforehand given a reference genome. The reviewer is asking whether is there any evidence suggesting a relationship between the IE value of a window prior to the appearance of a mutation and the probability of it undergoing a mutation subsequently.

Answer

The observed correlation between the IE and the mutations, which triggered the formulation of the governing law of genetic mutations, combined with the observation that this is applicable not only to the whole genome, but also to segments of it, called windows, is the pathway to achieving this. There is no evidence yet, because this is the first time such result is reported, but in section 3 we discuss how this might be achieved, also acknowledging that this is beyond the scope of this manuscript.

Text included in the manuscript:

"The careful application of this law facilitates a possible predictive approach to future genetic mutations, before they take place. This is a matter of future studies and beyond the scope of this report, but a possible way of taking advantage of the observed law could be a methodology that involves splitting the entire genome into subsets, called “windows”."

The reviewer also pointed out that this study was based upon mutation patterns for just a single virus.  It needs confirmation over observations on multiple viral species.

Answer

Indeed this is a valid argument and a new paragraph (see below) has been added in the conclusions section. The paragraph clarifies this and I also acknowledge the need of further validation of the observed results.

"It is also important to mention that this study is based on a single RNA virus family and the proposed governing law of genetic mutations is a strong generalization. However, the governing law of genetic mutations is a particular case of a new physics law, called “the second law of infodynamics” [26], which states that information entropy of all information systems decreases over time. While the second law of infodynamics underpins the generalization of these results to all biological systems, we acknowledge that further studies on other organisms are necessary to validate these results. We therefore hope that this work will stimulate future studies leading to some form of predictive algorithm of genetic mutations. "

Thank you very much for your report!

Reviewer 2 Report

The DNA or RNA genetic molecule is a highly complex information encoding system, and genetic mutations is a kind of biological information encoded error. It is important to develop a mathematical – physical approaches to predict and quantify the dynamic characteristics of genetic mutations. The current work proposed an information entropic law of genetic mutations based on based on the Shannon’s information theory and the concept of information entropy, and Information Entropy of SARS-CoV-2 genomes was analyzed. Generally, the topic seems interesting, however, some problems should be addressed.

 

1) The author is to propose an information entropic law of genetic mutations, which is a complex mathematical – physical system. However, limited cases were discussed in the current work. Therefore, the author could discuss more specified problems, instead of a law at the current stage.

2) How the author could validate the result and conclusion?

3) More cased is suggested to be tested.

4) The whole paper were not well written.

Author Response

Thank you very much for your positive and constructive comments. The 1st  reviewer had similar comments. In particular, he/she questions the validation of these results and the generalization based on a limited number of cases studied here. This is fully acknowledged in the manuscript now, and the changes made in response to the first reviewer address the 2nd reviewer’s comments, too. Even the title: “A possible information entropic law of genetic mutations” acknowledges the need for further validation. However, reporting the existing results is the only way to advance our knowledge.

A new paragraph (see below) has been added in the conclusions section. The paragraph clarifies this and I also acknowledge the need of further validation of the observed results.

"It is also important to mention that this study is based on a single RNA virus family and the proposed governing law of genetic mutations is a strong generalization. However, the governing law of genetic mutations is a particular case of a new physics law, called “the second law of infodynamics” [26], which states that information entropy of all information systems decreases over time. While the second law of infodynamics underpins the generalization of these results to all biological systems, we acknowledge that further studies on other organisms are necessary to validate these results. We therefore hope that this work will stimulate future studies leading to some form of predictive algorithm of genetic mutations."

Thank you very much for your report!

Reviewer 3 Report

The main problem with the paper under review is the lack of mathematical rigour . The problem is stated in  so vague terms that it is impossible to clearly and exactly understand the author argument. I will underline some weaknesses as examples. The author introduces the state space Y = (A,G,T,C) and the probability distribution p = (p_A,p_G,p_T,p_C) and the Shannon entropy of p, that is H(p) = sum p_i ln p_i. A sequence (DNA chain) can be seen as a discrete random process taking values in Y. In the paper it is not explained how the entropy of the sequence is computed. The formula NxH(p) does not make any sense because it is a constant.

A second point is that the authors says that the entropy of a sequence is computed using a software (GENIES) but there are no informations on how this  entropy is computed. This is a key point because it is stated that different sequences (mutated)  have different entropy. 

In short, if these two points are not clearly explained using neat mathematical language ( not like "observing the set of events",  or  " the information extracted from the observation..") one can not trust the conclusions of the paper. 

So potentially the problem addressed in the paper is interesting but there is much work to do.

 

Author Response

Response attached to this email!

Thank you very much.

Author Response File: Author Response.pdf

Reviewer 4 Report

I was rushed by the Editor, hence a short, cursory review.

 

Unfortunately, I don't find the paper suitable for publication.

 

The author is perplexed by the apparent drop in information entropy of SARS-CoV-2 genomes with time, and attributes this phenomenon to some mythical novel law of nature. This is, however, trivially because SARS-CoV-2 virus is in a phase of niche shift (change of host), and simply converges to a new stationary state. This can be observed in basically any optimization performed by a genetic algorithm, including the canonical Dawkins weasel problem.

 

The author furthermore identifies non-uniform distribution of mutations, and cites this as a counter-example of the alleged Darwinian law that it should be uniform. There is no such thing; it is a common sense reasoning that replication machinery tries to copy exactly, but makes random errors. Still, no molecular biologist would be shocked by the news that replicases could make some classes of errors more likely than others.

Though, the author's observation has an even simpler explanation --  the analyzed sequences are not uniformly sampled from all SARS-CoV-2 RNAs in existence, but, due to survivor bias, sampled from the current most fit variants. It is obvious that function-preserving mutations in epitope parts of the S protein are more likely to be observed than function-disrupting ones in the replicase, even when they are exactly as likely to happen during replication.

 

There are also methodological errors.

First of all, author suggests an universal law based on 17 specimen of a single species over a two year period -- this is vastly insufficient, we have tons of sequenced genomes from many species, including fast-evolving ones, it is a necessity to check universal claims on a substantial sample of them.

Moreover, the entropy estimation is based on a bag-of-words approach which ignores the impact of non-local interactions, which are known to be very important both for expression and replication, especially for RNA genomes. Author should analyze this aspect thoroughly, or at least note this is a limitation.

 

The idea of analyzing genomes via information theory measures is not novel, given for instance https://doi.org/10.1038/srep01033 (the first search engine hit I got).

 

 

Author Response

It is regrettable that the reviewer finds this article not suitable for publication, which is in contradiction to all previous 3 reviewers. The language used is also regrettable….i.e. perplexed, mythical law of nature, etc….

 

The fact that the information entropy decreases consistently when a mutation takes place it is not a perplexing fact. It is an interesting and unique scientific observation.

 

This is in the context of a more universal physics law, similar to the second law of thermodynamics, but called second law of infodynamics. The second law of information dynamics is a peer reviewed and accepted (in-press) article, not a mythical law. My collaborator and I demonstrated how all information systems tend to lower information entropy over time.

The observation reported here is just a particular example of this more universal entropic law.

The reviewer mentions this being trivial, “because SARS-CoV-2 virus is in a phase of niche shift (change of host), and simply converges to a new stationary state”.

However, why this new state is always at lower entropy? Does not reviewer find this interesting?

What would be trivial is to find the information entropy jumping around randomly when mutations occur. Instead, a linear trend is observed.

The other comments of the 4th reviewer converge with the previous reviewers and have already been addressed as detailed above.

Anyway, thank you very much for your report.

Round 2

Reviewer 3 Report

The author has addressed my concerns and now the mathematical content of the paper is more transparent and readable. The author find experimental evidence that the entropy of a sequence decreases when mutations are present. However, since an increase in entropy due to a mutation is possible, as it easy to check mathematically, one must conclude that the   law stated by the author hold probabilistically and not deterministically. While the result presented in the paper is interesting, it calls for a mathematical justification.

Author Response

Thnak you very much for your report. We totally agree that an increase in entropy is possible after a mutation occurs, but this is the case for the IE of the "windows" only. What we noticed is that the overall IE of the whole genome always decreases. This is now clearly stated in the manuscript on page 7, just above the Table 2:

"Although in this example all four mutations resulted in a decrease of the IE values of their containing windows, it is important to mention that not all mutations display this behavior. Indeed, the majority of the mutations occur so that the IE value of their containing window decreases, but in some cases only 70-80% of the mutations obey this information entopic rule, especially for genomes that suffered a large number of mutations. When applied to the entire genomes, the observed entropic law is always fully applicable."

Reviewer 4 Report

The author failed to address my reservations, hence my opinion has not changed.

Please verify with more species & simulations, validate entropy estimation, then I would be able to reconsider.

 

 

Author Response

Thank you very much for your report. The author would like to stress the appreciation of reviewer's position. The most important thing in science is the scientific debate and the ability to freely agree / disagree with each other.

However, when disagreement occurs, it is also beneficial to share the new results with the wider academic / scientific community and to allow these results to be scrutinized. If they check out, then we make some small progress. If they get disproved, then we also make progress as we can focus attention on other research.

I also agree with the reviewer on the point that this generalization is too strong, being based on only one study. This is clearly stated in the conclusions section:

"It is also important to mention that this study is based on a single RNA virus family and the proposed governing law of genetic mutations is a strong generalization. However, the governing law of genetic mutations is a particular case of a new physics law, called “the second law of infodynamics” [26], which states that information entropy of all information systems decreases over time. While the second law of infodynamics underpins the generalization of these results to all biological systems, we acknowledge that further studies on other organisms are necessary to validate these results. We therefore hope that this work will stimulate future studies leading to some form of predictive algorithm of genetic mutations. "

Back to TopTop