1. Introduction
When forensic DNA testing reveals a concordance between a crime scene sample and a person of interest’s (POI) DNA profile, it is necessary to provide a statistic to evaluate the strength of the correspondence or the weight of the evidence.
The likelihood ratio (
LR) is acknowledged as the most powerful and relevant statistic used to calculate the weight of DNA evidence and is recommended by the DNA commission of the International Society of Forensic Genetics (ISFG) in forensic DNA mixture interpretation [
1].
The
LR is a ratio of two conditional probabilities, probability densities, or numbers proportional to them. The
LR is not exclusively used for the interpretation of forensic DNA evidence. It is used to assign the weight of evidence for other forensic evidence and used in many other situations in statistics. It follows from Bayes’ theorem where the odds form is:
where
E represents the evidence,
I represents relevant background information, and
Hp and
Hd (or
Ha) represent alternate hypotheses or propositions. Bayes’ theorem follows directly from the laws of probability and can be expressed in words as follows:
Posterior odds = likelihood ratio × prior odds.
An LR greater than one means the DNA evidence supports the proposition given in the numerator. An LR less than one means the evidence supports the alternate proposition given in the denominator. In forensic casework, the LR in Bayes’ theorem is typically written as shown above, with the probability of the evidence given the prosecution hypothesis forming the numerator and the probability of the evidence given the defence hypothesis as the denominator.
The prosecution proposition (Hp) is generally known and straightforward to apply, especially when only one POI is being considered. The defence are under no requirement to offer a proposition, and often they do not. If the defence proposition is available, then that should be selected. If not, a sensible ‘alternate’ proposition consistent with exoneration should be chosen. Hence, the use of Ha for an alternate proposition can be a preferred descriptor.
There is a well-established hierarchy of propositions that are informed by the evidence being assessed. The original three levels within the hierarchy are offence, activity, and source-level propositions [
2]. Forensic DNA evidence is typically evaluated at the sub-source or sub-sub-source level within the hierarchy [
3,
4]. Within this paper, we discuss
LRs assigned using sub-source level proposition sets. Below, we give an example of a sub-source set of propositions for a two-person mixed DNA profile considering one POI as a contributor (set one):
Set one, the simple proposition pair, sub-source propositions (LR for a single POI, no conditioning):
Hp: The DNA originated from the POI and one unknown individual, unrelated to the POI
Ha: The DNA originated from two unknown individuals, unrelated to the POI or each other
The propositions assigned in a case should be mutually exclusive, address the issue of interest and be close to exhaustive in that they take account of relevant case information and ensure no reasonable consideration is omitted [
4,
5]. The propositions considered must be plausible or sensible within the known framework of circumstances. The use of non-sensible propositions can lead to misleading
LRs [
6,
7].
If one is transparent about the information that has been used to form the propositions and willing to consider a re-evaluation of the findings given different propositions, should the information change, then this approach is robust.
A simple proposition pair is where no more than one POI considered within Hp is replaced with an unknown individual within Ha. Proposition set one above is an example of a simple proposition pair.
In the case of circumstances where there is more than one POI, there are multiple propositions that may be considered both under
Hp and
Ha. Consider a two-person mixture where two POI both give inclusionary
LRs using a simple proposition pair. In this case, it is prudent to test whether these POI could explain the profile when considered together. This could be undertaken using a compound proposition pair, defined as one where more than one POI within
Hp is replaced with unknown donors in
Ha ([
8], hereafter the ASB (American Standards Board) draft standard and see also [
9,
10]).
Set two, the compound proposition pair, sub-source propositions (LR for all POI together, no conditioning):
Hp: The DNA originated from POI1 and POI2
Ha: The DNA originated from two unknown individuals, unrelated to either POI or each other
Although this proposition pair is highly effective in assessing whether both POI could be donors together, reported without the simple LRs for each individual, it can appear to greatly overstate the weight against a POI who gives a small inclusionary or uninformative LR when considered individually but who is carried in the compound LR by the much stronger other donors to the mixture.
Another form of proposition pair assumes the contribution of all POIs under
Hp and
all but one POI under the alternate proposition. We cannot find a definition of this proposition pair in the ASB draft standard [
8], although this appears to come under clause 4.5.b, where they are described as a variant of the simple proposition pair. We will term these conditional proposition pairs. If the contribution of all POIs is supported by the observations, then the
LR for such a conditional proposition pair is a good approximation of the exhaustive
LR, as described by Buckleton et al. [
7] (their Equations (7a) and (7b)).
Set three, the conditional pair, considering POI 1 for a four-person mixture (LR for a single POI, uses conditioning profiles):
Hp: The DNA originated from POI1, POI2, POI3, and POI4
Ha: The DNA originated from POI2, POI3, POI4, and one other individual, unrelated to POI1, POI2, POI3, and POI4
Three additional conditional
LRs would subsequently be assigned considering POI
2, POI
3, and POI
4. This isolates the evidence for the contribution of each POI in turn. Note that there are other possible combinations of conditional propositions when considering mixtures of more than two individuals. For example, conditioning on only one or two known contributors within a four-person mixture. These partial conditioned
LRs are not calculated within this paper but are explored by Duke et al. [
9] (see, for example, the study’s Table 4).
Given these types of scenarios, the assignment of a compound
LR is advised by the ASB draft standard [
8]. However, as these
LRs may overinflate the evidence, they advise that the
LRs derived from simple proposition pairs are the ones reported and not the compound
LR unless this is exclusionary.
Bright and Coble [
11] report that for individuals who are well-represented in the mixture, the logarithm of the compound
LR is approximately the sum of the logarithm of the
LRs for each of the known contributors considering simple proposition pairs. This is only approximately true and then only for true donors. Within this research, we investigate the behaviour of
LRs assigned for known and non-contributors to a set of mixed DNA profiles using compound, conditional, and simple proposition pairs. We demonstrate that compound likelihood ratios can be obtained as the product of conditional likelihood ratios. We also demonstrate that, on average, conditional
LRs result in higher
LRs for a true donor and more exclusionary
LRs for non-contributors than their equivalents using simple proposition sets.
3. Results
For each mixture, the compound log10LR assigned in STRmix™ was the same as the sum of the conditional log10LRs and one simple log10LR assigned in DBLR™ (the log10(LR) was compared to six decimal places). This is the expected result.
A summary of the sub-source
LRs assigned using the simple proposition set and compound proposition set for the Lab A and Lab B mixtures is given in
Figure 1.
LRs using simple proposition sets and the true donors are given as stacked columns where the
LR for each contributor is given as a different colour. The compound
LR is given for each mixture as a red asterisk.
Exclusions (
LR = 0) were obtained for nine of the 32 mixtures using a compound proposition set and the true donors. These included one four-person mixture and all eight five-person mixtures. This is not unexpected when there are multiple unknown contributors. The sample space is so vast that it can be inadequately sampled by the number of default accepts. These profiles were re-interpreted in STRmix™ with ×10 or ×100 the default accepts (100,000 or 1,000,000 burn-in and 500,000 or 5,000,000 post-burn-in accepts) per chain to better explore the probability space in the deconvolution (see
Appendix A). Following reinterpretation, compound
LRs > 1 were assigned for all nine mixtures. These are the results shown in
Figure 1.
Inspection of
Figure 1 shows that the compound log
10(
LR) were larger than the sum of the individual log
10(
LR)s using the simple proposition set for each known contributor for all but one sample. This is more pronounced for the high-order mixtures (N = 3 and greater). This is an overrepresentation of the weight of evidence against each individual contributor.
The five-person mixture (Lab B, sample number 3), designed with donor ratios of 10:2:2:1:1 and with a 100 pg template for the lowest contributors interpreted using ×100 accepts resulted in a compound log10(LR) that was less than the sum of the individual log10(LR)s (52.26 versus 57.47). The mixture proportions assigned by STRmix™ were 64%, 16%, 11%, 8% and 1%. The contributor position with the highest LR for two of the contributors to this mixture using simple proposition sets differed from the contributor order they aligned with for the compound LR. The sub-sub-source LR for one contributor was approximately 20 times lower in its compound LR position. The sub-sub-source LR for the other contributor was around 17 orders of magnitude lower. This contributor best aligned in the third contributor position using simple propositions with an approximate mixture proportion of 11% but was aligned as the trace fifth contributor with an approximate mixture proportion of 1% using the compound proposition set. This individual is one of the two lowest template donors. Their alignment in the third contributor position using simple propositions is likely due to the presence of a D2S1338 18.3 peak not originating from any actual donor and likely drop-in, which is favoured as an allele for the fifth contributor, and also given the amount of allele sharing between donors. The sum of the individual log10(LR) for each donor with simple propositions when in their experimentally designed contributor positions was 40.67.
3.1. Conditional LRs
A plot of the log
10(
LR) assigned for the true donors to the 32 mixtures using the simple proposition set (per contributor) versus the conditional log
10LRs (alternatively described as Slooten and Buckleton et al.’s approximation to the exhaustive (
LR)) is given in the top pane of
Figure 2. The
LRs assigned given conditional propositions were larger than the
LRs assigned using simple proposition sets for the same POI for all but one comparison. This was the five-person mixture from Lab B, sample number 3, discussed above. The data points for samples on the
x =
y line are for mixtures that were fully or close to fully resolved, and conditioning did not add any extra information to the interpretation.
A plot of the log
10(
LR) assigned for the mixtures using compound propositions versus log
10(
LR)s for the conditional propositions is given in the bottom pane of
Figure 2. The
LRs assigned given conditional propositions were smaller than the
LRs assigned using compound proposition sets. The data points at [~28, ~0] and [~28, ~27] and indicated as filled data points in
Figure 2 are considering two different POIs contributing to the same mixture. The major is (almost) fully resolved, whereas the minor is very ambiguous. The major carries the minor in the log
10(
LR) considering compound propositions. When conditioning on the major (in the approximation of exhaustive propositions), no information is gained in relation to the minor’s genotype. Vice versa, when conditioning on the minor, no information is gained in relation to the major’s genotype.
3.2. Non-Contributor Tests
3.2.1. Compound Propositions
The twelve non-contributors (two for each of the six mixtures tested in
Section 2.4.1) that had previously given inclusionary
LRs using simple proposition sets resulted in exclusions (
LR = 0) when using compound propositions, where they replaced, one by one, each of the true donors in the proposition.
3.2.2. High-Risk Database, Simple and Conditional Propositions
A plot of log
10(
LR) given a simple proposition set versus the template assigned in STRmix™ (in rfu) for the high-risk database of non-contributors is given in
Figure 3. Overall, 56% of comparisons were exclusions (
LR = 0) and are plotted around log
10(
LR) = −40 in
Figure 2.
A plot of log
10(
LR) given a conditional proposition set versus the template assigned in STRmix™ (in rfu) for the high-risk database of 1000 non-contributors is given in
Figure 4. The conditioned individual(s) was a known donor, and the POI was a database individual. Over 99% of comparisons resulted in
LR = 0.
Compound log
10(
LR) values for non-contributors within the high-risk database, which resulted in
LR > 0 when assigned using a conditional proposition set, are plotted against the corresponding conditional log
10(
LR) values in
Figure 5. The compound
LR is always greater than the conditional
LR for the non-donors.
4. Discussion
The conditional
LR showed that, on average, the
LR assigned to true donors was larger than the
LR assigned using simple proposition sets for the same POI. This is because conditioning on another true donor adds information to the interpretation allowing for better resolution of the remaining genotypes. This is the known effect of conditional
LRs. The data points on or about the line of equality in
Figure 2 (top pane) are profiles that were fully resolved (or close to fully resolved), where conditioning on a contributor did not add extra information to the interpretation. The conditional
LR was always lower than (or equal to) the
LR using compound propositions (
Figure 2 bottom pane).
The rate of adventitious matches for high-risk non-contributors created by sampling alleles from known contributors was significantly higher when using simple proposition sets compared with conditional proposition sets (
Figure 3 versus
Figure 4). Conditional
LRs have an increased power to differentiate between true and false donors. Conditioning on a true donor should increase
LRs for other true donors (as demonstrated in [
15]) and lower them for false donors. This is demonstrated again within this work. The high-risk non-contributors represent a ‘worst case’ scenario not typically encountered in casework other than when mixtures of relatives are involved.
In relation to simple proposition sets, Slooten states [
14], “The hypotheses for
only use the person of interest under investigation here. It may seem at first sight as an unbiased way to present the evidence, not using any other POI whose contribution is also disputed as assumed contributors. But it is easily overlooked that in fact one then assumes that the other POI did not contribute, and that this assumption is not at all supported by the data.” The simple proposition under
Hp might also not represent the most logical scenario for the prosecution given the case circumstances.
The assignment of a compound
LR is a natural extension if multiple POIs give inclusionary statistics when using simple proposition sets. We have shown that the logarithm of the compound
LR is the sum of conditional log
10(
LR) and a simple log
10(
LR) for the individual contributors. However, the compound
LR is only useful as a test of whether two or more POI can both be donors. In the overwhelming majority of cases, they are an inappropriate expression of the weight of evidence for any individual donor and may be too high or too low. Compound proposition sets have a higher chance of both false inclusionary support (non-donor carried by strong
LRs of other donors), as shown in
Figure 5 and false exclusionary support (
LR = 0 due to the vast sampling space and computing limitations).
In general, if multiple POIs can be included in a mixture individually, and the ground truth is that all POIs have contributed, we expect the compound log
10(
LR) to be greater than the sum of the log
10(
LR)s assigned using simple proposition sets (
Figure 1). Mixtures with the greatest ambiguity (or least well resolved) will typically have the greatest difference between the compound log
10(
LR) and the sum of the individual simple log
10(
LR)s (refer to
Appendix B). This is because, in the compound
LR, the
LRs for the individual contributors (for example,
LR1 and
LR2 for a two-person mixture) are not independent. Conditioning on a POI adds information to the interpretation, reducing the number of genotype combinations possible for the remaining contributor/s.
Fully resolved mixtures are a special case where the compound log10(LR), the sum of conditional log10(LR)s, and the sum of the simple log10(LR) for each true donor POI will all be equal, as long as sub-sub-source propositions are considered. This is because when the mixture is fully resolved in the compound LR calculation and are now independent, i.e., conditioning on a POI being present does not add any extra information to the calculation.
We have demonstrated that, for some samples with a high number of contributors, the compound
LR is zero even though for each true donor POI the simple
LR is inclusionary. In these samples, the genotype combinations of all true donor contributors individually were accepted at least once across the posterior burn-in iterations, but the genotype combination explaining all true donor contributors in combination was not accepted within one iteration. This is not unexpected and arises because the sample space is vast. In these cases, we recommend the use of extended MCMC accepts within the interpretation. This allowed for more time to explore the sample space and was also a finding of Duke et al. [
9].
Where it is necessary to determine if multiple POIs could together be donors to relatively high template complex mixtures comprising four or five contributors, this may require additional MCMC accepts to fully explore the range of possible genotype combinations at each locus. The additional accepts may allow a wider range of genotype combinations to be accepted, thereby preventing an exclusion.
5. Conclusions
When assigning LRs in forensic casework, an analyst may have some idea of the most appropriate prosecution proposition but very rarely has knowledge of the most appropriate defence proposition. In the absence of this information, a reasonable set may be selected in a way that maintains the legitimate interests of the defence. This can be informed by case circumstances. An understanding of the performance of the LR under certain proposition sets can also help an analyst make this decision.
It may be worthwhile benchmarking two of the recommendations in the draft ASB standard; recommendations 4.4 and 4.5 [
8]. We note that these recommendations are in draft.
Recommendation 4.4: A profile should be assigned as a conditioning profile to a mixture when an individual is identified as an intimate contributor or when it is reasonable to assume their presence based on case-specific information and the associated data supports the assumption. The conditioning profile could be from the complainant, POI, or other individuals, depending on the case scenario. In the published guidelines for setting sub-source propositions [
3], the DNA Commission of the International Society for Forensic Genetics define relevant case circumstances as those that “include only the case information that is needed for the formulation of the propositions and for assigning the probabilities of the results”. Buckleton et al. [
4] describe forensically relevant case circumstances for a DNA case as “information that will help formulate the appropriate alternative, determine the number(s) of contributors, and select the relevant population”. They do not consider “information such as prior conviction, motive, presence of other types of evidence, or a confession as relevant forensic information”. As much relevant case information should be gathered as practical before formulating the propositions.
The conclusions of this work suggest that this recommendation should be greatly strengthened. We reprise Slooten’s insightful comment that not conditioning is also an assumption [
14]: That the profile being considered for conditioning is not a donor and that this is not at all supported by the data. It is very tempting to feel that not assuming is somehow safe or conservative. But the choice is between assuming that the conditioning profile is, or is not, a donor. If the data support the presence of this profile, it can be very detrimental not to assume their presence. This is because of the much-enhanced ability to differentiate true from false donors when conditioning is applied. A useful way through this issue is to use the approximation to the exhaustive
LR. This enables a balanced approach that assumes that the conditioning profile either is or is not a donor. However, in the event that two or more donors cannot both or all be donors (or that the compound
LR is much less than 1), it is still necessary to state this explicitly.
Recommendation 4.5: The analysis should separate the propositions into their simplified constituents (i.e., simple proposition pairs—recall that ASB describes both simple and conditional propositions as simple) when an LR favouring Hp has resulted from a compound proposition pair incorporating multiple POIs under Hp and none of the POIs under Ha, in order to establish the weighting and the consequent probative value of the evidence per contributor under Hp.
The conclusions of this work very strongly support this statement. Compound proposition pairs can misrepresent the weight of the evidence against an individual strongly in either direction. This work strongly favours the use of conditional proposition pairs rather than simple proposition pairs whenever the data support the presence of an individual as the conditioning profile since this increases the ability to differentiate true from false donors.
We have demonstrated by calculating conditional
LRs in DBLR™ that the compound
LR can be obtained as a product of simple and conditional
LRs. This is also approximately true using
LRs produced by STRmix™. The use of conditional
LRs, described as an approximation to the exhaustive
LR by Buckleton et al. [
7], resulted in higher
LRs for the known contributors and lower
LRs for the non-donors than when using a simple proposition pair. This statistic makes the best use of the DNA profiling information.