1. Introduction
Olfactronic individual identification can objectify contemporary subjective olfactory identification, which is, at the present time, carried out by specially trained canines handled by police cynologists. Such scent evidence is now used in police practice, but only as an indirect proof or as an operational search tool [
1]. This study demonstrates the potential possibilities of olfactronic identification [
2].
Dogs are much more sensitive and faster than the best chromatograph devices are now. In this paper, the canine model of identification was used when comparing a crime scene sample with possible samples. A dog firstly creates some scent image from the known scent samples and then determines from a selection of unknown objects the one that corresponds to the previous samples. The idea is that when sniffing, a kind of scent signature is created, which is then used for comparison with unknown samples [
1].
Papers about olfactronic identification are very rare so far [
2,
3,
4,
5]. Solving this identification procedure could soon be as useful as fingerprint identification, although olfactronic identification is more complex because of the several basic tasks that must be preferentially solved before the olfactronic identification of persons can be performed. A number of these problems are already being solved. The first problem is the process of the collection and storage of the original scent samples [
6], and the second issue is the correct recognition of the compounds in these samples (peak alignment) [
7,
8], and only then is the objective olfactronic identification of persons feasible.
Chemical compounds in human scent samples can be formally divided into three groups, so-called primary, secondary, and tertiary scent compounds. The primary and the secondary scent compounds pass through the skin of a human individual. Since the primary compounds are genetically determined [
9], their occurrence and relative amount is not much affected by the diet or physical or mental state of the particular person, nor by their diseases, medications, cosmetics, etc. It is assumed that the genetic background of the person determines the occurrences and relative concentrations of these primary compounds [
10,
11].
Unlike primary compounds, the occurrence and relative concentrations of the secondary scent compounds vary. They depend on the personal diet, physical as well as mental conditions (excitement, fear, or illness), rhythm of life (physical stress, fatigue, rest, and sleep), medications used, various cycles, etc. The presence of some secondary compounds may indicate drug addiction, diseases, mental disorders, etc. [
12].
The tertiary scent compounds do not pass through the skin, and they are usually contaminants from the surrounding environment or from cosmetics. They may contain compounds that arise from the decomposition of primary and secondary compounds because of the activities of bacteria and ferments on the surface of the human body, e.g., in the armpits, genitals, etc. Tertiary compounds also include primary and secondary scent compounds of other people with whom a particular person has come into a personal contact. The same applies to a transferred scent from animals as well as contaminations from an environmental emission (e.g., the smell of hospitals, gas stations, smoky restaurants, chemical laboratories, etc.).
In general, the primary compounds can be used for the individual identification of persons, while the secondary compounds can be used for the recognition of some diseases, drugs, etc. Therefore, the concentration ratios and the relative concentrations of these compounds in scent samples should be relatively constant for any particular person. These findings are the basis for the odorological identification of persons.
Primary compounds are not distinguished first in this article. Instead, all compounds are taken into account, and those that function best for the identification are the best candidates for the primary compounds.
2. Materials and Methods
This study is a continuation of 3 previous publications [
4,
6,
13] devoted to the olfactronic identification of persons using comprehensive two-dimensional gas chromatography with mass spectrometry. The scent samples used in this study come from studies carried out in 2018–2020 [
4,
13]. All the samples were processed in the same GC×GC-MS chromatograph with the same set of parameters. In one study [
6], the instrumentation and sampling details for the data were precisely defined. For this study, 10 samples from each of 40 volunteers were taken. The raw data were used in the presented experiments. This means no data alignment was used, only procedures that are incorporated in ChromaTOF software by Leco, Version 4.72.0.0.
The samples were originally labeled as “system_1”, “system_2”, and “system_3”. This designation meant a change in chromatograph configuration, namely the change in the second column. The broken one was changed for the same one with the same length. Some adjustments had to be made, but all the parameters of the chromatograph remained the same. All 400 samples are placed in the olfact_system1_2_3_400.zip file. In the second experiment, only “system_1” samples were taken. This limitation resulted in 181 samples in the second experiment, with 7 or 8 samples for each person. Those samples are placed in the olfact_system_1_181.zip file. Both data files can be downloaded from
https://doi.org/10.5281/zenodo.11120467 (accessed on 4 September 2023).
Two approaches were used for the construction of the digital scent signatures of the individual persons. They are described below as the digital scent signature of type AA (area/area) and the signature of type AS (area standardized or normalized). The AA scent signature compares the mutual ratios of compound concentrations (areas) in the samples, whereas the AS scent signature compares the normalized concentrations (areas) of compounds in the digital scent samples.
The first step in this olfactronic identification procedure consists of the selection of the scent compounds that are expected to be present in all the scent samples of an examined individual and that retain their relative concentrations. From these compounds, the digital scent signatures can be constructed for each person.
3. Experiments
3.1. Database Creation
The database for the identification was made by means of the database system PostgreSQL [
16] with the database tool dBeaver [
17]. This system enables the creation of tables for data and procedures for working with these data. All structures and procedures are present in the file
database_structure.sql, which is attached to the article. The running of the identification procedures is described in the file
database_run.sql.
The tables can be divided into 3 groups: import tables, working tables, and tables for comparison. Digitized scent samples are placed in the directory as files, and all those files are in the first step inserted to the “insertxx” table and then transferred to the “odorfull” table. The next step deals with creating the table “molecule”, where all the distinct compounds are inserted for further processing. In this step, some of the compounds are removed from further processing. Namely, the following compounds:
- -
Siloxanes and phthalates (contaminants from the glass);
- -
Compounds that are present only in a few samples;
- -
Duplicate compounds in samples.
Removing all the duplicates in the samples can be considered to be too strict. But summing them and working with the summed area value led to a deterioration in the identification results.
In the current article, the problem of data alignment is not resolved. Data are taken with compound identification from ChromaTOF. Deleting duplicates from the samples surely removes some compounds that could be considered as primary. Data alignment (DA) is a worldwide problem and has not yet been resolved completely. For olfactronic purposes, DA with no manual handling (detection of several peaks for alignment) must be applied.
The presented algorithm does not try to identify chemical compounds for the scent signature first. Instead, it finds those compounds that preserve mutual ratios of abundances. Further analysis is needed to determine which of them are primary, secondary, or tertiary compounds.
In the table “molecules”, there are prepared pairs of molecules for computing ratios of the “area” fields. This is important for the creation of the signatures. The pairs of compounds that have a similar ratio in samples of the given person are the best candidates for the primary compounds of the person.
3.2. Import of Samples
Text files containing digitized information about the compounds in the human scent trace (i.e., samples or chromatograms) were uploaded into the database, processed, and evaluated. For each person, 5 to 9 samples were subsequently used to construct the digital scent signatures. The intention was to verify whether the signatures from more samples would serve better for identification. One other sample from each person was taken as a test sample, with which the identification was carried out. It was essential that the test sample was not used in the construction of signatures so that it could be considered an unknown sample. The flowchart of processing samples can be seen in
Figure 2.
3.2.1. Digitized Scent Signatures
There are more signatures in human scent. Dogs can also identify a person from a fraction of scent samples. In the present article, two differently constructed signatures were used. In the first experiment, 40 unknown test samples (one from each person) were tested against 40 so-called AA and 40 so-called AS signatures. In the second experiment, 24 unknown test samples were similarly tested against 24 AA and 24 AS pre-prepared signatures. This process was repeated with different test samples.
3.2.2. Digitized Scent Signature of Type AA
The ratio of two different compounds in the same sample (hence, the signature of type AA) was used for the construction of signatures of type AA. This ratio is a characteristic for the given person when it remains approximately the same in the samples of the person. Subsequently, the variance of these ratios in samples of the person was calculated for each ratio of the compound areas in the person’s samples. Those ratios of the compound areas with the lowest variance of were selected for the AA type digital signature. Along with the ratios, the minimum and maximum ratio values were stored with the signature in the database.
3.2.3. Digitized Scent Signature of Type AS
The value
area corresponds to the concentration of the compound in the sample. To compare the concentrations of compounds in different samples, it is necessary to standardize or normalize the
area value to
areas (therefore, the signature of type AS). The normalized
areas values were obtained by dividing each
areaj by the average value of the
areas of all the compounds in the sample:
for
j ranging across all compounds in the sample.
The compounds with the smallest variation in values in their areas in each person’s samples were included in the AS-type digital signature for any given person.
3.2.4. Comparisons of Samples
A diagram of the comparisons of the test samples with the signatures can be seen in
Figure 3.
When all signatures are created, then the identification process can begin. Every unknown sample is compared with every signature. Every pair of compounds from the signature is compared with appropriate compounds in the test samples. The number of good comparisons are placed in the “person_compare” table. After completing all the comparisons, identification can start.
The comparison procedure is conducted so that three values are taken into account:
- −
The total number of pairs of compounds that are both in the signature and in the unknown sample;
- −
The total number of pairs of compounds from the unknown sample that have a ratio between the min. and max. values of the same compounds stored in the signature;
- −
The percentage value of successful comparisons.
Then, all those comparisons for the unknown sample are ordered according to these criteria. The first is marked as the result for the identification.
The subject of this study was the comparison of the unknown test samples with AA and AS signatures. For each test sample, the ratio and areas were calculated, respectively. Afterwards, the corresponding values from the unknown test samples were compared regarding whether they fall into the minimum and the maximum values of the same compounds in all signatures.
For signatures AA and AS, the total number of compounds that are present in both the test sample and in the signature were checked, as well as the number of values for and areas that were in the range of the minimum and maximum values in the signatures for the same compounds.
The evaluation was performed repetitively with an increasing number of samples, from which signatures were created. In addition, the total number of these
ratios in the signatures was also increased (4000, 6000, 8000, and 10,000), and calculations of the success percentages of the individual identifications were computed (see
Table 1 and
Table 2).
4. Results
The number of scent samples and the corresponding two-dimensional chromatograms from which the digital scent signatures were constructed and the number of pairs of compounds included in the comparisons impacted the successful identifications of the test samples when compared to the prepared signatures. The best results were obtained for digitized scent signatures created from seven samples and the 6000 (or 8000 in the second experiment) compound area ratios. The values in
Table 1 indicate the percentages of successful identifications.
To interpret
Table 1, it can be seen, for example, that a percentage value of 92.3 means that out of 40 test samples, 3 were evaluated incorrectly. As the number of samples in the signature increased beyond seven and the number of pairs of compounds in a signature is greater than 6000, the percentage of correct identification starts to decrease. This may be due to inhomogeneity between the samples from different systems (system_1, system_2, and system_3).
4.1. Comparisons for 400 Samples
The results of the comparisons for 400 samples marked as “system_1”, “system_2”, and “system_3” are shown in
Table 1. They can be seen in graphic form in
Figure 4.
4.2. Comparisons for 181 Samples
Subsequently, tests were performed on a subset of the input data with samples labeled as “system_1”. Those samples were labeled as “system_1” before changing the settings on the chromatograph (primary or secondary column). Persons who had at least seven samples of system_1 were included in the processing. There were 24 persons and a total of 181 samples.
The results are summarized in
Table 2. The percentage of correct identification increased to up to 95%. So, the homogeneity of the samples may be essential for the identification.
The results of the comparisons for 181 samples all labeled as “system_1” are in
Table 2. They can be seen in graphic form in
Figure 5.
Each field in the table consists of 10 identification processes. The percentage value in
Table 2 is the average of the identification success values. The accuracy and precision of those average percentage values is inserted for two values.
Obtaining these apparently simple results in
Table 1 and
Table 2 took several thousand computer hours on the server (AMD Ryzen 9 5950 × 16-Core Processor, 3.40 GHz, 128 GByte RAM, 6 × 585 GByte) and about thousand lines of source code.
Two different algorithms were tested to create digital scent signatures for individual identification. The first algorithm works with the
area ratios (
), while the second one works with the relative concentrations (
areas) of the compounds identified in the chromatograms. The probability of the correct identification increases with the number of chromatograms used to create the signatures and with the number of
area ratio pairs of compounds in the comparisons (see
Table 1 and
Table 2), and reaches up to 95% success for samples measured using equivalent conditions.
5. Discussion
The olfactronic approach can be considered objective and not influenced by any subjectivity. Furthermore, it is a repeatable process, and it is possible to determine the accuracy and precision of the results. Last but not least, it is possible to store and compare digital samples in practically unlimited quantities. If a uniform procedure for determining scent signatures is found, then it will be possible to use the created signatures on an international scale as well.
Using the two differently constructed signatures made it possible to significantly increase the percentage of correct identification. The individual identification of persons can be the first step in forensic olfactronics, which can be followed by class identifications, e.g., gender, ethnic origin, blood groups, illnesses, psychiatric conditions, etc.
For further studies, it is essential to specify compounds in the signatures. In addition, the mutual compatibility of samples must be studied, working with samples that have different numbers of compounds and comparing strong and weak samples.
For the further development and validation of this method of identification, more detailed validation data are needed that demonstrate the consistency across various populations and environmental conditions. Real-world applications must first confirm the above conclusions in praxis.
For forensic purposes and in law enforcement, it is necessary to have hundreds of persons and thousands of scent samples in processing. The database of digital scent signatures must be made world-wide in cooperation with workplaces and laboratories that deal with the topic of scent samples and person identification.
6. Conclusions
The results presented in this paper clearly show that the computerized identification of people based on their digitized scent signatures prepared by GC×GC-MS spectroscopy is a very promising method.
The issue of the purity of samples must be taken into account. In the samples processed in the current experiments, there are very clean samples, practically without contaminants. But in real-world situations, the scent samples are contaminated with another person’s scent.
This preliminary study has some limitations, and several topics need to be elaborated on. Above all, it concerns the mentioned compatibility of samples. Moreover, it is necessary to be able to compare data from different chromatographs and scent samples with different intensities and qualities.
This article uses result samples obtained using ChromaTOF software that uses NIST library version 20. However, the assignment of the compound name from the NIST database is not 100% reliable, especially in the case of homologues, where the given compound name may occur more than once with different retention times.
The identification algorithm can be improved in several ways. Two types of signatures are now used. Probably, there are more types of signatures in scent samples. In addition, the multiplicity of scent signatures was proved by the canine identification of persons by scent fractions.
In the current experiments, the unknown sample was compared with all the signatures prepared from the same number of known samples. But this cannot be guaranteed in a real environment for real identifications.