A Data-Challenge Case Study of Analyte Detection and Identification with Comprehensive Two-Dimensional Gas Chromatography with Mass Spectrometry (GC×GC-MS)
Abstract
:1. Introduction
2. Data Quality and Preprocessing
2.1. Modulation-Cycle Phase Roll
2.2. Detector Baseline Correction
2.3. Detector Saturation
2.4. Detector Oscillation and Logarithmic Value Mapping
3. Analyte Detection
3.1. Blob Detection
- To deal with the detector saturation and oscillating noise described in Section 2.3 and Section 2.4, the 2D blur parameter was set to 4.3 datapoints, which is 21.5 msec;
- To increase sensitivity for detecting faint peaks, the minimum peak threshold was set to 7 (times the estimated noise standard deviation).
3.2. Blob Filtering
3.3. True and False Blob Recognition
3.4. Interactive Blob Review and Editing
3.5. Ion-Peaks Detection
4. Analyte Identification
4.1. MS Search Optimization
4.1.1. Experimental Variables
4.1.2. Experimental Results
4.1.3. Maximum Performance
4.2. MS Search
- The library compound name and CAS number for the MS search #1 hit for each analyte, i.e., the preliminary identification, subject to confirmation by the analyst;
- The library search results for the #1 hit, including DMF, RMF, probability, I, and base peak.
4.3. Retention Index Calibration
4.4. Expressions for Checking Compound Identifications
- DMF. This check requires that the DMF of the analyte spectrum with the library compound spectrum exceed a specified threshold, e.g.,:IF(Library_Match_Factor >= 850,“PASS”,“FAIL”)
- Base Peak (BPk). This check requires that the base peak of the analyte spectrum is the same as the base peak of the library compound:IF(MASSRANK(1) = Library_Base_Peak,“PASS”,“FAIL”);
- RI. This check requires that the difference between the computed analyte IT and the I of the library compound is not greater than a specified threshold, e.g.,:IF(ABS(LRI_I-LibraryRI) < 15,“PASS”,“FAIL”)This tolerance is somewhat large, but is justified by the preliminary purpose, lack of details about the chromatographic conditions, and pressure drop across the 1D column due to outlet restriction generated by the modulator.
- For Analyte 8, the NIST 17 Replicates Library (replib) has a good match for “3-Carene” aka Δ-3-Carene (DMF = 905, RMF = 924), which has a passing I = 1011 (compared to 1010 for the computed IT).
- For Analyte 18, the #2 hit with mainlib (DMF = 906, RMF = 907) is “β-Ocimene” (unspecified isomer), which has a passing I = 1037 (compared to 1029 for the computed IT). However, the preliminary identification of Analyte 13 was “β-Ocimene”. Further examination suggests Analyte 13 matches “1,3,7-Octatriene, 3,7-dimethyl-“ aka α-Ocimene (mainlib, DMF = 925, RMF = 925), with I = 1047, or “trans-β-Ocimene” (replib, DMF = 913, RMF = 914), with I = 1049; then, Analyte 18 matches “β-Ocimene” as above or “1,3,6-Octatriene, 3,7-dimethyl-, (Z)-“ aka cis-β-Ocimene (replib, MF = 904, RMF = 905), with I = 1038. Note, Analyte 13 was used in the preliminary IT calibration, so this identification change affects the IT model. Of course, it is preferable to analyze known standards for surer identification and IT calibration. The identifications are validated by the components of Cannabis Terpene Standards #1 from Restek (Bellefonte, PA) [16], the source sample for the chromatogram, which lists “Ocimene” (CAS 13877-91-3, β-Ocimene unspecified isomer). The 1D chromatogram for the standard supplied by Restek shows two Ocimene peaks, presumably trans-β-Ocimene (here, Analyte 13) and cis-β-Ocimene (here, Analyte 18).
- Analyte 21 is identified as “1,6,10-Dodecatrien-3-ol, 3,7,11-trimethyl-“ (Nerolidol, unspecified isomer). In replib, there is a better match (DMF = 949, RMF = 954) with “Nerolidol” aka D-Nerolidol, which has a passing I = 1544 (compared to 1530 for the computed IT). It is notable that the NIST 17 record for D-Nerolidol lists I = 1544 ± 16, which is an exceptionally large range. Restek [16] lists Nerolidol (CAS 7212-44-4, unspecified isomer) and shows two Nerolidol peaks in the 1D chromatogram, corresponding here to Analytes 21 and 19.
- For Analyte 3, identified as “Linalool”, there are no clear alternatives with passing I and the same base peak among the top hits for the NIST libraries. However, the Wiley Registry of Mass Spectral Data (7th Edition) [17] lists 25 entries for linalool, of which 17 have base peak 71, 3 have base peak 93, and 1 has base peak 69. So, this Base Peak test failure could be due to variable EI conditions and/or mass analyzer. The identity is validated by the Restek documentation;
- For Analyte 6, one of the spectra in replib for “Isopulegol”, with a passing I = 1146 (compared to 1153 for the computed IT), is an even better match (DMF = 935, RMF = 942) than the #1 hit in mainlib, “(1R,2R,5S)-5-Methyl-2-(prop-1-en-2-yl)cyclohexanol” aka Neoisopulegol. In mainlib, the spectrum for “Isopulegol” was the #4 hit. The identity, Isopulegol, is validated by the Restek documentation;
- For Analyte 10, one of the spectra in replib for the #1 hit from mainlib, “Caryophyllene” aka trans-β-Caryophyllene, has a matching base peak of 41 (DMF = 921, RMF = 930, Prob = 29). Here also, the spectra for the #1 hit are variable enough to account for the failure of the Base Peak test. The identity is validated by the Restek documentation;
- For Analyte 14, replib has a spectrum for “1,3-Cyclohexadiene, 1-methyl-4-(1-methylethyl)-“ aka α-Terpinene in which the peak for 93 is nearly as large as the base peak of 121, with intensity 906 on a scale of 999. Again, this failure may be due to somewhat different fragmentation. The identity is validated by the Restek documentation;
- For Analyte 19, the replib has a spectrum for the #1 hit “1,6,10-Dodecatrien-3-ol, 3,7,11-trimethyl-, (E)-“ aka E-Nerolidol that is a better match (DMF = 939, RMF = 946, Prob = 37) and has a matching base peak of 41. Another NIST library entry, “1,6,10-Dodecatrien-3-ol, 3,7,11-trimethyl-“ aka Nerolidol (unspecified isomer), has the same I and base peaks of 41 or 69 in replicate spectra. As described above, Restek [16] shows two Nerolidol peaks in the 1D chromatogram, corresponding here to Analytes 21 and 19;
- For Analyte 20, replicate spectra for “Geraniol” have base peaks of 41 or 69. The identity is validated by the Restek documentation.
5. Discussion and Conclusions
Supplementary Materials
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Reichenbach, S.E. Data Acquisition, Visualization, and Analysis. In Comprehensive Two-Dimensional Gas Chromatography; Ramos, L., Ed.; Elsevier Science: Oxford, UK, 2009; Chapter 4; pp. 77–106. ISBN 9780444532374. [Google Scholar]
- Izadmanesh, Y.; Garreta-Lara, E.; Ghasemi, J.B.; Lacorte, S.; Matamoros, V.; Tauler, R. Chemometric analysis of comprehensive two dimensional gas chromatography–mass spectrometry metabolomics data. J. Chromatogr. A 2017, 1488, 113–125. [Google Scholar] [CrossRef] [PubMed]
- Titaley, I.A.; Ogba, O.M.; Chibwe, L.; Hoh, E.; Cheong, P.H.-Y.; Simonich, S.L.M. Automating data analysis for two-dimensional gas chromatography/time-of-flight mass spectrometry non-targeted analysis of comparative samples. J. Chromatogr. A 2018, 1541, 57–62. [Google Scholar] [CrossRef] [PubMed]
- Higgins Keppler, E.A.; Jenkins, C.L.; Davis, T.J.; Bean, H.D. Advances in the application of comprehensive two-dimensional gas chromatography in metabolomics. TrAC Trends Anal. Chem. 2018, 109, 275–286. [Google Scholar] [CrossRef] [PubMed]
- Ieda, T.; Hashimoto, S.; Isobe, T.; Kunisue, T.; Tanabe, S. Evaluation of a data-processing method for target and non-target screening using comprehensive two-dimensional gas chromatography coupled with high-resolution time-of-flight mass spectrometry for environmental samples. Talanta 2019, 194, 461–468. [Google Scholar] [CrossRef] [PubMed]
- Harynuk, J.; Franchina, F. GC×GC Data Challenge. In Proceedings of the 10th Multidimensional Chromatography Workshop, Liege, Belgium, 2019. [Google Scholar]
- ASTM Standard E2077-00, 2016, Standard Specification for Analytical Data Interchange Protocol for Mass Spectrometric Data; ASTM International: West Conshohocken, PA, USA, 2016.
- Reichenbach, S.E.; Tao, Q. GC Image Users’ Guide; V2.8r3.; GC Image, LLC: Lincoln, NE, USA, 2019. [Google Scholar]
- Reichenbach, S.E.; Ni, M.; Kottapalli, V.; Visvanathan, A. Information technologies for comprehensive two-dimensional gas chromatography. Chemom. Intell. Lab. Syst. 2004, 71, 107–120. [Google Scholar] [CrossRef] [Green Version]
- Reichenbach, S.E.; Ni, M.; Zhang, D.; Ledford, E.B., Jr. Image background removal in comprehensive two-dimensional gas chromatography. J. Chromatogr. A 2003, 985, 47–56. [Google Scholar] [CrossRef]
- Latha, I.; Reichenbach, S.E.; Tao, Q. Comparative analysis of peak-detection techniques for comprehensive two-dimensional chromatography. J. Chromatogr. A 2011, 1218, 6792–6798. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Reichenbach, S.E.; Kottapalli, V.; Ni, M.; Visvanathan, A. Computer language for identifying chemicals with comprehensive two-dimensional gas chromatography and mass spectrometry. J. Chromatogr. A 2005, 1071, 263–269. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- NIST/EPA/NIH Mass Spectral Library with Search Program 2017; Data ver. 17, Software ver. 2.3; National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2017.
- Stein, S.E.; Wallace, W. NIST Mass Spectral Search Program User’s Guide; Ver. 2.3.; National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2017. [Google Scholar]
- Bicchi, C.; Binello, A.; D’Amato, A.; Rubiolo, P.; D’Amato, A.; Rubiolo, P. Reliability of Van den Dool retention indices in the analysis of essential oils. J. Chromatogr. Sci. 1999, 37, 288–294. [Google Scholar] [CrossRef]
- Restek. Cannabis Terpenes Standard #1. Available online: https://www.restek.com/catalog/view/45361 (accessed on 25 June 2019).
- McLafferty, F.W. Wiley Registry of Mass Spectral Data, 8th ed.; Wiley: Hoboken, NJ, USA, 2005; ISBN 978-0470047859. [Google Scholar]
Library Search #1 Hit | Blob | Library | Tests | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | Compound Name | CAS# | %Rsp | 1tR | 2tR | IT | BPk | DMF | RMF | Prob | I | BPk | DMF | BPk | RI |
1 | ß-Myrcene | 123-35-3 | 5.38 | 14.33 | 0.70 | 983 | 41 | 907 | 907 | 56 | 983 | 41 | PASS | PASS | PASS |
2 | D-Limonene | 5989-27-5 | 5.59 | 16.17 | 0.70 | 1028 | 68 | 890 | 890 | 15 | 68 | PASS | PASS | ||
3 | Linalool | 78-70-6 | 4.99 | 19.21 | 0.92 | 1086 | 43 | 929 | 929 | 73 | 1086 | 71 | PASS | FAIL | PASS |
4 | ß-Pinene | 127-91-3 | 5.41 | 13.92 | 0.75 | 973 | 93 | 944 | 945 | 50 | 973 | 93 | PASS | PASS | PASS |
5 | Camphene | 79-92-5 | 5.86 | 12.75 | 0.72 | 946 | 93 | 947 | 962 | 38 | 946 | 93 | PASS | PASS | PASS |
6 | (1R,2R,5S)-5-Methyl-2-(prop-1-en-2-yl)cyclohexanol | 29141-10-4 | 4.39 | 21.42 | 1.13 | 1139 | 41 | 930 | 931 | 45 | 69 | PASS | FAIL | ||
7 | Humulene | 6753-98-6 | 4.95 | 33.63 | 0.84 | 1451 | 93 | 886 | 888 | 49 | 1451 | 93 | PASS | PASS | PASS |
8 | α-Pinene | 80-56-8 | 5.13 | 15.29 | 0.70 | 1003 | 93 | 921 | 929 | 16 | 933 | 93 | PASS | PASS | FAIL |
9 | α-Pinene | 80-56-8 | 5.30 | 12.00 | 0.70 | 933 | 93 | 945 | 952 | 33 | 933 | 93 | PASS | PASS | PASS |
10 | Caryophyllene | 87-44-5 | 4.52 | 32.33 | 0.84 | 1419 | 41 | 926 | 926 | 36 | 1419 | 93 | PASS | FAIL | PASS |
11 | Cyclohexene, 1-methyl-4-(1-methylethylidene)- | 586-62-9 | 4.78 | 18.67 | 0.68 | 1079 | 93 | 916 | 923 | 15 | 1079 | 93 | PASS | PASS | PASS |
12 | τ-Terpinene | 99-85-4 | 5.43 | 17.46 | 0.69 | 1050 | 93 | 876 | 878 | 12 | 1050 | 93 | PASS | PASS | PASS |
13 | ß-Ocimene | 13877-91-3 | 4.31 | 16.88 | 0.73 | 1037 | 93 | 938 | 938 | 35 | 1037 | 93 | PASS | PASS | PASS |
14 | 1,3-Cyclohexadiene, 1-methyl-4-(1-methylethyl)- | 99-86-5 | 4.93 | 15.63 | 0.74 | 1010 | 93 | 907 | 916 | 19 | 1010 | 121 | PASS | FAIL | PASS |
15 | α-Bisabolol | 515-69-5 | 4.49 | 41.25 | 0.85 | 1668 | 43 | 929 | 944 | 75 | 1668 | 43 | PASS | PASS | PASS |
16 | o-Cymene | 527-84-4 | 5.02 | 15.96 | 0.79 | 1025 | 119 | 944 | 958 | 63 | 1025 | 119 | PASS | PASS | PASS |
17 | 5-Azulenemethanol, 1,2,3,4,5,6,7,8-octahydro-α,α,3,8-tetramethyl- | 13822-35-0 | 4.10 | 38.42 | 0.94 | 1588 | 59 | 893 | 904 | 30 | 59 | PASS | PASS | ||
18 | α-Pinene | 80-56-8 | 2.52 | 16.38 | 0.69 | 1030 | 93 | 913 | 920 | 19 | 933 | 93 | PASS | PASS | FAIL |
19 | 1,6,10-Dodecatrien-3-ol, 3,7,11-trimethyl-, (E)- | 40716-66-3 | 2.43 | 37.04 | 0.82 | 1549 | 41 | 923 | 925 | 39 | 1549 | 69 | PASS | FAIL | PASS |
20 | Geraniol | 106-24-1 | 3.19 | 25.54 | 0.99 | 1237 | 41 | 922 | 922 | 67 | 1237 | 69 | PASS | FAIL | PASS |
21 | 1,6,10-Dodecatrien-3-ol, 3,7,11-trimethyl- | 7212-44-4 | 1.46 | 36.00 | 0.82 | 1519 | 41 | 939 | 953 | 61 | 1551 | 41 | PASS | PASS | FAIL |
Abbr. | Description | Options |
---|---|---|
Search | NIST Identity Search Presearch | [None, Quick, Normal] |
Source | Source of Mass Spectrum | [Apex, Blob] |
Integr. | Percent of Apex for Integration (Blob Source Type only) | [Number 0–100] |
Subtr. | Background Subtraction | [None, Start, Start & End] |
Thresh. | MS Peak Intensity Threshold | [Non-negative number] |
Abbr. | Description | Range |
---|---|---|
DMF All | Direct Match Factor for #1 hit each analyte, averaged for all analytes | [0–999] |
DMF Top | Direct Match Factor for #1 hit each analyte, averaged for top analytes | [0–999] |
RMF All | Reverse Match Factor for #1 hit each analyte, averaged for all analytes | [0–999] |
RMF Top | Reverse Match Factor for #1 hit each analyte, averaged for top analytes | [0–999] |
Prob. All | Probability for #1 hit each analyte, averaged for all analytes | [0–100] |
Prob. Top | Probability for #1 hit each analyte, averaged for top analytes | [0–100] |
AMF All | Average of DMF & RMF for #1 hit each analyte, averaged for all analytes | [0–999] |
AMF Top | Average of DMF & RMF for #1 hit each analyte, averaged for top analytes | [0–999] |
R2 All | R2 for Linear Fit with (1tR, I) for #1 hits of all analytes | [0–1] |
R2 Top | R2 for Linear Fit with (1tR, I)) for #1 hits of top analytes | [0–1] |
Experimental Settings | Results All | Results Top | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Row | Src. | Intgr. | Subtr. | Thr. | Search | DMF | RMF | Prob. | AMF | R2 | DMF | RMF | Prob. | AMF | R2 |
1 | Apex | 0 | None | 0 | None | 775 | 818 | 21% | 797 | 0.80 | 923 | 928 | 39% | 926 | 0.99 |
2 | Apex | 0 | None | 0 | Quick | 772 | 820 | 25% | 796 | 0.75 | 923 | 928 | 40% | 926 | 0.99 |
3 | Apex | 0 | None | 0 | Norm | 774 | 818 | 25% | 796 | 0.78 | 923 | 928 | 40% | 926 | 0.99 |
4 | Blob | 0 | None | 0 | None | 785 | 835 | 18% | 810 | 0.72 | 914 | 922 | 38% | 918 | 0.99 |
5 | Blob | 0 | None | 0 | Quick | 783 | 834 | 23% | 809 | 0.68 | 914 | 922 | 39% | 918 | 0.99 |
6 | Blob | 0 | None | 0 | Norm | 784 | 835 | 23% | 809 | 0.72 | 914 | 922 | 40% | 918 | 0.99 |
21 | Apex | 0 | None | 27 | Norm | 791 | 831 | 27% | 811 | 0.92 | 923 | 928 | 40% | 926 | 0.99 |
43 | Blob | 0 | None | 700 | Norm | 808 | 840 | 26% | 824 | 0.86 | 914 | 922 | 40% | 918 | 0.99 |
51 | Apex | 0 | Start | 0 | Norm | 759 | 803 | 25% | 781 | 0.69 | 924 | 928 | 41% | 926 | 0.99 |
61 | Blob | 0 | Start | 0 | Norm | 753 | 795 | 24% | 774 | 0.66 | 913 | 920 | 39% | 917 | 0.99 |
72 | Apex | 0 | S&E | 0 | Norm | 761 | 801 | 25% | 781 | 0.57 | 923 | 929 | 40% | 926 | 0.99 |
82 | Blob | 0 | S&E | 0 | Norm | 765 | 814 | 24% | 789 | 0.70 | 915 | 921 | 39% | 918 | 0.99 |
154 | Blob | 45 | None | 250 | Norm | 826 | 851 | 30% | 838 | 0.95 | 920 | 926 | 40% | 923 | 0.99 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Reichenbach, S.E.; Tao, Q.; Cordero, C.; Bicchi, C. A Data-Challenge Case Study of Analyte Detection and Identification with Comprehensive Two-Dimensional Gas Chromatography with Mass Spectrometry (GC×GC-MS). Separations 2019, 6, 38. https://doi.org/10.3390/separations6030038
Reichenbach SE, Tao Q, Cordero C, Bicchi C. A Data-Challenge Case Study of Analyte Detection and Identification with Comprehensive Two-Dimensional Gas Chromatography with Mass Spectrometry (GC×GC-MS). Separations. 2019; 6(3):38. https://doi.org/10.3390/separations6030038
Chicago/Turabian StyleReichenbach, Stephen E., Qingping Tao, Chiara Cordero, and Carlo Bicchi. 2019. "A Data-Challenge Case Study of Analyte Detection and Identification with Comprehensive Two-Dimensional Gas Chromatography with Mass Spectrometry (GC×GC-MS)" Separations 6, no. 3: 38. https://doi.org/10.3390/separations6030038
APA StyleReichenbach, S. E., Tao, Q., Cordero, C., & Bicchi, C. (2019). A Data-Challenge Case Study of Analyte Detection and Identification with Comprehensive Two-Dimensional Gas Chromatography with Mass Spectrometry (GC×GC-MS). Separations, 6(3), 38. https://doi.org/10.3390/separations6030038