1. Introduction
Knowledge of spectroscopic parameters for transitions between energy levels in atoms and molecules is essential for interpreting and modeling the interaction of radiation (light) with different media. In order to aid researchers, spectroscopic parameters are being compiled into reference databases. In particular, the HITRAN (high-resolution transmission) molecular spectroscopic database [
1] is a compilation of spectroscopic parameters used to simulate and analyze the transmission and emission of light in gaseous media, with an emphasis on planetary atmospheres. For half a century HITRAN has been considered to be an international standard which provides one recommended value per parameter for millions of transitions for different molecules. HITRAN employs both experimental and theoretical data which are gathered and processed from articles, books, proceedings, databases, theses, reports, presentations, unpublished data, papers in-preparation and private communications. Commencing with the HITRAN1986 edition [
2], HITRAN started to provide reference mapping for the line positions, transition intensities, and broadening coefficients due to the pressure of air. Starting with the HITRAN2004 edition [
3], the majority of parameters in HITRAN have complete reference attributions. The current edition of HITRAN [
1] contains references for dozens of parameters per transition. It is imperative that all of these contributions to HITRAN receive acknowledgement through proper referencing to their cited material in the HITRAN database. This gives users an option to read more about how the parameters were determined and also acknowledges contributing papers and enables their citation by the users of the database. Furthermore, it assists the managers of the database in maintaining and updating complex segments of the database.
While HITRAN is unique in providing references for the majority of the spectral parameters in every transition, it is important to note that other reference molecular spectroscopic databases exist, and they adopt different approaches to citing original references. One such database is GEISA [
4] which has many of the same parameters as in HITRAN. GEISA does not provide references to the individual parameters and usually provides a reference index for the entire transition; however mapping for this index is available only to the managers of that database for maintenance purposes. It is also important to mention two databases (NASA’s Jet Propulsion Laboratory (JPL) microwave catalogue [
5] and the Cologne Database for Molecular Spectroscopy (CDMS) [
6]) that concentrate mostly on spectroscopic parameters in the microwave region for molecules of astronomical interest. While individual parameters or transitions do not have references in these databases, a supplementary bibliography per line list is provided. This bibliographic information provides a list of references to the reported data that were used in the databases directly or as input into the global model used for calculation. A similar approach has been adapted by the databases of extensive ab initio line lists, including ExoMol [
7] and TheoReTs [
8].
In this paper we describe a new, automated referencing system to provide consistent, accurate and detailed bibliographies to every source of data on the website. Starting from the HITRAN2016 edition [
1], the data is being distributed through HITRAN
online (
https://hitran.org) which is built on a relational database model described in Hill et al. [
9]. The relational database approach removes the constraints of fixed-width text fields for the storage of parameters and allows an arbitrary number of parameters to be stored and retrieved for each transition, along with their uncertainties and bibliographic references. Each data set returned by a search is accompanied by a bibliography. Each data file provides citations, hyperlinks, and notes to the original data sources to make it easier for users to credit data providers. Bibliographies can be exported in several formats, including HTML, plain text and BibTeX.
All of the information on the contributions to HITRAN has previously been entered manually into the database; this method is error-prone and time-consuming. The purpose of this work is to create a convenient bibliographic system, to enable contributors’ work to be easily cited. Users utilizing this system need only enter a single line of information into the program, in order to obtain the complete bibliography entry for the paper they wish to cite. This system was designed to prevent common mistakes and ensure faster updates to the references system in HITRAN as well as to the Atomic and Molecular Bibliographic Data System (AMBDAS) database. We have previously reviewed existing data practices in molecular spectroscopy [
10] and have identified that there is room for improvement which can be facilitated with specialized tools. The goal of this work is to encourage an environment that promotes data sharing provenance and good practice amongst researchers and databases.
The AMBDAS database is a collection of references to articles concerning collisional and spectroscopic processes in plasmas and plasma–material interaction data, with a particular focus on their application to nuclear fusion energy research. The database website, accessible from
https://amdis.iaea.org/databases/, provides an interface for querying by collision species, process category and publication metadata. The bibliographic entries are maintained using the Python software described in this article, through established collaborations with the National Institute for Fusion Science (NIFS) in Japan, the National Fusion Research Institute (NFRI) in South Korea and the National Institute of Standards and Technology (NIST) in the United States of America, as well as through ad hoc arrangements with individual consultants. The use of an intuitive, automated administration interface for importing data is an important way in which errors, ambiguities and duplications are minimized within the AMBDAS database.
The software tools presented here are described as applied to the HITRAN and AMBDAS databases, but are self-contained and can be applied to other database services. The complete code for the new referencing system is provided on an open-source and cost-free basis under version 2 of the Apache Software Foundation License [
11], along with detailed instructions and resources for customizing the output formats of citations. The objective is that with enough availability and use, databases and researchers alike will be encouraged to share information between databases, in turn making their work more accessible to users. Enhancement of the HITRAN reference infrastructure will directly benefit the Virtual Atomic and Molecular Data Centre (VAMDC) [
12] and their recommended citation practices [
13,
14]. VAMDC provides the infrastructure for dissemination of atomic and molecular data from different databases from the same platform. For instance, one can simultaneously access the data from the HITRAN, JPL and CDMS databases.
This paper will not describe in detail the HITRAN data itself, searching methods, how to access data, graphing data nor other technical accessibility information. For details on this information, please refer to the papers describing the quadrennial HITRAN editions (the most recent one is HITRAN2016 [
1]) and the Hill et al. [
9] article describing HITRAN
. The HITRAN
online website has been available at
https://hitran.org since May 2015. Registration, at
https://hitran.org/register/ (and also linked from the home page), is free and requires the user to provide only a name and email address.
Current Status of References in HITRAN
Changes to the referencing system in HITRAN are useful only if users are aware of how to access this information when using HITRAN’s data access capabilities. Therefore, this section is dedicated to providing a detailed review on how to retrieve bibliographies from HITRAN. There are several sections in HITRAN where the user may search for, graph, preview or download data. The corresponding bibliographies for these data are made available in several different ways which depend on the section the user is using at the time, as well as what specific information they are retrieving.
First of all, in each bibliography the user will see a number assigned to every reference. This number is a unique “global” identifying integer ID, which is referred to in relevant data files and is recorded for user accessibility and administrator storing purposes. Users of the legacy HITRAN 160-character .par format will find this “global” integer identification system convenient; the “per-molecule” identifying integers are retained and cross-referenced with their global equivalents when this default output format (.par) is selected. Alternatively, a custom output format may be created and used as described in the article by Hill et al. [
9].
Some data in HITRAN are calculated from
multiple references and sources; to provide full credit to all contributors HITRAN nests multiple references under a single bibliographic entry. The main bibliography where multiple references are nested, is technically a “note”, while the nested references are complete bibliographies stored in the database. Therefore, any note can be created and assigned to data that is being referenced; the note will then pull the complete bibliographies of the papers that the data set is generated from. An example of this technique can be seen in
Figure 1.
HITRAN has several major sections that provide different types of spectroscopic data, with the traditional section being the line-by-line or molecular transition section. In the line-by-line section, when a user accesses their desired data, they will see a query-results web-page that will contain a “downloads” table. In this “downloads” table, there are two bibliography files giving sources and notes relating to the returned data. One bibliography is an HTML file with links to the cited articles at their publisher websites and on the ADS database [
15] (
Figure 2). The other bibliography file is a .bib file containing these references in BibTeX format that enables the inclusion into a LaTeX document (
Figure 3). If there are fewer than 1000 transitions returned by the query made by the user, then those transitions are listed in an HTML table on the same query results web page (
Figure 4). Hovering the mouse cursor over each parameter in this table brings up a bibliography entry for that parameter which contains links to the article and any relevant notes on the reference.
In the absorption cross section part of HITRAN
online there is a complete list of references contained in the supplemental folder (provided at the top left of the window). In this folder the referenced sources can be found listed in HTML, Excel and plain text formats. If a user wishes to view the full bibliography for a particular absorption cross section, they click on their desired molecule and hover their mouse cursor over any of the rows of data listed for that particular molecule. An example is displayed for an absorption cross section of formaldehyde (
Figure 5). After selecting one or more data sets, the user will be prompted to the final screen which then presents a page with links to the data files and a complete bibliography.
In the collision induced absorption section of HITRAN
online, the corresponding references are given in the reference document PDF which is provided at the top of the web-page. In the aerosol properties section, the references for the utilized sources can be obtained through a PDF document. In the HITEMP section, users are asked to cite the original sources of data by using the assigned reference codes for each line transition. These reference codes are consistent with those in HITRAN line-by-line. In the supplemental section there is a list of references at the bottom of the window and in each subsection (Line-Mixing, Total Internal Partition Sums, Supplemental Absorption Cross Sections and Radioactive Isotopologues). There exists an instruction manual containing the references used for the data provided. Overall these bibliographies are displayed so that users will understand where the data came from and to be able to use this information for their records and further research. HITRAN also provides the HITRAN Application Programming Interface (HAPI) [
16] that allows downloading the spectroscopic data from HITRAN and carry out sophisticated calculations using predefined or custom functions. At the moment HAPI does not enable downloading reference data but work is underway to provide this option in the near future.
2. Results
2.1. Digital Object Identifier
A digital object identifier (DOI) is a string of numbers, letters and symbols used to permanently identify an article or document and link to it to its online original source; a DOI is assigned to almost every work that is published in the modern age [
17]. Even proceedings and people can have their own DOI to electronically link public work to the sources. This DOI system is traceable and permanent, which is why HITRAN and AMBDAS chose to use the unique DOI for citing and referencing sources in their respective databases. The International DOI Foundation designates DOI’s, which act as a unique identifier for content online and in print. Once assigned, DOI’s are unchanging and therefore provide a permanent link to the location of individual works. Digital objects may change physical locations, but the DOI assigned to that object will never change. Therefore by using the DOI the object is always accessible to the user.
The DOI for an article can be found at the top or bottom corners of the published paper, or in the hyperlink to the paper. The new automated method implemented by the software described in this article enables the administrator to enter only the DOI for published work and in return retrieve the bibliography of the publication.
2.2. Automatic Referencing System
This section describes the process of querying and retrieving bibliographies and references with the new automated referencing system. All code for the automatic referencing system was written in the Python programming language through the web-based, interactive computing notebook environment, Jupyter Notebook and is also implemented in a Django application available at
https://github.com/hitranonline/pyref. Interested users are encouraged to test the referencing system on this Jupyter notebook, which provides detailed instructions. Users who are interested in implementing the code through the provided Django application can find further details in the
setup.py,
settings.py and
README files.
Several libraries are referenced, along with the necessary packages which must be installed and imported by the users. The referencing system was developed for the ease of use of the administrators of the HITRAN and AMBDAS database, as well as for the reliability of adding accurate information for the users and contributors of HITRAN alike.
The automatic referencing system provides three different output formats for every reference generated. Several formats are necessary so that users have multiple options when viewing and accessing the bibliographies in HITRAN. The format outputs for references are as follows:
Every referenced material in HITRAN also has the option of adding a detailed note to the bibliography. This detailed note option is consistently used for describing where data was taken from in the article, what data was not used and any other information necessary for understanding and using the referenced information. This same note option is included in the new automatic referencing system in HITRAN. Hyperlinks are included for all references as well, so that the users will have access to the full-text of the paper as well as the ADS link to the paper. The ADS link is a hyperlink to the ADS database, which provides bibliographic information to a majority of astronomical researchers worldwide. ADS provides several unique systems; the user has the option to view author network visualizations, paper citations, paper downloads, access citations in multiple formats, and account users have the option to develop a private library for their records. Thus, HITRAN has endeavored to include links to the ADS database for articles as well as the DOI links to published work.
The new referencing system is detailed in six clear steps, and a visualization is provided in
Figure 6. However, if the administrators are referencing a source that exists in the ADS database, then they only need to follow three easy steps in order to cite the source properly in all three output formats and include relevant notes. The six steps are explained in further detail in the
Section 4 and they are listed as follows:
Retrieve the DOI of the paper that is being cited. Enter the DOI in the prompt provided.
This step will populate the corresponding ADS bibcode for the paper. If no bibcode is generated, skip to step 4
Use the retrieved bibcode from step 2 for the
Section 4.2. The output is customizable; all output formats are possible.
Use the DOI to search in
Section 4.3. The output will be the bibliography as a JSON output.
The initial DOI is automatically populated in the final
Section 4.5 to retrieve the BibTeX citation for the paper.
Compared with manual entry, there is much less scope for human error in this new referencing method. The only part on the user side is to include any necessary notes required for citing the source properly. Users of the HITRAN database will see contributors’ names and links to their publications on every window where their data is displayed, and all of this information will be accurate and up to date. This system of displaying contributors’ work has been the conventional method used by HITRAN since 2015. Prior to this referencing system, there existed a static file containing all of the references in HITRAN at the time and to what molecules or data these references were referred to.
Henceforth, HITRAN will continue the good-practices it has always used, for providing references of data shown in the database. This new automatic method for adding sources into HITRAN and AMBDAS will be the next stepping stone towards modernization of each database.
3. Discussion
HITRAN gets its data from contributors who publish their results, present their data at a conference or communicate privately with the HITRAN team. Adding these contributors’ references in HITRAN is currently all done by hand. Therefore, this project created a new automated method for adding references in HITRAN, using only a DOI. This new automated system will be easier for the administrators/users to maintain, and data replacement will also be simplified as well as users’ access to references. Most importantly, the contributors to HITRAN and AMBDAS will receive proper credit and representation. This change will ideally set an example to the research community and encourage a productive data sharing environment amongst researchers.
Prior to creating the new automated method, references in the HITRAN database were updated manually, either by transcribing reference metadata or by cutting-and-pasting from websites, PDF files and so on. Every reference needed to be processed painstakingly by hand, including: correlating the bibliography information to the main article or resource, adding the DOI information on the article, checking spelling, adding separate plain text,
and HTML markups, etc. About fourteen hundred references were corrected with up-to-date information, and their outputs were tested and double checked before updating the database. In addition, all changes are recorded for administrator record-keeping, and this further increases the time and attention required by the administrator. The new automated method of simply retrieving references with a single click of a button will ensure faster results, create a smaller time requirement and ensure minimal mistakes.