1. Introduction
Database integration using Linked Open Data technologies [
1] has been promoted in many science fields. In the field of life sciences, integration is progressing and contributing to collaboration between different research fields, such as proteomics and genomics, to solve cross-cutting issues across fields [
2]. The importance of database integration is recognized in the glycosciences as well. After the international glycan repository GlyTouCan was released in 2015 [
3], integration greatly progressed worldwide. Today, GlyTouCan has over 120,000 entries registered by researchers worldwide. This number is expected to increase in the future because of the release of the GlyCosmos glycoscience portal [
4], which provides an integrated interface for glycoscience data, and is increasingly collaborating with other research fields to integrate glycoconjugate data, such as glycoproteins and glycolipids. We are also collaborating with the worldwide Protein Data Bank (wwPDB), which recently performed an update of their data representation for carbohydrates, as a result of consultation with the glycoscience community (Carbohydrate Remediation) [
5]. This remediation provides access to glycan 3D structural information in the PDB in a more Findable, Accessible, Interoperable and Reusable (FAIR) manner. To facilitate database access to the increasing number of glycan entries in GlyTouCan and GlyCosmos, we have developed infrastructure to access glycan related information more efficiently, e.g., we provide a web Application Programming Interface (API) for obtaining GlyTouCan IDs from WURCS [
6], which is the main text representation used in GlyTouCan. WURCS is now also included as one of the linear descriptors of carbohydrates in the PDB.
On the other hand, many web users need a useful search system for accessing desired glycan structures. Although the web API described above can provide access to glycan entries, it is not easy for many experimental researchers to use such informatics technologies. A search system should enable non-experts to search glycans easily, and inputting and searching glycans should not take much time. Thus, we considered the following requirements to develop a tool for glycan search: (1) users should be able to input glycan structures by editing symbolic nomenclature via the tool; (2) database search should be available; (3) all processes should be fast and completed online.
The symbolic nomenclature described in requirement (1) refers to glycan structure representation with symbols for representing monosaccharides and lines for describing linkages between monosaccharides. The Symbol Nomenclature for Glycans (SNFG) [
7] proposed by a wide community of glycoscientists proposes a standard nomenclature for glycans. This nomenclature covers not only fully defined glycan structures but also glycans in ambiguous states, such as glycan fragments which are those containing underdetermined linkages. Considering the fact that many glycan fragments are registered in GlyTouCan and thereby are queried frequently, the tool which we are considering must support glycan fragments as well. Regarding requirement (2), the web API, described above, can be used for database search online via this tool.
For development of the tool, we used or referenced some existing tools. GlycanBuilder [
8], developed during the EuroCarb project [
9], has been long used by many glycoscientists. This tool provides excellent features for editing glycan structures in symbolic nomenclature, and its features were updated in GlycanBuilder2 to make it compatible with SNFG representation as well as some special glycan structures such as cyclic glycans [
10]. This tool was upgraded to run in a web environment and is used in the database search systems of GlyTouCan and UniCarbKB [
11], but has some limitations and difficulties in terms of usage and maintenance as a web application. SugarSketcher [
12], developed by the Swiss Institute of Bioinformatics, is a web-based glycan editing tool. This tool can edit many glycan structures, except for ones in an ambiguous state, and provides a simple editing interface for use on smartphones and tablets, but it requires many steps even to add a monosaccharide. This tool can be used as a web application because all the systems are coded in JavaScript. This tool can export GlycoCT [
13], which is supported by many software tools and databases. However, because of the lack of an online database search functionality in this tool, users must take the GlycoCT outputted by this tool and then manually enter the string into the search interface for databases which provide GlycoCT search. The code is freely available from their GitHub repository at
https://github.com/alodavide/sugarSketcher (accessed on 30 October 2021). GlycoGlyph [
14] is another tool that is available online and provides similar features in SugarSketcher, such as GlycoCT export. This tool provides a GlyTouCan search system, which can obtain GlyTouCan IDs, and the links for not only the inputted glycans but also ones with different anomeric states can be obtained from GlyTouCan.
After consideration of all these tools, we decided to develop a new web-based glycan search tool, SugarDrawer, based on SugarSketcher functionalities, but additionally providing an updated glycan editing interface and some additional features. In this manuscript, we describe the following as the main features of our tool: (1) an intuitive editing experience similar to GlycanBuilder; (2) editing structures and exporting of text formats supporting glycan fragments; (3) GlyTouCan database access through this tool. SugarDrawer can be accessed at
https://glycoinfo.gitlab.io/sugardrawer/sugar-drawer-pages/ (accessed on 30 October 2021). The code is available on GitLab at
https://gitlab.com/glycoinfo/sugardrawer/sugardrawer (accessed on 30 October 2021).
3. Discussion
GlycanBuilder2 allows users to edit glycans using SNFG symbols on the web, and it can also be utilized as a database search tool as well as for glycan text conversion such as with WURCS and GlycoCT. It currently supports the following types of glycans: simple and branched glycans, containing substituents such as sulfated monosaccharide and glycan fragments.
Various glycan editing tools have been developed by several groups. We compared these various glycan editing tools and summarized them in
Table 1. All of the glycan editing tools could support standard glycans found mainly in mammalian organisms, and all these tools could represent them using SNFG. GlycanBuilder2 and SugarDrawer can represent glycan fragments using brackets and glycan subgraphs. On the other hand, SugarDrawer provides unique editing features for glycan fragments, which allow users to select attachment sites of the fragment part on the core part. Other editing tools, such as GlycanBuilder, can handle glycan fragments but do not have such features for specifying specific attachment sites. Moreover, all functionality in SugarDrawer, including editing of the glycan fragment and database search features, are web-based.
Regarding database search, we use a web API for searching GlyTouCan IDs using WURCS. We implemented this feature such that it could be easily used by other databases as well. However, the database available in SugarDrawer is limited to the GlyCosmos. On the other hand, the “Get GlyTouCan ID” function in the GlycoGlyph obtains the accession number of the GlyTouCan from editing glycan and provides detail of this structure from GlyTouCan, GlyGen, PubChem, and ChEBI, using the GlyCosmos API. Thus, we plan to provide the details of the glycans obtained from these databases in the “Search” function of SugarDrawer in the future. Since wwPDB now provides WURCS as a linear descriptor in their data category, this “Search” function can also be used to search for glycan structures in wwPDB using WURCS. However, at the time of writing, there is no system or API for accessing WURCSs in wwPDB directly, and this is left for future work.
4. Materials and Methods
We chose JavaScript ES2015 (ES6) as the development language for SugarDrawer. Node.js (10.19.0) and Node package manager (6.14.1) were used to manage dependencies of multiple libraries efficiently. React (15.6.1) and Semantic-ui-react were adapted for the window design. CreateJS-EaselJS (0.8.2) was used for generating glycan images.
Our tool uses SugarSketcher as a library, but we updated some of its features, including GlycoCT conversion, 2D coordinate generation and data structures for storing glycan structure information. The updated code is available at
https://gitlab.com/glycoinfo/sugardrawer/SugarSketcher2 (accessed on 30 October 2021). Webpack (3.0.0) was used for compiling the code to use React and CreateJS-EaselJS in the browser.
4.1. Data Structures
We created the Liaise class for managing and storing all data in this tool. This class was used for editing glycan structures, for connecting SugarSketcher data structures and for storing the editing process by users to support undo and redo functionality.
4.2. Glycan Images
Image generation in this tool was implemented using the Glycan, Monosaccharide, and GlycosidicLinkage classes in SugarSketcher and the Shape class in CreateJS. Text image generation used the Text class in CreateJS combined with the GlycosidicLinkage and Substitutent classes in SugarSketcher.
4.3. Format Conversion
In this tool, SugarSketcher was used for GlycoCT import and export. For WURCS import and export, the GlycanFormatConverter API [
15], which can convert between GlycoCT and WURCS, was used.
4.4. Handling Glycan Fragments
Figure 8 shows an extension of the SugarSketcher data structure for glycan fragments. Since the data structure implemented by SugarSketcher for glycans, the Glycan class, could not handle glycan fragments originally, we used the Glycan class for both the core and fragment parts of glycan fragments. While the usage of the Glycan class was unchanged for the core part, we added information for linkage and attachment sites such as structured data, Linkage information and Attachment sites, for the fragment part. The Linkage information has information of linkage positions to the core part from the fragment part, and the Attachment sites have a list of monosaccharide identifications which indicate the attachment sites on the core part. All the Glycans were stored in an array, and the first element was assumed to be the core part, while the subsequent ones were considered to be fragment parts. If the array had only one element, this indicates that it is a simple glycan, and not a glycan fragment. This array is a member of the Liaise class.
Since SugarSketcher does not have GlycoCT conversion functionality for glycan fragments, we extended two classes in SugarSketcher, GlycoCTParser and GlycoCTWriter, for GlycoCT import and export as GlycoCTParserForFragment and GlycoCTWriterForFragment, respectively (
Figure 9). Specifically, the GlycoCTParser output and GlycoCTWriter input can only handle a Glycan, but GlycoCTParserForFragment output and GlycoCTWriterForFragment input can handle an array of Glycans indicating a glycan fragment. Since the “ParentIDs” and “SubtreeLinkageID” lines in GlycoCT are specific information for glycan fragments (see
Figure 5), parsing and writing features for the lines indicating a fragment were newly implemented in GlycoCTParserForFragment and GlycoCTWriterForFragment, respectively. WURCS import and export can also handle these glycan fragments using this functionality and the GlycanFormatConverter API.
4.5. Database Search
This new tool provides GlyTouCan database search functionality based on WURCS. In this system, we called a web API which receives WURCS and returns a GlyTouCan ID. We also used another API which generates glycan images using the GlyTouCan ID to show the images in the same way as GlyTouCan. The GlyTouCan ID is hyperlinked to the GlyTouCan entry page, and the generated glycan image is displayed in a table. These web APIs are provided as GlyCosmos web resources (
https://glycosmos.org/glycans/show/ (accessed on 30 October 2021).