Processing the Narrative: Innovative Graph Models and Queries for Textual Content Knowledge Extraction †
Abstract
:1. Introduction
1.1. Contribution and Methodology
- [i]
- The extraction and representation of text content using data models, including matrices with term frequencies, text graphs, ontologies, property graphs and vectorial representations through the design of graph databases.
- [ii]
- The consequences of the chosen modelling approach on the capabilities of querying, updating, maintenance and knowledge discovery within the text content.
- RQ1.
- What are the capabilities and limitations of graph-based representations (text graphs, ontologies, property graphs and vectorial representations) in capturing syntax and semantics within textual content? RQ1 examines the barriers and challenges in processing textual content. It can be answered by exploring examples and identifying experiments that illustrate challenges in understanding these barriers.
- RQ2.
- How can a graph-based structural and semantic text representation address missing information, such as incomplete, low-quality, or contradictory annotations, and what are the implications for querying and maintenance? RQ2 studies how to address missing information—such as incomplete, low-quality, or contradictory annotations—within text graphs, ontologies, property graphs or vectorial representations that structure text content. It considers the implications for querying and maintenance and identifies unresolved issues that demand further attention.
- RQ3.
- How do graph-based text representations enable the extraction of explicit and implicit knowledge? RQ3 examines text content representations maintenance and update. It asks how annotations should evolve in response to changes in repositories and corpora and how newly deduced, discovered, or explicitly inserted knowledge can be validated.
- RQ4.
- What querying possibilities are enabled by structural and semantic text graph representations? RQ4 explores the possibilities for querying different graph-based representations of text content. In the analytical sense, this includes querying graphs like traversals, pattern discovery, community discovery and centrality.
1.2. Organisation of the Paper
2. Structuring Textual Content into Graphs
2.1. Extracting Structured Semantic Information
2.1.1. Modelling Structural Text Content
- the techniques adapted for processing the text seeking for entities considering the characteristics of the language (i.e., grammar) and;
- identifying the pertinent data structures for structuring the content.
- [1]
- Processing structural textual content. NLP tasks have been developed for building structured information from text [4]. Techniques address the structural and morphological dimensions of texts and semantics representation. Regarding structure and morphology, NLP tasks include the following:
- Text and speech processing [5] encompasses the conversion of text from various formats (image, sound clip, text) into a textual representation, typically comprising words or sentences. The complexity of this task varies depending on the linguistic characteristics of the language in which the content is produced. Tasks include optical character recognition, speech recognition, speech segmentation, text-to-speech, and word segmentation.
- Morphological analysis [6] breaks down words into their constituent morphemes or lemmas to create normalised representations or to identify parts of speech (POS) within sentences (e.g., noun, verb, adjective). Tasks include lemmatisation, morphological segmentation, part-of-speech tagging, and stemming.
- Syntactic analysis [7,8] aims to understand the structural aspects of natural languages. It involves generating a grammar describing the syntax of a language, identifying sentence boundaries within text chunks, and constructing parse trees (grammatical analysis) that represent relationships (dependencies) between words. Dependency parsing establishes relationships with or without probabilities, while constituency parsing produces probabilistic context-free or stochastic grammars. Tasks include grammar induction, sentence breaking, and parsing.
- [2]
- Data structures for structuring textual content. Different data structures form modelling textual content (e.g., matrices and text graphs). However, the study focuses on using graphs as data structures for structuring textual content that can be adapted to query and extracting knowledge profiting from the mathematical characteristics of graphs.
2.1.2. Modelling Textual Semantics
- Lexical semantics of individual words in context [11]: focuses on determining the computational meaning of individual words within a given context or from data. It involves identifying proper names (e.g., people or places) and their types (e.g., person, location, organisation), extracting subjective information (e.g., sentiments), identifying terminology, and disambiguating words with multiple meanings. This complex task includes lexical semantics, distributional semantics, named entity recognition, sentiment analysis, terminology extraction, word-sentence disambiguation, and entity linking.
- Relational semantics [12]: involves identifying relationships among named entities, disambiguating semantic predicates and semantic roles, and generating formal representations of text semantics. This task includes relationship extraction, semantic parsing, and semantic role labelling. For example, understanding that “bank“ can mean a financial institution or the side of a river depending on the context, and modelling these different meanings accordingly.
- Discourse analysis [13] extends beyond individual sentence semantics and includes tasks such as co-reference resolution, discourse parsing to determine discourse relationships between sentences, recognising textual entailment, identifying topics within text segments, and identifying argumentative structures. For example, in a sentence such as “He entered John’s house through the front door”, “the front door” is a referring expression. The bridging relationship to be identified is that the door referred to is the front door of John’s house (rather than of some other structure that might also be referred to).
2.2. Building a Text Graph
Centrality indices answer the question, “What characterises an important vertex?” The answer is given in terms of a real-valued function on the vertices of a graph, where the values produced are expected to provide a ranking that identifies the most relevant nodes. The word “importance” has a vast number of meanings, leading to many different definitions of centrality.
- Information retrieval: represent the relationships between documents and the terms they contain, allowing for more effective information retrieval process.
- Text summarization: represent the relationships between sentences in a document, allowing for the automatic generation of summaries.
- Topic modelling: represent the relationships between documents and the topics they contain, automatically identifying topics within a corpus of text.
- Sentiment analysis: represent the relationships between words and their sentiment, allowing for automatic sentiment analysis in text.
2.3. Querying Text Graphs
2.3.1. Exploring Text Graphs
- Attributed text graph queries extract information from attributed graphs, where vertices and edges are associated with attributes such as types, numbers, and texts. For example, suppose we have an attributed graph representing a document, where each node represents a word and has an attribute “word” containing the word itself. Edges represent the order of words in the document. We can use an RPQ to find all occurrences of a specific phrase in the document. If we want to find all occurrences of the words “artificial intelligence” in the document, we can use the following RPQ:[word="artificial"]/./[word="intelligence"]
- Structure extraction is used to extract the underlying structure of a text graph by identifying and grouping equivalent nodes. For example, we can assume that we have a graph as described in the previous example. Suppose we want to group all occurrences of the word “artificial” together. In that case, we can use structure extraction to identify all nodes representing the word “artificial” and group them into a single node. This would result in a simplified graph where all occurrences of the word “artificial” are represented by a single node. Similarly, we can use structure extraction to group nodes representing equivalent concepts, such as synonyms or related words. We could group nodes representing the words “artificial intelligence”, “AI”, and “machine learning” into a single node representing the concept of artificial intelligence.
2.3.2. Analysing Text Graphs
2.4. Processing the Nuances and Ambiguity of Natural Language
2.4.1. Dealing with Bias in NLP Processing
- preprocessing data to balance or neutralise biases, creating fairer datasets for training models [21];
- altering training procedures or model architectures ensures equitable performance across different demographic groups [22];
- using attention mechanisms can enhance the transparency of NLP models [23];
- helping to pinpoint and correct biases in how models process data [24];
- adjusting model training to calibrate outputs to reduce bias [25];
- establishing robust ethical frameworks ensures that NLP applications adhere to principles of fairness and equity [26].
- [1]
- Visualization of Relationships and Associations. Text graphs can map out the relationships between entities, concepts, and words in the text, making it easier to identify biased associations. For example, if certain adjectives (like “aggressive” or “emotional”) are disproportionately associated with a particular gender or ethnicity, a text graph can visually highlight these associations, making the bias clear. By examining the structure of a text graph, one can spot imbalances in the representation of different groups or ideas. For instance, if the graph shows a significant clustering of negative sentiments around a specific demographic, it could indicate a bias in how that group is portrayed.
- [2]
- Contextual Analysis. Text graphs can capture the context in which certain entities or groups are mentioned. Suppose certain stereotypes are consistently reinforced through specific relationships or descriptions (e.g., linking a profession predominantly to one gender). In that case, the graph will make these patterns visible, exposing potential biases in the text. By tracking the usage of specific terms across different contexts, text graphs can help identify biased language. For example, the graph can reveal if terms like “leader” are more frequently associated with one gender. In contrast, terms like “helper” are associated with one another, highlighting a bias in how roles are portrayed.
- [3]
- Quantifying Bias through Metrics. By analysing the nodes (entities) and edges (relationships) in a text graph, one can quantify the extent of bias. Metrics like centrality, frequency, and clustering can indicate how often certain groups or ideas are connected to negative or positive terms, allowing for a more objective bias assessment. Text graphs can integrate sentiment analysis to evaluate how different entities are described. If the graph reveals that certain groups are frequently linked to negative sentiments or emotions, it can indicate underlying bias.
- [4]
- Comparative Analysis. Text graphs allow for comparing biases across multiple documents or corpora. By comparing the structure and relationships in graphs from different sources, one can identify consistent biases or disparities in how certain topics or groups are represented. Text graphs can be used to track how biases evolve. By comparing graphs generated from texts written in different periods, one can see if biases have increased, decreased, or changed in nature, providing insights into how societal attitudes are reflected in language.
- [5]
- Highlighting Ambiguity and Inconsistency. Text graphs can also highlight areas where the text is ambiguous or inconsistent in portraying specific ideas. For example, if a text graph shows conflicting relationships for a particular entity (e.g., an individual is described as both “competent” and “incompetent”), it could indicate a bias in how that entity is portrayed. Suppose the graph reveals that certain groups are represented inconsistently across different text parts (e.g., a minority group being portrayed positively in one section and negatively in another). In that case, this inconsistency might reflect an underlying bias in the content.
2.4.2. Maintaining Content Representation and Dealing with Annotations Evolution
- Nodes represent key entities, such as symptoms (e.g., “chest pain”, “shortness of breath”), diagnoses (e.g., “coronary artery disease”, “hypertension”), treatments (e.g., “beta blockers”, “angioplasty”), lab results (e.g., “elevated cholesterol”, “EKG results”), and visits (e.g., “initial consultation”, “six-month follow-up”).
- Edges represent relationships between these entities, such as “diagnosed_with”, “treated_with”, “led_to”, and “associated_with”. These edges can also carry temporal annotations to indicate when these relationships were recorded.
2.5. Wrap Up Example: Textual Content Representation, Maintenance and Exploration in Healthcare
2.5.1. Processing Textual Content in Healthcare
- [1]
- Clinical documentation: NLP significantly enhances the extraction and structuring of data from unstructured clinical notes, thereby improving patient care through streamlined analysis [32]. For instance, consider a patient’s clinical notes that describe “severe chest pain” and “shortness of breath”, mention the administration of “nitroglycerin”, and recommend a “stress test”. NLP technology can process this information by identifying and tagging medical terms for automatically detecting and classifying terms within the notes as symptoms (e.g., chest pain, shortness of breath), medications (e.g., nitroglycerin), and procedures (e.g., stress test). They can also structure the identified information into organised categories such as symptoms, diagnosed conditions, medications administered, and procedures recommended.
- [2]
- Predictive analytics: NLP analyses electronic medical records to identify patients at heightened risk of health disparities, bolstering surveillance efforts.
- [3]
- Clinical decision support: NLP aids clinical decision-making by furnishing clinicians with pertinent patient medical history information alongside the latest research findings and guidelines.
- [4]
- Patient experience: NLP enhances the patient experience by delivering personalised information and support across the patient’s care continuum.
- Variation in language: Many different dialects and language variations are used in medical records, making it difficult for NLP algorithms to understand and interpret the content accurately.
- Data standardisation: Poor standardisation of data elements, insufficient data governance policies, and variation in the design and programming of electronic health records (EHRs) can make it challenging to use NLP to fill the gaps in structured data.
- Domain-specific language: The application of NLP methodologies for domain-specific languages, such as biomedical text, can be challenging due to the complexity and specificity of the language used.
2.5.2. Structural and Semantic Modelling of Clinical Cases
2.5.3. Querying and Analysing Clinical Cases Represented as Graphs: From Facts to Knowledge
2.6. Discussion: Current Trends and Open Issues
- -
- RQ1. What are the capabilities and limitations of graph-based representations in capturing syntax and semantics within textual content?
- -
- RQ3. How do they enable the extraction of both explicit and implicit knowledge?
- -
- RQ2. How can a graph-based structural and semantic text representation address missing information, such as incomplete, low-quality, or contradictory annotations, and what are the implications for querying and maintenance?
- -
- RQ4. What querying possibilities are enabled by structural and semantic text graph representations?
3. Modelling Knowledge from Textual Content: Semantic Web
3.1. Modelling Textual Content
- Analysing domain-specific text to identify pertinent terms, concepts, and their relationships;
- Mapping these elements into an ontology using representation languages such as OWL (Web Ontology Language), RDF (Resource Description Framework), or RDFS (Resource Description Framework Schema);
- Evaluating the constructed ontology.
Concepts: | COVID-19, Antiviral Medications, |
Nirmatrelvir with Ritonavir (Paxlovid), | |
Remdesivir (Veklury), Fever, | |
Cough, Shortness of Breath | |
Relationships: | |
Nirmatrelvir with Ritonavir (Paxlovid) is a medication for COVID-19 | |
Remdesivir (Veklury) is a medication for COVID-19 | |
Fever is a symptom of COVID-19 | |
Cough is a symptom of COVID-19 | |
Shortness of Breath is a symptom of COVID-19 |
3.2. Reasoning (Querying) with Ontologies
- Hierarchical relationships include subclass or subtype relationships (e.g., “A dog is a mammal”). They establish a hierarchy within the ontology, helping to organise concepts into categories and subcategories.
- Associative relations connect entities in a non-hierarchical manner, such as “A doctor treats a patient” or “A drug is prescribed for a disease”. They define how different concepts interact or are related within the domain.
- Ontological relationships often come with logical constraints or axioms (e.g., inverse, transitive, symmetric properties). These constraints help in reasoning and inference, enabling the derivation of new knowledge from existing facts.
- Querying basic information: Retrieve basic information about all patients in the dataset.PREFIX cmdo: <http://example.org/cmdo#>SELECT ?patient ?name ?age ?genderWHERE {?patient a cmdo:Patient ;cmdo:hasName ?name ;cmdo:hasAge ?age ;cmdo:hasGender ?gender .}
- Finding patients with specific conditions: Retrieve the list of patients diagnosed with a specific condition, such as diabetes.PREFIX cmdo: <http://example.org/cmdo#>SELECT ?patient ?nameWHERE {?patient a cmdo:Patient ;cmdo:hasName ?name ;cmdo:hasCondition ?condition .?condition a cmdo:Condition ;cmdo:hasName "Diabetes" .}
- Reasoning over treatment outcomes: Identify patients who were treated with a specific medication and their associated treatment outcomes.PREFIX cmdo: <http://example.org/cmdo#>SELECT ?patient ?name ?medication ?outcomeWHERE {?patient a cmdo:Patient ;cmdo:hasName ?name ;cmdo:receivedTreatment ?treatment .?treatment a cmdo:Treatment ;cmdo:usesMedication ?medication ;cmdo:hasOutcome ?outcome .?medication a cmdo:Medication ;cmdo:hasName "Metformin" .}
- Identifying potential risk factors: Identify patients with multiple risk factors (e.g., high blood pressure and high cholesterol) who have not yet been diagnosed with a related condition (e.g., cardiovascular disease).PREFIX cmdo: <http://example.org/cmdo#>SELECT ?patient ?nameWHERE {?patient a cmdo:Patient ;cmdo:hasName ?name ;cmdo:hasRiskFactor ?riskFactor1, ?riskFactor2 .?riskFactor1 a cmdo:RiskFactor ;cmdo:hasName "High Blood Pressure" .?riskFactor2 a cmdo:RiskFactor ;cmdo:hasName "High Cholesterol" .FILTER NOT EXISTS {?patient cmdo:hasCondition ?condition .?condition a cmdo:Condition ;cmdo:hasName "Cardiovascular Disease" .}}
- Detecting condition progression: Detect patients whose condition has progressed based on lab results or other clinical measurements.PREFIX cmdo: <http://example.org/cmdo#>SELECT ?patient ?name ?condition ?measurement ?value ?dateWHERE {?patient a cmdo:Patient ;cmdo:hasName ?name ;cmdo:hasCondition ?condition ;cmdo:hasMeasurement ?measurementRecord .?condition a cmdo:Condition ;cmdo:hasName "Diabetes" .?measurementRecord a cmdo:Measurement ;cmdo:hasMeasurementType ?measurement ;cmdo:hasMeasurementValue ?value ;cmdo:hasMeasurementDate ?date .FILTER (?measurement = "HbA1c" && ?value > 7.0)}
- Inferring new knowledge through reasoning: Infer patients at risk of a condition based on related factors and existing conditions, applying reasoning over the ontology.PREFIX cmdo: <http://example.org/cmdo#>CONSTRUCT {?patient cmdo:atRiskOf ?condition .}WHERE {?patient a cmdo:Patient ;cmdo:hasCondition ?existingCondition ;cmdo:hasRiskFactor ?riskFactor .?existingCondition a cmdo:Condition ;cmdo:isRelatedTo ?condition .?condition a cmdo:Condition .}
- Finding treatment patterns: Identify common treatment patterns for a particular condition across multiple patients.PREFIX cmdo: <http://example.org/cmdo#>SELECT ?condition ?treatment (COUNT(?patient) AS ?numPatients)WHERE {?patient a cmdo:Patient ;cmdo:hasCondition ?condition ;cmdo:receivedTreatment ?treatment .?condition a cmdo:Condition ;cmdo:hasName "Hypertension" .}GROUP BY ?condition ?treatmentORDER BY DESC(?numPatients)
- Predicting complications: Predict potential complications for patients with a specific condition based on past data.PREFIX cmdo: <http://example.org/cmdo#>SELECT ?patient ?name ?predictedComplicationWHERE {?patient a cmdo:Patient ;cmdo:hasName ?name ;cmdo:hasCondition ?condition ;cmdo:hasRiskFactor ?riskFactor .?condition a cmdo:Condition ;cmdo:hasName "Diabetes" .?riskFactor a cmdo:RiskFactor ;cmdo:predictsComplication ?predictedComplication .}
3.3. Discussion: Current Trends and Open Issues
- -
- RQ1. What are the capabilities and limitations of ontology-based representations in capturing syntax and semantics within textual content?
- -
- RQ2. How can an ontology-based structural and semantic text representation address missing information, such as incomplete, low-quality, or contradictory annotations, and what are the implications for querying and maintenance?
- -
- RQ3. How do ontologies enable the extraction of both explicit and implicit knowledge?
- -
- RQ4. What querying possibilities are enabled by structural and semantic text ontologies representations?
4. Graph Databases for Storing, Querying and Analysing Textual Content
4.1. Building Graph Databases
Graph Database Modelling
- [1]
- Decision-Making Challenge in Transforming into a Property Graph. When transforming this clinical case text into a property graph, decisions must be made regarding which entities are modelled as nodes, properties, and edges. Deciding what to model as nodes, properties, and edges involves considering the importance and relationships of entities within the clinical context:
- -
- Nodes: Entities central to the clinical decision-making process (e.g., patient, conditions, and medications) are modelled as nodes because they represent key concepts or actors in the clinical narrative.
- -
- Properties: Attributes that describe these entities without requiring separate nodes are modelled as properties. For instance, the patient’s age and gender are properties of the patient node rather than separate nodes because they describe the patient rather than act as independent entities.
- -
- Edges: Relationships critical to understanding the flow of the clinical case are modelled as edges. For instance, the connection between a diagnosis and its treatment is best represented as an edge to clearly show the therapeutic decision process.
- [2]
- Graph database modelling challenges:
- Ambiguity: Deciding whether to model “elevated cholesterol levels” as a property of a test result or as a separate node connected by an edge can be ambiguous and depends on the specific use case and the granularity of the graph.
- Complexity: The more detailed the graph, the more complex it becomes. For instance, every medication might have dosage, frequency, and side effects as properties, but modelling these might complicate the graph unnecessarily for certain queries.
- Consistency: Ensuring that similar entities across different patient records are consistently modelled is crucial for making the graph queryable and comparable across cases.
4.2. Defining and Querying a Graph Database
4.2.1. Defining a Graph Database
4.2.2. Querying and Analysing a Graph Database
- make personalised recommendations;
- identify clusters or communities of nodes that are more densely connected than to the rest of the graph;
- predicts the likelihood of a link forming between two nodes in a graph based on the characteristics of the nodes and their relationships with other nodes in the graph;
- classify the nodes in a graph based on their attributes and relationships with other nodes.
- [1]
- Analysing text graph databases through an example: a medical graph database.
- node types representing entities like patient, condition, symptom, medication, and treatment
- edges likePatient -> condition (diagnosed_with);Patient -> medication (treated_with);Condition -> symptom (exhibits);Treatment -> outcome (results_in).
- [2]
- Graph analytics in GDBMS. A GDBMS and a graph data science platform can be utilised to calculate various graph metrics, including generic metrics such as betweenness centrality [62] and PageRank [16,63] score, as well as domain-specific metrics such as the number of known fraudsters indirectly connected to a given client. Additionally, these tools can generate graph embeddings [64], translating graph data into a more suitable format for machine learning.
- [3]
- Using a GDBMS for graph analytics: concluding insights. Using a graph database management system (GDBMS) for performing graph analytics offers several significant advantages over traditional approaches based on programming languages and machine learning libraries.
4.2.3. Schema Inference
- [1]
- Inferring Node Types and Their Properties. Query the graph to identify the types of nodes and the properties associated with each type. The following expression gives an example on how to implement this task. The query MATCH (n) is used to match all nodes in the graph, allowing to access and analyse each node individually. The function labels(n) retrieves the labels assigned to these nodes, which typically indicate the type of the node, such as patient or condition. The function keys(n) is then used to extract the properties associated with each node, revealing the attributes or data fields that describe the node (e.g., age, name, severity). The DISTINCT keyword ensures that the results returned are unique, meaning that each node type and its associated properties are only listed once, avoiding duplicates and providing a clear overview of the graph’s structure.
RelationshipType | | | FromNodeType | | | ToNodeType |
----------------- | | | ---------------- | | | ---------------- |
"diagnosed_with" | | | ["Patient"] | | | ["Condition"] |
"prescribed" | | | ["Doctor"] | | | ["Medication"] |
"treated_by" | | | ["Patient"] | | | ["Doctor"] |
- [2]
- Inferring Relationships Between Node Types. Query the graph to infer the relationships (edges) between these node types. The query MATCH (a)-[r]->(b) is designed to match all relationships in the graph that connect one node to another, allowing to examine how different entities in the graph are related. The type(r) function is then used to retrieve the type of each relationship, such as diagnosed_with or prescribed, which describes the nature of the connection between the nodes. Additionally, labels(a) and labels(b) are used to retrieve the labels of the nodes at each end of the relationship, indicating the types of entities that are being connected (e.g., a patient node connected to a condition node). This query helps in understanding the structure and semantics of the relationships within the graph.
RelationshipType | | | FromNodeType | | | ToNodeType |
----------------- | | | ---------------- | | | ---------------- |
"diagnosed_with" | | | ["Patient"] | | | ["Condition"] |
"prescribed" | | | ["Doctor"] | | | ["Medication"] |
"treated_by" | | | ["Patient"] | | | ["Doctor"] |
- [3]
- Inferring Property Types and Values. To understand the kind of data each property holds, query for examples of property values. A function from the APOC library returns the type of the property (e.g., String, Integer, Float).
NodeType | | | Property | | | PropertyType |
----------------- | | | ---------------- | | | ---------------- |
["Patient"] | | | "name" | | | "String" |
["Patient"] | | | "age" | | | "Integer" |
["Medication"] | | | "dosage" | | | "String" |
["Condition"] | | | "severity" | | | "String" |
- Node Types: patient, condition, medication, doctor.
- Relationships: diagnosed_with (patient -> condition), prescribed (doctor -> medication), treated_by (patient -> doctor).
- Properties: Nodes like patient have properties such as name, age, and gender, with types such as String and Integer.
4.3. Discussion: Current Trends and Open Issues
- -
- RQ1. What are the capabilities and limitations of graph databases in capturing syntax and semantics within textual content?
- -
- RQ2. How can a text graph database address missing information, such as incomplete, low-quality, or contradictory annotations, and what are the implications for querying and maintenance?
- -
- RQ3. How do graph databases enable the extraction of both explicit and implicit knowledge?
- Explicit Knowledge: Direct queries on the graph can extract explicit knowledge, such as relationships between entities, specific properties of nodes, or predefined paths. For example, in a graph where nodes represent medical conditions and treatments, a query can directly retrieve all treatments associated with a specific condition. Pattern matching queries allow for the extraction of known relationships and connections within the graph (e.g., “Find all patients who have been treated for hypertension with beta-blockers”).
- Implicit Knowledge: Graph traversal queries enable the exploration of relationships that are not immediately obvious. By traversing paths between nodes, these databases can uncover hidden connections and dependencies, revealing implicit knowledge. For instance, traversing a medical knowledge graph might reveal an indirect link between two seemingly unrelated symptoms through a common underlying condition.
- -
- RQ4. What querying possibilities are enabled by text graph databases?
- Graph analytics queries are implemented by GDBMSs with built-in graph algorithms such as PageRank, community detection, and similarity scoring. These algorithms help uncover implicit structures and patterns within the graph, such as identifying influential nodes, clustering similar entities, or finding potential correlations that are not directly stated in the textual content.
- Rule-based inference queries allow to infer new knowledge from existing data based on predefined rules. This allows the database to deduce new relationships or classify entities in ways that were not explicitly stated in the text.
5. Artificial Intelligence for Analysing and Discovering Knowledge from Text Graphs
5.1. Data Science Pipelines for Modelling Text Content
5.1.1. Building a Data Science Pipeline for Representing Textual Content
- [1]
- Text preprocessing. The process begins with document collection, where a diverse dataset is gathered, including text from sources such as articles, clinical reports, emails, or any other relevant textual data. Once collected, the text undergoes preprocessing to prepare it for analysis. This involves tokenization, which breaks the text down into individual words or phrases, known as tokens. Lemmatization or stemming is then applied to normalize these tokens to their root form, ensuring consistency across the dataset. Common words that do not add significant meaning, such as “the” or “and”, are removed during the stop words removal step. Finally, the text is split into individual sentences, making it easier to analyse and extract meaningful information in subsequent steps.
- [2]
- Entity Recognition and Classification. Named entity recognition (NER) involves using a pre-trained model or training a custom model to identify and classify key entities within the text, such as persons, organizations, locations, dates, or medical terms. Once these entities are recognized, entity linking is performed, where each entity is connected to a corresponding entry in a knowledge base or ontology—such as linking “New York” to the concept of a city in a geographical database. After linking, the entities are classified into predefined categories based on their roles within the text, such as “Person”, “Location”, “Disease”, or “Medication”, helping to organise and structure the extracted information for further analysis.
- [3]
- Relationship Extraction. Dependency parsing involves using models to analyse the syntactical structure of sentences, identifying relationships between words such as subject-verb-object connections. This process helps to map out how different words interact within a sentence, providing a foundation for understanding the text’s meaning. Once these relationships are identified, machine learning models are applied to classify them, determining specific types of connections between entities, such as “treated with”, “located in”, or “works for”. Additionally, co-reference resolution is performed to ensure that references to the same entity—such as “John” and “he”—are correctly linked, maintaining consistency in the extracted relationships throughout the text.
- [4]
- Initial Graph construction. In the graph construction phase, nodes are created for each identified entity, with each node labelled according to the entity type, such as “Person”, “Location”, or “Condition”. These nodes are then enriched with properties that reflect attributes extracted from the text, like age, date, or description, providing additional context for each entity. Following node creation, edges are established between nodes to represent the relationships identified during the text analysis. Each edge is labelled to indicate the type of relationship, such as “diagnosed_with” or “treated_with”, and, where applicable, properties like the date or context of the relationship are assigned to these edges, further enhancing the graph’s ability to capture and represent the intricacies of the textual content.
- [5]
- Graph population and optimization. In the graph population and optimization stage, the constructed nodes and edges are inserted into a GDMBS, creating a structured representation of the textual content. To ensure efficient access and analysis, indexing is applied to frequently queried nodes and edges, enhancing the speed and performance of graph traversal and querying. Additionally, the graph structure is optimised by merging similar nodes or relationships, which reduces redundancy and streamlines the overall graph, making it more efficient for storage, retrieval, and further analysis.
- [6]
- Validation and refinement. In the validation and refinement phase, the accuracy of the graph structure is validated by comparing it against a ground truth or leveraging feedback from domain experts to ensure that the relationships and entities accurately represent the underlying data. Based on this validation, the machine learning models used for entity recognition, relationship extraction, and graph-based predictions are refined to address any discrepancies or inaccuracies identified. This process of refinement enhances the precision and reliability of the graph. Additionally, continuous learning is implemented, where models are regularly updated and retrained as new documents are added to the dataset. This ongoing process ensures that the graph remains accurate, relevant, and reflective of the most current data available.
- [7]
- Deployment and maintenance. In the deployment and maintenance phase, the graph database is deployed for real-time applications such as question answering systems, recommendation engines, or decision support systems, where it can be actively used to derive insights and support decision-making. Maintenance involves regularly updating the graph with new data to ensure it remains current and relevant. Additionally, machine learning models are retrained as needed to accommodate evolving data patterns and improve system accuracy. Continuous monitoring of the graph’s performance and accuracy is essential to ensure the system functions optimally and delivers reliable results over time.
5.1.2. Data Science as First-Class Citizens
5.1.3. Implementing Data Science Pipelines
5.2. Vectorial Representation of Text Graphs
Graph Neural Networks for Building Text Graphs
5.3. Discussion: Current Trends and Open Issues
- -
- RQ1. What are the capabilities and limitations of vectorial representations in capturing syntax and semantics within textual content?
- -
- RQ2a. How can machine learning models address missing information, such as incomplete, low-quality, or contradictory annotations?
- -
- RQ2b. How can a vectorial representations of text graphs address missing information, such as incomplete, low-quality, or contradictory annotations, and what are the implications for querying and maintenance?
- -
- Addressing Missing and Incomplete Information. Vectorial representations of text graphs can effectively address issues of missing information, incomplete or low-quality data, and contradictory annotations by leveraging the inherent ability of vector spaces to capture and generalize patterns across the data.
- -
- Handling Contradictory Annotations. Vectorial models can capture the different contexts in which similar entities appear, allowing the model to distinguish between contradictory annotations based on their respective contexts. For example, in a text graph where “headache” might be annotated as a symptom of both stress and a neurological condition, vector representations can help differentiate these contexts, reducing the ambiguity in querying and analysis.
- -
- Implications for Querying. With vectorial representations addressing missing and contradictory information, queries against the graph can return more accurate and contextually relevant results. The model’s ability to infer and generalize from incomplete data ensures that queries are less likely to fail or return incorrect results due to gaps or errors in the dataset.
- -
- Implications for Maintenance. The ability of vectorial models to generalize from incomplete or low-quality data reduces the need for constant manual curation and correction of the dataset. This can significantly lower maintenance overheads, as the system becomes more resilient to imperfections in the data.
- -
- RQ3. How do vectorial representations enable the extraction of both explicit and implicit knowledge?
- [1]
- Extraction of explicit knowledge. Vectorial representations encode explicit knowledge directly from the data, such as the meaning of words, relationships between entities, and their attributes. For example, in a medical context, a word embedding might directly capture the relationship between terms like “diabetes” and “insulin” based on their co-occurrence in the text. These vectors encode the explicit connections, allowing for straightforward retrieval of information through querying or similarity searches.
- [2]
- Extraction of implicit knowledge. Vectorial representations capture the broader context in which entities appear, allowing the model to infer relationships and meanings that are not explicitly stated. For instance, if “fatigue” frequently appears in medical cases involving both “anemia” and “thyroid disorders”, the model can implicitly associate fatigue with these conditions, even if this connection is not directly annotated.
- -
- RQ4. What querying possibilities are enabled by vectorial text graph representations?
6. Conclusions and Outlook
- Content extraction: This involves using linguistic techniques to delve into the language and extract meaningful content.
- Knowledge representation: This assumes that textual content defines a network of representative concepts, semantic relations, and associated consistency rules that represent the knowledge contained in the text, including knowledge that can be inferred from it.
- Efficient query execution: This involves selecting appropriate graph data models to represent textual content so that specific queries can be answered efficiently.
- Knowledge modelling, discovery, and prediction: Given textual content represented as graphs, this involves modelling how concepts weave content by connecting words, sentences, and groups of sentences with the hypothesis that new knowledge can be produced and predicted. For example, graph machine learning, or geometric machine learning, can learn from complex data such as graphs and multi-dimensional points. Its applications have been relevant in fields such as biochemistry, drug design, and structural biology (https://www.gartner.com/smarterwithgartner/gartner-top-10-data-and-analytics-trends-for-2021, accessed on 19 August 2024).
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Turgunova, N.; Turgunov, B.; Umaraliyev, J. Automatic text analysis. Syntax and semantic analysis. In Engineering Problems and Innovations; TATUFF-EPAI: Chinobod, Uzbekistan, 2023. [Google Scholar]
- Nadkarni, P.M.; Ohno-Machado, L.; Chapman, W.W. Natural language processing: An introduction. J. Am. Med. Inform. Assoc. 2011, 18, 544–551. [Google Scholar] [CrossRef] [PubMed]
- Idnay, B.; Dreisbach, C.; Weng, C.; Schnall, R. A systematic review on natural language processing systems for eligibility prescreening in clinical research. J. Am. Med. Inform. Assoc. 2022, 29, 197–206. [Google Scholar] [CrossRef] [PubMed]
- Fanni, S.C.; Febi, M.; Aghakhanyan, G.; Neri, E. Natural language processing. In Introduction to Artificial Intelligence; Springer: Berlin, Germany, 2023; pp. 87–99. [Google Scholar]
- Trivedi, A.; Pant, N.; Shah, P.; Sonik, S.; Agrawal, S. Speech to text and text to speech recognition systems—A review. IOSR J. Comput. Eng 2018, 20, 36–43. [Google Scholar]
- Luerkens, D.W.; Beddow, J.K.; Vetter, A.F. Theory of morphological analysis. In Particle Characterization in Technology; CRC Press: Boca Raton, FL, USA, 2018; pp. 3–14. [Google Scholar]
- Chomsky, N. Systems of syntactic analysis. J. Symb. Log. 1953, 18, 242–256. [Google Scholar] [CrossRef]
- Chowdhary, K.; Chowdhary, K. Natural language processing. In Fundamentals of Artificial Intelligence; Springer: New Delhi, India, 2020; pp. 603–649. [Google Scholar]
- Eisenstein, J. Introduction to Natural Language Processing; MIT Press: Cambridge, MA, USA, 2019. [Google Scholar]
- Maulud, D.H.; Zeebaree, S.R.; Jacksi, K.; Sadeeq, M.A.M.; Sharif, K.H. State of art for semantic analysis of natural language processing. Qubahan Acad. J. 2021, 1, 21–28. [Google Scholar] [CrossRef]
- Geeraerts, D. Theories of Lexical Semantics; OUP Oxford: Oxford, UK, 2009. [Google Scholar]
- Wang, C.; Zhou, X.; Pan, S.; Dong, L.; Song, Z.; Sha, Y. Exploring relational semantics for inductive knowledge graph completion. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 28 February–1 March 2022; Volume 36, pp. 4184–4192. [Google Scholar]
- Potter, J. Discourse analysis. In Handbook of Data Analysis; Sage: London, UK, 2004; pp. 607–624. [Google Scholar]
- Chauhan, K.; Jain, K.; Ranu, S.; Bedathur, S.; Bagchi, A. Answering Regular Path Queries through Exemplars. Proc. VLDB Endow. 2021, 15, 299–311. [Google Scholar] [CrossRef]
- Arul, S.M.; Senthil, G.; Jayasudha, S.; Alkhayyat, A.; Azam, K.; Elangovan, R. Graph Theory and Algorithms for Network Analysis. E3S Web Conf. EDP Sci. 2023, 399, 08002. [Google Scholar] [CrossRef]
- Zhang, P.; Wang, T.; Yan, J. PageRank centrality and algorithms for weighted, directed networks. Phys. A Stat. Mech. Appl. 2022, 586, 126438. [Google Scholar] [CrossRef]
- Garrido-Muñoz, I.; Montejo-Ráez, A.; Martínez-Santiago, F.; Ureña-López, L.A. A survey on bias in deep NLP. Appl. Sci. 2021, 11, 3184. [Google Scholar] [CrossRef]
- Dev, S.; Sheng, E.; Zhao, J.; Amstutz, A.; Sun, J.; Hou, Y.; Sanseverino, M.; Kim, J.; Nishi, A.; Peng, N.; et al. On measures of biases and harms in NLP. arXiv 2021, arXiv:2108.03362. [Google Scholar]
- Hutto, C.; Gilbert, E. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the International AAAI Conference on Web and Social Media, Ann Arbor, MI, USA, 1–4 June 2014; Volume 8, pp. 216–225. [Google Scholar]
- Loper, E.; Bird, S. Nltk: The natural language toolkit. arXiv 2002, arXiv:cs/0205028. [Google Scholar]
- Bolukbasi, T.; Chang, K.W.; Zou, J.Y.; Saligrama, V.; Kalai, A.T. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Proceedings of the Advances in Neural Information Processing Systems 29, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Zhang, Y.; Ramesh, A. Learning fairness-aware relational structures. In ECAI 2020; IOS Press: Tepper Drive Clifton, VA, USA, 2020; pp. 2543–2550. [Google Scholar]
- Wiegreffe, S.; Pinter, Y. Attention is not not explanation. arXiv 2019, arXiv:1908.04626. [Google Scholar]
- Hardt, M.; Price, E.; Srebro, N. Equality of opportunity in supervised learning. In Proceedings of the Advances in Neural Information Processing Systems 29, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Raji, I.D.; Smart, A.; White, R.N.; Mitchell, M.; Gebru, T.; Hutchinson, B.; Smith-Loud, J.; Theron, D.; Barnes, P. Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona Spain, 27–30 January 2020; pp. 33–44. [Google Scholar]
- Jobin, A.; Ienca, M.; Vayena, E. The global landscape of AI ethics guidelines. Nat. Mach. Intell. 2019, 1, 389–399. [Google Scholar] [CrossRef]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. In Proceedings of the Advances in Neural Information Processing Systems 26, Lake Tahoe, NV, USA, 5–10 December 2013. [Google Scholar]
- Le, Q.; Mikolov, T. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. PMLR, Beijing, China, 22–24 June 2014; pp. 1188–1196. [Google Scholar]
- Ma, S.; Sun, X.; Li, W.; Li, S.; Li, W.; Ren, X. Query and output: Generating words by querying distributed word representations for paraphrase generation. arXiv 2018, arXiv:1803.01465. [Google Scholar]
- Kaddari, Z.; Mellah, Y.; Berrich, J.; Belkasmi, M.G.; Bouchentouf, T. Natural language processing: Challenges and future directions. In Proceedings of the International Conference on Artificial Intelligence & Industrial Applications, Meknes, Morocco, 19–20 March 2020; Springer: Cham, Switzerland, 2020; pp. 236–246. [Google Scholar]
- Khurana, D.; Koli, A.; Khatter, K.; Singh, S. Natural language processing: State of the art, current trends and challenges. Multimed. Tools Appl. 2023, 82, 3713–3744. [Google Scholar] [CrossRef]
- Savary, A.; Silvanovich, A.; Minard, A.L.; Hiot, N.; Ferrari2D, M.H. Relation Extraction from Clinical Cases. In Proceedings of the New Trends in Database and Information Systems: ADBIS 2022 Short Papers, Doctoral Consortium and Workshops: DOING, K-GALS, MADEISD, MegaData, SWODCH, Turin, Italy, 5–8 September 2022; Proceedings. Springer Nature: Berlin, Germany, 2022; p. 353. [Google Scholar]
- Carriere, J.; Shafi, H.; Brehon, K.; Pohar Manhas, K.; Churchill, K.; Ho, C.; Tavakoli, M. Case report: Utilizing AI and NLP to assist with healthcare and rehabilitation during the COVID-19 pandemic. Front. Artif. Intell. 2021, 4, 613637. [Google Scholar] [CrossRef]
- Jozefowicz, R.; Vinyals, O.; Schuster, M.; Shazeer, N.; Wu, Y. Exploring the limits of language modeling. arXiv 2016, arXiv:1602.02410. [Google Scholar]
- Kouadri, W.M.; Ouziri, M.; Benbernou, S.; Echihabi, K.; Palpanas, T.; Amor, I.B. Quality of sentiment analysis tools: The reasons of inconsistency. Proc. VLDB Endow. 2020, 14, 668–681. [Google Scholar] [CrossRef]
- Rossiello, G.; Chowdhury, M.F.M.; Mihindukulasooriya, N.; Cornec, O.; Gliozzo, A.M. Knowgl: Knowledge generation and linking from text. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2023; Volume 37, pp. 16476–16478. [Google Scholar]
- Chimalakonda, S.; Nori, K.V. An ontology based modeling framework for design of educational technologies. Smart Learn. Environ. 2020, 7, 1–24. [Google Scholar] [CrossRef]
- Al-Aswadi, F.N.; Chan, H.Y.; Gan, K.H. Automatic ontology construction from text: A review from shallow to deep learning trend. Artif. Intell. Rev. 2020, 53, 3901–3928. [Google Scholar] [CrossRef]
- Bienvenu, M.; Leclère, M.; Mugnier, M.L.; Rousset, M.C. Reasoning with ontologies. In A Guided Tour of Artificial Intelligence Research: Volume I: Knowledge Representation, Reasoning and Learning; Springer: Cham, Switzerland, 2020; pp. 185–215. [Google Scholar]
- Zaihrayeu, I.; Sun, L.; Giunchiglia, F.; Pan, W.; Ju, Q.; Chi, M.; Huang, X. From web directories to ontologies: Natural language processing challenges. In Proceedings of the International Semantic Web Conference, Busan, Republic of Korea, 11–15 November 2007; Springer: Berlin, Germany, 2007; pp. 623–636. [Google Scholar]
- Maynard, D.; Bontcheva, K.; Augenstein, I. Natural Language Processing for the Semantic Web; Springer: Cham, Switzerland, 2017. [Google Scholar]
- Asim, M.N.; Wasim, M.; Khan, M.U.G.; Mahmood, W.; Abbasi, H.M. A survey of ontology learning techniques and applications. Database 2018, 2018, bay101. [Google Scholar] [CrossRef] [PubMed]
- Benbernou, S.; Ouziri, M. Enhancing data quality by cleaning inconsistent big RDF data. In Proceedings of the 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, USA, 11–14 December 2017; IEEE: New York, NY, USA, 2017; pp. 74–79. [Google Scholar]
- Mikroyannidi, E.; Quesada-Martínez, M.; Tsarkov, D.; Fernández Breis, J.T.; Stevens, R.; Palmisano, I. A quality assurance workflow for ontologies based on semantic regularities. In Proceedings of the Knowledge Engineering and Knowledge Management: 19th International Conference, EKAW 2014, Linköping, Sweden, 24–28 November 2014; Proceedings 19. Springer: Berlin, Germany, 2014; pp. 288–303. [Google Scholar]
- Wilson, R.S.I.; Goonetillake, J.S.; Ginige, A.; Indika, W.A. Ontology quality evaluation methodology. In Proceedings of the International Conference on Computational Science and Its Applications, Athens, Greece, 3–6 July 2022; Springer: Berlin, Germany, 2022; pp. 509–528. [Google Scholar]
- Sheveleva, T.; Herrmann, K.; Wawer, M.L.; Kahra, C.; Nürnberger, F.; Koepler, O.; Mozgova, I.; Lachmayer, R.; Auer, S. Ontology-Based Documentation of Quality Assurance Measures Using the Example of a Visual Inspection. In Proceedings of the International Conference on System-Integrated Intelligence, Genova, Italy, 7–9 September 2022; Springer: Berlin, Germany, 2022; pp. 415–424. [Google Scholar]
- Schneider, T.; Šimkus, M. Ontologies and data management: A brief survey. KI-Künstl. Intell. 2020, 34, 329–353. [Google Scholar] [CrossRef] [PubMed]
- Cardoso, S.D.; Pruski, C.; Da Silveira, M.; Lin, Y.C.; Groß, A.; Rahm, E.; Reynaud-Delaître, C. Leveraging the impact of ontology evolution on semantic annotations. In Proceedings of the Knowledge Engineering and Knowledge Management: 20th International Conference, EKAW 2016, Bologna, Italy, 19–23 November 2016; Proceedings 20. Springer: Berlin, Germany, 2016; pp. 68–82. [Google Scholar]
- Pietranik, M.; Kozierkiewicz, A. Methods of managing the evolution of ontologies and their alignments. Appl. Intell. 2023, 53, 20382–20401. [Google Scholar] [CrossRef]
- Ziebelin, M.D.; Pernelle, M.N.; Broisin, M.J.; DesprÈs, M.S.; Rousset, M.M.C.; Jouanot, M.F.; Druette, M.L. Interactive Ontology Modeling and Updating: Application to Simulation-based Training in Medicine. In Proceedings of the 2021 IEEE 30th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), Bayonne, France, 27–29 October 2021; pp. 177–182. [Google Scholar]
- Espinoza, A.; Del-Moral, E.; Martínez-Martínez, A.; Alí, N. A validation & verification driven ontology: An iterative process. Appl. Ontol. 2021, 16, 297–337. [Google Scholar]
- Ngom, A.N.; Diallo, P.F.; Kamara-Sangaré, F.; Lo, M. A method to validate the insertion of a new concept in an ontology. In Proceedings of the 2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Naples, Italy, 28 November–1 December 2016; IEEE: New York, NY, USA, 2016; pp. 275–281. [Google Scholar]
- Tartir, S.; Arpinar, I.B.; Sheth, A.P. Ontological evaluation and validation. In Theory and Applications of Ontology: Computer Applications; Springer: Dordrecht, The Netherland, 2010; pp. 115–130. [Google Scholar]
- Della Valle, E.; Ceri, S. Querying the semantic web: SPARQL. In Handbook of Semantic Web Technologies; Springer: Berlin, Germany, 2011. [Google Scholar]
- Hogan, A.; Reutter, J.L.; Soto, A. In-database graph analytics with recursive SPARQL. In Proceedings of the International Semantic Web Conference, Athens, Greece, 2–6 November 2020; Springer: Berlin, Germany, 2020; pp. 511–528. [Google Scholar]
- Hogan, A.; Reutter, J.; Soto, A. Recursive SPARQL for Graph Analytics. arXiv 2020, arXiv:2004.01816. [Google Scholar]
- Mosser, M.; Pieressa, F.; Reutter, J.; Soto, A.; Vrgoč, D. Querying apis with SPARQL: Language and worst-case optimal algorithms. In Proceedings of the Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Greece, 3–7 June 2018; Proceedings 15. Springer: Berlin, Germany, 2018; pp. 639–654. [Google Scholar]
- Ali, W.; Saleem, M.; Yao, B.; Hogan, A.; Ngomo, A.C.N. A survey of RDF stores & SPARQL engines for querying knowledge graphs. VLDB J. 2022, 31, 1–26. [Google Scholar]
- Prevoteau, H.; Djebali, S.; Laiping, Z.; Travers, N. Propagation measure on circulation graphs for tourism behavior analysis. In Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, Virtual Event, 25–29 April 2022; pp. 556–563. [Google Scholar]
- Getoor, L.; Machanavajjhala, A. Entity resolution: Theory, practice & open challenges. Proc. VLDB Endow. 2012, 5, 2018–2019. [Google Scholar]
- Christophides, V.; Efthymiou, V.; Palpanas, T.; Papadakis, G.; Stefanidis, K. An overview of end-to-end entity resolution for big data. ACM Comput. Surv. (CSUR) 2020, 53, 1–42. [Google Scholar] [CrossRef]
- Grando, F.; Granville, L.Z.; Lamb, L.C. Machine learning in network centrality measures: Tutorial and outlook. ACM Comput. Surv. (CSUR) 2018, 51, 1–32. [Google Scholar] [CrossRef]
- Sargolzaei, P.; Soleymani, F. Pagerank problem, survey and future research directions. In Proceedings of the International Mathematical Forum, Copenhagen, Denmark, 4–11 July 2004; Citeseer: Roskilde, Denmark, 2010; Volume 5, pp. 937–956. [Google Scholar]
- Wang, X.; Bo, D.; Shi, C.; Fan, S.; Ye, Y.; Philip, S.Y. A survey on heterogeneous graph embedding: Methods, techniques, applications and sources. IEEE Trans. Big Data 2022, 9, 415–436. [Google Scholar] [CrossRef]
- Zhang, Z.; Wang, X.; Zhu, W. Automated machine learning on graphs: A survey. arXiv 2021, arXiv:2103.00742. [Google Scholar]
- Lbath, H.; Bonifati, A.; Harmer, R. Schema inference for property graphs. In Proceedings of the EDBT 2021—24th International Conference on Extending Database Technology, Nicosia, Cyprus, 23–26 March 2021; pp. 499–504. [Google Scholar]
- Lutov, A.; Roshankish, S.; Khayati, M.; Cudré-Mauroux, P. Statix—Statistical type inference on linked data. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; IEEE: New York, NY, USA, 2018; pp. 2253–2262. [Google Scholar]
- Bouhamoum, R.; Kellou-Menouer, K.; Lopes, S.; Kedad, Z. Scaling up schema discovery for RDF datasets. In Proceedings of the 2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW), Paris, France, 16–19 April 2018; IEEE: New York, NY, USA, 2018; pp. 84–89. [Google Scholar]
- Pokornỳ, J. Functional querying in graph databases. Viet. J. Comput. Sci. 2018, 5, 95–105. [Google Scholar] [CrossRef]
- Bellaachia, A.; Al-Dhelaan, M. Short text keyphrase extraction with hypergraphs. Prog. Artif. Intell. 2015, 3, 73–87. [Google Scholar] [CrossRef]
- Pokornỳ, J. Graph databases: Their power and limitations. In Proceedings of the Computer Information Systems and Industrial Management: 14th IFIP TC 8 International Conference, CISIM 2015, Warsaw, Poland, 24–26 September 2015; Proceedings 14. Springer: Berlin, Germany, 2015; pp. 58–69. [Google Scholar]
- Ashmore, R.; Calinescu, R.; Paterson, C. Assuring the Machine Learning Lifecycle: Desiderata, Methods, and Challenges. ACM Comput. Surv. 2021, 54, 111. [Google Scholar] [CrossRef]
- Liu, F.; Wu, J.; Xue, S.; Zhou, C.; Yang, J.; Sheng, Q. Detecting the evolving community structure in dynamic social networks. World Wide Web 2020, 23, 715–733. [Google Scholar] [CrossRef]
- Zhang, C.; Song, D.; Huang, C.; Swami, A.; Chawla, N.V. Heterogeneous graph neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 793–803. [Google Scholar]
- Agrawal, S.; Jain, S.K. Medical text and image processing: Applications, issues and challenges. In Machine Learning with Health Care Perspective: Machine Learning and Healthcare; Springer: Cham, Switzerland, 2020; pp. 237–262. [Google Scholar]
- Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. [Google Scholar] [CrossRef]
- Shah, F.; Castelltort, A.; Laurent, A. Handling missing values for mining gradual patterns from NoSQL graph databases. Future Gener. Comput. Syst. 2020, 111, 523–538. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Vargas-Solar, G. Processing the Narrative: Innovative Graph Models and Queries for Textual Content Knowledge Extraction. Electronics 2024, 13, 3688. https://doi.org/10.3390/electronics13183688
Vargas-Solar G. Processing the Narrative: Innovative Graph Models and Queries for Textual Content Knowledge Extraction. Electronics. 2024; 13(18):3688. https://doi.org/10.3390/electronics13183688
Chicago/Turabian StyleVargas-Solar, Genoveva. 2024. "Processing the Narrative: Innovative Graph Models and Queries for Textual Content Knowledge Extraction" Electronics 13, no. 18: 3688. https://doi.org/10.3390/electronics13183688
APA StyleVargas-Solar, G. (2024). Processing the Narrative: Innovative Graph Models and Queries for Textual Content Knowledge Extraction. Electronics, 13(18), 3688. https://doi.org/10.3390/electronics13183688