1. Summary
Stimulation of emotional states is the process of intentionally causing a person to experience a particular emotion [
1]. This can be achieved through a variety of means, such as words, pictures, sounds, visualization, role-playing, or other experiential exercises [
1]. The goal of stimulating emotional states is to elicit a specific emotional response from a person for use in therapy or research in psychology. For example, one such commonplace therapeutic intervention that uses visualizations and emotion-provoking images is cognitive behavioral therapy (CBT), which focuses on helping people identify and change negative thought patterns and behaviors contributing to their problems [
2]. CBT can be used to treat a wide range of conditions, including anxiety, depression, and addiction [
2]. In therapeutic interventions, the goal of emotion stimulation is to help people become more aware of their emotions and how they affect their thoughts and behaviors and to provide them with tools and strategies for managing their emotions more effectively [
3]. This can ultimately lead to improved mental health and well-being [
3,
4].
One common way to elicit emotions is by using emotion-evoking images [
1]. These images or pictures can be shown to people individually or in groups, and their emotional responses are commonly measured through self-reports and physiological features [
5]. Emotion-evoking pictures specifically prepared for the controlled stimulation of emotions in laboratory settings are stored in affective picture databases along with their additional semantic, emotion, and context descriptors [
5,
6]. Based on their intended use of intentionally provoking specific emotional states, these documents are often referred to as stimuli, while pictures and videos are commonly referred to as visual stimuli [
1,
6].
The presented knowledge graph (KG) dataset is an extension of the Nencki Affective Picture System (NAPS) repository containing information about the relevant high-level semantics of visual stimuli [
7]. The NAPS was built by the Polish Nencki Institute of Experimental Biology with the intention of providing researchers with an additional number of visual stimuli with high image quality in different categories that can be used in different areas of affective research [
7]. The original database contains 1356 realistic, high-quality photographs divided into five disjointed categories: people, faces, animals, objects, and landscapes. Only parts of the photos are content-neutral because they were selected to evoke a specific emotional response in the general population. Since its introduction, the NAPS has been expanded with several additional datasets that are specialized for different domains of research in emotion processing. The extensions are: (i) NAPS Basic Emotions (NAPS BE), containing normative ratings based on the discrete model of emotions and additional dimensional ratings for a subset of 510 pictures from the original NAPS [
8]; (ii) the Nencki Affective Picture System (NAPS ERO), with an additional 200 visual stimuli accompanied with self-reported subjective ratings by homosexual and heterosexual men and women (
N = 80) of emotional valence and arousal [
9]; and (iii) the Children-Rated Subset, which is the most recent extension of NAPS and includes 1128 pictures from the original NAPS database that were rated as appropriate for children based on various criteria and expert judgment [
10]. The latter affective ratings were collected from a sample of
N = 266 children aged 8–12 years [
10]. One of the main important features of the NAPS set as a whole (i.e., with all its extensions) is that it combines a relatively large number of pictures with normative ratings identified according to both dimensional and categorical (discrete) emotion theories [
11] compared with other stimuli sets, and it also contains additional multiword semantic descriptions organized into different topics, which are all significant features contributing to the successful construction of stimuli sequences for the elicitation of emotional reactions [
7,
12].
The motivation for the development of the presented dataset is found in the limitations imposed by the inadequacy and diversity of existing semantic description models used for the annotation of stimuli in contemporary affective multimedia databases. Today, these databases are described loosely and with unsupervised vocabularies; they are domain-dependent and have different models and formats. New multimedia stimuli cannot simply be added to affective multimedia databases, but require a separate set of affective ratings to be acquired through psychological experimentation with participants [
1,
13,
14].
Recently in our previous research, we conducted a systematic online survey of domain experts in emotion stimulation and estimation. The survey results showed that researchers predominantly identify and retrieve relevant stimuli manually, which is time-consuming and labor-intensive [
15]. This is due to two reasons: (1) insufficient semantic descriptors and (2) limitations of the existing stimuli retrieval software. The survey of domain experts further revealed that the quality of semantic descriptors significantly impacts user satisfaction. The findings also highlighted the importance of a user-friendly, AI-based tool for the efficient retrieval of affective pictures, particularly those that are labeled with high-quality semantic descriptors. As a result, semantically enriching current multimedia stimuli databases with additional material and KG descriptions have been shown to be a critical requirement that has the potential to dramatically improve precision, efficiency, and user satisfaction [
15]. In this context, the creation of novel KG annotation datasets has emerged as a promising solution to enhance the efficiency of document retrieval and improve the overall management of stimuli repositories. The integration of KGs in unstructured affective multimedia databases can facilitate semantic understanding and reasoning over complex data relationships, allowing for more accurate and efficient identification of relevant documents.
The remainder of this paper is structured as follows:
Section 2 explains how the dataset could be employed to semantically enrich the NAPS database and contribute to the success of personalized emotion elicitation. WordNet and Suggested Upper Merged Ontology (SUMO) KGs are described with examples of their applications for the rich semantic description of NAPS pictures. In
Section 3, we present the structure and format of the generated dataset, analyze the relationship between the knowledge graph concepts and the distribution of terms and concepts, and discuss the implications of our results for identifying relevant information in each of the five categories of NAPS stimuli.
Section 4 provides an overview of the data collection process and the methodology for identifying knowledge graph concepts in stimuli pictures. Finally, in
Section 5, we summarize our main observations and conclusions and outline plans for future research.
2. Advantages of Semantic Enrichment in Stimuli Retrieval and Personalized Emotion Stimulation
Semantic enrichment of a dataset involves adding additional information to the dataset that helps to describe and contextualize the content better [
16]. The semantic enrichment process aims to make the dataset more useful for analysis, search, and information retrieval by improving its ability to be understood by humans and machines [
17]. Typically, this includes adding free-text labels, tags, or differently structured metadata that give additional meaning to the data. However, semantic enrichment can be further improved by linking data from knowledge taxonomies or ontologies, either specialized for specific domains or describing general knowledge [
18]. In this paper, we used the WordNet lexical knowledge graph [
19,
20] and general, shared, and reusable ontology SUMO [
21] containing formally described concepts as the vocabularies for the labeling of entities and stimuli semantic enrichment. Our approach is explained in detail in
Section 2.1 and
Section 2.2. An example of a thorough semantic enrichment of a NAPS visual stimulus using SUMO formal concepts is illustrated in
Figure 1.
The semantic enrichment of emotion stimuli is essential for achieving higher accuracy and precision in retrieval from affective multimedia databases and, subsequently, better personalization of stimuli sequences [
22]. Personalization in emotion elicitation can be defined as a process of selecting optimal stimuli for a single subject or a group of subjects who share some collective knowledge, heritage, experiences, attitudes, or perceptions that collectively determine the effect and meaning of the stimuli for the subject. The optimization criterion depends on the goals of the exposure. But regardless of the intended purpose, the stimuli must affect the subjects’ cognition, behavior, and emotional states in a precise and timely manner. The desired stimulus effect and its dynamics, nature, and magnitude must be deliberately predetermined in the personalized stimuli before the exposure to ensure the expected impact on the elicitation, estimation, and regulation of emotion [
1,
14,
23].
In the context of computerized or computer-assisted emotion elicitation exposure, personalization is effectively an interactive and often iterative process of constructing stimulus sequences as a time-dependent series of individual virtual reality (VR) or multimedia stimuli. Other stimuli such as haptic, olfactory, and vestibular stimuli [
24,
25] are also amenable to the personalization process, although they are less commonplace in practice, require specialized hardware, and do not have as standardized stimuli databases as the more common audiovisual stimuli. The necessary prerequisite for any personalized computer-assisted exposure is identifying the content that should have powerful significance to a specific subject. The effects of these stimuli must produce observable manifestations that an expert or specialized computer acquisition system can unambiguously identify. Clear examples of such objective phenomena are changes in physiological signals, facial expressions, and vocal expressions. Each of them can be monitored with a number of specialized devices, such as sensors for heart rate (HR), skin temperature (SKT) and skin conductance (SC), ECG, voice and video recorders, or neuroimaging devices (fMRI, MRI, PET, EEG, MEG) [
26,
27]. By objectively measuring these manifestations, the success of the procedure and the personalization itself can be verified.
The most important purpose of semantic enrichment in affective multimedia databases is to enable faster, simpler, and more accurate retrieval of relevant stimuli from affective multimedia databases to achieve a personalized emotion elicitation process. By doing this, semantic enrichment facilitates the creation of personalized emotion elicitation sequences. Also, as a secondary effect, the semantic enrichment process can help identify patterns and insights into the relationships between semantics and emotions that may not be apparent from the original data alone.
In the presented KG dataset, the semantic enrichment was carefully carried out manually by a group of raters. This process strictly followed a specific methodology, which is detailed in
Section 4.
2.1. Representation of Stimuli Semantics with WordNet Knowledge Graph
WordNet is an extensive lexical database of the English language, developed by the Cognitive Science Laboratory at Princeton University, and is a well-known tool for describing the meanings of words and their mutual relationships [
19,
20]. WordNet is structured as a graph, with each word or term represented as a node in the graph. The nodes are organized into so-called synsets (“sets of synonyms”), which represent groups of synonyms (words with similar meanings), and the relationships between words are represented as edges connecting the nodes. This allows WordNet to capture the relationships between words, such as hypernyms (more general terms), hyponyms (more specific terms), meronyms (terms part of a larger whole), etc. Hypernyms and hyponyms are usually referred to as IS-A relations and meronyms as PART-OF. In this respect, WordNet is a useful knowledge data source for a high-level description of picture content because: (1) it defines a very large and supervised labeling glossary, and (2) the labels are organized in a taxonomy as a knowledge graph [
28].
A knowledge graph is a structured representation of real-world entities and the relationships between them [
28]. In other words, knowledge graphs are structured representations of knowledge that model entities, attributes, and relationships between them in a graph-like structure. They enable the integration and organization of heterogeneous data sources, including textual, visual, and audio data. In this respect, WordNet’s hierarchical structure of labeled concepts and its rich set of properties and relationships between concepts make it well suited to be used as a knowledge graph and to determine semantic similarities between different concepts in the graph [
29].
In this approach, as illustrated in
Figure 2, WordNet terms are used as tags or semantic annotations of affective pictures, and KG relations expand the descriptions.
Building on our earlier studies [
30,
31,
32], we have found that the utilization of WordNet knowledge graphs for annotating visual data is an effective strategy for enhancing the efficacy of information retrieval in multimedia stimuli databases. In our previous research, we developed a model for describing and retrieving stimuli pictures using WordNet and demonstrated the benefits of this approach using a custom software tool for this purpose [
30]. The results were encouraging, and after
N = 40 queries, showed an average precision of 68.93% and an average relevant document count of 6.15. The highest achieved precision was 84.21% for the first stimuli pictures retrieved in the results. However, the number of pictures in the experimental dataset labeled with WordNet terms needed to be higher for a thorough evaluation of retrieval performance [
31,
32].
2.2. Description of Relevant High-Level Visual Stimuli Semantics Using SUMO
For the formal representation of the semantics of complex stimuli, the dataset uses the general knowledge ontology SUMO to go beyond WordNet [
21]. SUMO (
http://www.ontologyportal.org; accessed on 15 August 2023) is one of the most comprehensive freely available formal upper, core, and common-sense ontologies [
21]. It was developed within the IEEE P1600.1 Standard Upper Ontology Working Group (SUO WG). Today, it is owned and maintained by the IEEE. Its large knowledge base contains over 25,000 terms and 80,000 axioms. The available mappings from SUMO to WordNet help express the concepts in natural language terms [
33], which facilitates the extension of the framework to existing tools for the informal representation of multimedia (especially pictures) with semantic networks and lexical ontologies. In addition, SUMO is the only formal ontology mapped to the entire WordNet lexicon. Because of numerous advantageous features and comparative advantages in the formal representation of multimedia semantics over other candidate upper ontologies, we selected SUMO to develop the presented corpus. As an illustration of the high-level semantics ontology annotations, the stimulus People_172_v is originally described with the keyword “man swinging”. But in the presented dataset, this is first expanded to three WordNet KG synsets, “{09225146} <noun.object> body of water#1, water#2”; “{10287213} <noun.person> man#1, adult male#1”; and “{04371774} <noun.artifact> swing#2”, and then mapped to the subsuming SUMO concepts “WaterArea”, “RecreationOrExerciseDevice”, and “Man”, as shown in
Figure 3.
3. Data Description
The dataset for the semantic enrichment of picture descriptions in the NAPS stimuli database with KGs is represented in a structured tabular form. It is organized into rows and columns resembling a table, with each row describing one NAPS picture and each cell containing specific information for a corresponding attribute. The dataset comprises 30 comma-separated value (CSV) files and 15 Microsoft Excel (XLSX) files, for 45 files in total. The CSVs are more suitable for automated software processing and the Excel files for data examination and manual processing.
The first group of five CSVs, NAPS_WordNet_Animals.csv, NAPS_WordNet_Faces.csv, NAPS_WordNet_Landscapes.csv, NAPS_WordNet_Objects.csv, and NAPS_WordNet_People.csv, contain the WordNet KGs associated with one of the NAPS picture categories Animals, Faces, Landscapes, Objects, and People, respectively.
Each row has the mandatory attributes or columns ‘Picture_ID’, ‘Category’, and ‘Description’, which are identical to the attributes in the NAPS database. The attribute ‘Picture_ID’ is the most important, as it represents a unique identifier for each NAPS picture (e.g., Animals_001_h, Faces_001_h, Landscapes_001_h, Objects_001_h, People_001_h). As such, it may be used for querying and integrating the KG dataset and the NAPS database. The attribute ‘Category’ denotes one of the five NAPS categories, and ‘Description’ represents the original, single free-text keyword loosely describing the picture content (e.g., “dead stork”, “children with a dog”, “concentration camp”, “burning car”, “sad woman”). In the NAPS, only the ‘Description’ attribute is available for descriptions of semantics. In addition to these three mandatory attributes, each row in the presented dataset contains at least one column containing WordNet KGs describing the picture. These columns are labeled ‘WordNet_1’, ‘WordNet_2’, ‘WordNet_3’, ‘WordNet_4’, ‘WordNet_5’, ‘WordNet_6’, and ‘WordNet_7’.
The first group of five CSV files contains only WordNet synset IDs without any other descriptive information. These files are the most suitable for machine processing and database indexing. The first 10 rows in the NAPS_WordNet_Animals.csv datafile are provided in
Table 1.
The second group of five CSV files have the suffix “_Complete”. Their filenames are NAPS_WordNet_Animals_Complete.csv, NAPS_WordNet_Faces_Complete.csv, NAPS_WordNet_Landscapes_Complete.csv, NAPS_WordNet_Objects_Complete.csv, and NAPS_WordNet_People_Complete.csv. These files have the same structure as the first group, and they also describe the NAPS stimuli with WordNet KGs. But the CSV files from this group contain the entire descriptive content of the WordNet synsets, including their ID, term type, enumerated synonyms, and other information. As an example, the first three rows of the NAPS_WordNet_Complete_Animals.csv are shown in
Table 2.
The third group of five CSV files in the presented dataset contain ontology concepts describing NAPS stimuli using the formal vocabulary defined by SUMO. They are: NAPS_SUMO_Animals.csv, NAPS_SUMO_Faces.csv, NAPS_SUMO_Landscapes.csv, NAPS_SUMO_Objects.csv, and NAPS_SUMO_People.csv. Each row in these documents also describes the semantics of a single picture and has the same tabular structure as the WordNet CSV files. The first three columns are ‘Picture_ID’, ‘Category’, and ‘Description’, while the remaining seven columns are denoted ‘SUMO_1’, ‘SUMO_2’, ‘SUMO_3’, ‘SUMO_4’, ‘SUMO_5’, ‘SUMO_6’, and ‘SUMO_7’.
Table 3 shows a sample of the dataset in NAPS_SUMO_Animals.csv.
The structure of the 15 Excel files (XLSX) is identical to those of the already described CSV files. Each row and column (i.e., attribute) in the Excel files corresponds to a row and column in the CSV files, ensuring consistency between the two file types. This enables seamless data comparison and processing as well as consistent data usage and analysis.
The CSV files were created using Microsoft Excel, which uses the semicolon (;) character as the default column separator. To facilitate interoperability with all data processing tools, the dataset contains an additional 15 CSV files with the comma (,) as the separator. These additional CSV files are denoted with the suffix “_CommaDelimited” in their filenames.
Because of the tabular structure of the knowledge graph dataset, when exporting empty or null values in certain cells to a CSV format, the data are displayed as consecutive separators (e.g., “;;”, “;;;”, “;;;;”, etc.). However, standard spreadsheets and text editors can handle such data.
It is important to mention that the presented dataset is licensed under “Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)”. Other parties are free to copy and redistribute the material in any medium or format and to remix, transform, and build upon the material for noncommercial purposes. If other parties remix, transform, or build upon the presented dataset, they must distribute the dataset under the same license as the original. Other parties must give appropriate credit, provide a link to this license, and indicate if changes were made.
3.1. Data Utilization
The presented KG dataset incorporates WordNet and SUMO vocabularies, providing a more sophisticated representation of the real-world entities of the NAPS affective pictures’ content. This semantics annotation method is more structured and expressive than the traditional free-text keyword model. As a result, the process of querying, retrieving, and analyzing affective pictures becomes much more streamlined and efficient.
However, the dataset does not contain affective pictures and emotion information. To utilize the dataset for document retrieval, it is necessary to request the NAPS repository for nonprofit academic research purposes from the Nencki Institute of Experimental Biology, Laboratory of Brain Imaging (LOBI), at
https://lobi.nencki.gov.pl/research/8/ (accessed on 15 August 2023).
An example of querying would involve utilizing the attribute ‘Picture_ID’ from both the KG dataset and the NAPS database as the common link or key. When aiming for document retrieval from the NAPS database, one would begin by selecting a specific ‘Picture_ID’ from the KG dataset. This ‘Picture_ID’ would then be matched with the corresponding ‘Picture_ID’ in the NAPS database. By ensuring that both attributes match, one can perform a join operation to merge the relevant data from the two records. As a result of this join operation based on the common attribute ‘Picture_ID’, the user will retrieve comprehensive document details from the NAPS database enriched with the semantic information from the KG. This method facilitates precise and detailed document searching and utilizes the content of both datasets: high-level semantics from the KG dataset and pictures and emotions from the NAPS database.
The integration of the KG dataset, with separate WordNet and SUMO data tables, and the NAPS data table is illustrated in
Figure 4. The KG WordNet dataset is represented on the left side of the figure. It is illustrated as a table with multiple columns labeled as ‘Picture_ID’, ‘Category’, ‘Description’, ‘WordNet_1’, ‘WordNet_2’, ‘WordNet_3’, ‘WordNet_4’, ‘WordNet_5’, ‘WordNet_6’, and ‘WordNet_7’. These labels indicate the various columns or attributes found within this dataset. Similarly, the KG SUMO dataset is represented on the right side of the figure. The NAPS database table in the middle consists of three columns: ‘Picture_ID’, ‘Category’, and ‘Description’. The attribute ‘Picture_ID’ serves as the primary key (PK) for all data tables and also as the foreign key for joining the KG WordNet table (FK1) and KG SUMO table (FK2) with 1:1 relationship cardinality (i.e., a multiplicity relationship attribute).
Effectively, by using the presented dataset, the NAPS attribute ‘Description’ is semantically enriched with the 14 KG attributes ‘WordNet_1’–‘WordNet_7’ and ‘SUMO_1’–‘SUMO_7’.
3.2. Data Distribution
It is important to analyze the distribution of data points in the corpus while analyzing the attributes of the generated dataset. The presented dataset comprises 6808 systematically manually assigned annotations or labels for 1356 NAPS pictures in 5 categories. Out of 6808 labels, 3429 are WordNet concepts and 3379 are SUMO concepts. This glossary comprises 935 unique WordNet synsets and 513 SUMO concepts. Because of the higher abstraction, substantially fewer SUMO concepts are needed to semantically describe pictures.
Figure 5 depicts the frequency distribution of the synsets and ontology concepts, highlighting the dataset’s most and least commonly utilized KG entities. This provides insight into the overall distribution of the KGs and their usage trends.
The 10 most used annotating synsets in the dataset, with their respective frequencies in brackets, are: “{10287213} <noun.person> man#1, adult male#1” (82); “{10787470} <noun.person> woman#1, adult female#1” (61); “{08436759} <noun.group> vegetation#1, flora#1, botany#1” (39); “{06878071} <noun.communication> smile#1, smiling#1, grin#1, grinning#1” (30); “{13104059} <noun.plant> tree#1” (30); “{03544360} <noun.artifact> house#1” (29); “{02084071} <noun.animal> dog#1, domestic dog#1, Canis familiaris#1” (27); “{05600637} <noun.body> face#1, human face1#1” (24); “{09436708} <noun.object> sky#1” (22); and “{02121620} <noun.animal> cat#1, true cat#1” (21). Likewise, the 10 most frequent SUMO concepts are: “Man” (159), “Woman” (107), “HumanChild” (74), “Smiling” (69), “Device” (64), “WaterArea” (58), “Human” (55), “SubjectiveAssessmentAttribute” (55), “BotanicalTree” (43), and “Plant” (42).
Another important data feature is how many KGs are used to describe each NAPS picture. The distribution remains consistent across all the common descriptive statistical parameters and the five picture categories, as shown in
Figure 6. The thorough adherence to the tight formal rules utilized in the picture annotation process can be credited to the uniform distribution. The approach enables reliable and accurate labeling, contributing to the general consistency of the KG distribution throughout the dataset.
The numbers of WordNet synsets and SUMO concepts used to semantically describe each picture are very similar in the presented dataset. This can be attributed to the usage of mappings, which effectively connect each synset to a related concept. However, it is important to consider that the overall number of distinct ontology concepts is much lower than the entire number of unique synsets. This is primarily due to the fact that within the ontology, multiple synsets have been allocated to identical or subsuming concepts. This mapping technique contributes to the simplification and consolidation of the semantic representation of pictures, resulting in a more compact and coherent ontology-based description.
5. Summary
The choice of relevant stimuli is frequently limited in the available emotionally annotated databases that store different types of stimuli. Much can and should be done to improve the functionality and interoperability of existing emotionally annotated databases. The semantic enrichment of multimedia databases is a crucial step toward enhancing their accessibility and usability. By incorporating higher-level semantic metadata, such as knowledge graphs with formal concepts and relationships, we can facilitate more efficient identification and retrieval of relevant content from large multimedia databases. This improves the user experience and supports a wide range of applications, such as multimedia retrieval, stimuli recommendation, and emotion elicitation personalization.
The dataset uses the WordNet knowledge graph, the SUMO upper ontology, and SUMO to WordNet mappings to provide rich, high-level semantic expressivity with interfaces to commonly used models and existing systems. The dataset improves the knowledge reuse, interoperability, and formalization of picture stimuli information over current methods for representing stimuli based on keywords or tags. All these features enable formal, consistent, and systematic annotation of affective multimedia content and document properties.
Future research should explore innovative approaches and tools for semantic enrichment, with a focus on addressing the challenges of scalability, semantic heterogeneity, and accuracy. In this regard, the knowledge graph dataset should be expanded to include other affective multimedia databases researchers use most frequently in addition to NAPS. By transforming keywords to high-level concepts and mapping them to an upper core formal ontology, it will be possible to achieve semantic integration of different multimedia stimuli databases; i.e., to combine emotion-elicitation documents from various sources, formats, or systems to allow for meaningful interpretation and analysis. In our previous research, we created the first versions of such ontologies [
34,
35] and plan to expand their knowledge models further and use them in the continuation of our work.
The presented dataset could be used to explore the finer relationships between emotions and semantics in affective multimedia in general (e.g., the semantic gap), and especially those encountered in specific stimuli sequences for certain domains. Both could provide further insights into the affective data properties and move to a more overarching affective model that includes emotion, cognition, behavior, and action properties.
In addition, the presented dataset could be used as a foundation to develop novel data retrieval software tools using emotional and semantic descriptors, allowing for a more efficient construction of personalized emotion-elicitation sequences for therapeutic interventions, personalized education, and interactive entertainment. For example, these tools could be used by therapists to help patients with anxiety or depression find images that evoke positive emotions. It could also be used by educators to create personalized learning materials that are tailored to the cognitive and emotional needs of students. Novel tools could be used to create personalized gaming experiences by tailoring the game content to the player’s individual preferences and emotional state. Such intelligent tools would also enable meaningful and accurate data analysis in the domains of pedagogy, education, psychology, neuroscience, and cognitive sciences.
Finally, recent advancements in artificial intelligence (AI), such as machine learning (ML), deep learning (DL), and natural language processing (NLP), have the potential to significantly impact the development of KGs for affective multimedia databases. For example, DL with NLP techniques can be used to automatically detect objects in images or videos and extract semantic information from text descriptions. This information can then be used to create new semantic descriptors for affective multimedia or to improve the accuracy of existing descriptors. ML techniques can also be used to learn the relationships between semantic descriptors and emotional responses. This information can then be used to develop additional semantic and emotion descriptor datasets or more effective data retrieval tools for personalized stimuli sequences.