1. Introduction
Taste elicits robust hedonic responses, encompassing sensations of pleasure or displeasure and distinct qualities such as sweetness, sourness, saltiness, bitterness, and umami. It is widely acknowledged that the hedonic aspects of taste trigger emotional responses, ultimately influencing food consumption behavior, consumer preferences, and the acceptance of food products [
1]. While the qualitative attributes of taste are primarily assessed through explicit means, involving subjective ratings of perceived taste quality, rather than implicit measurements [
2], the hedonic dimension of taste can also be implicitly assessed through physiological reactions, in addition to subjective (explicit) measurements, such as self-reported overall liking using a nine-point hedonic scale. Hence, the combination of implicit and explicit aspects is crucial for the hedonic evaluation of taste [
3].
Implicit evaluation methods are diverse [
4], encompassing heart rate variability [
5,
6,
7], skin temperature [
8], skin blood flow [
9,
10], skin conductance response [
11], facial expressions [
3,
6,
12,
13], and brain activity [
14,
15,
16,
17]. Among these, facial expression analysis stands out as a relatively straightforward and practical approach, utilizing readily available automatic classifiers for facial affect recognition [
18]. In the realm of taste research, Steiner [
12] pioneered the revelation that facial expressions occur reflexively in response to basic tastes and can be categorized based on hedonic tone, whether they convey a sense of pleasantness or unpleasantness, rather than specific facial expressions corresponding to the taste’s qualitative attributes. Subsequently, research on taste and facial expressions has made significant advancements.
Numerous studies have examined the facial expressions of individuals while consuming various foods and beverages, extending beyond basic taste solutions [
5,
6,
19,
20,
21,
22,
23,
24,
25,
26]. Recent research has placed emphasis on predictive modeling based on these analyses, encompassing the prediction of consumer preferences [
27], consumer acceptance of food products [
28,
29], and even choices in beer selection [
25]. However, to date, no studies have sought to predict explicit hedonic ratings by analyzing implicit facial expressions. To address this gap, our research group embarked on a pioneering investigation into the potential of predicting the deliciousness of food and beverages through facial expression analysis in 2021 [
30]. In this study, using 10 female students (21–22 years old), our approach utilized facial expression analysis as the dependent variable, quantifying expressions of neutrality, happiness, sadness, surprise, fear, disgust, and anger for each of the five basic tastes (sweetness, saltiness, sourness, bitterness, and umami) in various solutions. As an independent variable, we incorporated participants’ perceived hedonic ratings, obtained through explicit sensory assessments for each taste. Employing multiple regression analysis, we established a regression equation for predicting hedonic ratings. Subsequently, we validated the model by applying facial expression analysis results from different subjects (six females and six males; age range, 22–59) who consumed a range of taste solutions, including commercially available beverages. It is noteworthy that, when the emotional scores of facial expressions from male and female participants (five individuals in their twenties, two in their thirties, three in their forties, and two in their fifties) were input into the prediction formula derived from 21–22-year-old female participants, the predicted values closely matched the perceived ratings for each individual, affirming facial expression analysis as a valuable and objective method for evaluating the hedonic aspects of taste perception.
However, this study identified three limitations: (1) a small sample size; (2) limited generalizability of the AI application used for analysis, which was locally available but not globally accessible; and (3) reliance on a single selected facial expression image chosen by the experimenter (one-shot single image). Therefore, the primary goals of this study were to address these limitations: (1) to increase the sample size to at least double that of the previous study; (2) employ widely used facial expression analysis software, FaceReader, with broader accessibility; and (3) explore the utility of various methods, including data analysis based on average facial expression values over a specific timeframe (in addition to one-shot analysis). The overarching objective was to reaffirm the predictability of food and beverage deliciousness or unpleasantness based on facial expressions with these enhancements.
Foods and beverages encompass a myriad of chemical compounds, and their individual tastes intricately intermingle, rendering qualitative analysis a formidable challenge. Nonetheless, judgments regarding whether something is delicious or not, based on hedonics, can be easily discerned and are readily manifested in facial expressions. Conversely, by leveraging our methodology, which seeks to forecast the degree of deliciousness from facial expressions, we anticipate obtaining quantitative outcomes not only for sensory scientific inquiries at the laboratory level but also for evaluating product preferences, comparing preferences across different products, and expediting preference surveys for new food products.
2. Materials and Methods
2.1. Participants
In the previous report, a total of 22 participants were used, while in this study, we recruited a total of 49 participants from Kio University, including students and staff members. Based on the responses obtained from a pre-experiment questionnaire, we confirmed that all participants did not have any sensory abnormalities, eating disorders, or mental disorders and were not taking any medications that could potentially affect their sense of taste. We instructed all participants not to eat or drink anything for one hour prior to the start of the experiment. We provided a detailed explanation of the purpose of the experiment, safety measures, and protection of personal information. After obtaining their understanding and consent, we collected written informed consent from each participant. This study received approval from the Kio University ethics committee (No. R2-31), and all experiments were conducted in adherence to the principles outlined in the Declaration of Helsinki.
2.2. Experiment 1
The experimental procedure and taste solutions were essentially the same as those of our previous study [
30]. A group of 29 healthy female volunteers (age range, 18–55 years; mean ± SD, 23.1 ± 7.9) participated in an experiment aimed at assessing the effectiveness of AI in analyzing facial expressions and establishing a formula for predicting hedonic ratings. Taste solutions were administered to 16 out of the 29 participants and included ten different concentrations of five conventional basic tastes: 2.5%, 5%, 10%, and 20% sucrose; 0.5% and 2% monosodium glutamate (MSG); 1% citric acid; 1% and 5% sodium chloride (NaCl); and 0.01% quinine hydrochloride (QHCl). The remaining 13 participants received taste solutions consisting of 5% glucose, 0.3% sodium guanylate, and 3% NaCl. Each of the solutions was prepared using distilled water (DW). A 10 mL aliquot of the taste solution was placed in a small paper cup positioned in front of the seated participant. Participants were instructed to sip the 10 mL of liquid, hold it in their mouths for approximately 1 s, and then swallow. They were encouraged to display facial expressions naturally without deliberate intent and to provide brief remarks regarding the quality and/or palatability of the stimulus after recognition. Following the consumption of each solution, participants rinsed their mouths with DW. The task was repeated with a minimum inter-stimulus interval of 2 min. The order of stimulus presentation was randomized, except for QHCl, which was administered last due to its lingering taste. Additionally, participants were asked to assess the overall hedonic rating of the stimulus using a scale ranging from −5 (extremely unpleasant) to +5 (extremely pleasant), with 0 indicating a neutral response, prior to commencing the next tasting session. It is worth noting that in our previous paper [
30], we employed a scale ranging from −10 to +10.
One researcher was positioned close to the participant and signaled the start of the drinking task while simultaneously recording a video. The video was captured with a digital camera (Cyber-shot DSC-WX350; Sony Corp., Tokyo, Japan) placed 2 m in front of the participant. The participant was instructed to maintain direct eye contact with the camera to obtain a frontal face view. Adequate and uniform white lighting was employed to ensure optimal recording conditions.
After the experiment, the video replay was analyzed using the AI application FaceReader (ver. 8.1; Noldus Information Technology, Wageningen, The Netherlands). The most significant difference between this study and the previous one [
30] is the use of FaceReader, whereas a different AI application was employed in the previous report. FaceReader processes facial expressions frame-by-frame at 30 Hz and classifies them into seven emotions (neutral, happy, sad, angry, surprised, scared, and disgusted) with scores ranging from 0 (no visible emotion) to 1 (emotion fully present). We conducted score analyses based on four different methods: (1) a single selected facial expression image chosen by the experimenter (one-shot image) judged to be the most applicable facial expression for the taste stimulation; (2) the average emotion scores for 2 s with the one-shot image in the middle (one-shot ± 1 s, or 2 s image); (3) the average emotion scores for 4 s (one-shot ± 2 s, or 4 s image); and (4) the average emotion scores for 6 s (one-shot ± 3 s, or 6 s image).
Any part overlapping with a subject’s brief remark was excluded from the analysis, and the mean value was calculated from the remaining analysis time.
In the subsequent phase, we conducted a multiple linear regression analysis to predict hedonic ratings utilizing the seven emotions. The calculation was grounded on the scores of the seven emotions collected from 29 participants for 13 stimuli, which served as the dependent variables. The independent variable comprised the participants’ self-reported hedonic ratings for each stimulus. Through this multiple regression analysis, we derived a regression equation for predicting hedonic ratings.
2.3. Experiment 2
Another randomly chosen group of 20 healthy volunteers (19 females and 1 male, aged 20–50 years; mean ± SD, 22.9 ± 6.3) took part in a second experiment to assess and validate the effectiveness of the formulae derived in Experiment 1 for predicting hedonic ratings. None of the participants in Experiment 2 had been involved in Experiment 1. The taste stimuli consisted of the following 11 liquids: natural mineral water (ILOHAS, Coca-Cola Bottlers Japan Inc., Tokyo, Japan), 1% malic acid, 2% monopotassium glutamate (MPG), 0.003% sucrose octa acetate (SOA), 7% calorie-free sweetener (Palsweet, Ajinomoto Co. Inc., Tokyo, Japan), peach juice (Peach Mix 100%, Dole Japan, Inc., Tokyo, Japan), noodle broth (Mentsuyu, Daitoku Food Co., Ltd., Nara, Japan), vegetable juice (Thick Vegetable Juice, Kagome Co. Ltd., Tokyo, Japan), 2.5% salt (Hakata-no-Shio, Hakata Salt Co., Ltd., Ehime, Japan), flat lemon juice (Shikwasa juice, Okinawa Aloe Co. Ltd., Okinawa, Japan) and catechin green tea (Healthya Green Tea, Kao Corp., Tokyo, Japan). These stimuli were different from those used in our previous study [
30] except for SOA. The taste stimuli were given to the participants randomly, but SOA was given last.
Liquid intake, video recording, FaceReader analysis, and the rating of perceived hedonics followed the same procedures as those utilized in Experiment 1. The FaceReader outputs, representing emotional facial expressions in response to these stimuli, were incorporated into the respective emotion categories within the equations established in Experiment 1, resulting in predicted (or calculated) hedonic ratings. Subsequently, predicted and perceived hedonic ratings were assessed and compared.
Summarizing the research methodology, in Experiment 1, we conducted both perceived hedonic ratings and emotional analysis of facial expressions for basic tastes, leading to the derivation of predictive formulae for hedonic ratings based on multiple regression analysis. In Experiment 2, a different set of participants was given different taste solutions, and we compared the perceived hedonic ratings with the hedonic ratings calculated when including the hedonic scores of facial expressions into the formula. The experimental method was the same as in the previous report [
30], but the number of participants, the AI application used, and the analysis method were different.
2.4. Data Analysis
In Experiment 1, a boxplot analysis was conducted to assess the scores for the seven emotions associated with each of the 10 stimuli across 16 participants. The analysis yielded median values, interquartile ranges, as well as minimum and maximum scores. To investigate the similarity in hedonics among taste stimuli, Spearman’s correlation coefficients were computed between pairs of stimuli based on the scores of the seven emotions. For a deeper understanding of the relationships between these variables, a multiple linear regression analysis was carried out. In this analysis, the scores for the seven emotions served as the dependent variables, and they were derived from responses to 13 stimuli provided to 29 participants. The participants’ perceived hedonic ratings for each stimulus were used as the independent variables. To check for multicollinearity, which arises when predictors exhibit high correlations, correlation coefficients were computed among pairs of the seven emotions. In Experiment 2, the connections between predicted and perceived hedonic ratings were explored and compared using Pearson’s and Spearman’s correlation coefficients, along with a one-way ANOVA and the Wilcoxon signed-rank test. Before conducting the correlation analyses, data were assessed for normal distribution using the Shapiro–Wilk test. All statistical analyses were performed using IBM SPSS Statistics (ver. 25) and Excel Statistics 2012, with statistical significance set at p values < 0.05.
4. Discussion
The present study was designed to confirm the validity of our previous findings [
30], which indicated that the analysis of facial expressions in response to tastants can predict the hedonic ratings of those tastants. Using one-shot images captured through different AI applications and presenting different taste stimuli to various participants, we obtained consistent results, revealing the following: (1) the five basic tastes could be classified into three hedonic categories: positive, neutral, or negative, based on AI analysis of facial expressions. (2) We established a formula for predicting hedonic ratings using multiple linear regression analysis, considering emotional facial expressions in response to basic taste stimuli. (3) By inputting emotional scores of facial expressions in response to different tastants from different participants into this formula, we found a strong correlation and concordance between predicted (or calculated) and perceived (or subjective) hedonic ratings. Although we should be careful with interpretation because the sample size is still relatively small, these results suggest that a single image of a person’s face can quantitatively predict the extent to which that person enjoys food and beverages.
The function of taste lies in discriminating whether a food item is beneficial or detrimental to the body. Innately, the body possesses a physiological function to find appealing and enhance appetite for items that are good for it, while it naturally finds unappetizing and avoids the intake of items that are detrimental. Therefore, food-related behavior is determined by hedonics, i.e., whether the food is considered delicious or not, and the emotions associated with it. The finding that FaceReader classified the basic tastes into three hedonic categories: positive, neutral, and negative was similarly observed in our previous study [
30] using a different AI application. Steiner [
12] already stated such hedonic classification based on facial expressions 50 years ago. The analysis of the quality of taste information is also essential as a cognitive function of the brain. It involves storing information about the characteristics of the food, along with its aroma, visual attributes, texture, and more, to be utilized in subsequent eating behavior. This sensory function is necessary for adapting future dietary choices.
For this study, we utilized FaceReader, a widely used, convenient, and accurate automated facial expression recognition system. FaceReader classifies facial expressions into the basic universal human emotions suggested by Ekman and Friesen [
31], including happiness, sadness, anger, surprise, scare, disgust, and neutrality. The intensity of these emotions ranges from 0 to 1. Although this software is not fully accurate in its emotion recognition performance [
6,
18], the analyses of these emotions have been effectively employed in various experimental situations in food research [
5,
19,
20,
21,
22,
27,
32]. In our previous study [
30], we used a different facial expression analysis software. Comparing two AI applications can be challenging. This is because the AI used in the previous study, unlike FaceReader, is essentially a smartphone’s free app, and detailed information about its algorithm and functionality is not available. Moreover, its functionality is extremely simple: it provides emotional analysis results by uploading a single facial photo (one-shot) to the AI. However, the obtained results of facial expression analysis were remarkably similar (compare
Figure 1 and
Figure 2 of this study with
Figure 1 and
Figure 2 of the previous report [
30]). While detailed analysis is certainly necessary, it seems that the choice of AI software may not need to be overly meticulous.
In our previous study, the AI application used displayed sadness exclusively rather than disgust emotions for facial expressions induced by aversive taste stimuli, such as 5% NaCl, 0.01% citric acid, and 0.01% QHCl. Only “happiness” was a significant predictor of hedonically positive ratings. In a previous study [
30], we posited that these results might be dependent on the AI application used. Facial emotions and scores would be classified differently with different accuracies by a different algorithm [
18]. However, as shown in the present study, essentially the same results were obtained by the analysis using FaceReader. The dominant appearance of happiness and sadness in these results may be related to a recent study on an emotion recognition test by Wang et al. [
33], who reported that happiness and sadness are unique and independent among the emotions.
FaceReader can provide time course data showing changes in each emotion after tasting. A challenging aspect of facial expression analysis is determining the appropriate time window for analysis, and this approach can vary among researchers. In line with our previous study [
30], we selected moments during video observation when we judged facial expressions had changed and conducted FaceReader analysis at those “one-shot” moments. However, we cannot be entirely certain if the chosen moments are the best ones. Therefore, we also calculated the average emotional values within a 2 s window, consisting of 1 s before and 1 s after the one-shot moment, as well as within 4 s (2 s before and 2 s after) and 6 s (3 s before and 3 s after) windows. Satisfactory predictive results were obtained with the 2 s window, but as the window width increased, the predictive accuracy decreased. This variation is because emotional scores fluctuate over time. The fact that a 2 s window is acceptable implies that there is no need to be overly meticulous in selecting one-shot moments.
Although we generally obtained a good correlation and concordance between predicted (calculated) and perceived (subjective) hedonic ratings, a significant difference was detected between the predicted and perceived ratings for the very aversive SOA and the very palatable peach juice. Such a difference between the two ratings for SOA and peach juice may have been due to the limitation of the predicted ratings in reaching the maximum hedonic ratings, such as −5 to +5. This phenomenon was similarly detected in our previous study [
30] where we used a scale from −10 to +10. To address this issue, the following procedure may be effective: if the estimated rating for a tastant, whose perceived rating is larger than 4.0 or smaller than −4.0, is multiplied by 1.6, the compensated ratings yield calculated intensities that become very close to the perceived intensity, as proven in the present study (see
Figure 4C).
We asked each participant to make a short remark after the oral intake of tastants. An interesting new finding was that one-shot images for aversive tastants tended to appear before the remarks, while those for palatable tastants tended to appear after the remarks. These characteristic differences were proven to be statistically significant. This finding agrees with other studies showing that aversive facial expressions appear more intensely and quickly than pleasant expressions [
34,
35,
36], reflecting that aversive tastes convey warning messages in the form of discomfort, harm, and urgency. It is believed that these rapid reactions are mediated by reflex circuits in the brainstem [
12]. Good tastes are pleasant, palatable, and nutritive; we enjoy them slowly, comfortably, and with emotional fulfillment. Reflex reactions for palatable tastants in the brainstem are more likely to trigger responses in the autonomic nervous system and hormonal systems rather than the motor system.
However, the bitter-tasting SOA elicited one-shot images both before and after remarks evenly, despite being the most aversive stimulus. This may be explained by the fact that bitter stimuli stimulate taste cells in the foliate and circumvallate papillae situated at the back of the tongue better than those in the anterior tongue [
37,
38]. About half of the participants felt a stronger bitter taste after remarks at the timing for the bitter substance to reach the posterior tongue.
There are some limitations in this study: (1) the prediction is not successful for individuals who show no or very small expressions of happiness to very palatable tastants, e.g., peach juice and noodle stock, even though these people can express aversive emotions to unpalatable tastants, e.g., malic acid and SOA. In the present study, 4 participants out of 20 belonged to this category of individuals (see
Figure 4A). On this point, Zeinstra et al. [
39] reported that facial expressions were suitable to measure dislike, but not for pleasant stimuli in school-aged children. Using other facial expressive measurements such as analyzing facial muscle movements might overcome these shortcomings [
4]. (2) It was demonstrated that for extremely delicious or extremely unpleasant beverages, adjusting the calculated values can bring them closer to the participants’ sensory evaluations. Since compensation is an operation that is not ideally desired, investigating the existence of AI software that does not require compensation is one of the potential themes for future research. (3) The participants in this study were predominantly women in their twenties. This raises questions of whether factors such as gender, age, ethnicity, etc., are reflected in the predictive formula for hedonic ratings derived from facial expressions. In a previous report [
30], using one predictive formula obtained from young women, it was suggested that it could produce hedonic ratings that closely matched the actual values regardless of gender or age. Therefore, in future research, we would like to confirm these aspects by increasing the number of participants. Additionally, we are interested in exploring what results may arise if the AI used to derive the formula differs from the AI used during testing.
The present study has confirmed the validity of our previous findings, showing that hedonic ratings can be effectively predicted by a formula derived from multiple regression analysis of facial expressions obtained using AI software. Our method of predicting hedonic ratings from facial expression emotion analysis initially requires the derivation of equations. However, in future research, if standardized formulae for each AI application are established, the process of deriving formulae may be eliminated in practical situations. By incorporating these standardized formulae into AI systems, users would only need to specify the analysis point (one-shot) during food and beverage consumption, and the hedonic rating would be immediately displayed. This convenience and speed would enhance work efficiency and enable the retrieval of consumers’ hedonic ratings without relying on traditional subjective evaluation methods like analog scales. In product development and consumer preference surveys, this approach would allow for the use of a wide range of consumers without the need for specialized panelists, and hedonic ratings (overall deliciousness) could be obtained rapidly and conveniently. To achieve this, it is necessary to expedite research for standardizing versatile formulae.