1. Introduction
In the digital era, human–computer interaction (HCI) interfaces serve as the bridge for communication between humans and machines. The design of HCI interfaces has become increasingly complex and personalized, leading to a growing demand for aesthetic evaluation. Aesthetics is one of the most crucial elements in interface design [
1], with proven impacts on various interaction aspects including perceived usability [
2], interaction efficiency [
3], user satisfaction [
4], and usage intentions [
5]. As interface designs become highly homogeneous, exceptional aesthetic appeal gains importance [
6], positively influencing sustainable perceived value, breaking the homogeneity, and providing a competitive edge [
7]. Given the significance of interface aesthetics in HCI, conducting thorough evaluations is essential [
2]. Aesthetic evaluation of interface layouts, particularly from a cognitive aesthetic perspective, has become a key approach to enhancing design quality. This evaluation, through the analysis of aesthetic metrics such as density, symmetry, and balance, reveals the intrinsic value and effectiveness of designs. Such assessments not only aid designers in enhancing the visual impact of interface layouts but also offer users a more enjoyable and efficient interaction experience.
However, despite its significance, the evaluation phase is often overlooked in HCI interface design and application. Currently, there is a scarcity of validated scales for measuring HCI aesthetics [
4]. Especially in the design process, traditional methods typically adopt a linear approach, where evaluation is positioned at the end of the process, after the transition from concept design to frontend implementation, or even after the interface has been launched. This linear process poses significant challenges, particularly when the evaluation outcomes are unsatisfactory, necessitating a reassessment and potential redesign, leading to efficiency losses, including in time, energy, and resources. The lack of flexibility in this linear design approach is not due to designers undervaluing interface evaluation but stems from two main challenges in the evaluation process. Firstly, the process of interface evaluation is costly and complex. Aesthetic assessments often require experimental studies to gather user feedback, necessitating participant recruitment. Evaluating website aesthetics through user ratings is resource-intensive [
8], with the design, implementation, and data analysis demanding high levels of expertise and involving significant time and financial costs, increasing the complexity and delay of evaluations. In contrast, with the advancement of artificial intelligence and machine learning technologies, automated tools and algorithms for aesthetic evaluation are emerging. However, these methods require collecting extensive data to train evaluation models, including direct user feedback on interface aesthetics and user behavior and interface usage data. Studies indicate that datasets need to contain over 16,000 webpage screenshots for deep learning models to be effective [
9]. The collection, processing, and analysis of data require substantial resources, including advanced computing capabilities and data storage space, posing a challenge for resource-limited research teams or companies.
Secondly, existing evaluation methods often involve multiple metrics, making it difficult to intuitively assess the merits of interface layout solutions. In the process of rapid design iteration, designers need to compare the subtle differences between various design proposals and require a comprehensive index for overall decision making. Although previous computational methods have been proposed based on multiple aesthetic dimensions to aid designers in quantifying interface aesthetics, computational models have become exceedingly complex, leading to redundancy and overlap among some metrics [
10]. Currently, the acquisition of a comprehensive index often relies on the Analytic Hierarchy Process (AHP), where the weights of various metrics are determined through questionnaires or interviews [
11]. Therefore, devising a more scientifically effective method to integrate multiple aesthetic metrics into a comprehensive evaluation indicator remains a significant challenge in the application of interface layout evaluation methods.
In response to these challenges, this research simplifies and adjusts the 14 metrics proposed by Ngo [
12] to identify seven aesthetic metrics: density, symmetry, balance, proportionality, uniformity, simplicity, and sequence. These metrics are integrated into a comprehensive evaluation indicator using multiple regression and entropy methods, with the validity of both fitting methods verified. Leveraging automatic segmentation and recognition technology for graphical user interface (GUI) screenshots, this research enables the rapid and automated acquisition of the seven metrics’ evaluation values and the comprehensive indicator’s evaluation value. Consequently, an interface layout aesthetic evaluation prototype system was designed. We aim to reduce the time, manpower, and resources required for interface evaluation, enhance the universality, compatibility, and flexibility of layout assessment, and promote its integration at any design stage, contributing to lightweight rapid evaluation and iterative design cycles.
The remainder of the paper is structured as follows:
Section 2 elaborates on related research in interface layout aesthetic evaluation, further illustrating the challenges faced by the study.
Section 3 explains the setup, calculation, and recognition outcomes of the seven aesthetic metrics.
Section 4 and
Section 5 describe how the seven aesthetic metrics are integrated into a comprehensive evaluation indicator using multiple regression and entropy methods, respectively.
Section 6 validates the effectiveness of the comprehensive evaluation indicator obtained through both methods via verification experiments.
Section 7 discusses the interface layout aesthetic evaluation software based on the methodology proposed in this study.
Section 8 and
Section 9 discuss and summarize the findings of this research.
2. Related Works
Contemporary research and users often underestimate the immediate and profound importance of aesthetics in HCI interfaces [
13]. As a non-instrumental quality, the aesthetics of HCI interfaces play a crucial role, as they can be perceived at a glance, instantly determining whether a user is attracted to the system. Interface aesthetics can produce a hedonic halo effect, thereby influencing the usability ratings of the interface [
13]. Taking website interfaces as an example, in the absence of specific information, users face the decision of whether to continue interacting or to seek alternative options. At this juncture, the immediate evaluation of interface aesthetics plays a key role in the user’s decision to stay or leave, and it strongly influences the usability and credibility of the interface [
14]. For instance, in the presented fashion shopping scenario, elements such as the recommendation area and product highlights have a significant positive impact on customer attention due to adherence to aesthetic rules in interface design [
15]. Soui et al. investigated the impact of eight aesthetic flaws across 56 versions of five Android applications, confirming that despite significant code improvements, some severe aesthetic flaws persist, potentially necessitating additional maintenance efforts [
16].
Given the significance of aesthetics in the field of HCI, it is necessary to employ effective and reliable tools for its assessment. This study summarizes common interface layout aesthetic evaluation metrics found in recent research. Ngo’s model is a classical approach that includes 14 aesthetic evaluation metrics and indicates that the interactions among selected features are linear and all these features are equally important [
12]. Maity and Bhattacharya primarily considered text, images, and whitespace as the main elements, thus forming an interface aesthetic computation model [
17]. Wang et al. proposed an interpretable GUI design aesthetic index, integrating visual aesthetics (visual similarity and spatial proximity) and GUI structure (semantic similarity and whitespace) to simulate the distribution of visual grouping [
18]. Chen et al. aimed to study the impact of shape, contrast, and visual force on the visual weight of interface elements, providing empirical evidence for optimizing the balance calculation model [
19]. Liu et al. evaluated the relationship between eight key elements (image–text ratio, color difference, color distribution, color harmony, thematic style, whitespace ratio, frame style, and number of colors) and emotions [
20]. Deng explored the relationship between curvature and proportion and emotional preferences regarding five sets of emotional indicators for interfaces (safety–danger, formal–lively, masculine–feminine, cold–warm, and soft–hard) [
21]. Chen and Zhang selected four indicators from existing aesthetic metrics—balance, equilibrium, cohesion, and density—to evaluate laptop keyboard layouts [
22]. Hynek and Hruška aimed to analyze the applicability of selected object-based metrics in dashboard quality assessment and their ability to differentiate well-designed samples, focusing on users’ subjective perceptions and constructing a model to rate and classify object-based metrics based on the ability to objectively distinguish well-designed dashboards [
23]. In webpage evaluation, VisAWI is a common model that includes four metrics: simplicity, diversity, colorfulness, and craftsmanship [
2,
4,
7].
With the rapid increase in demand for aesthetic evaluation, a large number of tools for assessing interface aesthetics have emerged. Zen and Vanderdonckt constructed QUESTIM, which includes a simplified aesthetic model and implements the assessment of GUI metrics through a web service tool [
24]. Oulasvirta and colleagues developed the Aalto Interface Metrics (AIM), which aggregates multiple models and metrics, providing an online service where users can input the URL of a GUI design for online evaluation [
25]. Bessghaier et al. proposed an automated method for restructuring the design of user interface structures based on a data model, using the ADDET tool to assess the quality of the original and restructured versions of 511 user interfaces in terms of the number of aesthetic flaws and aesthetic properties. The results indicate significant positive differences between the restructured user interfaces and the originals under the improvement of seven quality indicators, with an average value of 0.59 [
26]. Samele and Burny developed OctoDollop, which can assess graphical user interfaces instantaneously and seamlessly based on a limited number of samples, without departing from their usage context [
27].
Although various tools are available to measure interface aesthetics, only a few have been proven effective and capable of accurately assessing actual aesthetics. Lima and Gresse von Wangenheim, through a comprehensive summary of user interface visual aesthetics over the past decade, contend that few methods have been comprehensively evaluated in terms of reliability and validity. Given the importance of visual aesthetics as a part of software quality, further research is warranted [
28]. While other measurement methods have been utilized in aesthetic assessments, their lack of standardization, vague psychometrics, and empirical research preclude them from serving as reliable tools [
1]. Existing models for predicting aesthetics are limited in performance and capability [
8]. Current research on HCI interface layouts predominantly focuses on ergonomics analysis, with insufficient study on the aesthetics and aesthetic degree computation of interface layouts. The application of aesthetics in interface design is still in its infancy, lacking clear aesthetic standards to guide interface design [
3]. Moreover, most studies concentrate on the functional aspects of features, with considerably less attention given to aesthetic design features [
5], and there is a lack of research investigating the reliability and validity of proportion types as a unidimensional structure in visual aesthetics [
29]. Additionally, the evaluation indicators are overly abstract, the objective rationality of the evaluation process needs improvement, and the feedback guidance of the evaluation results on HCI design requires enhancement [
30].
There are also studies that apply machine learning methods to identify features related to aesthetics, thereby creating models to predict aesthetics. For instance, Soui and Haddad combined the Densnet201 architecture with the K-Nearest Neighbor (KNN) classifier to evaluate mobile user interfaces, assessing this approach using a publicly available large dataset, with the model achieving an average accuracy of 93% [
31]. These methods can capture rich or complex aesthetic perceptions, providing excellent results. However, this approach often faces issues with interpretability, generalizability, and flexibility. For example, the aesthetic perceptions provided by machine learning can enhance the accuracy of evaluations but do not intuitively suggest to web designers how to lay out web objects [
32]. Moreover, such models typically focus on specific targets, such as evaluating interfaces of particular categories, and require complete retraining if modifications are needed [
33].
In terms of specific applications, although previous research has provided evidence of the impact of website aesthetic design features on user responses, the underlying mechanisms of this impact remain relatively unexplored [
5]. For a complete webpage, the design and layout often overlook the consideration of each element’s layout position, focusing instead on specific visual areas [
32]. Furthermore, existing mobile marketing recommendation methods lack further research into user data and the layout features of mobile marketing recommendation interfaces, thus failing to utilize user aesthetic preference information to improve the quality of mobile market recommendations. Research on utilizing the layout information of mobile marketing interfaces and user aesthetic preferences from an aesthetic perspective for layout remains insufficient [
34].
3. Acquisition of Interface Elements and Determination of Metrics
In this study, OpenCV was primarily utilized to automatically acquire the positional information of interface elements, followed by the determination of seven aesthetic evaluation indicators and their quantification methods based on Ngo’s research. Subsequently, the effects of automatic acquisition and quantitative calculation were demonstrated through examples of two interfaces.
3.1. Extraction of Interface Element Position Information
In traditional methods of interface aesthetic evaluation, manual dragging is generally required to mark the edges of elements. This method is inefficient when there are many interface elements or when multiple interfaces need to be evaluated comparatively. In our previous research [
35], preliminary attempts at edge detection for graphical user interfaces were implemented. In this study, OpenCV was further employed to automatically obtain the contour dimensions and positional information of design elements in interface screenshots.
3.1.1. Preprocessing
Since this study does not consider the color design of the interface, the screenshots are first converted to grayscale upon file reading, specifically using the cv2.COLOR_BGR2GRAY command. Subsequently, the screenshots undergo binarization, aimed at further simplifying the grayscale image to make the edges more distinct and the internal contours clearer. During thresholding, an adaptive threshold method is used, where the threshold is obtained by calculating the weighted average of the surrounding area of each pixel and applying this threshold to the current pixel. After binarization, the screenshot image retains only two gray levels: 0 and 255, representing black and white.
3.1.2. Image Segmentation and Position Acquisition
The segmentation and detection method used in this study is essentially consistent with the method used by Bakaev et al. [
36], with the difference that we do not use the DOM for auxiliary recognition. By employing the edge detection feature in OpenCV, elements within the interface screenshot can be recognized as rectangles. For the interface, the width and length are set as “width” and “height”, respectively; for the elements, each element’s attributes are represented as a tuple
, indicating the horizontal position, vertical position, length, and width of element i within the interface. The definition of position information is as shown in
Figure 1.
3.2. Aesthetic Evaluation Metrics
This section elucidates the specific meanings and quantification methods of the metrics frequently encountered in related research. Ngo’s study proposed 14 quantifiable metrics, the effectiveness of which has been widely recognized. However, a review of the literature reveals that most studies adopt only 4–6 of these metrics rather than using all 14.
In our research, modifications were made to Ngo’s metrics as follows: Firstly, the metric system was reduced. Due to the need for multiple linear regression in this study, collinearity issues among some metrics, such as between balance and equilibrium, were identified in preliminary research. Consequently, this led to a simplification of the original fourteen metrics down to the seven utilized in this study. Secondly, the metrics and their formulas were further streamlined, primarily focusing on symmetry, proportionality, and simplicity. Lastly, in using these seven metrics and formulas, the need to segment and extract interface elements necessitated the definition of positional tuples . Based on this positional measurement approach, modifications were made to Ngo’s formulas to facilitate programming implementation in OpenCV.
In the metrics described below, the total number of elements in the interface is set as N. Additionally, as this study primarily focuses on the aesthetics of layout and does not involve the impact of color, the following metrics are applied to the interface after undergoing the preprocessing described in
Section 3.1.1.
3.2.1. Density
In this study, density is determined by the optimal proportion of whitespace within an interface. Whitespace in interface design, utilized to divide the design space, constitutes the breathing space of an interface. In graphic design, designers often leverage whitespace to create a refined, high-end image for products or brands. Whitespace significantly affects the visual impact of an interface, and it is generally believed that interfaces achieve maximum aesthetic and usability when whitespace occupies about 50% of the user-visible area. From the perspective of different interface types, usability increases with more whitespace up to 50%, beyond which usability decreases with additional whitespace. For e-commerce website interfaces, where users typically expect product diversity, reduced product information might lower their shopping intent, while overcrowded interface elements can significantly decrease perceived usability. Studies show dissatisfaction among users of all ages with either too high (above 90%) or too low (below 50%) proportions of whitespace. When users are asked to complete open-ended questions regarding their usability and aesthetic needs, the most common responses relate to “simplicity”, such as clear layouts, high readability, and distinct titles. Given recent design trends, minimalist design styles like “Minimalism” are particularly popular among users. Therefore, it can be inferred that interfaces with 50% element occupancy provide the most comfortable user experience.
Different studies slightly vary in their definition of whitespace. Some scholars consider line spacing, paragraph spacing, distances between elements and boundaries, gaps between text and images, and color blocks used to differentiate elements as whitespace. Others view any part of the interface not conveying information as whitespace. Quantification of density is twofold: one based on pixels, calculating the proportion of non-informative pixels in the total pixel count of the interface screenshot, and another based on element area, abstracting elements into rectangles and computing the proportion of the local blank area outside rectangles in the total page area. While other metrics could also adopt these two quantification methods, this research abstracts each element in the same manner without considering specific content or color, thus opting for the second method based on the area of the bounding rectangles of elements.
The specific formula is as follows:
3.2.2. Symmetry
The literature on the impact of interface symmetry on user perception and cognition is abundant in studies of graphical user interface layouts, underscoring symmetry as a critical factor. Symmetry aids in refining the interface structure, enhancing visual guidance for information, and improving users’ comprehension of interface content.
Various methods for quantifying interface symmetry exist, with considerable research focusing on vertical symmetry, making it a significant method for symmetry quantification. Initially, quantification of interface symmetry was based on three orientations: horizontal, vertical, and diagonal symmetry, with equal weight given to each. However, recent studies, especially those on graphical user interfaces like web and mobile layouts, predominantly focus on vertical symmetry. Research indicates a strong correlation between vertical symmetry and users’ aesthetic preferences. Therefore, this study considers only the vertical symmetry of interface layouts.
Vertical symmetry is defined as follows: A perpendicular line through the intersection of the interface’s diagonals divides the interface into left and right sections. The left side elements are mirror-copied across this perpendicular line, and the proportion of the area where the copied elements overlap with the elements on the right side to the total area on the right side represents vertical symmetry. Specifically, the quantification algorithm for symmetry proceeds as follows: First, divide the interface into left and right parts using a vertical symmetry axis, then identify pixel pairs (u, v) on both sides that meet the following three criteria:
where
and
represent the grayscale values of pixel points u and v, respectively. Given that the image has undergone binarization, both values should be 0, indicating that the pixel points are part of the element’s interior rather than the background. The proportion of pixel pairs that meet the above three criteria relative to the total area of all elements quantifies the interface’s vertical symmetry. The expression of vertical symmetry in interface layouts under this quantification method is as follows:
where
denotes the number of pixels pairs that meet the above three criteria.
3.2.3. Balance
Similar to interface symmetry, interface balance encompasses different types, primarily focusing on central balance and the separate calculations of left–right and top–bottom balance. In physics, balance is defined as equal weight on both ends. Visually, balance implies that elements within an interface are orderly arranged, creating a dynamic sense of stability. The two main factors affecting users’ perception of interface balance are visual weight and positioning.
In this study, interface balance is defined as the distribution of visual weight among elements within the interface. The interface is divided into four quadrants: top-left, top-right, bottom-left, and bottom-right, each with equal weight. The quantification approach compares the difference in visual weight between the sides of the vertical and horizontal symmetry axes. The specific formulas are as follows:
where L, R, T, and B represent the left, right, top, and bottom sides of the vertical and horizontal symmetry axes, respectively.
denotes the area of each quadrant’s elements, and
represents the distance between the element’s center point and the interface’s center point.
3.2.4. Proportionality
High-quality proportions have been widely applied, with the Golden Ratio considered the most preferred proportion in human perception. Thus, it is extensively used in both grand architectural and delicate jewelry designs. This study’s applicability to graphical user interfaces is universal, and interface proportionality cannot be quantified entirely in the manner proposed by Ngo. Therefore, this research further quantifies interface layout proportionality, simplifying Ngo’s formula [
12]. The specific formulas are as follows:
where N is the total number of elements within the interface. The process is to calculate the ratio of
to
for each element, determine which of the two values,
or
, is greater, and use the larger one as the denominator and the smaller one as the numerator. The preferred proportions
are selected based on the five ratios mentioned by Ngo in his paper: 1:1, 1:1.414, 1:1.618, 1:1.732, 1:2 [
12].
3.2.5. Uniformity
Uniformity refers to the consistency among elements belonging to the same functional module within an interface. Similar to its function in the fields of architectural and industrial product design, uniformity plays a significant role in enhancing the aesthetics of an interface. It helps users better understand the product’s functionalities and naturally guides them in performing corresponding operations. Uniformity in an interface can be achieved by utilizing similar element sizes to ensure that the gaps between elements are not too large and that the gaps between elements and the interface boundaries are not exceeded.
Uniformity is defined as the degree to which all elements in an interface appear to be part of a whole. This includes two aspects: the similarity in element sizes and the spacing between elements compared to the spacing from the edges. The specific formulas are as follows:
where
refers to the degree of similarity in sizes among interface elements, calculated as follows:
where
denotes the area of the bounding rectangle of all elements within the interface, width and height are the dimensions of the interface screenshot,
and
are the dimensions of each element,
refers to the number of different sizes among the interface elements, and N represents the total number of elements within the interface.
3.2.6. Simplicity
Simplicity refers to the degree to which elements within an interface are easily accepted by users, typically measured by the number of elements and their alignment levels. This study’s definition of simplicity, following Ngo, pertains to the distribution and alignment degree of elements within an interface [
12].
where
represents the number of vertically aligned elements,
represents the number of horizontally aligned elements, and N is the total number of elements within the interface. The coordinates of the top-left corner of the elements are denoted as
, and the number of points with identical
or
values, i.e., the number of vertical alignment points
and horizontal alignment points
, are calculated. A higher
value indicates stronger simplicity, while a lower
value indicates weaker simplicity.
3.2.7. Sequence
Sequence measures the degree to which the layout of elements within an interface facilitates eye movement. Typically, especially during free browsing, the gaze moves from the top-left corner across to the bottom-right corner, with elements of strong contrast dominating more user attention. From a layout perspective, elements occupying larger areas are more likely to be noticed. The quantification formula for sequence is as follows:
where
is the area of element i in quadrant j, and
represents the dominance weight of the top-left, top-right, bottom-left, and bottom-right quadrants, corresponding to 4, 3, 2, and 1, respectively.
3.3. Metrics Acquisition and Calculation
Based on the content of
Section 3.1 and
Section 3.2, this section demonstrates the multi-metric evaluation results for interface layouts through computational examples involving the layouts of two interfaces. In
Figure 2, (a) and (b) represent two different interface layout methods, whereas in
Figure 3, (a) and (b) correspond to the recognition of segmentation detection formed by them. Based on the detection and recognition outcomes and the aesthetic calculation formulas outlined in this study, the evaluation results for the two layout schemes can be directly obtained, as shown in
Table 1.
From the calculated results of the mentioned metrics, it is evident that the two schemes exhibit varying strengths across the seven metrics, making it challenging to intuitively determine the superiority of one design over the other. Thus, alongside providing detailed specific metric calculation results, a scientific and reliable comprehensive index is necessary to intuitively and comprehensively evaluate the advantages and disadvantages of the two layout approaches.
4. Evaluation Method Based on Multiple Regression Model
The first method for obtaining a composite index involves using seven indicator values to model users’ overall evaluations of interface layouts. Specifically, user ratings for their overall perception of different interface layouts are collected through an online questionnaire. Additionally, the seven indicators for each interface layout are also obtained using the method described in
Section 3. Ultimately, these are modeled using a multiple regression approach, the process of which is illustrated in
Figure 4.
In this study, it is hypothesized that users’ evaluations of interfaces are directly related to the seven selected metrics through a multiple regression relationship. A multiple regression model is established based on user ratings for different interface layouts collected via online survey questionnaires. The ratings employ a Likert scale method, and the collected data are analyzed and processed. The average user rating for each layout is calculated, along with the computational values of the seven metrics for each interface. These data are then fitted to derive the formula for the multiple regression model. The explanatory variables are the seven metrics: density
, symmetry
, balance
, proportionality
, uniformity
, simplicity
, and sequence
, with the user rating for a particular interface as the dependent variable
Y. The multiple linear regression model is as follows:
where
, …,
are the regression coefficients, and
is the random error term.
4.1. Questionnaire Survey
Prior to conducting the interface layout evaluation, the objective of this evaluation was communicated to participants, along with a brief description of interface layout aesthetics. Additionally, basic information about the participants was recorded, including age, gender, educational background, experience in interface design, and daily usage duration of GUIs.
The interface screenshots for online evaluation were not subject to a time limit for display. Typically, time restrictions are imposed during aesthetic ratings of webpages and other GUIs to prevent content within the interface from influencing users. However, the questionnaire samples consist of abstract images of interface layouts, devoid of specific colors and content; hence, no display time limit for layout interfaces was set. The rating employs a 5-point Likert scale for the online survey experiment, where “Very appealing” scores 5 points, “Quite appealing” 4 points, “Neutral” 3 points, “Slightly unappealing” 2 points, and “Very unappealing” 1 point. The questionnaire contains a total of 55 images, derived from real GUIs and abstracted into layout diagrams through binarization, with elements represented by gray rectangular blocks. Some of the experimental materials are shown in
Figure 5.
4.2. Data Collection
A total of 320 questionnaires were distributed, with 314 considered valid. Among the respondents, 166 were female, accounting for 52.87% of the total, and 148 were male, making up 47.13%. The age distribution included 116 individuals between 18 and 25 years old, 113 individuals aged 26–30, 16 individuals over 30, and 8 individuals under 18. A total of 170 participants had experience related to interface design, and all participants had normal corrected vision.
The rating given by each user to interface layout
K is denoted as
, with the total ratings for each interface by all users represented as
, where
is the total number of participants. The arithmetic mean of ratings for each sample is calculated as:
and the standard deviation for each sample is:
The questionnaire survey on interface layout aesthetics and the summary of the data are presented in
Table A1 of
Appendix A. Samples 18, 32, 36, 37, and 54 exhibited coefficient of variation values exceeding 44%, which is significantly higher compared to other samples. Therefore, data related to these five samples were excluded during the data fitting process. To eliminate the influence of dimensions, the evaluation values were normalized, with the results shown in
Table A2 of
Appendix A.
4.3. Data Analysis
The data were subjected to multiple regression analysis to assess whether the model exhibits multicollinearity. The specific related data obtained are presented in
Table 2.
Table 2 shows that the model indicates passed the
F-test (
F = 4.029,
p = 0.002 < 0.05), suggesting that at least one of the variables has a significant impact on the dependent variable
Y. Furthermore, the determination coefficient
R2 is 0.402, meaning that the independent variables can explain 40.2% of the variance in the dependent variable.
Subsequently, we calculated the impact of each independent variable on the dependent variable
Y and assessed the statistical significance of each regression coefficient using the t-statistic. As presented in
Table 2, the t-statistics and corresponding
p-values for each variable were determined. These calculations of t-statistics and
p-values indicate that the coefficients are statistically significant, thereby confirming that the influences of different independent variables are meaningful.
Following this, the correlations between variables were calculated and analyzed, with the results displayed in
Figure 6. The results indicate significant positive correlations between
and
, and
and
, with correlation coefficients around 0.5 and
p-values less than 0.001. Conversely, the correlations between other pairs of variables are relatively weak.
Finally, the issue of multicollinearity among variables was examined using the VIF. As shown in
Table 2, the highest VIF is 1.966, which is well below the commonly used thresholds of 5 or 10, indicating that our model does not suffer from severe multicollinearity issues. Although pairs of variables with high correlations were identified in the correlation analysis, these relationships did not statistically elevate the risk of multicollinearity. Additionally, the Durbin–Watson (D-W) value is 1.646, suggesting that there is no autocorrelation in the model and the sample data are uncorrelated, thus enhancing the model’s reliability. Therefore, these variables are suitable for multiple linear regression analysis. Based on the coefficients derived from
Table 2, the multiple linear regression model formula is:
5. Evaluation Method Based on Entropy Theory
The second method for acquiring a comprehensive index utilizes the entropy weight method to determine the weights of each metric, thereby constructing a comprehensive evaluation model for interface layout metrics. Entropy theory, now integrated into research across various disciplines, enhances objectivity. The core idea of the entropy weight method is that the greater the amount of information in a system, the smaller the uncertainty, resulting in a higher weight; conversely, the smaller the information amount, the greater the uncertainty, and the smaller the relative weight. Assuming a relationship between the seven metrics and the comprehensive evaluation results of interfaces based on entropy, this method attempts to determine the weights of selected metrics using the entropy weight method to fit the comprehensive evaluation results. The process of the entropy method is shown in
Figure 7.
As all metrics in this study are positive, with higher numerical values indicating better outcomes, the calculation formula is as follows:
where
represents the computed value of the
layout metric for the
sample. For the interface to be evaluated,
is directly obtained using the methods described in
Section 3.
m denotes the number of samples, and
n indicates the number of metrics.
Using Python’s NumPy library, the metrics undergo max-min normalization. To mitigate the influence of extreme values, any metric calculation value of 0 is converted to 0.01 to ensure result validity. Since all selected metrics are positive, the specific normalization calculation is as follows:
The proportion of the
sample value of the
layout metric to that metric is
, calculated as:
The entropy value
for the
metric is:
where
. The weight
for each layout metric is then:
Through the calculations above, the weight matrix
for the interface layout metrics can be determined. Based on the entropy weight method evaluation mapping, the formula for comprehensive evaluation of interface layout is:
where
represents the comprehensive evaluation for the
sample;
is the normalized matrix of interface layout metrics;
is the matrix of layout metric weights.
8. Discussion
Building on previous research findings, this study identified metrics highly correlated with the aesthetic appeal of interface layouts, adjusted and elaborately described their quantification methods, and used these as a basis for quantifying the aesthetics of interface layouts. The metrics include density, symmetry, balance, proportionality, uniformity, simplicity, and sequence. In practical evaluation applications, it is often challenging to intuitively display the merits and demerits of different design schemes using these seven metrics directly. In the research by Li et al., the weights of the metrics were derived from scores obtained through user questionnaires and interviews [
11], a method which inherently bears a level of ambiguity [
37]. In contrast, this study fits the seven metrics into a single comprehensive index using both multiple regression and the entropy weight method, approaches that offer more objectivity and statistical significance.
Moreover, unlike the study by Wan et al., which posited a positive correlation between an interface’s popularity and its aesthetics, using popularity rankings and visitation frequency as indicators of high aesthetic quality [
32], this research focuses directly on the aesthetic features of the interface. Thus, it avoids the influence of various factors, such as interface functionality and user needs, on visitation frequency. Validation of the comprehensive index reveals that the ranking results obtained from both multiple regression and the entropy weight method largely align with the outcomes of the ranking method employed by participants. Although slight discrepancies exist between the rankings of the first and second places, and the third and fourth places, the differences in the specific values of the two comprehensive metrics are minimal, making the ranking reversals within an acceptable range.
From the final ranking results, it is evident that Layout 1 (The New York Times homepage), Layout 2 (BBC News homepage), and Layout 6 (Huxiu homepage) rank among the top three, both in terms of index scores and user ratings. These layouts are discernibly more structured, content-rich, and feature relatively larger fonts and images compared to others, aligning with user demands for interface aesthetics.
In the prototype design of the interface layout evaluation software, we integrated the interface segmentation recognition method from previous research with the multi- metric and comprehensive index calculation methods proposed in this study, resulting in an automated computation and evaluation of interface layouts. This software holds significance in two main aspects.
The first is its universality. Many current studies use manual selection for segmentation, bypassing or overlooking the acquisition of interface elements’ position and size information. Segmentation remains a bottleneck, hindering the full automation of computational aesthetic assessments [
38]. Some more automated approaches involve using web crawlers to directly read a webpage’s HTML source code [
32] or browser extensions to support webpage segmentation methods’ inspection and analysis [
39]. However, when evaluating interfaces during the design process or when interfaces are presented in different formats (e.g., low-fidelity drawings or interfaces involving some confidential systems), segmentation and recognition become barriers to rapid evaluation [
40]. Therefore, this study evaluated interfaces using screenshots [
14], employing wireframe models to represent the position and size of elements within the interface [
41]. Thus, regardless of the interface’s current form or the frontend language in which it is written, the final user-facing visual interface can be captured. Recognizing and processing interface screenshots allows for a more universal evaluation.
Secondly, the software has a significant impact on improving the design cycle. Interface design should not be a linear process from design to evaluation. If evaluation only occurs after completing the requirements-low, fidelity-high, fidelity-frontend design stages, any required modifications based on evaluation results would consume considerable time and manpower. This is primarily because traditional interface layout assessments rely on anthropometric data to verify the accessibility and feasibility of human–computer interaction interfaces, focusing on the quantitative analysis and processing of human–machine operation experimental data [
3]. Through the outcomes of this study, a lightweight, rapid, and automated evaluation of interface layout aesthetics can be achieved, offering an effective alternative to evaluation methods that require recruiting a large number of participants for empirical experiments. This alternative fosters a shift from a linear design process to a cyclical design–evaluation progression, especially when comparing multiple design schemes. It allows for intuitive comparisons of different layout designs or understanding the effectiveness of layout improvements. Embedding evaluation throughout the design cycle, rather than as an afterthought, can significantly reduce subsequent testing costs, enhancing design efficiency and reliability [
37]. Designers can innovate and respond to user needs more effectively, meeting the complexity and dynamism of interface design and evaluation, and promoting a shift towards more iterative, agile, and user-centered methodologies.
Our research still exhibits certain limitations, primarily manifested in the generalizability of the validation experiments. In
Section 6, the experimental materials used were all sourced from a single category of interfaces, specifically the interfaces of electronic newspapers. This decision was made because, unlike the more universally applicable abstract interfaces used in
Section 4, the real interfaces in the validation experiments contain actual content, which could influence user evaluations. For example, the content volume in electronic news interfaces is typically much higher than that in the home interfaces of ordinary apps. Therefore, to control variables, interfaces from the same category were utilized. In future research, it is necessary to validate the effectiveness of the methods proposed in this study across more categories of real interfaces.
9. Conclusions
In this study, we proposed and implemented a method for evaluating the aesthetics of interface layouts by comprehensively considering seven key aesthetic metrics: density, symmetry, balance, proportionality, uniformity, simplicity, and sequence, aimed at enhancing the efficiency and accuracy of interface design evaluations. The main contributions of this research include the following aspects:
Firstly, we adjusted and optimized existing aesthetic evaluation methods. We simplified the fourteen criteria from Ngo’s study to seven metrics, and modified the calculation methods for symmetry, proportionality, and simplicity among them. Subsequently, using two distinct statistical techniques—multiple regression analysis and entropy weighting method—we integrated the seven independent aesthetic metrics into a single comprehensive evaluation index. The success of this step not only validates the effectiveness of the chosen methods, but also provides a reliable quantitative tool for subsequent interface layout aesthetic assessments.
Secondly, by incorporating the interface screenshot automatic segmentation and recognition technology from previous research, this study can rapidly and automatically obtain the seven metrics’ evaluation values and their comprehensive evaluation value for interface layouts. The application of this technology significantly speeds up the evaluation process and its automation level, reducing the demand for manpower and resources, and enhancing the universality, compatibility, and flexibility of the assessment.
Further, based on the aforementioned methods and technology, we developed a prototype system for evaluating the aesthetic quality of interface layouts. This system not only facilitates rapid assessment of the aesthetic quality of interface layouts, but also promotes rapid iteration and optimization during the design phase, offering significant value in supporting lightweight and swift evaluations and cyclical iterative design.
In summary, this study not only theoretically expands the research on interface aesthetics evaluation but also provides an effective tool and method in practice to support and promote efficient, accurate interface design assessments. Future work will focus on further optimizing the accuracy of the evaluation model, expanding its applicability across different types of interface designs, and exploring its potential for integration and application within actual design processes.