1. Introduction
Globally, e-commerce is increasingly growing without any signs of declining in the future. E-commerce has revolutionized business, offering organizations limitless possibilities for growth and success [
1]. The ongoing growth of e-commerce is derived from advancements in product visualization, such as 3D presentation techniques [
2,
3]. Therefore, the user interface (UI) of websites is of great importance, and various graphical UI design principles, such as flexibility, simplicity, and learnability, should be ensured [
4]. Users firstly judge the website through its UI and explore different pages with the aim of finding out different information about their inquiry [
5,
6]. To create a first, good lasting impression, the UI should be consistent across all the pages and through all times. This is mainly vital for e-commerce websites because they are regularly changing in line with the ongoing change in daily needs and promotional offers [
7,
8]. Usability is a quality aspect that helps users achieve their goals easily. Websites that have good usability attain users and ensure they are happy [
5,
9]. Usability measures how easily a user can utilize the interface [
10]. The significance of usability evaluation is undeniable when designing a website [
11]. It is also of crucial importance for already running websites [
5].
Usability analysis is performed through traditional methods, including, but not limited to, click analysis and questionnaires. Nevertheless, such methods are not capable of addressing the aspects that the technological-centered websites present. They are not capable of providing the cognitive processes and direct information on a user’s thinking. It is challenging to evaluate these aspects through such traditional methods [
12]. To ultimately improve technology-centered e-commerce websites, a number of behavioral measurement techniques are adopted, such as usability testing [
13,
14,
15], heuristic evaluation [
15,
16,
17], eye-tracking [
18,
19,
20], electroencephalography (EEG) [
21], and physiological measures [
22]. However, each method has its advantages and disadvantages. Some methods report limited usability issues or minor ones only. Therefore, there are questions about their effectiveness in identifying valid, significant, and consistent problems.
In light of existing studies in the usability literature, it is observed that there is a lack of studies specifically related to conducting behavioral measurement techniques on e-commerce websites, particularly in Saudi Arabia. As e-commerce is a growing industry in Saudi Arabia, it is important to ensure that websites are user-friendly and provide a positive user experience. In view of this, there is a serious need to identify usability challenges and provide recommendations for improvements to enhance the online presence of businesses in the region and attract more customers. Based on that, this research aims to investigate the usability of two e-commerce websites in Saudi Arabia and compare the effectiveness of various behavioral measurement techniques, such as heuristic evaluation, usability testing, and eye-tracking. Specifically, the research selects the Extra and Jarir e-commerce websites in Saudi Arabia based on a combined approach of criteria and ranking. This research adapts an experimental approach in which both qualitative and quantitative approaches are used to collect and analyze the data. The contributions of this paper can be highlighted as follows:
- -
The utilization of three different behavioral measurement techniques, which has not been addressed in previous studies, and the comparison of their effectiveness in identifying usability issues make a significant contribution to the field of website usability evaluation.
- -
By incorporating a comprehensive comparative study of these behavioral measurement techniques, this research aims to advance the understanding and effectiveness of usability evaluation methods for e-commerce websites in Saudi Arabia, an area that has not received adequate attention thus far.
- -
Based on the obtained results, this study provides detailed recommendations which are divided into three primary groups (for the Extra and Jarir websites, for e-commerce websites, and for usability elevators).
- -
The identification of the strengths and weaknesses of the Extra and Jarir websites through our research provides insights into their usability and user experience (UX). As these websites attract a large number of daily visitors and hold considerable importance within the e-commerce sector in Saudi Arabia, it is crucial to ensure an optimal UX for users.
In fact, this research fills a critical gap by evaluating the usability of e-commerce websites specifically within the context of Saudi Arabia. It serves as a foundational study in addressing the importance of website usability in the Saudi Arabian context, where there is currently a lack of focused research on this specific problem. While the selection of conducting this research in Saudi Arabia may not be driven by a single explicit reason, the evaluation of the Extra and Jarir websites serves as a practical case within the e-commerce sector in general, which is relevant and significant to the Saudi Arabian context. Since e-commerce plays a crucial role in Saudi Arabia’s Vision 2030, contributing to various aspects such as economic diversification, job creation, digital infrastructure development, small and medium-sized enterprises (SME) growth, consumer convenience, and global competitiveness. We believe that our methodology and results can be applied to other e-commerce websites in similar contexts. However, through our recommendations and insights gained from the study, we mainly aim to specifically promote awareness and understanding of the significance of usability among Saudi websites and to motivate them towards enhancing the overall user experience.
The paper is organized as follows.
Section 2 sheds light on relevant concepts and reviews the previous related studies, while
Section 3 shows the followed research methodology.
Section 4 analyzes the results of the study.
Section 5 presents the experimental results.
Section 6 discusses the findings.
Section 7 displays the recommendations. Finally,
Section 8 concludes the research with the limitations and presents future research.
4. Data Analysis
This section aims to show the followed procedures and findings of the experiments of the behavioral measurements techniques for the Extra and Jarir websites.
4.1. Heuristic Evaluation Analysis
Three experts were selected according to their experiences with UI/UX and heuristic evaluation. They were reached through e-mail and furnished with the three attachment files. The introductory file included the welcome message and the purposes of this research. The second and third files were evaluation forms for the Jarir and Extra websites that included the process of heuristic evaluation, Nielsen’s 10 general principles with a description of each one, and severity rating. Each expert spent 2 days before submitting the report for each website. Then, the provided problems with severity ratings were aggregated in one file. In total, the experts identified 50 usability issues, where 24 of the discovered issues were for the Extra website, and 26 of the issues were for the Jarir website.
A detailed quantitative analysis of the Extra and Jarir websites based on the heuristic evaluation was demonstrated. For each heuristic principle, a list of usability issues found per each severity score was provided. For a comprehensive estimation, a calculation was conducted on a weighted score for each principle by multiplying the number of usability issues per severity by the severity score to achieve. The total sum of these values for each heuristic principle was also calculated. The average severity score for each principle was calculated, and the results were rounded up. The higher the average severity scores, the more severe the usability problems. The following sub-sections present the analysis for both websites.
4.1.1. Heuristic Evaluation Analysis for Extra Website
Table 3 shows the average and frequency of the severity of problems based on the principles of heuristic evaluation for the Extra website.
In an initial evaluation, it was noted that nine problems (40.91% of total identified problems) were major, seven problems (31.82%) were minor, five problems (22.73%) were catastrophic, and three problems (13.64%) were merely cosmetic. Regarding individual principles, “flexibility and efficiency of use” and “help and documentation” resulted in average severity scores of 4 (catastrophic problem). Following that, “match between the system and real world”, “user control and freedom”, “consistency and standards”, and “aesthetic and minimal design” resulted in average severity scores of 3 (major problem). “Visibility of the system status”, “error prevention”, and “recognition rather than recall” resulted in average severity scores of 2 (minor problem). Despite the identified usability issues, no issues were found in “help users recognize, diagnose, and recover from errors”.
Major usability issues for the Extra website included vague system performance and unclear status. For instance, no UI change occurred when a user clicked on a menu item. Thus, the user was unable to know on which page they were. In addition, it was reported that the Extra website does not support smart search and spelling checks as users are not expected to recognize model numbers and items’ full names.
4.1.2. Heuristic Evaluation Analysis for Jarir Website
Table 4 shows the average and frequency of the severity of the problems based on the principles of heuristic evaluation for the Jarir website.
For the Jarir website, the initial evaluation showed that nine problems (34.62% of total identified problems) were minor, seven problems (26.92%) were major, five problems (19.23%) were catastrophic, and five problems (19.23%) were merely cosmetic. Regarding individual principles, “help users recognize, diagnose, and recover from errors” and “help and documentation” resulted in average severity scores of 4 (catastrophic problem). Following that, “match between the system and real world”, “user control and freedom”, “recognition rather than recall”, and “flexibility and efficiency of use” resulted in average severity scores of 3 (major problem). “Visibility of the system status”, “consistency and standards”, “error prevention”, and “aesthetic and minimal design” resulted in average severity scores of 2 (minor problem) average severity score.
Among the notable usability issues for the Jarir website was the limitation in user controls, such as the number of items in an order. The user could only order up to 20 instances of certain items using a drop-down menu to select a quantity. The user had to scroll down to reach the required number and was unable to simply type it. On other items, the user could only order one instance. Jarir also does not include basic error prevention functions, such as input validation. For instance, an expert provided an invalid phone number. However, the website did not show an error message or a notification.
4.2. Usability Testing Analysis
This section presents the data analysis of the usability testing technique of the selected websites. The usability analysis utilized multiple metrics: task success rate, task time, and error number. In addition, this section demonstrates the usability issues for each website.
4.2.1. Usability Testing Analysis for Extra Website
The percentage of success and failure of all four tasks is shown in
Figure 2. Three users were able to complete Tasks 1, 2, 3, and 4 without any help, while one user needed help with Task 2, and one user failed to complete Task 3. Most of the users struggled to finish Task 2 because they were unable to find the “Compare” button. They completed the task by manually comparing the product price. Participant (P1) was unable to complete Task 3; she added the wrong item to the cart.
The average task time of each task was calculated as shown in
Figure 3. The users demonstrated a short time on Task 4 compared to the others. Tasks 1, 2, and 3 took a long time. Task 2 took the longest time, and its average time was 175.4 s because users were searching for similar products to add to a comparison list, but they could not find the “Compare” button. They again needed to go back and search for similar products that they wanted to compare, and then they had to compare products manually.
After observing the users and analyzing their performance while implementing the four tasks, as shown in
Figure 4, the errors were most frequent in Task 2 followed by Task 3 and Task 1. Most participants committed errors with Task 2. They selected the wrong product for comparison because the search option showed irrelevant products. Some participants selected different product colors first and then found the correct product color in Task 1 and Task 3.
Considering all users and tasks, the overall success rate for the Extra website was 92.5, which (according to [
72]) was above the average completion rate of 78%. The minimum average completion score across all tasks was 80%, which was also above the average completion rate of 78%. The total number of errors was 11, corresponding to 0.55 per task, which was less than 0.7, the average number of errors per task.
Table 5 shows the identified usability issues and severity ratings for the Extra website. The usability issues pointed out problems related to effectiveness and efficiency.
4.2.2. Usability Testing Analysis for Jarir Website
The percentage of success and failure of all four tasks is shown in
Figure 5. One user was able to complete Tasks 1–4 without any help, while two users needed help with Tasks 1–3, two users failed to complete Tasks 1–3. All users completed Task 4. Users were unable to change the product color in Tasks 1 and 3, which led them to not completing the tasks. Most of the users struggled to finish Task 2 because they were unable to find the comparison list.
The average task time of each task was calculated as shown in
Figure 6. The users demonstrated a short time on Task 3 compared to others. Tasks 2, 1, and 4 took a longer time. Task 2 took the longest time, and its average time was 186.6 s, because users were searching for similar products to add a comparison list after clicking on the “Compare” button, but then they could not quickly find a comparison list in a visible place.
After observing the users and analyzing their performance while implementing the four tasks, as shown in
Figure 7. The errors were most frequent in Task 2 followed by Tasks 1, 4, and 3. Most participants committed errors in Task 2. They selected the wrong product for comparison because the search option showed irrelevant products. Some participants selected different products color first and then found the correct product color in Task 1 and Task 3. One user made four errors in Task 4 because he did not quickly find the telephone number of Customer Support.
Considering all users and tasks, the overall success rate of the Jarir website was 75% which (according to [
72]) was less than the average completion rate of 78%. The minimum average completion score across all tasks was 50%, considerably less than the average completion rate of 78%, indicating a major usability issue in the corresponding task, which was comparing items. The total number of errors was 19, which corresponded to 0.95 per task, more than 0.7, the average number of errors per task in the literature.
Discovered Usability Issues:
Table 6 shows the identified usability issues and the severity rating for the Jarir website. The usability issues pointed out problems related to effectiveness and efficiency.
4.3. Eye-Tracking Analysis
This section covers the usability analysis using eye tracking for both the Extra and Jarir websites. The experiment was carried out using Realeye.io, an online platform that offers subscription-based eye-tracking analysis. Using the platform, five participants per website performed the same tasks mentioned in
Section 4.3. While attempting to perform the tasks, the participants’ fixation number, average fixation duration, TTFF, spent time, AOI, revisits, and heatmaps were recorded.
The heatmap is a method for data visualization that indicates how the user views and interacts with a page, where red points to hot areas on which the user focuses and blue represents cold areas that receive very little attention. In terms of usability, a hot area could also indicate elements that are difficult to process. The same applies to fixations: while a long fixation could indicate a very interesting target, it could also indicate that the user could not immediately figure out the point of the element or had difficulty extracting information.
The analysis was carried out in three stages:
An overall evaluation of the website during user browsing: in this stage, an examination was conducted on the total time spent on tasks, the average fixation duration during tests, and the overall number of fixations and revisits.
Task-specific areas of interest: in this stage, an examination was also conducted on the above-mentioned metrics for certain areas of interest as shown in
Table 7.
Other non-task-related findings: in this stage, user behavior on the website was examined and provided an analysis based on their behavior and feedback.
4.3.1. Extra Website
On average, the participants spent 450 s performing tasks (7.5 min), of which 214 s (3.6 min) were spent fixating on elements. The nature of tasks and participant behavior during experiments indicated that the reason for the long fixation was the difficulty in locating what they were looking for.
Table 8 shows the overall fixation data for the Extra website.
Table 9 shows a considerable number of fixations and a significant number of revisits to major areas of interest, mainly the search bar and main menu. The relatively long fixation time of the search bar was due to a lack of smart search; some users had to type and retype. On the other hand, the high number of fixations of the item price and description was due to active search; before users committed to the task, their eyes roamed all over the item image, description, and primary features to ensure they had the right item.
The users had two ways to locate the element: either by typing the name or by searching through categories. Users who attempted to type the name had a problem without an auto-complete functionality, which led to users having to type the full name.
Figure 8 shows an aggregated heatmap for all users during Task 1; the main hot areas included the left side of the search bar and the area of the drop-down menu on which users expected to find smart suggestions.
On the other hand, users who searched through the categories had difficulty with the menus disappearing with any slight movement of the mouse.
Figure 9 shows users searching for items using menu categories; users kept flicking back to the source menu due to the disappearance of submenus.
As depicted in
Table 10, in this task, the time to the first fixation was significantly reduced. The users became more familiar with the position and purpose of the main elements on the website. The time spent on the search bar was significant due to the user searching for the first item and then for the second. The main menu was neglected by all users. All users performed the price comparison by browsing the first item and then the second. No user was able to detect the “Compare” button; therefore, no data were available for this element.
As expected, users spent considerable time fixating on item cards and item descriptions in an attempt to locate a “compare” button, as shown in
Figure 10 and
Figure 11.
As shown in
Table 11, users became adept at using the website. They focused on the search bar immediately and spent little time fixating on the task. Despite the simplicity of this task, the users faced an issue. When scrolling through search results, the cards did not contain an “Add to Cart” button. The following heatmap shows how after the product was identified, the users searched for an “Add to Cart” button on the right side of the page but found none, as shown in
Figure 12. For a regular buyer or an experienced user, this could be frustrating. All users had to access the full page before they could add the item to the cart. Furthermore, using a financing service should be an option, not a compulsory path. When adding an item to the cart, the users were greeted with the financing service banner. This could be avoided by using a separate button for users who wished to simply buy the item and those who wished to apply for financing services.
Table 12 shows the fixation data for the final task for the Extra website. The users were able to detect the support number easily on the footer, which was the first location to search for all users, with only one fixation and no revisits. The heatmap is shown in
Figure 13.
Table 13 shows the identified usability issues using eye tracking and the severity rating for the Extra website. The usability issues pointed out problems related to user behavior.
In addition to the tasks, a further analysis was conducted on the user behavior while browsing the website, and the findings are as follows:
In general, following gazes and fixations, users did not pay attention to the main banner on the home page, despite its considerable size and central position. This is due to either their focus on the tasks or the banner’s inability to catch users’ attention while browsing.
User scan paths were regressive, tracing back and forth between certain areas on the page. This is an indicator of search inefficiency. Unfortunately, a dedicated eye-tracking system, rather than software, is required to generate an accurate analysis.
4.3.2. Jarir Website
As shown in
Table 14, the participants spent 408 s on average performing tasks (6.8 min), of which 195 s (3.25 min) were spent fixating on elements. The nature of tasks and participant behavior during experiments indicated that the reason for the long fixation was the difficulty in locating what they were looking for. Interestingly, the average time to first fixation was 2 s, which is relatively high. This is due to the extensive set of banners all over Jarir’s homepage. The user gazed through colors and images without focusing on an item that stood out.
As per
Table 15, the TTFF for the main menu was considerably less than the TTFF for the search bar. In fact, most users attempted to locate the item using the main menu due to its colorful design and clear, inviting labels in contrast with the barely visible, grey search bar. However, the menu categories were very complicated and similar, which resulted in longer task times, longer fixations, and considerably more revisits.
Figure 14 shows Jarir’s menu heatmap with hot areas around the path to locate the required TV set.
In contrast, the search functionality was very efficient. Users who reverted to search using the toolbar were able to browse the impressive smart suggestions provided and immediately located the required item, as shown in
Figure 15. Interestingly, the item description was easily located, and the price was immediately detected as it stood out in a bold, red font.
As shown in
Table 16, all users reverted to search using the search bar, and the menu received very little attention, which indicates the difficulty they faced while using it. This is shown in the shorter TTFF on the search bar compared to the main menu and the fewer number of fixations the latter received. However, the “Compare” button was located immediately, as shown in
Figure 16 and
Figure 17.
Table 17 shows user bias using the search bar and the lack of interest in the menu. This is reflected in the users’ TTFF and the number of fixations, depicted in
Figure 18.
Jarir provides a clear call to action, and the Add to Cart button was easily located by users from the item card, as shown in
Figure 19.
Table 18 shows fixation data for the final task on the Jarir website. Users had to actively search for the support number. They attempted to scan the website for a call button or support number at the footer. However, they finally managed to locate the help button in the top menu.
Figure 20 shows the heatmap for identifying the support number location on the Jarir website.
Table 19 shows the identified usability issues using eye tracking and the severity rating for the website. The usability issues pointed out problems related to user behavior.
Further analysis of user behavior on the website revealed the following:
The users did not register any notable interest in the homepage despite its colorful design and attractive banners. However, the users’ notice of the main banner was more than that for the Extra website.
User scan paths were somewhat regressive, tracing back and forth between certain areas on the page. On the Jarir website, this was also due to the crowded banners with small fonts and icons.
4.4. SEQ
After the participants completed the required tasks in usability testing and eye-tracking techniques, they answered an SEQ to assess the difficulty of the tasks. The question was: Overall, how difficult or how easy were the tasks to complete?
The answers were on a Likert Scale, with 1 indicating very difficult and 7 corresponding to very easy; the results were calculated per both websites. Average SEQ scores for the Extra and Jarir websites are shown in
Table 20.
According to [
73], the average task difficulty using SEQ is 4.8. The calculated average SEQ among users was 4.1 and 5 for the Extra and Jarir websites, respectively. This indicates a below-average score for Extra, which corresponds with a higher number of task errors and more severe usability issues. As for Jarir, the score indicates an above-average score. In general, users on Jarir had less difficulty performing tasks which is reflected in a higher SEQ.
5. Results
This section aims to show the experimental results of the behavioral measurement techniques by comparing the effectiveness of the three techniques by listing the number of identified usability problems alongside the severity ratings. This section also covers the results of the selected techniques for both websites and analyzes these results.
Table 21 shows a comparison of the effectiveness of the selected techniques for the Extra website. The initial evaluation of the Extra website using the three behavioral measurement techniques uncovered 36 problems, 28 of which were determined uniquely.
Table 21 shows that 24 of the total detected problems (66.66%) were discovered by heuristic evaluation, 5 (13.9%) by usability testing, and the remaining 7 (19.44%) using eye-tracking analysis. Of these problems, 14 problems (38.89% of total identified problems) were major, 11 problems (30.55%) were minor, 6 problems (16.66%) were catastrophic, and 5 problems (13.9%) were merely cosmetic. As per each technique, a subsequent figure illustrates how each technique was performed in regard to the severity of the detected problems for the Extra website. The highest number of problems per all severity ratings was obtained using the heuristic evaluation, whereas usability testing and eye-tracking came next with comparable numbers of detected problems.
For the minor and major severity ratings, 2 and 3, respectively, eye tracking achieved better results than usability testing. However, for the catastrophic rating of 4, usability testing detected 1 problem, but the eye-tracking technique detected none. The usability testing and heuristic evaluation yielded equal average severity of 2.6, while the average severity score of problems identified by eye tracking was 2.3. Overall, the problems were mainly related to the website functionalities, such as weak search and lack of compare and undo functionalities, and the website design, such as menu complexity, inconsistent item cards, and system response.
Figure 21 shows the severity rating for the Extra website based on the behavioral measurement techniques.
On the other hand,
Table 22 shows a comparison of the effectiveness of the selected techniques for the Jarir website. The initial evaluation of Jarir’s website using three behavioral measurement techniques uncovered 35 problems, 30 of which were determined as unique. As the table shows, 26 of the total detected problems (74.29%) were discovered using heuristic evaluation, 4 (11.43%) via usability testing, and the remaining 5 (14.28%) using eye-tracking analysis. Of these problems, 13 problems (37.13% of total identified problems) were minor, 10 problems (28.58%) were major, 7 problems (20%) were cosmetic, and 5 problems (14.29%) were catastrophic. As per each technique, a subsequent figure illustrates how each technique performed in terms of the severity of the detected problems for Jarir. The highest number of problems per all severity ratings was obtained using the heuristic approach, whereas usability testing and eye tracking came next with comparable numbers of detected problems.
Both eye tracking and usability testing achieved the same results of one and two problems each for cosmetic and minor severity ratings, respectively. However, both methods did not detect any catastrophic problems, whereas eye tracking detected two problems in the major severity rating against only one problem detected using usability testing. The highest average severity score was 2.5, obtained using heuristic evaluation, followed by eye tracking with an average severity of 2.2 and usability testing with an average severity of 2. Overall, the problems were mainly related to the website functionalities, such as lack of undo features, and the website design, such as menu complexity and the crowded homepage.
Figure 22 shows the severity rating for the Jarir website based on the behavioral measurement techniques.
6. Discussion
This research has integrated three behavioral measurement techniques: heuristic evaluation, usability testing, and eye-tracking to evaluate the website usability of Extra and Jarir. This section discusses the findings in three major aspects: first, it highlights some detected usability problems. Next, this section illustrates the major differences between the applied techniques in terms of the number of detected problems, their severity, and their nature. Finally, this section lists the strengths and weaknesses observed on each website.
As indicated in [
15], usability problems detected on e-commerce websites mostly relate to page navigation, search facilities, purchasing, consistency, design, security, and lack of certain functionalities. The findings for both websites agree with these categories, in particular, the search optimization and missing functionalities.
In general, the heuristic evaluation yielded many interesting usability problems. Among the catastrophic problems was the lack of input validation for the phone number on the Jarir website and the subsequent absence of any error messages when the verification failed. This issue may degrade the UX, especially as it comes shortly after the buying decision is made. The users may unknowingly mistype their phone number and not receive any notification of their order for no apparent reason as they received no error message.
One interesting usability problem is the restriction placed by both websites on the possible number of ordered items. For instance, on both websites, users can not order more than one expensive item, such as a smartphone or a TV. This is a critical weakness as it limits shopping behavior and may require users to order several times. In fact, such items are not commonly ordered in bulk. However, an alternative, better practice would be to display a warning message and emphasize the total cost of the order.
Moreover, both the Jarir and Extra websites lack documentation and user guides. This seems to be a common usability problem among e-commerce websites, as indicated by [
74]. Business owners and web developers may assume that users are already familiar with online shopping. Despite the recent rise in the number of online shoppers in Saudi Arabia and worldwide, many users are still inexperienced or unsure and may need to frequently access help pages.
The Extra and Jarir websites also share another usability problem, and that is the inconvenience of the slider element that moves one item at a time. Thus, if the user wishes to slide 10 items to the left, they need to click the left arrow 10 times. This may annoy users and redirect them to another navigation option.
Among the problems identified by usability testing but not detected by experts was the reappearing offers on both websites. Users were interrupted several times while performing tasks by the website’s ads, which led to user distraction and confusion. The marketing efforts in such scenarios are not only lost, but contribute to the users’ dissatisfaction. This problem was detected neither by experts nor via eye tracking. Experts may not be as annoyed with the disruption as ordinary users. On the other hand, on eye tracking, no significant fixation was recorded on the pop-up messages. The user attempted to locate the “Exit” button to close the message. Therefore, such problems are not detected by eye tracking.
Furthermore, the menu design, the lack of smart search, and the low quality of search results all led to usability problems. Despite this fact, the Jarir website performed better than the Extra website in terms of search results. These problems will hinder the user’s activity on the website and restrict their shopping experience.
The eye-tracking technique revealed that the main menu categories on the Extra website would disappear with the slightest movement of the mouse pointer. This was reflected in longer and higher density fixations around the main menu. Furthermore, the lack of an obvious comparison button was clearly illustrated in the heat map and scan path. The users fixated, searched, and revisited the same side of the page on which they expected to find the button. The smart use of color on the Jarir website and the placement of item prices in a bold, red font led to faster detection by users.
Heuristic evaluation yielded the highest number of problems, identifying 26 and 24 unique usability problems on the Extra and Jarir websites, respectively. In other words, heuristic evaluation yielded additional and more severe problems, twice the number identified by both eye tracking and usability testing combined. Eye tracking detected seven usability problems on Extra and five on Jarir websites, whereas usability testing identified four and five usability problems on the Extra and Jarir websites, respectively. This corresponds to other works, such as [
37,
57,
75]. The higher number of problems identified by heuristic evaluation can be attributed to the expertise of participants in comparison to normal users who conducted the usability testing and eye tracking. Furthermore, experts in the heuristic evaluation were free to roam the website and navigate pages, whereas users were required to perform tasks that limited the scope of usability problems they could find. For instance, in this experiment, experts uncovered usability problems related to the number of items ordered and the phone verification process. In usability testing and eye tracking, users were not required to, and thus did not, reach this stage.
Regarding the severity of identified problems, heuristic evaluation yielded the highest average severity on both websites. On the Extra website, the average severity scores of the heuristic and usability testing were similar. However, on the Jarir website, both heuristic evaluation and eye tracking achieved a higher average severity score than usability testing. This does not agree with previous works, such as [
57,
76,
77], which stated that the average severity score obtained by the usability testing is higher than that of heuristic evaluation. This difference could be due to several reasons. For instance, the nature of tasks required in usability testing did not cover all aspects examined by heuristic evaluation. Many severe usability problems were detected while placing an order or verifying location, which were beyond the scope of the required tasks.
In terms of the nature of identified usability problems, the three techniques provided significantly varying perspectives of the websites. Heuristic evaluation provided a fine-grained evaluation that scanned all aspects of the website. The participants were professional UI/UX experts able to detect cosmetic and severe problems in areas which user testing was unlikely to reach. Interestingly, the heuristic evaluation provided more design-related problems, such as menu design, item navigation, and placement, and product display, whereas usability testing and eye-tracking reported more workflow obstacles, such as the absence of comparison buttons, annoying update notifications, and search inefficiency. This observation is in line with the work of [
78].
Regarding eye tracking, the high number of fixations and long average fixation duration can be attributed to either interest in the element or inefficient search and failure to extract information. In relation to the nature of tasks, the user was unlikely to be attracted to the “Compare” button area, for instance. The high SEQ score on both websites indicated that users found the tasks relatively easy, leading to the conclusion that users were having minor difficulties with the interface. Such problems can be detected neither by heuristic evaluation nor with traditional usability testing. This makes eye-tracking more attractive as a supportive technique for usability evaluation but not a stand-alone usability evaluation technique as it does not provide sufficient information on its own. This is supported by the works of [
63,
79]. Unfortunately, eye tracking requires extensive analysis to determine the exact underlying usability problem. Furthermore, to provide an authentic assessment of usability problems, user browsing behavior should be as natural as possible. This will allow the evaluators to assess all aspects of the UX, not simply task-related. For instance, a browsing user may be distracted by a side banner or intrigued by a particular element. In task-based studies, users are focused and quite restricted in their roaming around the website.
In short, the strengths and weaknesses of the Extra and Jarir websites can be summarized as shown in
Table 23.
8. Conclusions
This work provided a comprehensive literature review on behavioral measurement techniques to identify the gap analysis. Another objective of this research was to assess the usability of different e-commerce websites in Saudi Arabia using three behavioral measurement techniques, namely: heuristic evaluation, usability testing, and eye-tracking technology, in an attempt to compare the effectiveness of these techniques in identifying and reporting the usability issues.
This research assessed two major e-commerce websites: Extra and Jarir. Many usability problems with varying severity scores were identified, and proper recommendations were provided accordingly. The heuristic evaluation technique yielded the highest number of usability problems and the highest number of severe problems, whereas usability testing provided fewer problems and most of which were already identified by the experts.
Eye tracking provided critical information regarding the page design and element placements and revealed certain user behavior patterns that indicated certain usability problems. For instance, longer fixation and retracing scan paths indicated search inefficiency and layout issues. Since eye tracking and usability testing required users to perform the same tasks, there was a significant overlap in their results. Overall, when used properly, the three behavioral measurement techniques are complementary. It is recommended to apply all these techniques if the available resources, namely budget and time, are sufficient. However, in case of time constraints and the cost of tests, heuristic evaluation by experts is enough to detect most of the usability problems on the websites.
Unfortunately, there have been many obstacles that hindered the progress of this research, some of which were circumvented. This research utilized an online subscription-based platform to conduct the eye-tracking experiment as the researcher had no access to an eye-tracking device. The device was not available at the researcher’s university, and there was no possibility of purchasing one since it was expensive. Therefore, due to time constraints, the researcher substituted the device with the online platform, which is less accurate, generates fewer metrics, and has a limited user session.
Future work can be pursued to further verify the obtained results by using a more accurate eye-tracking device in addition to including more groups of participants, more tasks to cover all the website features, and combining usability testing with eye-tracking and think-aloud methods. We acknowledge that further research on classification of the identified usability problems would strengthen the results and can be explored as a future direction. Moreover, real-life tests can also be beneficial, such as testing users with actual buying intentions. Such users will exhibit more authentic behavior towards the website design and its functionalities. Furthermore, more focus can be directed to applying the behavioral measurement techniques on other domains, such as Saudi governmental websites, as these websites are becoming more important, especially with the Saudi 2030 vision of the transformation to digitalization. Accordingly, it is essential to identify usability issues and provide recommendations to enhance the UX.