AI-Enhanced Personality Identification of Websites

Chishti, Shafquat Ali; Ardekani, Iman; Varastehpour, Soheil

doi:10.3390/info15100623

Open AccessArticle

AI-Enhanced Personality Identification of Websites

by

Shafquat Ali Chishti

¹,

Iman Ardekani

^1,2,*

and

Soheil Varastehpour

¹

School of Computing, Electrical and Applied Technology, Unitec Institute of Technology, Auckland 1025, New Zealand

²

School of Arts and Sciences, The University of Notre Dame Australia, Broadway Campus, Chippendale, Sydney, NSW 2007, Australia

^*

Author to whom correspondence should be addressed.

Information 2024, 15(10), 623; https://doi.org/10.3390/info15100623

Submission received: 5 September 2024 / Revised: 7 October 2024 / Accepted: 8 October 2024 / Published: 10 October 2024

(This article belongs to the Special Issue Recent Developments and Implications in Web Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

This paper addresses the challenge of objectively determining a website’s personality by developing a methodology based on automated quantitative analysis, thus avoiding the biases inherent in human surveys. Utilizing a database of 3000 websites, data extraction tools gather relevant data, which are then analyzed using Artificial Intelligence (AI) techniques, including machine learning (ML) and natural language processing. Four ML algorithms—K-means, Expectation Maximization, Hierarchical Agglomerative Clustering, and DBSCAN—are implemented to assess and classify website personality traits. Each algorithm’s strengths and weaknesses are evaluated in terms of data organization, cluster flexibility, and handling of outliers. A software tool is developed to facilitate the research process, from database creation and data extraction to ML application and results analysis. Experimental validation, conducted with identical training and testing datasets, achieves a success rate of up to 94% (with an Error of

\leq 50 %

) in accurately identifying website personality, which is validated by subsequent surveys. The research highlights significant relationships between website attributes and personality traits, offering practical applications for website developers. For instance, developers can use these insights to design websites that align with business goals, enhance customer engagement, and foster brand loyalty. Additionally, the methodology can be applied to creating culturally resonant websites, thus supporting New Zealand’s cultural initiatives and promoting cross-cultural understanding. This research lays the groundwork for future studies and has broad applicability across various domains, demonstrating the potential for automated, unbiased website personality classification.

Keywords:

website personality identification; machine learning algorithms; K-means algorithm

Graphical Abstract

1. Introduction

The internet plays a vital role in our daily lives, particularly through social networks where people constantly share comments, feelings, ideas, news, and pictures [1]. The internet now boasts over 1 billion indexed pages [2]. It has become a significant tool for interaction, information seeking, education, and relaxation [3]. Additionally, e-commerce websites have gained popularity [4], offering advantages like overcoming geographical limitations, lower costs, 24/7 availability, easy product comparisons, and eliminating travel time and costs for customers [5]. E-commerce websites have enabled traditional companies to communicate with customers more easily, as well as to enter new markets, increase profits, and gain competitiveness [6]. Online shopping has become a significant trend, leading to rapid growth in e-commerce [7]. The world’s economic development has entered the era of e-commerce information, transforming the global economy [8]. Because of the importance of websites in the e-commerce business, it has become vital to understand and conduct research on website personality so that the websites can be more attractively designed and aligned to their specific business requirements.

The concept of brand personality evolved from the psychological science of personality theory. Brand personality is defined as the set of human personalities associated with a brand. When applied to the online setting, brand personality is known as web brand personality or website personality [9,10]. The topic of website personality has been an established and evolving area of research for many years. It primarily focuses on the characterization and differentiation of websites based on the unique traits they project to users; for an example, see the research of [11,12]. These traits often mirror those of human personalities, enabling websites to establish more relatable and engaging experiences for visitors. The concept of website personality involves understanding how websites communicate their identity, tone, and style through various elements such as design, content, layout, and interactivity. The concept of a website’s personality has been investigated in various ways. For example, in [11,12,13], researchers developed different instruments to assess website personalities, while in [14], they investigated the usability trait of website personality. Additionally, in [4,15,16], they investigated the quality of website personality. In [11], researchers introduced a Website Personality Scale (WPS) for classifying website personality, breaking it down into various personality ‘Items’, such as ‘Satisfying’ and ‘Concise’. These ‘Items’ are grouped into personality ‘Facets’, for example, ‘Proficient’ and ‘Systematic’, which, in turn, are grouped into personality ‘Factors’ like ‘Intelligent’. In [11], data were gathered through surveys and interviews with participants. However, data collection through surveys and interviews involved human bias. For instance, the participants’ preferences for certain colors may influence their perception of a website’s personality. Surveys inherently introduce human bias due to the subjective nature of human input. The determination of a website’s personality should ideally be objective and devoid of human bias and preferences.

Most of the past work is either manually conducted through surveys based on human preferences or, when automatic, focuses on a small number of website personality aspects. To ensure unbiased website personality assessment and classification, a neutral decision maker is essential, who is unaffected by biases toward any particular website. Considering the presence of biases in human judgment, using a machine, specifically a computer, is the best approach. Computers can objectively analyze and categorize website personality based on website attributes or quantitative elements. Examples of these elements are ‘Images’ and ‘Hyperlinks’. Hence, our research aims to automatically classify website personality without the intervention of human preferences. The plan involves using the WPS-defined ‘Items’ and detecting website personality as WPS-defined ‘Facets’. The four WPS-defined ‘Facets’, i.e., ‘Confusing’, Engaging, ‘Proficient’, and ‘Systematic’, are considered in this paper.

This paper, which illuminates the relationships between the above-mentioned website attributes and website personality, can be applied in several ways. Primarily, website developers can employ the insights derived from the proposed method to tailor their websites to specific business needs. By ensuring that the online representation of a brand aligns with its intended perception, developers can enhance customer perceptions and cultivate lasting loyalty—a crucial aspect in today’s digital realm where authentic and captivating online experiences are highly valued. There is potential for application in the development of cultural websites aligned with recent international initiatives, media, and public interests. The rest of the paper is organized as follows: the rest of this section explores some fundamental concepts and introduces the research topic of Website Personality Identification. Section 2 describes the development of the methods, data and materials used for this research, including the development of a software tool for training data collection and detailing a survey conducted to collect validation data. Section 3 presents the results along with an in-depth discussion, and finally, Section 4 concludes the paper.

Website Personality Identification

A detailed literature review has been conducted to establish the relationships between WPS ‘Items’ and website attributes (‘Quantitative Elements’). For instance, to establish the effect of website ‘Images’ on the website personality, research has found that ‘Images’ are easier for users to assess and help them to further broaden their horizon on the subject by breaking the dullness and monotony of texts. This makes the website very interesting for the user and adds to the ‘Satisfying’ quality of the webpages [17].

Similarly, some other mappings are created in between WPS ‘Items’ and Website ‘Quantitative Elements’ based on literature review; these mappings are discussed below: In [18], Harpel says the most important characteristic to search for on a website is the implementation of ‘Search Boxes’. The ‘Search Boxes’ should be in an appropriate location for easy finding and have a proper color combination for the background. In [19], Prakash et al. say that the number and placement of ‘Hyperlinks’ on a page provide valuable information about the broad category the page belongs to. They have computed the ratio of the number of characters in links to the total number of characters in the page. A high ratio means the probability of the page being an ‘Informative’ page is high. In [20], Chtouki et al. discussed in their research that people comprehend and grasp the content that is shown on a website much better when it is shown visually on a ‘Video’ rather than just as text. So, a website that has ‘Videos’ and text, rather than just text, will be more ‘Informative’ for the user since they will be able to understand more information, and, in turn, the website will be more ‘Informative’. In [19] Prakash et al. say that the amount of text on a page gives an indication of the type of page; generally information and personal homepages are sparse in text compared to research pages. In [21], Sun et al. presented that a Webpage is ‘Concise’ if it is less dense. The webpage ‘Density’ is dependent on the number of characters and number of tags. If the number of characters is very large when compared to the number of tags, the ‘Density’ will be higher, and vice versa. As per [22], the ‘Density’ can be calculated as per the method described here. First, the number of characters (‘A’, ‘3’, ‘@’ etc.), say C, on the web page, is calculated, then the number of tags (‘img’, ‘a’, etc.), say n, on the web page, is calculated, the web page ‘Density’ is obtained by the formula (‘Density’ = C/n). In [23] Reinecke et al. also discussed about a website’s ‘Colors’ effects. They say that colorfulness and visual complexity play an important role in how much a website is ‘Attractive’ to the users. In [24], Hernandez et al. discussed that speed is a very important factor in e-commerce. A slow website may impede both the decision to purchase and the decision to complete the transaction online. The research [25] is also about the website ‘Loading Time’. Here, Gehrke and Turban have conducted a user survey, and based on the results achieved, they suggested that page loading speed is the most important category. Users no longer want glitter—they want content and service, and they want it ‘Fast’. This demand will continue to drive website design toward speed, navigation efficiency, simplicity, and elegance with an emphasis on customer focus and security. In [26], Fallahnezhad et al. talk about text available on webpages. The research looked at the text analysis for the classification of website personality. Automatic recognition of a speaker’s personality has attracted many researchers recently, and the Five-Factor Personality Model has become the prevailing model of a general personality structure. In [27], Singh et al. carried out sentiment analysis; they say it is a challenging task to identify any user’s hidden sentiment through his/her text review. The Tone Analyzer service was used in the detection of sentiments from the text of website pages. In [28], Marouf et al. discussed that the Tone Analyzer service acts as a general-purpose endpoint to analyze the tone of any written text. It assists in understanding emotional tones in communications. The Tone Analyzer service is provided in IBM Watson Applications, which is an AI platform for business and multidisciplinary applications [29]. In [30], Baker conducted a study where he created a ’Joy Program’ aimed at attracting and engaging post-modern young adults in religious studies by incorporating elements of ‘Joy’. His research revealed that individuals exhibited a greater attraction to learning through the ’Joy Program’ compared to conventional, straightforward study methods. In the research [31], the study explores the connections between ‘Discouragement’ and ‘SADNESS’. The research delves into how parents’ ‘Discouragement’ and support of their children’s expression of ‘Sadness’ relate to various indicators of internalizing behaviors during middle childhood. The findings substantiate a link between discouraging responses and the manifestation of ‘Sadness’ in children. In [32], Angelika and Jessie conducted a study to explore the relationship between ‘Irritating’ stimuli and the emotion of ‘Anger’. They employed ‘Irritating’ noise as a means to induce ‘Anger’ and observed that the presence of such annoying and harsh sounds heightened the experience of ‘Anger’ in their research participants. In [33], Michael and Gerhard say Individuals exhibiting high levels of ‘Anger’ tend to perceive a broader spectrum of situations as triggering ‘Anger’, encompassing experiences that are perceived as annoying, ‘Irritating’, or frustrating. In [34], Robin conducted research to investigate the connection between ‘Irritating’ and ‘Disgust’. His conclusion indicates that, in everyday language usage, ‘Disgust’ seems to encompass not only things that are repellent but also those that are ‘Irritating’ or annoying. The research [35], focuses on the interplay between ‘Irritating’ factors and ‘Fear’. According to the researchers, the experience of living with ‘Fear’ significantly alters individuals’ behavior across various aspects of their lives. This ‘Fear’ not only impacts their professional demeanor but also necessitates a revaluation of their priorities. In essence, the researchers note that the persistent state of ‘Fear’ can be genuinely ‘Irritating’.

Based on the above-discussed literature, mappings have been created between WPS ‘Items’ and Website attributes (‘Quantitative Elements’). These created mappings can be seen in Figure 1. Here, you can observe the connections between the ‘Quantitative Elements’ of the website and the ‘Items’ defined in the WPS. These ‘Items’ within the WPS are grouped together to create ‘Facets’. For instance, the ‘Items’ ‘Searchable’, ‘Informative’, and ‘Satisfying’ are combined to form the ‘Facet’ known as ‘Proficient’ as depicted in Figure 1. Utilizing the mappings established, this paper will employ them to detect the website personality across four WPS-defined ‘Facets’: ’Confusing’, ’Engaging’, ’Proficient’, and ‘Systematic’.

AI techniques have been leveraged to extract insights from the collected data, thereby reducing the need for human intervention in data extraction processes. AI encompasses a range of technologies imbued with intelligent capabilities that can imitate, extend, and enhance human intelligence [36]. AI is a broad concept referring to the use of computer systems to mimic intelligent behavior with minimal human intervention. The origins of AI are commonly traced back to the creation of robots [37], and it remains a prevailing theme in today’s world, frequently discussed by technologists, academics, journalists, and venture capitalists alike [38]. This paper’s aim is to employ ML techniques in the development of modules for the detection of websites’ personalities based on their personality traits. ML is regarded as a branch of AI that deals with creating algorithms that teach computers new behaviors based on input [39]. ML primarily focuses on creating computer programs that can access data and learn on their own. To make better decisions, this process begins with data evaluation and pattern search [40]. ML uses a variety of techniques to create mathematical models and make predictions based on previous information or data. Some usual tasks performed by AI algorithms are classification, regression, clustering, or pattern recognition within a large dataset [41]. Currently, ML is utilized for many different things, for example in healthcare field [42,43,44,45], in Soil and Mineral analysis [46,47], in analyzing driving styles [48], and in handwritten recognition [41] etc., including recommender systems, email filtering, Facebook auto-tagging, image identification, and speech recognition [49]. ML is broadly divided into two categories: supervised ML and unsupervised ML [50]. The set of algorithms in which we use a labeled dataset is called supervised learning. The set of algorithms in which we use an unlabeled dataset is called unsupervised learning [51,52]. In unsupervised ML techniques, the system does not seek to identify specific outputs but rather seeks to uncover insights and observations within the dataset, aiming to reveal hidden patterns from unlabeled data [40]. This research utilizes unsupervised ML techniques due to the absence of labeled data for training. Clustering methods have been chosen to identify hidden patterns in the data for website personality identification. The clustering methods fall into four categories, as shown in Figure 2 [53].

Four distinct algorithm techniques have been chosen to create clusters, ensuring representation from each clustering method: K-means (from Partition-based), Agglomerative (from Hierarchical), Expectation Maximization (from Model-based), and DBSCAN (from Density-based). This selection aims to demonstrate website identification results in four diverse approaches.

2. Materials and Methods

2.1. Developing Data Collection Software Tool

A software tool is developed to streamline the research process, encompassing website downloading, data maintenance, and data extraction using data scrapers. It utilizes the ’Wget’ command for website downloading. The tool integrates with JSoup, Tone analyzer service, and Selenium WebDriver to extract ‘Quantitative Elements’ details from the downloaded website.

A dataset has been prepared by downloading 3000 websites (chosen from five different categories, i.e., academia, banks, e-commerce, news, and sports). A systematic approach was employed to achieve this, involving the extraction of website names and their corresponding URLs from multiple online sources, including Wikipedia. Custom programs were developed to automate this extraction process efficiently. These programs utilized the ‘Wget’ command to facilitate the retrieval of website data, streamlining the collection process and ensuring consistency across the extracted information. The ‘Wget’ command is capable of downloading or mirroring an entire website onto the computer where the command is executed. It can also fetch specific files from a website’s hierarchy [54]. The ‘Wget’ command offers the functionality to specify the depth of directories that can be downloaded from a website. In this research, the ‘Wget’ command is configured to download directories up to two levels deep consistently for each website, ensuring that only files within these levels are included. Each website is downloaded in a unique folder. However, the command is limited in that it does not retrieve any content beyond the second directory level. As a result, files or folders located deeper than this second level are excluded from the scope of this research, ensuring uniformity across all sampled websites. The downloaded websites are then split into two datasets: a training dataset of 2700 websites and a test dataset of 300 websites. Then, data scraping was used to extract the ‘Quantitative Elements’ details from the downloaded websites. Data scraping involves the extraction of useful data from a given electronic file using a software program [55]. Methods currently used for data extraction mostly involve heuristics to extract certain features of the document, for example, the number of ‘Hyperlinks’ present, the page’s text ‘Density’, etc. [56]. In this research, the ‘Quantitative Elements’ details of websites have been extracted by using three techniques. A JSoup data scraper is used, which is an open-source data scraper. JSoup is an HTML parser that works with HTML pages in order to extract data regarding tags on the page and other HTML structural information [57]. To perform the required functions of scraping, it employs the usage Domain Object Model, Cascading Style Sheets, and jQuery-like methods [58]. In this research, the JSoup is utilized to scrape the downloaded websites, retrieve the quantitative values of the required ‘Quantitative Elements’ present on those websites, and save them on a database. Selenium WebDriver is used as another technique to extract information from websites. Selenium WebDriver stands out as a widely embraced choice for evaluating web applications [59]. Within the context of this research, the research has employed the Selenium web driver to document the ‘Loading Time’ required for each website and to determine the quantity of ‘Colors’ featured on the homepages of these websites. The WebDriver is used to load the website in a web browser and subsequently calculate its ‘Loading Time’. To determine the quantity of ‘Colors’, it loads the website, takes a screenshot, and then the tool analyzes the screenshot to observe the ‘Colors’ of the pixels. These ‘Colors’ pixel counts are subsequently saved in the database for each website. This research also utilizes IBM Watson’s Natural Language Understanding service to analyze emotions within a provided text on websites. IBM Watson offers various AI Services, one of which is Tone Analyzer Service. The Tone Analyzer service is provided in IBM Watson Applications, which is an AI platform for business and multidisciplinary applications. This service acts as a general-purpose endpoint to analyze the tone of any written text. It assists in understanding emotional tones in communications [28]. This AI-based technique is used in this research to extract emotions from the text available on the websites. For input handling, the code truncates the input text to 10,000 characters if it exceeds this limit, ensuring efficiency in subsequent analysis. The IBM Watson setup involves authentication with an API key and configuring the service with the relevant version and URL. The service is configured for emotion and sentiment analysis on the text. An ‘Analyze Options’ instance facilitates the extraction of emotion scores (‘Anger’, ‘Disgust’, ‘Fear’, ‘Joy’, ‘Sadness’), offering insights into the emotional nuances within the text. The source code processes the extracted emotion scores and stores them in a database table.

Following the extraction of data, the tool interfaces with the WEKA (Waikato Environment for Knowledge Analysis) tool to create clusters, employing four ML algorithms (i.e., K-means, EM, HAC, and DBSCAN) from the WEKA platform. Subsequently, the tool assigns ratings using WEKA and generates four distinct modules (each corresponding to one ML algorithm), utilizing the developed training dataset. These modules are then tested based on the created testing dataset and detect the website personality for their four WPS-defined ‘Facets’, i.e., ‘Confusing’, ‘Engaging’, ‘Proficient’, and ‘Systematic’. The tool performs various calculations, generates graphs, and produces result reports. Additionally, the tool manages the survey by recording surveyors’ details, randomly assigning websites to the surveyors, and maintaining surveyors’ ratings for the websites. It calculates the Error (in %) between survey ratings and developed modules ratings and helps in the analysis of the acquired results. Therefore, the developed software tool serves as the backbone of this research, facilitating various stages of the research.

2.1.1. Training Data Processing (Clusters Creation)

Each identification algorithm is constructed using a training dataset comprising 2700 websites through the WEKA 3 tool. Various ‘k’ values, ranging from ‘k2’ to ‘k15’ are applied during the processing of the training data for all modules except DBSCAN (DBSCAN autonomously generates clusters based on specified parameters such as epsilon (‘Eps’) distance and minimum data points (‘MinPts’), taking into account the density of the provided data). Ratings are given to each training data website for its ‘Quantitative Elements’ so that websites can be compared with each other. For each ‘Quantitative Element’, the values of the training websites are passed through the WEKA Normalization filter to normalize the values. WEKA returns an array of converted data, i.e., the values are normalized from 0 to 10 scaled. Then the rating for each ‘Item’ in each cluster is allocated by calculating the average ratings of all ‘Quantitative Elements’ corresponding to the ‘Item’. For example, as shown in Table 1 highlighted with bold fonts, the ‘Item’ ‘Informative’ rating is calculated by calculating average ratings of their corresponding ‘Quantitative Elements’ ratings (i.e., ‘Hyperlinks’, ‘Videos’, and ‘Wordcount’) for cluster 0. This is performed for all ‘Items’ of each cluster. Then the rating for each ‘Facet’ is allocated by calculating the average ratings of all ‘Items’ corresponding to that ‘Facet’. For example, as shown in Table 2 highlighted with bold fonts for the ‘Facet’ ‘Proficient’ rating is calculated by calculating average ratings of their corresponding ‘Items’ ratings (i.e., ‘Informative’, ‘Satisfying’ and ‘Searchable’) for cluster 0. This is also performed for all ‘Facets’ of each cluster.

As a result of these calculations, each cluster obtains ratings for its ‘Quantitative Elements’, ‘Items’, and ‘Facets’.

2.1.2. Elbow Creation

To identify the most effective clustering among the 14 cluster groups (from ‘k2’ to ‘k15’), the Elbow method is employed for all modules except DBSCAN. In the process of the Elbow method, centroids are computed for each cluster within the ‘k2’ to ‘k15’ range. Subsequently, Euclidean distances are calculated between each website and the centroid of its respective cluster. The Elbow method utilizes these distances to ascertain the optimal value for clustering, which is found as ‘k5’ for all three modules. For example, in the K-means module, based on the Elbow observed in the plotted graph in Figure 3, it is evident that the data point ‘k5’ is situated at the maximum distance from the Line PQ. Consequently, the K-means value ‘k5’ is identified as the optimal clustering choice and selected it for K-means module creation. (The Elbow point is highlighted with red color for easy identification).

2.1.3. Test Data Processing

Each of the four modules is tested with the identical 300 downloaded websites. The tool proceeds to calculate the Euclidean distance, employing all ‘Quantitative Element’ values of the testing website against the midpoint ‘Quantitative Element’ values of each training website cluster of ‘k5’. The ‘Item’ and ‘Facet’ ratings of the closest training cluster are then assigned to the testing website’s ‘Item’ and ‘Facet’ ratings. A detailed result report is generated based on these testing website ratings, as shown in Table 3.

Based on the results provided by the module (as per the ’Conclusion’ mentioned in the Table 3, the test website ’Macquarie University’ has the following ratings: ‘Confusing’: 6.43 ‘Engaging’: 5.27 ‘Proficient’: 9.41 ‘Systematic’: 3.61 Using these ‘Facets’ ratings, we can estimate the extent to which a given website is perceived as ‘Confusing’, ’Engaging’, ‘Proficient’, and ‘Systematic’. Hence, through the assignment of ‘Facets’ ratings, the developed modules are capable of identifying the personality of a given test website at their ‘Facets’ level. The validation of these results from the modules is carried out through a survey.

A dataset used in this research is stored in a private GitHub repository. The data presented in this study are available on request from the corresponding author.

2.2. Developing Survey for Validating Results

To the best of our knowledge, there is no dataset available for the website personality traits. There are no websites available that are identified for their personality traits. This is the first time that research is going to classify websites for their personality traits. Hence, to verify the results, a human survey has been conducted. Human surveys are vital in research, providing essential data. However, the survey method should avoid being influenced by a single participant’s preferences. To address this, the research survey rates each website’s ‘Facets’ based on input from three different surveyors, and the final rating is calculated as the average of these ratings. This approach ensures a more balanced and objective assessment. So, the classification of a single website for a single trait does not depend on one single participant but depends on the ratings of multiple participants. Hence, it reduces the chances of a single person’s preferences (likes and dislikes) affecting the results. The same 300 websites used in the survey were used as test datasets during the testing of four modules, so the testing results achieved can be verified by the survey results. The survey is conducted by 90 volunteer graduates. The tool randomly assigns ten websites to each surveyor. Each surveyor rate for the four WPS ‘Facets’ (i.e., ‘Proficient’, ‘Systematic’, ‘Confusing’, and ‘Engaging’) of their assigned website. Each website is assigned to 3 different surveyors to examine, and the average of the three surveyors’ ratings is used as the final ratings for the website. Hence, the ratings for the websites are not dependent on one individual. An online Google survey form has been developed for the survey. The form has ten website links and columns for all ‘Facets’ to mark their ratings. Brief explanations for each of these four ‘Facets’ were also provided in the form so the surveyors can understand what they are supposed to examine on the websites. Surveyors are instructed to consider these explanations while evaluating and rating an assigned website for their ‘Facets’ on a rating scale ranging from 0 to 10, where 0 represents the lowest level and 10 signifies the highest. These outlined characteristics serve as the ‘Quantitative Elements’ employed across the four developed modules to assess a test website for the same ‘Facets’, utilizing the same rating scale (0 to 10), thereby facilitating the verification of module ratings based on the survey ratings.

The outcomes obtained from the survey indicate a substantial number of records having differences in ratings between the survey and developed modules. However, it is essential to note that this does not imply the complete inaccuracy of the developed modules. Some reasons may be considered for creating these errors.

First of all, the presence of errors during the survey and data collection process is inherent in any survey. Such errors are typical, and consequently, achieving perfect alignment is not expected.

Secondly, upon examining the ratings provided by surveyors, a substantial variability is evident in the ratings assigned by different surveyors to the same website (as discussed earlier, each website was evaluated by three surveyors). Personal biases and subjective judgments can influence individual assessments of websites. For instance, one surveyor might perceive a website favorably due to personal attachment or other subjective factors, even if objective criteria suggest otherwise. The variances can be seen in Figure 4, which depicts the percentage of ‘Facets’ records falling within three levels of variance: 5, 10, and 15. It can be seen here, at a variance level of 15, the ’Facet:’ ’Systematic’ has the highest percentage of records, with ’Engaging’ and ’Proficient’ having identical percentages, and ’Confusing’ has the lowest. At variance level 10, ’Systematic’ still leads, but ’Engaging’ drops below ’Proficient’, with ’Confusing’ remaining the lowest. At variance level 5, ’Systematic’ and ’Proficient’ have nearly equal percentages, ’Engaging’ remains lower than both, and ’Confusing’ still has the lowest. Overall, the variance in survey ratings most significantly affects ’Confusing’, has a moderate impact on ’Engaging’, and has the least impact on ’Systematic’ and ’Proficient’.

Thirdly, the accuracy of the developed modules is also not guaranteed as perfect. There may be some errors and omissions on a low scale, which may play a part in causing differences in survey and module ratings.

3. Results

The comparisons of results achieved from the developed four modules and the conducted survey are presented here, focusing on two aspects: Facet-wise comparison and Module-wise comparison. In the Facet-wise comparison, the results are analyzed based on the individual personality ‘Facets’, examining how each ‘Facet’ is assessed by the different modules. This comparison provides insights into the consistency or variability of results across ‘Facets’. On the other hand, the Module-wise comparison evaluates the performance of each module across all ‘Facets’, highlighting any patterns or discrepancies in their assessments. The survey ratings for ‘Facets’ are compared with each of the four module ratings for a total of 1200 ‘Facets’ records (300 for each of the four ‘Facets’: ‘Confusing’, ‘Engaging’, ‘Proficient’, and ‘Systematic’) to identify differences between survey and module ratings. Error (in percentages) are calculated for the differences between survey and module ratings. Let E represent the Error (%), MR represents the Module Rating, and SR represents the Survey Rating. The equation can be written as:

E = (| MR - SR |) / SR \times 100

(1)

The above-mentioned equation is employed to calculate the Error in (%) for all four developed modules (i.e., K-means, EM, HAC, and DBSCAN). In Figure 5 and Figure 6, the bar chart illustrates the percentage of the number of ‘Facets’ records that have an Error

\leq 50 %

, out of the total 1200 ‘Facets’ records.

3.1. Facet-Wise Comparison

In Figure 5, the ‘Facet’ ‘Systematic’ has the maximum percentage of records across all four modules. This indicates that the modules detected this ‘Facet’ most accurately. Specifically, K-means achieved 87.33%, EM achieved 71.67%, HAC achieved 85%, and DBSCAN achieved an impressive 94%. This suggests a high level of precision in the mappings created and the surveyors’ understanding of the rating requirements for the ‘Facet’ ‘Systematic’.

On the other hand, the ‘Facet’ ‘Proficient’ has the lowest percentage of ‘Facet’ records except DBSCAN, indicating that the modules detected this ‘Facet’ with the least accuracy. This could be attributed to weaknesses in the mappings or a lack of understanding among surveyors regarding the rating requirements.

For the other two ‘Facets’: ‘Confusing’ and ‘Engaging’, the accuracy is notably better than for the ‘Facet’ ‘Proficient’ but still lower than the ‘Facet’ ‘Systematic’. Additionally, within these two ‘Facets’: ‘Engaging’ exhibits better results compared to ‘Confusing’ for all modules except DBSCAN.

3.2. Module-Wise Comparison

In Figure 6, it is evident that overall, DBSCAN predictions outperform other modules for all ‘Facets’ except for ‘Engaging’, where it has the lowest percentage of records.

Conversely, the EM module consistently has the lowest percentage of records across all four ‘Facets’, except for ‘Engaging’, where it achieves the highest percentage of records as 71.67%, and ‘Proficient’ where it achieves a higher percentage of records as compared to HAC module. The K-means and HAC modules exhibit comparable performance across all four ‘Facets’, showcasing results superior to EM, with the exception of the ‘Facet’ ’Engaging’, where they display a higher percentage of records than both K-means and HAC and the ‘Facet’ ’Proficient’, where they outperform HAC. However, their performance remains inferior to that of DBSCAN. Hence, it can be deduced that the DBSCAN module emerges as the most reliable for website personality detection.

This paper serves as a foundation for individuals or organizations aiming to create diverse cultural platforms online. By providing insights into website attributes such as logos, imagery, content, and features, developers can utilize this research to create culturally resonant websites that contribute to fostering cross-cultural understanding in our globally interconnected digital world. This study provides valuable insights for website developers tasked with creating websites utilized for ranking purposes. Such purposes may include websites designed to evaluate the performance of other websites [60] or to rate various businesses such as restaurants [61], stores, vehicles, etc. Possibly developers stand to benefit from this research by gaining guidance on how to tailor a company’s website to meet its specific business requirements based on the relationships between website attributes and website personality. This research is providing the foundation for the people who want to follow these up. This research lays the groundwork for cross-sectoral application in the contemporary AI-driven era, spanning domains such as business, education, and culture. The findings and insights derived from this research serve as a valuable resource across diverse fields, offering a foundation for further exploration, application, and impactful interventions in varied contexts.

3.3. Contributions

This paper makes several contributions to the field of website personality detection through the following four key aspects:

‘Facets’ Analysis for Website Personality Detection: This paper focuses on analyzing website personality through the lens of four distinct ‘Facets’: ‘Confusing’, ’Engaging’, ’Proficient’, and ‘Systematic’. Each ‘Facet’ encompasses individual or groups of ‘Items’, which are mapped to the ‘Quantitative Elements’ of websites. This approach enables the derivation of results through quantitative analysis, providing insights into the personality ‘Facets’ exhibited by websites.
Development of Modules Based on Algorithms: Four modules are developed using different clustering algorithms to facilitate website personality detection. These algorithms include K-means, Hierarchical Agglomerative Clustering (HAC), Expectation Maximization (EM), and Density-Based Spatial Clustering of Applications with Noise (DBSCAN). By employing a diverse set of clustering methods, the research ensures that website identification results are presented in multiple ways, thereby enhancing the robustness and reliability of the outcomes.
Development of Appropriate Software Tool: A comprehensive software tool is developed to support various aspects of the research process. This tool is integrated with external libraries such as JSoup, WEKA, Selenium WebDriver, and IBM Tone Analyzer Service. The tool offers functionalities for website downloading, data mining, data maintenance, calculations, graph generation, report creation, survey conduction, and result derivation.
Dataset Creation: To the best of our knowledge, no dataset currently in existence offers complete information about websites, including their names, linkages, and characteristics. As a result, we selected 3000 websites that were chosen from five different categories: academia, banks, e-commerce, news, and sports. The software has been developed to identify the names of websites in the selected categories, with an emphasis on English-language websites. After that, these websites were specifically downloaded and analyzed for this research. This took several months to complete. The procedure produced several intermediate files, which then resulted in files that displayed the finished data.
The dataset used in this research is available on a GitHub repository, with the link provided in Section 3. The development of this dataset on the GitHub repository is an ongoing process.
Website Personality Analysis: Leveraging the aforementioned contributions, the research successfully analyses website personality across the identified ‘Facets’. By utilizing the developed modules, software tool, and the developed dataset, the study examines the ‘Facets’: ‘Confusing’, ‘Engaging’, ‘Proficient’, and ‘Systematic’ based on the website ‘Quantitative Elements’. This comprehensive approach enables a nuanced understanding of website personality traits, contributing to advancements in the field.

These contributions collectively enhance the understanding and application of website personality analysis methodologies.

4. Conclusions

The comparison and analysis conducted for Error

\leq 50 %

results aimed to uncover hidden insights within the obtained results. Across all modules, regardless of the Error percentage considered, the highest percentage of records consistently pertained to the ‘Facet’ ‘Systematic’, while the lowest percentage of records was consistently associated with the ‘Facet’ ‘Proficient’, (except in the case of DBSCAN where this trend was not observed). This suggests that the developed modules demonstrate a higher degree of accuracy in detecting the website personality ‘Facet’ ‘Systematic’, compared to the other ‘Facets’ examined in this research. On the flip side, when examining module comparison across the four ‘Facets’ for Error

\leq 50 %

results, the ratio of the percentage of records among the ‘Facets’ remained relatively consistent for each module. The ‘Facet’ ‘Systematic’ consistently exhibited the highest percentage of records, while the ‘Facet’ ‘Proficient’ consistently demonstrated the lowest percentage of records for each module except for DBSCAN. Among the four modules, DBSCAN demonstrated the highest accuracy in assessing three personality ‘Facets’: ’Systematic’, ’Proficient’, and ’Confusing’. Conversely, the EM module achieved the highest accuracy in analyzing the ‘Facet’ ’Engaging’. Consequently, the DBSCAN module, developed based on the DBSCAN algorithm, emerges as the most dependable module for detecting website personality ‘Facets’, boasting an accuracy rate as high as 94%.

Future research could prioritize the development of specialized libraries to enhance the detection of website personality, building on the current use of tools such as JSoup and WEKA. A significant challenge in this field involves mapping the ’Items’ from the Web Personality Scale (WPS) to the ’Quantitative Elements’ of a website. This is particularly difficult because some ’Items’ have a subjective nature, which complicates direct correlations. This paper covers thirteen ’Quantitative Elements’, eight ’Items’, and four ’Facets’. These choices are constrained by time and the lack of quantifiable elements for certain ’Items’. Future work should incorporate a greater number of ’Quantitative Elements’ to enhance the accuracy and thoroughness of reporting for each ’Item’ and ’Facet’.

Establishing new mappings between ’Quantitative Elements’ and ’Items’ can improve the explanations of ’Facets’ and the related survey questions. This, in turn, would increase the accuracy of survey results. An iterative process of refining and expanding the surveys will lead to a more sophisticated and precise module for detecting website personality. Feedback from these surveys will enhance the accuracy and reliability of the findings, thus contributing to a deeper understanding of website personality and the development of more accurate detection models.

Website developers can utilize the insights gained from this research to create websites that embody the values of Indigenous communities and to develop customized e-commerce websites. It offers initial guidance for creating culturally sensitive online platforms and effective strategies that align with business goals. The findings serve as a valuable resource for further exploration and impactful interventions across various fields in the AI-driven era, including business, education, and culture.

Author Contributions

Conceptualization, S.A.C. and I.A.; methodology, S.A.C. and I. A; software, S.A.C.; validation, S.A.C.; formal analysis, S.A.C.; investigation, S.A.C.; data curation, S.A.C.; writing—original draft preparation, S.A.C.; writing—review and editing, S.A.C., I.A. and S.V.; visualization, S.A.C. and I.A.; supervision, I.A. and S.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study did not require ethical approval.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Asadzadeh, L.; Rahimi, S. Analyzing Facebook Activities for Personal Recognition. In Proceedings of the 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico, 18–21 December 2017; pp. 961–964. [Google Scholar] [CrossRef]
Xu, H. Website Link Structure Optimization Based on SEO Algorithm. In Proceedings of the 2022 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC), Dalian, China, 14–16 April 2022; pp. 1300–1303. [Google Scholar] [CrossRef]
Lu, H.; Na, W.; Wenfa, Z. Personality and Internet Use A Meta-Analysis. In Proceedings of the 2021 4th International Conference on E-Business, Information Management and Computer Science (EBIMCS), Hong Kong, China, 29–31 December 2021; pp. 279–286. [Google Scholar] [CrossRef]
Li, X.; Liu, L.; Fan, Z.; Li, W. A Quantitative Approach In heuristic Evolution of E-Commerce Websites. Int. J. Artif. Intell. Appl. 2018, 9, 1–13. [Google Scholar] [CrossRef]
Sanyala, S.; Hisamb, M.W. Factors Affecting Customer Satisfaction with Ecommerce Websites—An Omani Perspective. In Proceedings of the 2019 International Conference on Digitization (ICD), Sharjah, United Arab Emirates, 18–19 November 2019; pp. 232–236. [Google Scholar] [CrossRef]
Lee, M.; Lee, H.Y.; Yoon, M. Website development strategy for e-Commerce success. In Proceedings of the 40th International Conference on Computers & Indutrial Engineering, Awaji, Japan, 25–28 July 2010. [Google Scholar] [CrossRef]
Zhang, X. Content-based E-commerce Image Classification Research. IEEE Access 2020, 8, 160213–160220. [Google Scholar] [CrossRef]
Liu, Y.; Li, S. Research on Marketing Strategy of Network Womenswear Brand Based on Big Data Statistics. In Proceedings of the 2019 34th Youth Academic Annual Conference of Chinese Association of Automation (YAC), Jinzhou, China, 6–8 June 2019; pp. 90–94. [Google Scholar]
Aaker, J.L. Dimensions of Brand Personality. J. Mark. Res. (JMR) 1997, 34, 347–356. [Google Scholar] [CrossRef]
Ho, J.S.Y.; Chew, K.; Khan, N. Humanizing websites: Website personality for E-services. In Proceedings of the IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), Singapore, 6–12 December 2015; pp. 621–625. [Google Scholar] [CrossRef]
Chen, Q.; Rodgers, S. Development of an Instrument to Measure Web Site Personality. J. Interact. Advert. 2006, 7, 47–64. [Google Scholar] [CrossRef]
Ham, C.; Lee, H. Internet media personality: Scale development and advertising implications. Int. J. Advert. 2015, 34, 327–349. [Google Scholar] [CrossRef]
Katerattanakul, P.; Siau, K. Measuring Information Quality of Web Sites: Development of an Instrument. In Proceedings of the 1999 20th International Conference on Information Systems, Charlotte, NC, USA, , 12–15 December 1999; pp. 279–285. [Google Scholar]
Kaur, S.; Kaur, K.; Kaur, P. An Empirical Performance Evaluation of Universities Website. Int. J. Comput. Appl. 2016, 146, 10–16. [Google Scholar] [CrossRef]
Jayanthi, B.; Krishnakumari, P. An Intelligent Method to Assess Webpage Quality using Extreme Learning Machine. Int. J. Comput. Sci. Netw. Secur. 2016, 16, 81–85. [Google Scholar]
Anusha, R. A Study on Website Quality Models. J. Sci. Res. Publ. 2014, 4, 1–5. [Google Scholar]
Jiang, N.; Feng, X.; Liu, H.; Liu, J. Emotional design of web page. In Proceedings of the 9th International Conference on Computer-Aided Industrial Design and Conceptual Design, Kunming, China, 22–25 November 2008; pp. 91–95. [Google Scholar] [CrossRef]
Harpel, P. Library Homepage Design at Medium-sized Universities: A Comparision to Commercial Homepages via Nielson and Tahir. OCLC Syst. Serv. 2005, 21, 193–208. [Google Scholar] [CrossRef]
Asirvatham, P.A.; Ravi, K.R. Web Page Categorization based on Document Structure; International Institute of Information Technology: Hyderabad, India, 2001. [Google Scholar]
Chtouki, Y.; Harroud, H.; Khalidi, M.; Bennani, S. The impact of YouTube videos on the student’s learning. In Proceedings of the 2012 International Conference on Information Technology Based Higher Education and Training (ITHET), Istanbul, Turkey, 21–23 June 2012. [Google Scholar] [CrossRef]
Sun, F.; Song, D.; Liao, L. DOM Based Content Extraction via Text Density. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China, 24–28 July 2011; pp. 245–254. [Google Scholar]
Chishti, S.; Li, X.; Sarrafzadeh, H. Identify Website Personality by Using Unsupervised Learning Based on Quantitative Website Elements. In Proceedings of the International Conference on Neural Information Processing, Istanbul, Turkey, 9–12 November 2015; pp. 522–530. [Google Scholar] [CrossRef]
Reinecke, K.; Yeh, T.; Miratrix, L.; Mardiko, R.; Zhao, Y.; Liu, J.; Gajos, K. Predicting Users’ First Impressions of Website Aesthetics with a Quantification of Perceived Visual Complexity and Colorfulness. In Proceedings of the CHI 2013: Changing Perspectives, Paris, France, 27 April–2 May 2013; pp. 2049–2058. [Google Scholar] [CrossRef]
Hernandez, B.; Jimenez, J.; Martin, M.J. Key website factors in e-business strategy. Int. J. Inf. Manag. 2009, 29, 362–371. [Google Scholar] [CrossRef]
Gehrke, D.; Turban, E. Determinants of Successful Website Design: Relative Importance and Recommendations for Effectiveness. In Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences, Maui, HI, USA, 5–8 January 1999; pp. 1–8. [Google Scholar] [CrossRef]
Fallahnezhad, M.; Vali, M.; Khalili, M. Automatic Personality Recognition from Reading Text Speech. In Proceedings of the Iranian Conference on Electrical Engineering (ICEE), Tehran, Iran, 2–4 May 2017; pp. 18–23. [Google Scholar] [CrossRef]
Singh, P.K.; Sharma, S.; Paul, S. Identifying Hidden Sentiment in Text Using Deep Neural Network. In Proceedings of the 2nd International Conference on Data, Engineering and Applications (IDEA), Bhopal, India, 28–29 February 2020. [Google Scholar] [CrossRef]
Marouf, A.; Hossain, R.; Sarker, M.R.K.R.; Pandey, B.; Siddiqui, S.M.T. Recognizing Language and Emotional Tone from Music Lyrics using IBM Watson Tone Analyzer. In Proceedings of the 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, India, 20–22 February 2019. [Google Scholar] [CrossRef]
Ralston, K.; Chen, Y.; Isah, H.; Zulkernine, F. A Voice Interactive Multilingual Student Support System using IBM Watson. In Proceedings of the 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019; pp. 1924–1929. [Google Scholar] [CrossRef]
Baker, D. An Evaluation of the Effectiveness of the Experiencing the Joy Program in Attracting and Connecting with Postmoderns in the Richmond Hill Seventh-Day Adventists Church; Andrews University: Aurora, ON, Canada, 2011. [Google Scholar]
Howard, K. (University of Memphis Digital Commons. Memphis, Tennessee). Supporting the Expression of Sadness: A Moderator in the Association between Parents’ Discouragement of Sadness and Child Internalizing Symptoms. 2010. Available online: https://digitalcommons.memphis.edu/etd/138 (accessed on 1 March 2024).
Seidel, A.; Prinz, J. Sound morality: Irritating and icky noises amplify judgments in divergent moral domains. Cognition 2013, 127, 1–5. [Google Scholar] [CrossRef] [PubMed]
Potegal, M.; Spielberger, C.; Stemmler, G. International Handbook of Anger, 1st ed.; Springer: New York, NY, USA, 2010; pp. 407–408. [Google Scholar]
Nabi, R. The theoretical versus the lay meaning of disgust: Implications for emotion research. Cogn. Emot. 2002, 16, 695–703. [Google Scholar] [CrossRef]
Goldsmith, B. Dealing with fear in the workplace. Cost Eng. 2002, 44, 39. [Google Scholar]
Chen, J.; Yang, P.; Liang, Y. Big Data Mining Algorithm of Internet of Things Based on Artificial Intelligence Technology. In Proceedings of the 2nd International Conference on Artificial Intelligence and Blockchain Technology (AIBT), Zibo, China, 2–4 June 2023; pp. 113–118. [Google Scholar] [CrossRef]
Hamet, P.; Tremblay, J. Artificial intelligence in medicine. Metabolism 2017, 69, S36–S40. [Google Scholar] [CrossRef] [PubMed]
Jordan, M. Artificial Intelligence The Revolution has not happened yet. Harv. Data Sci. Rev. 2019, 1.1, 1–9. [Google Scholar] [CrossRef]
Chitralekha, G.; Roogi, J.M. A Quick Review of ML Algorithms. In Proceedings of the 2021 6th International Conference on Communication and Electronics Systems (ICCES), Coimbatre, India, 8–10 July 2021. [Google Scholar] [CrossRef]
Saravanan, R.; Sujatha, P. A State of Art Techniques on Machine Learning Algorithms: A Perspective of Supervised Learning Approaches in Data Classification. In Proceedings of the 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 14–15 June 2018; pp. 945–949. [Google Scholar] [CrossRef]
Sharma, N.; Sharma, R.; Jindal, N. Machine Learning and Deep Learning Applications-A Vision. Glob. Transit. Proc. 2021, 2, 24–28. [Google Scholar] [CrossRef]
Reboredo, P.; Blanco, J.; Fernandez, N.; Cedron, F.; Novoa, F.J.; Carballal, A.; Maojo, V.; Pazos, A.; Lozano, C. A review on machine learning approaches and trends in drug discovery. Comput. Struct. Biotechnol. J. 2021, 19, 4538–4558. [Google Scholar] [CrossRef]
Baminiwatta, A. Global trends of machine learning applications in psychiatric research over 30 years: A bibliometric analysis. Asian J. Psychiatry 2022, 69, 102986. [Google Scholar] [CrossRef]
Zhang, Z.; Sejdic, E. Radiological images and machine learning: Trends, perspectives, and prospects. Comput. Biol. Med. 2019, 108, 354–370. [Google Scholar] [CrossRef]
Aljaddouh, B.; Malathi, D. Trends of using machine learning for detection and classification of respiratory diseases: Investigation and analysis. Mater. Today Proc. 2022, 62, 4651–4658. [Google Scholar] [CrossRef]
Chandan; Thakur, R. Recent Trends of Machine Learning In Soil Classification: A Review. Int. J. Comput. Eng. Res. 2018, 8, 25–32. [Google Scholar]
Behrens, T.; Forster, H.; Scholten, T.; Steinrucken, U.; Spies, E.; Goldschmitt, M. Digital soil mapping using artificial neural networks. J. Plant Nutr. Soil Sci. 2005, 168, 21–33. [Google Scholar] [CrossRef]
Mohammadnazar, A.; Arvin, R.; Khattak, A.J. Classifying travelers’ driving style using basic safety messages generated by connected vehicles: Application of unsupervised machine learning. Transp. Res. Part C 2021, 122, 102917. [Google Scholar] [CrossRef]
javaTpoint. Available online: https://www.javatpoint.com/machine-learning (accessed on 14 February 2022).
Soofi, A.A.; Awan, A. Classification Techniques in Machine Learning: Applications and Issues. J. Basic Appl. Sci. 2017, 13, 459–465. [Google Scholar] [CrossRef]
Serrano, L. Grokking Machine Learning, 1st ed.; Manning Publications Company: New York, NY, USA, 2021; pp. 1–100. [Google Scholar]
Wu, S.; Flach, P.A. Feature Selection with Labelled and Unlabelled Data; University of Bristol: Bristol, UK, 2002. [Google Scholar]
Khalfallah, J.; Slama, J.B.H. A Comparative Study of the Various Clustering Algorithms in E-Learning Systems Using Weka Tools. In Proceedings of the 2018 JCCO Joint International Conference (JCCO: TICET-ICCA-GECO), Hammamet, Tunisia, 9–11 November 2018. [Google Scholar] [CrossRef]
Milligan, I. Automated Downloading with Wget; University of Waterloo: Waterloo, ON, Canada, 2012. [Google Scholar]
Haddaway, N.R. The Use of Web-scraping Software in Searching for Grey Literature. GREY 2015, 11, 186–190. [Google Scholar]
Srivastava, S.; Haroon, M.; Bajaj, A. Web Document Information Extraction Using Class Attribute Approach. In Proceedings of the 4th International Conference on Computer and Communication Technology (ICCCT), Allahabad, India, 20–22 September 2013; pp. 17–22. [Google Scholar] [CrossRef]
Coneglian, C.S.; Fusco, E.; Segundo, J.E.S. Semantic Agent in the Context of Big Data Usage in Ontological Information Retrieval in Scientific Research. In Proceedings of the International Conference on Internet of Things and Big Data, Rome, Italy, 23–25 April 2016; pp. 324–330. Available online: https://www.scitepress.org/PublishedPapers/2016/58757/pdf/index.html (accessed on 1 March 2024).
Thasal, R.; Yelkar, S.; Tare, A.; Gaikwad, S. Information Retrieval and De-duplication for Tourism Recommender System. Int. Res. J. Eng. Technol. 2018, 5, 1683–1687. [Google Scholar]
Gojare, S.; Joshi, R.; Gaigaware, D. Analysis and Design of Selenium WebDriver Automation Testing Framework. In Proceedings of the 2nd International Symposium on Big Data and Cloud Computing (ISBCC’15), Chennai, India, 12–13 March 2015; pp. 341–346. [Google Scholar] [CrossRef]
WebScore AI. Available online: https://webscore.ai/ (accessed on 10 May 2020).
Urban List. Available online: https://www.theurbanlist.com/nz/a-list/restaurants-auckland (accessed on 1 February 2024).

Figure 1. Mappings created between WPS Items and Quantitative Elements.

Figure 2. Hierarchical Structure of Clustering Methods.

Figure 3. Elbow Creation for K-means Module.

Figure 4. Variance Levels (5, 10 and 15) in three surveyors’ ratings for the same website ’Facets’.

Figure 5. ‘Facet’ -wise comparison of the Number of ’Facet’ records (%) vs. Modules, with Error (≤50%).

Figure 6. Module -wise comparison of Number of ’Facets’ records (%) vs. Modules, with Error (≤50%).

Table 1. ‘Item’ ’Informative’ Ratings for ‘k5’.

Cluster #	Item Rating	Quantitative Elements Ratings
	Informative	Hyperlinks	Videos	Wordcount
0	8.22	9.92	4.74	10
1	5.29	4.42	5.80	5.66
2	3.33	0	10	0
3	7.79	10	4.17	9.20
4	0.76	0.57	0	1.72

Table 2. ‘Facet’: ‘Proficient’ Ratings and ‘Items’ Ratings for ‘k5’.

Cluster #	Facet Ratings	Items Ratings
	Proficient	Informative	Satisfying	Searchable
0	9.41	8.22	10	10
1	4.46	5.29	4.51	3.59
2	1.14	3.33	0	0.09
3	5.78	7.79	0.30	9.24
4	1.68	0.76	4.27	0

Table 3. Test Website Ratings for ‘k5’.

Website Unique ID: 100620, Name: Macquarie University.
The test website is close to Train Cluster 0.
Facets	Facets Ratings	Items	Items Ratings
Confusing	6.43	Discouraging	9.84
		Irritating	3.02
Engaging	5.27	Attractive	5.27
Proficient	9.41	Informative	8.22
		Satisfying	10
		Searchable	10
Systematic	3.61	Concise	0.0
		Fast	7.22
Conclusion: The given test website, Macquarie University, has a rating of 6.43 in Confusing, 5.27 in Engaging, 9.41 in Proficient, and 3.61 in Systematic.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chishti, S.A.; Ardekani, I.; Varastehpour, S. AI-Enhanced Personality Identification of Websites. Information 2024, 15, 623. https://doi.org/10.3390/info15100623

AMA Style

Chishti SA, Ardekani I, Varastehpour S. AI-Enhanced Personality Identification of Websites. Information. 2024; 15(10):623. https://doi.org/10.3390/info15100623

Chicago/Turabian Style

Chishti, Shafquat Ali, Iman Ardekani, and Soheil Varastehpour. 2024. "AI-Enhanced Personality Identification of Websites" Information 15, no. 10: 623. https://doi.org/10.3390/info15100623

APA Style

Chishti, S. A., Ardekani, I., & Varastehpour, S. (2024). AI-Enhanced Personality Identification of Websites. Information, 15(10), 623. https://doi.org/10.3390/info15100623

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AI-Enhanced Personality Identification of Websites

Abstract

1. Introduction

Website Personality Identification

2. Materials and Methods

2.1. Developing Data Collection Software Tool

2.1.1. Training Data Processing (Clusters Creation)

2.1.2. Elbow Creation

2.1.3. Test Data Processing

2.2. Developing Survey for Validating Results

3. Results

3.1. Facet-Wise Comparison

3.2. Module-Wise Comparison

3.3. Contributions

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI