1. Introduction
The Kuwait Construction Market offers a promising environment for the utilisation of state-of-the-art analytical tools to effectively investigate its intricate dynamics [
1,
2,
3]. By intricate dynamics, we refer to the complex interplay of multiple factors within the Kuwait Construction Market. These factors include fluctuating investment levels, varied project timelines, labour force dynamics, and material supply chain variability. Additionally, external influences such as global oil price volatility and local regulatory changes further contribute to these dynamics. Given the substantial increase in investments and the proliferation of projects within the industry, the utilisation of advanced analytical techniques becomes essential in order to reveal concealed patterns, inefficiencies, and possible possibilities. This research presents an innovative deep learning method that utilises an autoencoder [
4] to analyse the complex dynamics of the Kuwait Construction Market. This study seeks to analyse the complex relationship between many market elements, such as cash allocations, project timeframes, and labour dynamics in order to detect abnormalities that may indicate operational or financial problems [
5,
6]. In the field of construction and building management, our technique also aims to facilitate a more informed and proactive decision-making process.
A research conducted by Koushki and Kartam highlights the considerable influence of building materials on the schedules and expenses of projects in Kuwait’s construction sector [
7]. Their research into the time-related variables and material availability presents another dimension of anomaly detection—supply chain disruptions—which are critical in the construction industry. The findings of their study demonstrate that the selection and accessibility of construction supplies have a significant impact on project delays and cost overruns. The study’s findings suggest that imported materials have less impact on project delays and cost overruns compared with local materials. This is because imported goods are better planned, and there is a higher level of certainty regarding material availability before construction begins [
8,
9].
The study [
10] delves deeper into the wider ramifications of fluctuations in oil prices on the construction industry, with a particular focus on the susceptibility of government project spending to changes in oil income. This scenario is demonstrated by the government’s endeavours to broaden revenue streams outside oil, as a component of the Kuwait Development Plan, with the objective of fostering non-oil GDP expansion. Nevertheless, this research indicates that the effectiveness of these diversification endeavours is constrained, given the ongoing volatility of the construction sector in relation to fluctuating oil prices [
11,
12].
The combination of these studies into a thorough literature review shows the many problems that Kuwait’s building industry faces, such as its reliance on imported materials, the effects of changing oil prices, and the government’s role in funding building projects [
13]. The aforementioned problems highlight the necessity of implementing strategic planning, expanding the economy, and effectively managing building supplies in order to minimise delays and cost overruns. The knowledge obtained from this analysis can provide valuable guidance for policy-making, project management approaches, and future research endeavours, therefore enhancing the resilience and sustainability of the construction industry in Kuwait and comparable economies [
14,
15].
This research aims to enhance the understanding and prediction of market behaviours by detecting anomalies that may indicate inefficiencies or opportunities within the Kuwait Construction Market. By leveraging autoencoder neural networks, we aim to provide valuable insights for stakeholders to improve decision-making processes, boost operational efficiency, and support strategic planning in the construction sector. Our findings contribute to the broader academic discussion on integrating advanced machine learning techniques into construction market analysis.
This research paper is strategically designed to delve into the complexities of the Kuwait Construction Market, utilising data sourced from the Public Authority for Civil Information and the Ministry of Justice to underpin our analysis. Initially, we present a detailed examination of these market data, setting a foundational understanding of the current market dynamics, trends, and driving forces behind its growth. This provides the necessary context for our subsequent exploration into predicting the future size and investment patterns of the Kuwait Construction Market through statistical methods.
The core of our study focuses on the application of deep learning techniques, specifically the use of an autoencoder neural network. This approach is employed to analyse and interpret the intricate data, effectively identifying abnormal patterns that may indicate underlying issues or opportunities within the market. Our hypothesis is that autoencoders, by learning a compressed representation of normal market behaviour, can effectively highlight deviations from the norm. Our use of an autoencoder demonstrates the profound capabilities of deep learning in detecting anomalies and predicting future market behaviours.
In the discussion section, we provide an analysis of our findings, contextualising them within the broader scope of construction market analytics and their implications for stakeholders. The conclusion synthesises our research outcomes and operational efficiencies in the construction sector in Kuwait and potentially in similar markets worldwide.
In summary, here are the technical contributions of our paper:
Advanced Anomaly Detection in Market Data: Utilised autoencoder neural networks to identify abnormal patterns within the Kuwait Construction Market, providing a novel method for detecting underlying issues and opportunities through compressed representations of normal market behaviour.
Data-Driven Market Analysis and Predictions: Integrated diverse datasets from public authorities to analyse current market dynamics and trends, and employed deep learning alongside statistical methods to predict future market behaviours, aiding stakeholders in strategic planning and decision making.
Academic and Practical Impact: Contributed to the academic discussion on machine learning applications in construction market analysis, demonstrating the practical benefits of deep learning techniques in improving decision-making processes, operational efficiency, and financial outcomes for market stakeholders.
This research work is organised as follows: First, we discuss related work, followed by a discussion on the data. Next, we detail the methodology and present the experimental results and discussion. Additionally, we highlight the limitations of this work, and finally, we conclude with a summary of our findings.
2. Related Work
Anomaly detection in construction market data is a rapidly growing area of research, particularly pertinent to regions like Kuwait where economic activities are profoundly influenced by the dynamic construction industry. This literature review examines the developments in anomaly detection within the context of the Kuwaiti construction market, synthesising findings from various studies to identify trends, methodologies, and outcomes relevant to this niche field.
Deep learning is utilized in the construction industry as a whole, addressing numerous challenges such as resource planning, risk management, and logistics [
16]. These issues often lead to design defects, project delivery delays, cost overruns, and contractual disputes, prompting research into advanced machine learning algorithms like deep learning for diagnostic and prescriptive analysis. The publicity generated by tech giants like Google, Facebook, and Amazon about Artificial Intelligence and its applications to unstructured data is just the beginning. In the construction sector, deep learning has vast potential in areas such as site planning and management, health and safety, and construction cost prediction, which remain largely unexplored. This article aims to review existing studies that have applied deep learning to prevalent construction challenges like structural health monitoring, construction site safety, building occupancy modelling, and energy demand prediction. To the best of our knowledge, there is currently no extensive survey of the applications of deep learning techniques within the construction industry. This review aspires to inspire future research into the optimal application of deep learning techniques such as image processing, computer vision, and natural language processing to address numerous industry challenges. Additionally, this paper discusses the limitations of deep learning, including the black box challenge, ethics and GDPR concerns, cybersecurity, and cost, which construction researchers and practitioners may encounter. It also highlights how deep learning can be leveraged for automatic speech recognition for BIM tools, retrofitting advisers for energy savings, on-site safety and health monitoring, and project risk mitigation and analysis. The potential for interpretable deep learning models to address the black box challenge in machine learning is explored, serving as a valuable resource for construction engineers and researchers interested in the possibilities of deep learning in the construction domain.
One pivotal study by Aslam et al. [
17], although primarily focused on the oil industry, provides valuable insights into the application of machine learning algorithms for anomaly detection. The use of Random Forest (RF) and Explainable Artificial Intelligence (XAI) to manage and interpret multivariate time-series data can be analogous to detecting anomalies in construction market data, such as unexpected shifts in material costs or labour productivity metrics. However, these methods often require extensive feature engineering and may struggle with the high dimensionality and non-linear relationships inherent in construction data.
In their study, Al-Tabtabai and Soliman examine the consequences of declines in oil prices on the construction sector, with a specific emphasis on the timeframe spanning from 2007 to 2017 [
10]. The study conducted by the researchers provides a comprehensive analysis of the clear relationship between oil prices and building material expenses, along with the wider implications for Kuwait’s Gross Domestic Product (GDP). The impact of the decrease in oil prices on government expenditure in the construction industry is significant [
18], particularly in light of Kuwait’s substantial dependence on oil-generated income. This study further introduces a regression model with the objective of predicting building costs by considering swings in oil prices. This model provides a prediction tool for stakeholders in economies that heavily rely on oil. However, traditional regression models used in this context may not effectively capture complex anomalies that arise from multifactorial influences.
Jarkas and Horner’s study on labour productivity in Kuwait’s construction industry provides a robust framework for establishing productivity baselines, which are essential for identifying anomalies in labour performance [
14]. They emphasise the importance of understanding ’normal’ performance metrics to better detect and interpret deviations, which could signify either risks or opportunities within the construction process. Despite this, their approach is limited by the reliance on predefined baselines, which may not account for evolving patterns in labour dynamics.
Al-Sabah and Refaat’s work on assessing construction risks in public projects in Kuwait presents a detailed categorisation of potential anomalies in the form of risks, including economic, regulatory, and environmental risks [
19]. Their methodical quantification of risk probabilities and severities offers a structured approach to anomaly detection, where deviations from expected risk levels can indicate underlying issues in project management or execution. However, risk assessment frameworks often lack the capability to dynamically learn from new data, limiting their responsiveness to emerging trends.
These studies illuminate the multifaceted nature of anomaly detection in the Kuwait Construction Market. They reveal that while the methodologies may vary—from statistical analyses to machine learning techniques—the underlying goal remains consistent: to accurately identify, interpret, and respond to anomalies. This is crucial not only for maintaining economic stability and productivity in the construction sector but also for enhancing predictive capabilities and strategic planning.
Our proposed approach using autoencoder neural networks directly addresses these limitations. Autoencoders excel in handling high-dimensional, non-linear data without requiring extensive feature engineering, making them well suited for the complex nature of construction market data. Unlike traditional regression models or risk assessment frameworks, autoencoders dynamically learn from the data, continuously improving their ability to detect subtle and evolving anomalies. Additionally, the use of autoencoders mitigates the need for predefined baselines, offering a more flexible and adaptive solution compared with static baseline or rule-based methods. By leveraging these strengths, our approach provides a more robust and comprehensive tool for anomaly detection in the Kuwait Construction Market, enhancing both predictive accuracy and operational efficiency.
3. Kuwait Construction Market Data
Due to the absence of a standardised housing index in Kuwait, this study used similar data that were sourced data from the Ministry of Justice’s Department of Property Registrations in [
20]. The dataset encompasses roughly 60,000 property transactions spanning from February 2004 to March 2017, documented in Arabic within unstructured PDF and Microsoft Excel 2021 formats. These records detail critical elements of each transaction, including property type, transaction date, price, plot size, and the location of the property. While additional details such as precise house addresses are occasionally available, their inconsistent presence led us to focus on only consistent five primary attributes, namely, property type, transaction date, price, plot size, and the location of the property, for our analysis. The detail of the data is shown in
Figure 1.
To facilitate analysis, significant preprocessing was necessary. Initially, the dataset underwent language conversion from Arabic to English, as described in [
20]. The data were then systematically catalogued not by the exact date but rather by the month or quarter in which each transaction occurred. In order to create single format data, a GPT-4 model [
21] was utilised to convert the unstructured data into single structure data. This process involved merging separate unstructured datasets into a single structured dataset, facilitating comprehensive analysis and anomaly detection. Consistent with local practices, the pricing data were standardised to a per-square-meter basis by dividing the transaction price by the plot size as described by Alfalah [
20]. For example, a property with a 400-square meter-plot sold for KWD 200,000 would be recorded at a rate of KWD 500 per square meter. Considering that all residential plots in Kuwait have a built-up area that averages
of the land size, this measurement approach was deemed appropriate.
The data include entries related to different cities in Kuwait, with information on stock levels and transaction volumes separated by type (lands and houses). Here is an overview of the data:
City: Names of cities in Kuwait;
Stock: Stock levels, possibly related to some form of inventory or assets;
Transaction Volume Lands: Transaction volumes specifically for lands;
Transaction Volume Houses: Transaction volumes for houses.
To enhance analytical granularity, transactions were segregated by city. Each city’s data were then divided into two distinct time series, namely, one for house transactions and another for land transactions. This separation is particularly relevant in Kuwait and other emerging markets where combined sales of houses and plots are commonplace.
Prior to constructing the indices, a thorough data cleaning process was essential. This included the exclusion of data pertaining to non-single-family residential units such as apartments and other property classes like investment properties, focusing solely on single-family homes.
Further, the data were scrutinised for inaccuracies in recorded prices and plot sizes, with approximately 6000 transactions deemed unreliable and subsequently removed [
20]. These entries typically featured prices recorded as zero, or figures vastly exceeding typical values, as well as plot sizes that were either implausibly small or excessively large. An additional review targeted major outliers, resulting in the elimination of about 600 transactions. This selective exclusion did not remove all outliers due to the inherent price variability across different cities, which reflects the diverse nature of the real estate market and poses a significant challenge in real estate index construction.
Table 1 is a summary of key statistical measures for stock levels and transaction volumes for lands and houses across cities in Kuwait.
The mean and median values show that the central tendency of stocks is higher than that of transaction volumes, indicating larger stock reserves relative to sales or transactions. The maximum values indicate peak occurrences that could be subject of further investigation to understand factors driving exceptionally high transactions or stock levels. The minimum values, especially the zero in transaction volumes for lands, suggest periods or places with no transactions, which could be indicative of market downturns or lack of demand. The standard deviation highlights the variability in each category, with transaction volumes for lands showing greater fluctuation compared with houses, suggesting a more volatile market for lands.
Figure 2 illustrates the transaction volumes for lands and houses across six districts in Kuwait. It highlights significant disparities between districts, with the Alahmadi district exhibiting the highest transaction volume for both lands and houses. Notably, the Mobarak Alkaber district shows a pronounced preference for land transactions over houses, indicating a possible trend towards land investment in this region. The data suggest regional variations in real estate activity, which could be attributed to factors such as economic development, regulatory changes, or demographic shifts. This distribution can be considered as a critical tool for understanding the dynamics of the real estate market in Kuwait.
The spikes in transaction volumes for both lands and houses in certain cities are indicators of the existence of abnormalities as they are significantly higher than average. These could be driven by specific events like new developments, policy changes, or economic stimuli. Furthermore, the stock levels are relatively consistent, but sudden dips for some districts require further investigation to understand potential stock management issues or changes in demand.
Optimal Dataset Generation
Given that the current database has its own limitation, here, we would like to highlight how an optimal dataset for a future work would look like. In order to ensure that the model functions effectively, it is crucial to construct an optimal dataset characterised by comprehensive, high-quality data that accurately represent the complexities and variabilities of the Kuwait Construction Market. The optimal dataset would contain the following elements:
Comprehensive Data Coverage:
Temporal Granularity: Monthly or quarterly transaction data to capture short-term and long-term trends.
Spatial Granularity: Data segregated by districts or cities to account for regional variations.
Detailed Attributes: Variables such as property type, transaction date, price, plot size, location, project timelines, investment distributions, labour force dynamics, material supply data, and economic indicators.
Data Quality and Consistency:
Accurate and Complete Records: Ensure that all transactions are recorded with complete details and verified for accuracy.
Standardised Units: Normalise prices to a per-square-meter basis and standardise other units to ensure consistency across records.
Handling Missing Data: Implement robust methods for imputing missing data to maintain dataset integrity.
Inclusion of External Factors:
Economic Indicators: Incorporate data on interest rates, employment rates, oil prices, and other economic factors that influence the construction market.
Policy Changes: Include data on relevant government policies, subsidies, and zoning laws.
Market Sentiment Indicators: Use data from surveys or social media to gauge market sentiment.
5. Experimental Results and Discussion
5.4. Discussion
This paper presents three main findings derived from the application of our trained autoencoder to analyse real estate volatility. First, it reveals that the heteroscedasticity commonly associated with real estate is not the principal challenge in generating highly volatile indices. Instead, fluctuations are more influenced by factors related to land price in indices exclusively focused on land transactions. Despite lacking many housing characteristics, land sales inherently exhibit high volatility, leading to more volatile indices for land-only transactions compared with those including house transactions, which show moderate volatility. This variation may also stem from property traders who frequently flip land or from the high transaction rates in certain cities.
Second, employing the autoencoder to strategically reduce the sample size and select specific sub-samples has significantly enhanced index performance in various aspects. Excluding data from regions mostly known for luxury and vacation homes markedly improves outcomes. Similarly, it seemed that the network focused on high-frequency cities, which, as a result, enhanced performance by mitigating the impacts of cities with low transaction frequencies, high volatility, and elevated costs.
Third, the autoencoder’s capability to stratify data by city and long-term mean prices considerably bolsters the efficacy of central tendency measures used in index construction. While city stratification slightly edges out in performance, indices stratified by long-term mean prices hold competitive advantages. This approach aggregates a substantial volume of transactions per period, thereby minimizing the potential skewing effects of outliers, offering superior performance in certain scenarios compared with other stratification techniques.
Overall, based on our analysis, we found two main abnormalities in the data, which are the following:
Transaction Volume of Lands: The model identified significant outliers with volumes of 4278 and 13,211 transactions, which are substantially higher than typical volumes. These anomalies were correlated with specific market activities, such as new developments or policy changes.
Transaction Volume of Houses: Outliers included volumes of 1001 and 1150 transactions, also considerably higher than the average. These spikes were associated with specific events or shifts in market sentiment.
These outliers indicate special market activities in those specific cities. Abnormal behaviours, such as the extremely high transaction volumes in certain cities, seem to be driven by factors like new developments, policy changes, or other market influences, as stated earlier. The anomalies detected by the model highlight periods of unusual market activity, which could be critical for stakeholders in making informed decisions. For instance, the detection of abnormal spikes in transaction volumes can prompt further investigation into underlying causes such as economic changes, government policies, or new developments.
Based on our investigation on existing data, we can outline the following factors that might contribute to detected abnormalities:
Economic Changes: Shifts in the economy, such as changes in interest rates, employment rates, impact of oil price change, and overall economic growth, have significantly impacted real estate transactions.
Government Policies: The introduction of new government policies and incentives, such as subsidies for buyers and changes in zoning laws, has led to a dramatic change in transaction volumes.
Market Sentiment: One of our hypotheses is that the general sentiment about the future of the property market might have caused fluctuations.
New Developments: New developments and announcements of future developments in several districts has led to increased transactions as investors and homebuyers try to get in early.