1. Introduction
The video game market has exploded in the digital marketplace as one of the rapidly growing digital industries where it is estimated that 2.3 billion gamers across the globe spent USD 137.9 billion on games in 2018 [
1]. Furthermore, as one of the biggest game publishing platforms, Steam had 120 million monthly active users and over 50 thousand games on their catalog as of 2021 (
https://backlinko.com/steam-users) (accessed on 1 October 2022). Furthermore, with the transition of the video game market from isolated local experiences to more networked ones, millions more users can access the internet for an expanded universe of gamers’ games and virtual communities. In ever-competitive and expanding business markets, satisfying such a growing consumer base generates massive data. Therefore, data-driven analysis is becoming an essential tool for analyzing consumer behavior, which is helpful for gaming developers, marketers, and streaming platforms [
1,
2].
As a rapidly growing game publishing platform, Steam’s popularity became well-known for the well-established game studios and independent game developers, typically known as the “indie” developers (or indie studios). The shift of game production using “free” and accessible all-in-one game engines had dominated the market for the development of game products and services [
3]; allowing more flexible “open–close” production (The “opened” production game makers are where multiple professional and leisure-based game-making identities were shared and “closed” production was adopted under platform governance policies, proprietary technical requirements, and multisided market strategies [
3]) that is being supported by the Steam platform.
Since creating video games is a lengthy and demanding process [
4], which could cause management and production problems in the same proportion [
5], over-bearing and over-confident developers [
5,
6], and even requires urgent updates (An urgent update is a software update that fixes problems deemed critical enough not to be left unfixed until a regular cycle update) [
7]. As such, a game studio’s financial success often depends on providing exciting experiences and access to a diverse audience, especially in a densely populated platform such as Steam. Such risks were relevant to both small and large development studios, highlighting the importance of knowledge support and understanding the current market situation. Steam’s game review is one of the sources for knowledge wealth on discerning the suitable monetary models to be adopted [
8] and everyday needs of players and flaws in existing games [
9]. Nevertheless, an epistemological problem has occurred where developers must distinguish between actual contributors to improvements in the game and those that merely express their subjective wishes, especially when community reviews are regarded as a form of user feedback.
Game distribution platforms, such as the Steam platform, are expected to provide continual improvement on the games owned by the players. In such a context, incorporating feedback from players is paramount [
9]. However, making sense of the overwhelming volume of data available on such a platform to discern useful or beneficial content is challenging, causing values from insights provided in the feedback to be overlooked by developers and publishers alike [
7,
9]. As such, helpful attributes on the Steam platform can be a valuable tool to uncover players’ underlying intentions and wishes while identifying beneficial insights into the fast-paced behavior of the digital marketplace.
Considering data analytics in game-playing, the motion-in-mind model has been used to induce subjective association based on the objective matrices in the game-playing process [
10]. By adopting the analogy of motions and physical phenomenon, information progression from uncertainty to certainty can be modeled, representing the ratio of difficulty (or challenge) to solve such information [
10,
11]. Associating such information analysis with the empirical data from the Steam platform may uncover the underlying characteristics of players’ interactions and trends of their game content consumption.
Therefore, this study aims to provide an in-depth empirical analysis of Steam as a publishing platform and game. The data that will be analyzed consists of a combination of public data provided by the Steam Store and a third-party steam statistics gathering service, Steam Spy, which provides valuable metrics, including the total owner’s estimation on Steam. Moreover, one feature of the Steam platform is the Steam Achievements of the games. The developer can set in-game goals for the players to achieve and players are rewarded with an achievement emblem to be showcased on their Steam profile. The achievement can also be extrinsic motivation to motivate players to explore the game’s content [
12]. Characterized by Steam games’ achievements and analysis of such achievements from the motion-in-mind perspective, this study attempts to address the following four research questions (RQs):
- Q1:
What kind of games release and what kind of multiplayer support are available on the Steam platform?
- Q2:
How are ratings affected by game prices on the Steam platform?
- Q3:
What kind of developers and publishers are dominant on the Steam platform?
- Q4:
Do Steam achievements affect game ratings and the types of games?
2. Related Works
In recent years, Steam has become a popular digital game distribution platform that has drawn much attention from academia. First developed by Valve corporation, Steam offered services related to digital distribution, digital rights management (DRM), multiplayer gaming, and social networking [
13,
14]. It became the world’s largest gaming platform and started with an official release on 12 September 2003. Various game genres were listed on Steam, including first-person shooters (FPS), role-playing, racing, and even independent games for their digital management and distribution (i.e., Indie games). Steam is cross-platform as it supports multiple gaming environments [
1]. Steam users interact with it via a local Steam client, available for operating systems such as Windows, Mac, and Linux.
Games can be purchased from the Steam Store or third-party vendors, which are then activated through the Steam platform and playable after logging in on Steam using the Steam client [
14]. Payments are available in various currencies and licenses are registered to the user library [
13]. Game ownership (or license) and updates will be automatically verified and installed since it is mandatory to play a game through Steam. Users can download and enjoy games from their library with their account information anywhere and at any time with ease.
Several veins of research had been conducted on the Steam platform and database. For example, Windleharth et al. [
15] describes a conceptual analysis of all user-generated tags applied on video games in the Steam video game distribution system, where the categories were sorted and compared to the video game metadata schema, where emergent terms beneficial to players were presented and discussed to uncover issues in organization and its implications for the future work. Then, Li and Zhang [
16] proposed an alternative approach to understand video game genre classification via preliminary network analysis of the user-generated game tags on the Steam platform using centrality analysis and community detection. Such an approach is intended to lay the ground for and encourage the further investigation of the intertwined connections between genres, inconsistently defined abstraction levels, and different user focuses.
Meanwhile, some researcher focus on the reliability of game reviews and their implications for game development. For instance, Kang et al. [
13] identified factors affecting the helpfulness of reviews uploaded by users on the communities by analyzing unrefined game data via data mining techniques, such as classification and regression trees (CART) and a multi-layered perceptron network, to predict the most significant variable in the Steam community’s game reviews. Moreover, Eberhard et al. [
17] analyzed a Steam game reviews dataset to discover the aspects of review usefulness from the community point-of-view. They extracted the text bodies from the review, obtained 58 different features from the reviews and separated them into three categories (unhelpful, helpful, and top review) based on the number of votes they received from the community. They found that reviews with more helpfulness votes tend to be longer, use more complex language, and express more negative sentiments as well as being more critical toward the product and going into greater detail about the individual aspect of the game. While these reviews tend to be longer or critical about the product, Eberhard et al. [
17] also found that there exists reviews with a large number of helpfulness votes with short or meaningless text where the number of votes are derived from humor or the author being a popular personality. Meanwhile, Lin et al. [
14] performed an empirical study on game reviews on the Steam platform to obtain a better understanding of the user-perceived quality in games where the number and the complexity of reviews, the type of information that is provided in the reviews, and the number of playing hours before posting a review were analyzed. It was found that positive and negative reviews provide helpful insights and its association with playing hours was unique between different dimensions of game types while being distinctive compared to the mobile app reviews. Busurkina et al. [
18] utilizes Netnography research that adopted a Structural Topic Model (STM) to evaluate game-playing experience based on player reviews on the Steam platform. Seven dimensions were identified, which can be disentangled to generate more knowledge on the evaluation processes and the game itself. The findings extend the comprehension of consumer retention mechanisms and better understand users’ motives and criteria in comparing games.
From another perspective, some work also explored user profiling to determine the players’ specific behavioral characteristics or personalized content delivery. The first comprehensive analysis of hardcore gamer profiling was conducted on a dataset of over 100 million Steam platform users, with over 700,000 hardcore players (users playing more than 20 h per week). It covers over 3300 games using a k-mean clustering algorithm to determine the specific behavioral categories of hardcore players [
19]. The results identify six hardcore gamers’ behavioral clusters, where some were related to the sense of motives, consciousness, and openness to experience depending on the game genres played. Meanwhile, Li et al. [
12] uncover the underlying structures of the Steam user profiles using exploratory factor analysis to define the player’s preferences and personalized behavior characteristic of the Steam community. Finally, Vihanga et al. [
20] conducted a study to explore player population fluctuations within online games to identify weekly seasonality, archetypal weekly population patterns, and relative frequency of these patterns from an extensive Steam player population data of 1963 games. The study identified that 77% of games displayed a recurring weekly pattern clustered into nine diverse weekly player population fluctuation patterns. Out of nine clusters, the two highly similar dominant clusters indicated that most games display a weekly pattern where the player population increase toward the weekend.
Other aspects of the players were also explored in conjunction with the market influences, business model, and decision-support system. Toy et al. [
1] discovered patterns among game ownership, genre, and geographical region from a vast Steam database via basic Heat map and clustered Heat map analysis. The result analysis revealed several interesting patterns, trends, and correlations of popular genres in the gaming industry (i.e., action games), shifting of current market practice and strategies (i.e., early access), and potentially leading to improved markets, business models, and a more responsive market in general. Ranti et al. [
21] proposed a k-prototypes algorithm that integrates both k-means and k-modes algorithms to cluster mixed numeric and categorical attributes of Steam’s user behavior telemetry data (40% or more of their total accumulative playtime) from the World of Warcraft game, resulting in three groups of a total of 15 clusters. It was found that there is a good correlation between sales data from the sample and actual sales data reported by game development companies. Furthermore, better insights into the play patterns of the games bought and played by steam users, patterns about the user themselves, and the importance of differentiating users (i.e., doubling the player base does not double the revenue). Furthermore, Wang et al. [
22] proposed a solution for a new video game recommendation system for the Steam platform called STEAMer, which utilizes the Steam user data and applies additional user data in conjunction with a deep autoencoder learning model to generate potential recommendations. Performance evaluation included comparing STEAMer with a baseline deep neural network-based system. The results showed that adding additional public Steam user data has a noticeable and positive effect on the game recommendations with a noticeable increase in the test metrics over the traditional deep neural network using the same features. Furthermore, Ahmad Kamal et al. [
23] implemented a genre-based and topic modeling model in a recommender system to predict the ratings of games using public Steam dataset. Though the results show that the genre-based model outperforms topic modeling model, it does not outperform the model performance from the previous research. Therefore, they concluded that genre is not a suitable parameter for recommending games.
Other related research on the Steam data and platform includes determining the network feasibility of the Steam In-Home streaming services in comparison to the regular network infrastructure [
24], discovering security vulnerability to serve as a guideline for computer forensics for Steam game platform [
25], and determining the impact of shifts of business model changes [
8], based on the analysis of Steam review data. However, limited studies were conducted on the Steam platform, focusing on game-level analysis, which provides valuable knowledge and intuitive insights for the game developer.
Studies on game-level analysis were also conducted on the Steam platform focusing on different perspectives. Some studies explored the Steam platform in conjunction with other distinct platforms (i.e., Twitch.tv that focuses on streaming) and their influences on a specific game experience. For example, Gandolfi [
26] visualized the dynamics and trends of game platform analysis mediated by Gephi, a network-oriented software, on the role-playing game
Dark Souls 3, along with an exploratory counter-example using the action game
The Division. It was found that such a media trend generates two different reactions: a positive one when the game is no longer just a game but a performance to watch and a negative one when the interactive affordances were questioned. Meanwhile, Lin et al. [
27] conducted an empirical study on the characteristics of 1182 Early Access Games (EAGs) where the interaction between players and the developers of EAGs, the Steam platform during and after leaving the early access stage, and the tolerance of players of the quality of EAGs were analyzed. The study found that EAGs tend to be “indie” games (adopted by smaller development studios) and lower reviews were written during the early access stage compared to the review after leaving the early access stage, whereas the rating is vice versa. Bailey and Miyata [
28] conducted data mining on the Steam “achievement” data of the video games in the Steam platform to discover trends in the game completion rates and correlated it to the factors outside of the game’s length. The study found that the completion rate can indicate the rate of players completing the game content and provide a benchmark for future scoping decisions in individual projects, which influences game development decisions and success rates.
Li et al. [
29] analyze and evaluate the playability of video games by mining players’ opinions from their reviews guided by the game-as-system definition, where sentiment analysis, binary classification, multi-label text classification, and topic modeling are sequentially performed. A total of 99,993 player reviews on the Steam platform were evaluated, which focused on the collective opinions relative to the maintenance and evolution of video games and helped game developers to understand it. Ullmann et al. [
30] investigated the aspects that describe a high-rated game through 200 video game projects on the Steam Platform. Though genre, graphical perspective, game modes, and platforms do not correlate to the ratings of the games, the study found that games from smaller teams are often linked to higher ratings. Additionally, they analyzed post-mortems discussed by the developers of high-rated games. Furthermore, [
31] conducted a study to predict whether a game on Steam is on discount or not using machine learning methods through data collected from the Steam database. The study compared Logistic Regression and Random Forest Classification and concluded that Random Forest reaches the top performance with 79.5% accuracy. This model will benefit players by allowing them to purchase a game at the right time while saving their money and for game publishers to optimize their discount strategies. Meanwhile, Badoni et al. [
32] conducted an observation based on a survey from 315 participants regarding which aspect of the game (graphics, gameplay, mechanics, and audio) is the most attractive among various desktop and mobile games. Based on the survey result, gameplay and graphics are mutually beneficial. In summary, the related works and their contribution summary relative to the current study are provided in
Table 1.
Relative to the physical motion, the motion-in mind-concepts furthers the concept by adopting a motion formulation to describe the entertainment aspects of games from the objective and subjective standpoints [
11], based on the basic assumption of move selection, game progression, and the ratio of winning (
m) and challenge (
m) being equalized (
) [
10]. The motion-in-mind model had been previously adopted to identify meta-gaming elements from the perspective of game evolution and its influence on culture [
33], linking entertainment with the game-tree search processes [
34], educational structure [
35], process fairness [
36], defining game features that cause it to be addictive [
37], and bridging physical comfort to comfort in the mind [
38]. More recent work adopts the motion-in-mind concept to identify the mechanisms to retain entertainment in long-term arcade games [
39] and key entertainment aspects (challenge, anticipation, and unpredictability) between different games in the
God of War series [
40]. This previous study showed that the motion-in-mind concept provides a versatile metric that is suited for analyzing varying aspects of meta-gaming elements of the Steam publishing platform in addition to other conventional analyses, which serves as the primary motivation of the study.
5. Discussion
Games’ ratings were analyzed through the motion-in-mind
v value, where it was measured using the
model, where
G is the number of positive ratings and
T is the total rating a game received.
Table 8 shows the correlation values between the games’ rating (
v) to their Steam market stats based on the developer type (indie and non-indie developers). As shown in
Table 8,
v is positively correlated with the global achievement percentage while they are negatively correlated with the number of days since the game was released. This situation implies that the longer a game is released, the more likely it will have a lower positive response from the player, leading to losing its popularity. In a sense, the
v captures a positive response (rewards) from their overall player base (total attempts). Therefore, when a game receives a high
v value (
), it puts the game in an advantageous position where it can lead to higher popularity and more players being attracted to play the game, including exploring its additional contents (games’ achievement). Meanwhile, when the game is of low
v value (
), it puts the game in a disadvantageous position, causing it to lose its current players and future players that might be attracted to buying/playing the game. Therefore, the
v value of a game fluctuates as the developer updates their games.
Table 4 showed a positive correlation between the game price and the total rating received by a game. This condition implies that the higher the game price (higher price tag), the more ratings the games tend to receive from the players (more people will buy the game). However,
Figure 11 shows that there are peak points of game prices for each game type from different developer types. For instance, once a game is listed with a price tag over its peak point, it might not sell as much as other games with a lower price tag because the players feel it is too expensive or overpriced. Another insight that can gain from
Figure 11 is that the game developers can strategize how they will price their games to sell them or attract players to buy them efficiently. For example, they can sell their single-player game for USD 20 as their regular price tag and USD 15 as their
discounted price tag when they first released their game or during the Steam sales period to attract players. Another strategy that multiplayer games can use is by transitioning to a free-to-play (F2P) business model to attract players, as we saw in
Counter-Strike: Global Offensive (CS:GO) in Cluster 5, which significantly increased their player population [
8] and can lead to more revenue when players spend in their microtransactions scheme.
Moreover, a regression analysis was conducted through
Orange Data Mining (
https://orangedatamining.com/) (accessed on 1 October 2022) software using their built-in
Random Forest algorithm. First, the analysis was conducted between the independent variable (
rating total) and the dependent variables (
game price, game type, number of achievements, Steam rating, rating ratio, and an indicator whether the game has achievements, is released through the publisher, is an indie game, is a free-to-play game, and is a game from a new developer) of the collected Steam data. Next, the outlier games were removed from the data using
Orange’s built-in outlier detection feature using the
Covariance Estimator method; 4655 outlier games were removed from the data. Then, the train-to-test ratio data of 80:20 of the sample data was considered. Based on the result in
Table 9, the R
2 value indicated that the independent models explain 75% of the variance of the dependent variable. In addition, both the mean absolute error (MAE) and root mean squared error (RMSE) values were relatively low, within 261.983 and 3582.095, respectively, which is lower than the total rating’s average value of 814.45 and standard deviation of 10,106.77.
In terms of game types, the difference between single-player and multiplayer games can be observed where the multiplayer games tend to have lower achievement percentages. One of the possible explanations here is that, in single-player games, players have more space or opportunities to explore the game’s additional objectives to obtain in-game achievements beyond the main objectives. On the other hand, players of multiplayer games are occupied with competing or cooperating with other players to reach the game’s primary objective. Moreover, multiplayer games were preferred by non-indie game developers since they attracted a more extensive player base and, consequently, required less number of releases throughout the years.
Moreover, it can be implied that the Steam platform indirectly played a role in gamifying the game releases and game ratings by providing a meta-game solution to well-known game studios and indie games.
Figure 12 showed the boxplot and scatter plot of the positive rating ratio against the total rating of the collected Steam games. By considering the
model, where
G is the number of positive ratings and
T is the total ratings a game received,
Figure 12a showed that the majority of games regardless on which cluster they belong to were
(except for some outliers). These findings substantiated that the Steam platform is still the biggest and most popular digital game distribution platform compared to other similar platforms for players and developers (such as Epic Games and itch.io). In addition, the outlier games (
) that were shown in
Figure 12b implied that these outlier games are less popular or low rated (low
v, which also implies low
values).
In diverse situations, the clustering results revealed that a unique solution existed that allowed for a different manipulation of the Steam platform to maintain the sustainability of the game studio. For instance, the
achievement spam game takes advantage of the achievement system to attract players to the game regardless of the content [
60]. It also acts as an additional objective that a player can achieve outside of the intended content of the games, mainly to retain interests and continuity of the platform. Nevertheless, proper moderation is crucial to balancing and maintaining players’ interests on the platform.
From the developer’s perspective, the ratings and achievements of the Steam platform played other roles as indicators for developers to perform decision-making and risk assessment. The Steam platform helps indie developers to build reputations and user acceptance of their game release brand. On several occasions, there had been indications that the players care less about the type of developers (indie or non-indie) but care more if the developers work with a publisher. Moreover, it can be implied that fast-paced developers under a small studio or publisher would earn more revenue and be better received by Steam platform users by releasing single-player games. In contrast, a large development company backed by well-known publishers would be better off focusing on multiplayer game releases to take advantage of the platform achievement features while maintaining continuity by incorporating downloadable content (or DLC) [
62].
From a business point-of-view, the Steam platform, besides being an instrument of monopoly, capitalism, commodification, and ecosystem of digital products (for instance, games) [
63], it provides not only a channel that connects developers and players via a gamified platform but also maintains dynamic interactions between developers and players via constant engagement and structural processes. As such, the purpose of a platform becomes a meta-game for developers and publishers to bridge the needed experience of the players. In addition, the Steam platform also provides a gaming experience beyond the game itself, a notion known as ”extraludic” [
46,
47] while laying out the opportunity for learning and value-added social interactions by acting as both mechanical and social metagaming solutions [
64].
Finally, related to the analysis of Steam achievements, it can be implied that its proper implementation relative to the in-game and out-game contents regulates and drives a new form of “game economics” (cf. [
65]); thus, the developers and publishers may take advantage over it as a form of gameplay activities or game experience to generate revenue. Although there were some concerns regarding addiction and betting on the Steam platform [
63,
65], it does provide a unique opportunity to introduce a novel business model that emphasizes player experiences rather than exploiting the Games as a Service (GaaS) model just for revenue generation [
62]. Furthermore, prices and ratings ultimately make or break certain games, as found in the Steam platform analysis, highlighting the importance of cost transparency and cross-linkage to maintain existing players and attract new ones (cf. [
62,
66]).
In light of these research findings, the need for a harmonic balance between the play experience (of players) and the revenue generation (of developers and publishers) could be achieved via an experience-driven business model implemented in the context of the digital games distribution platform (
Figure 13). By having the appropriation of digital badging (such as Steam game achievements and ratings) with cross-linkage services (such as chat rooms, social media interfaces, and reviews) to dynamically compute aggregated scores (such as the motion-in-mind model) to rank the ”expected experience” players can perceive from the game listing. However, such a business model requires some investment (cost and time) from the developers and publishers. Instead of directly releasing the game, they have to undergo a moderation process (by the platform provider) to incorporate the appropriate digital badges and cross-linkage services. Finally, the aggregated metrics will be updated periodically when new data from the digital badges and cross-linkage services are acquired. The envisioned business model will enable game developers to improve the visibility of their games and direct them to their targeted users, enable Steam to recommend more personalized game recommendations to Steam users depending on the types of games that they are interested in (i.e., based on the game’s ratings, reviews, achievements, etc.) and, finally, this may improve user satisfaction itself where they can discover games that they will enjoy.
6. Limitation
The limitations to this research include focusing on games that have in-game achievements released between 2006 and 2022 on Valve Corporation’s Steam platform. This situation excludes games without in-game achievements. The primary focus was on the correlation of the data collected over causality. Furthermore, the data collected were mainly focused on game prices, categories, achievements, rating, releases, developer, publishers, and other essential auxiliary labels (such as game tags, appids, names, dates, etc.). Due to the nature of game data confidentiality, the exact numbers of owners of each game were not accessible. In this research, the
total ratings were used to assume the number of players who own the game. Since only users who own the games in their library can review the game in the [
67] platform, the actual game owners number can be higher because not every player writes reviews of the game they purchased (or played).
In addition, the measure of v, based on the rating ratio of the Steam game received, indicated the competitive comfort that the Steam platform presented to the game players and game developers. However, such a method was based on the assumption that the level of user’s ability was based on the objective and subjective analysis of popular board games. Therefore, game designers and developers should have taken these findings with a grain of salt.
This research analyzed the global achievement percentages of the achievements that exist in the games. Therefore, we do not differentiate the difficulty of each achievement in the games. However, there is always the possibility that completion rates could differ for games that force players to choose higher difficulty modes for obtaining achievements. Without an achievement to signify completion at the easier difficulty, there is no way to count the ratio of players who finish the game in the more accessible mode.
In particular, all the games on Steam are digital downloads, so completion rates could differ from games purchased on a physical medium. Furthermore, the research focused on both games that require money to purchase and “free-to-play” games. Therefore, content usage will still be an issue for those games. However, the business model involved is different enough to warrant a separate investigation, particularly concerning mobile games, such as ones released on Apple’s iStore or Google’s Play platforms.
Since the Steam platform provides digitally downloaded games, the nature of the games and its player may be different compared to other existing or competing platforms, such as Sony’s PlayStation, Microsoft’s Xbox, Nintendo’s Wii, Switch hardware, and so on. Finally, the present research is empirical by nature and causation between factors cannot precisely be determined where some other independent factors may be the primary focus. Therefore, this research focuses on the insights obtained from examining the three data repositories (Steam store, SteamSpy, and Steam IUserStats) via the publicly available Steamworks and SteamSpy application programming interfaces (APIs).