1. Introduction
The purpose of the research is to explore how diversified decentralized cash systems are presented and characterized in the largest open-source knowledge base. During this study, a number of research questions (RQs) were raised. They are listed below:
Are the articles that describe cryptocurrencies within Wikipedia emotionally well balanced? To what extent are they neutral in their claims? (RQ1)
Is there any association between the sentiment of Wikipedia articles about crypto coins and their overall quality? (RQ2)
Whether popular search engine statistics show similar patterns of interest as the visits in Wikipedia about cryptocurrencies? (RQ3)
How can one model the popularity of particular cryptocurrencies based on demand for information on the Internet? Is it possible to track this popularity on a geographical basis? (RQ4)
Is it possible to bind the national attitudes towards the crypto economy with the popularity of cryptocurrencies in particular countries? How different in particular countries is the legal approach towards cryptocurrency technology? (RQ5)
Why only several cryptocurrencies are described in the world encyclopedia? How variable is the crypto economy subject matter presented in Wikipedia? (RQ6)
Nowadays huge amount of content in Internet created by individuals helps to provide research in different fields: medicine [
1], tourism [
2], marketing [
3] and others. There are various possibilities for the Internet users to create such content. One of the most popular examples of such services is Wikipedia. This free collaborative knowledge base contains social and behavioral data, which already proven to be useful for socio-economic forecasting [
4]. For example, data from Wikipedia can be used for predicting movie box office success [
5] or moves on stock market [
6].
The elaborated topic is important because both Wikipedia and cryptocurrency technology are now used on a mass scale and internationally. There is also a strong need for a trustful and independent information about particular cryptocurrencies. Wikipedia has been perceived for a long time as a knowledge source of lesser reliability despite its popularity. This negative attitude and unenthusiastic sentiment concerning its trustworthiness have evolved in time towards more positive opinions. One has to bear in mind that Wikipedia is the most important crowdsourcing web portals on the modern Internet. Therefore, it should be treated as the major mean for modeling Internet users’ behaviors. Especially that Wikipedia provides access not only to content, but also to metadata about history of editing, readers, actions of the editors and other potentially useful data.
Wikipedia articles often appears on the top positions in search results in Google, which provide special tool for popularity analysis of search queries—Google Trends [
7]. This tool was used as additional source of data, to analyze demand for information about cryptocurrencies in different countries and for different time periods.
The first cryptocurrency started at the beginning of 2009. It was designed to function in analogy to normal (fiat) money. However, it had some exceptional features, being completely digital, third-parties independent and anonymous to a certain degree. After a decade, together with other, similar projects, it is a basis of an international economic system. The size of this system is also comparable to medium-sized national economies. The number of cryptocurrency projects that evolved during this time is beyond two thousand. Nevertheless, only a small amount of them is of high popularity.
In this research, five relationships between information and metadata contained in Wikipedia and the particular instances of cryptocurrencies were studied. The described analytical instruments allow formulating assertions about the state of cryptocurrency technologies and the position of specific cryptocurrency projects in the realm of these technologies. Specifically, the study involves:
The quality of crypto coins and their descriptions in Wikipedia.
The extension of the formulation of the cryptocurrencies’ popularity model that allows the estimation of potential users in particular geographic locations.
The construction of a model that confronts the crypto coins’ popularity with the legality of their use in certain jurisdictions.
The analysis of Wikipedia articles’ dynamics and its comparison with numerous cryptocurrency features.
The contribution of this study consist of conducting cryptocurrency-related Wikipedia articles deletion analysis, preparing cryptocurrency-related Wikipedia articles sentiment ranking, extending the cryptocurrency popularity model proposed in [
8] and providing the model that confronts the national cryptocurrency popularity data with their legality.
The methodological approach taken in this research uses the design science research (DSR). The DSR is a framework that allows to systematize the smooth transition from theoretical background to materialized empirical artifacts. It is especially well-suited to the IT-related studies. Additionally, some statistical methods are supplementary employed.
Wikipedia should be deemed as an unbiased source of knowledge and information for cryptocurrency technologies. The idea behind every encyclopedia is to present facts about objects of diversified types in the form of articles. These articles reflect community knowledge from any domain. When comparing Wikipedia to any other encyclopedic effort, it is exclusive for several reasons. It is the largest, international, free, open and collaboratively written web-based knowledge source. It has more than 300 language versions which in total exceeds 52 million articles. There is also an informative category describing cryptocurrencies and related concepts.
The details about Wikipedia treated as the source of information are covered in
Section 3.1 and
Section 4.1.1 of the article. In the
Section 2, a reference is given to other researches that may be considered relevant to the presented one. Consequently,
Section 3.2 gives an in-depth description of the technical side of data extraction from Wikipedia. All the presentations of the mentioned analytical instruments are organized in
Section 4 of the paper. The implications of received results are discussed in
Section 5.
2. Literature Review
There exist a small but constantly growing number of papers that involve both subjects of cryptocurrencies and Wikipedia. Some authors of the mentioned articles also turn their attention to other Internet knowledge and data sources. However, most of the earlier conducted researches focus on the associations between market time-series and the particular indicators based on crowd media data (e.g., [
9,
10]). In opposition to these texts this paper focus rather on other aspects of crypto coins than strict market data. A similar approach was taken in a recent study presented in [
11].
Figure 1 presents the distribution of articles related to the topics of interest of this paper. All the considered texts have been published in the following years since 2002—that is the creation of Wikipedia. This is a demonstrative graph that is aimed at giving orientation in numerical relations between scholarly contributions within the topics of Wikipedia and cryptocurrencies. The data come from the Scopus database. The articles devoted to Bitcoin started to appear in 2009. The third dataset represents the papers combining both the topic of Wikipedia and cryptocurrencies. The specifics of Wikipedia makes it a little problematic to set a precise number of involved scientific texts. The reason is that it may be included in the content of scientific papers in two roles: an object of research or as an information (reference) source. As the latter case is prevalent the search results were narrowed to the title or abstract of the papers.
Researches use Wikipedia regularly not only as a source of information but also as the main object of study. The researches on Wikipedia are especially focused on the topics of information quality and reliability of Internet services as an information source. The paper of Rowley and Johnson [
12] is an example of a study that aims to recognize the important elements that are the basis of the trustworthiness assessment process of an Internet information source. This study consist of two phases and Wikipedia was chosen as the object of examination. The same study also offers insights into the process of how these elements are used to evaluate the web source. Most the factors that were found are consistent with previous reports on the topic.
The authors of [
13] present two main findings. First, they empirically show that Wikipedia is a very popular knowledge source for students. Second, the authors highlight that the use of the knowledge form the on-line free encyclopedia is done without any particular consideration of the obtained information quality. It is an improbable situation that the students further verify the knowledge from Wikipedia. It implies that this Internet source is treated often by the young generation as an ultimate trusted provider of knowledge although it is quite a subjective point of their view.
The trustworthiness of Wikipedia is also admitted by another study [
14]. This research is more precise in its assessment of the popularity of this source of knowledge. The authors indicate that a third of the tested college group extracted facts from this source of information. Although the paper does not deal with the issue of information verification, another text [
15] suggests that part of the students show the behavior of critical reading of the online encyclopedia content. This observation is in contrast to the one mentioned earlier [
13]. Another conclusion of [
15] is that when the trustworthiness of an encyclopedic article was dubious the study subjects used different quality measures to further determine the reliability of the content.
Data from Wikipedia can be used not only to analyze demand for information on specific topics, but also can help to predict success of products and moves on stock markets. Research of Mestyán et al. [
5] presented a predictive model of financial success of movies based on such measures as number of authors and pageviews of Wikipedia articles. Scientific work by Moat et al. [
6] showed how Wikipedia data of user activity and popularity of articles can improve the decision-making process in stock market. There is also a study based on Wikipedia usage that created a model for tourism demand forecasting [
16].
Other source of data that was used in this study is Google Trends. It allows to predict sales of the different products: cars [
17], telecom [
18], fashion [
19] and others. However, Google Trends gives only relative values of the popularity of search queries in the scale from 0 to 100. Additionally, this tool has limitation on getting data for long periods of time and sometime problems of query unification for different languages (when there were various spelling variants for the same topic). This last problem can be solved in Wikipedia using semantic connections between the language versions.
The article of Kristoufek [
9] is an example of one of the very few researches that made the comparison between Bitcoin market time-series Google Trends data and some Wikipedia statistics. The selection of the data sources is not coincidental. Few later studies that are elaborated below process data from a similar or expanded set of sources. The experience form the research is that a correlation may be found between Bitcoin capitalization and Google queries. The importance of this report lays in its innovativeness by presenting a new front of inquiries seeking relationships linking two significant Internet trends: Social Web and cryptocurrencies.
The relations between Bitcoin and various other time-spun sets of data are also studied in [
20]. In this research, the behavior of the cryptocurrency market for Bitcoin is compared with several traditional real economy indicators as well as digital economy news feeds. The collection of processed datasets encompasses fiat currencies exchange rates, stock exchange indices, social media news, as well as queries on Wikipedia, Google search engine and Twitter posts. Numerous econometric techniques have been employed during the research. The authors found the existence of a long-run negative bound between S&P500 and the Bitcoin value. The relation shows the substitution between stocks and Bitcoin as an alternative investment instrument. The investors accumulate their capital according to the prospect of the global state of the economy Once the outlook is pessimistic the alternatives grow on importance.
The approach presented in the mentioned articles ([
9,
20]) is also taken in the research of [
21]. The last paper is however much better supported by the economic theory. Beside a large number of determinants that potentially may influence both fiat currencies and crypto assets, the article fits the explanation of the results of obtained results into the well-established Barro’s model for gold. The fundamental motivation for the research is to discover the forces behind the pricing mechanism. One of the specific factors that according to authors should capture the investment attractiveness of the examined cryptocurrency is the daily number of views of the article about Bitcoin on Wikipedia. The authors, however, note that such a measure has a flaw which is a failure to differentiate on what is the exact purpose of the demand for the information. The motives that drive the behavior of actors visiting the particular article on the online encyclopedia may be manifold and range from investors to technology (potential) users. According to the authors the long run macro-financial developments have no measurable impact on Bitcoin prices. Nevertheless, certain factors such as the demand-supply balance and Bitcoin attractiveness that have a considerable influence on the cryptocurrency market value. Another conclusion is that the correlation strength among the influential elements is not constant in time.
Kristoufek extended his initial research of 2013 into another article. Although the aim and motivation for the following study [
22] are different, the data used are very similar to that form the previous examination [
9]. The main motivation for the paper dated 2015 is the assessment of the degree to which Bitcoin is a purely speculative asset and what are the most important factors that impact the volatility of Bitcoin price. The author categorizes these factors into technical, fundamental and speculative ones. The text analyzes comprehensively the possible ways in which the elements may impact cryptocurrency market prices. Not only are temporal changes taken into account but also the partition on short- and long-term stimuli is introduced. The latter is based on signal frequency distribution by utilizing the continuous wavelets framework. Search engines are one of the groups of data series that impact on Bitcoin market is measured. The group includes two data sets: weekly Google Trends and daily Wikipedia visits, both accumulated to represent the term “Bitcoin”.
Social media were used for algorithmic trading on Bitcoin [
23] and altcoins price prediction [
24]. Some authors used microblogs sentiment analysis related to cryptocurrencies [
25,
26,
27].
Two main facts may be reconstructed after reading up the quoted literature. First, a number of researches explore the issues of interconnections between the cryptocurrency market and crowd media together with Google Trends. Wikipedia is among these prominent crowd media. This means that their authors are aware of the potential of the relationships to study. Nevertheless, the studies conducted up to this point merely scratched the broad capabilities of examining possibly fruitful aspects that this source represents. At least two aspects are omitted by most studies. That is the broadness of the cryptocurrency market with thousands of coins and tokens. Another aspect is the internationalization is a crucial attribute of the free online encyclopedia. As of 2019, there exist beyond 300 language editions which are independently managed, and which offer independent content.
Second, there is an ongoing discussion on the quality of articles and the credibility of Wikipedia as a source of knowledge. One thing is however beyond doubt, which is the shift of critical attitude towards this issue over time. The encyclopedia community has made definitive progress on the way to improve the editorial standards so that Wikipedia is now much more believable than it used to be in the past. Unlike most similar works in this area, data from various language versions of Wikipedia were used. Moreover, a wider set of measures for cryptocurrencies analyses was extracted.