1. Introduction
Nowadays, it is not easy to finance a novel idea or project. The different entities capable of giving financial support to new projects—far from facilitating credit access—use their administrative tools to make sure that behind each project there is a minimum risk, as low credibility shows the probability of no return on investment, thus avoiding the problem of lack of liquidity during the project’s life cycle [
1].
The advance of information and communication technologies allows access to unlimited and immediate sources of information, multiplying the impact of any activity simply by disseminating it across the appropriate forums, social networks or specific platforms [
2]. This new environment is particularly suitable for giving access to resource acquisition and improving the market for projects that would otherwise remain stagnant or backward [
3].
“Crowdfunding” has emerged as a way of financing ideas and turning them into projects, especially those difficult to finance because of their innovative character. Anyone can contribute economically, either to obtain a reward, or altruistically–for the satisfaction of having collaborated on a project. Thus, a double objective is achieved: obtaining the finance for its development and simultaneously gaining clients, since client and promoter are an indissoluble part of the method in this type of project.
By means of crowdfunding, unlike the traditional sale of products, the necessary funds are advanced, which reduces risk and makes the project development possible. The contact is carried out through specific platforms that connect people interested in a particular type of product with entrepreneurs offering them. Otherwise, they would not come into contact, causing unmet needs and unrealized expectations [
4]. So, a crowdfunding campaign is considered “successful” when the project reaches the minimum necessary funds (budgeted in advance) and is usually considered “very successful” if the funds reached exceed the budgeted amount.
The rise of these platforms has been unquestionable since 2006, when the Sellaband platform was created, aimed at raising funds for musical projects. It was the first online platform with an interface just as it is known nowadays. Since then, its number has not stopped growing; 2011 ended with 348 platforms [
5] and 2017 closes with more than 2000 active platforms [
6] and a large amount of money raised; Kickstarter, the industry’s most popular platform, has accumulated a total revenue of just over
$4 Billion [
7].
Despite this positive data, crowdfunding has not reached maturity, since the failure rate is far from improving over the time. According to a study by Professor Mollick, based on an extensive database of Kickstarter projects between 2009 and 2012, the average success rate during that period was 48.1% [
8]. The data that the platform shows in its statistics are even worse, starting with success rates in 2010 and 2011 of 43% and 46% respectively, and finishing with the last couple of years 2015 and 2016 with 36% and 32% success. 2016 stands out as the first year in which there is a recess in the number of projects proposed for funding, 14.5% less than the previous year [
9]. The success rate results of Indiegogo, the second most popular crowdfunding platform, are even lower, as it closed 2015 with a success rate of less than 10% [
10].
The sudden popularity of crowdfunding attributed to the impact of the most successful projects, has led to generalize this model to any new project with the risk of maladjustment. In spite of the improvements introduced in the platforms, the apparent facility to obtain financing leads creators to pursue this model, even when there are no real conditions to be successful. To increase success, platforms try to avoid projects with certain associated risks by frequently updating their standards, as they are the most interested part in managing successful projects that make a profit [
11].
Leaving aside considerations of taste, aesthetics or fashion, it is essential to know, both for platforms and for entrepreneurs, which criteria represent a greater guarantee of success, in order to avoid unnecessary work, resource consumption and bad image.
The greatest uncertainty for a creator of these projects is whether the required financing will be guaranteed or not, due the lack of reliable methods or tools to direct or reorient a project towards success, as is discussed in
Section 2.2. The main motivation for this study is the search for a tool based on the key factors of crowdfunding, these will allow a creator to adapt or modify the trajectory of a project based on a series of “pattern” scenarios with known characteristics. This interest is also stimulated because there is no tool capable of identifying which factors should be modified to guide the success or failure of a particular type of project. This contribution not only benefits the sponsors of the project by ensuring a product, but will also benefit the creator and the platform that hosts the project, contributing to the project funding sustainability. One of the foundations that allows the sustainable growth of this financing model is the constant search for satisfaction in both directions, through the reciprocity provided by this environment [
12].
To solve this problem, we performed a retrospective analysis or AAR (After Action Review) of the project results, whether successful or failed, following the guidelines of the Project Management Institute (PMI), to determine the causes. Knowing how, why or what decisions have been taken to achieve the success of the project is crucial if we are to standardize the actions that ensure success or keep the project from failure [
13].
We will use a data mining tools on a data set described in
Section 4.3, to induce new knowledge from a historical compilation of other projects [
14]. For this purpose, it is necessary to perform some preliminary steps of exploration and cleaning in order to limit the problem to our needs. Later, there will be an AAR analysis of the data resulting from the closing of projects in crowdfunding environments.
This process will allow for the grouping of common factors in successful and failed projects, as shown in
Section 5.1 and
Section 5.2, giving rise to six different types of project stereotypes, characterized by their most influential factors. The advantage for the creator of a project is clear, as comparative models are provided on which to associate any crowdfunding project, discussed in
Section 6, allowing us to set a strategy and modify the project variables based on the most appropriate model, even after the funding period has started.
5. Clustering Methodology
5.1. Justification of Use
The generalization in data collection allows the development of sophisticated techniques for extracting knowledge from information and data. These techniques are embedded within the KDD (Knowledge Discovery in Databases) analysis stage and are known as data mining techniques. The use of these techniques enables subsequent clustering of projects for labeling, following the principle of “maximizing the intraclass similarity and minimizing the interclass similarity” [
14].
This approach is based on the behavior of artificial neural networks, where the process starts with a project as an input and, through a competitive process associated with each neuron, generates a new vector called centroid, which is representative of all the projects related by this process.
Subsequently, in order to group the input in
k sets, the “K-means” algorithm is chosen as the most appropriate clustering model due to its non-hierarchical nature [
54]. The Davies–Bouldin index is used to identify the optimum value of
k. This index can be interpreted as the distance of each case to the newly identified cluster. The index decreases when the items in each cluster are more homogeneous [
55]. Following this criterion, the algorithm distinguishes seventy neurons distributed in six clusters as the optimal value of
k.
In
Figure 2, each of the six clusters obtained is shown by colors. This is the result of applying the K-means technique on the grid of representative centroids identified using SOM.
5.2. Analysis of Success and Failure
Starting from the neuron grid generated for the k-means cluster map, the number of projects labeled as success or failure is displayed. To allow the understanding of the chart, the size of each cell has been drawn proportionally to the number of projects it contains. In this way,
Figure 3 shows the two projections made on
Figure 2, one of success in green and failure in red. By superimposing these on the k-means cluster map it is possible to identify which zones or which clusters have more success or failure.
When comparing the two charts, there is an area of high interest, since it contains a higher number of successful projects and a reduced number of failed projects—the area in the lower left of the chart. This section is defined as a “Success Area”, and can be associated with clusters 3 and 2 in
Figure 2. Both clusters make up 43.64% of the total of successful projects, with a relation ratio of 76% of success and only 24% of failure, which is clearly differentiated in the success–failure ratio in the initial data set.
Likewise, another section of the chart can be defined as a “Failure Area” (top section), that includes clusters 1 and 5 where 58.90% of all failed projects are located, with a 40% success rate and 60% failure rate, representing the inverse behavior to the initial set.
5.3. Cluster Taxonomy
Once the neurons that belong to each cluster have been identified, a first characterization of clusters is carried out based on the number of projects that continue and the proportion of successful and failed ones—the authors commonly call it success rate.
Table 4 shows the projects classified by cluster and success rate. Clusters 2, 3 and 4 are highlighted for surpassing the success rate, while the rest of the clusters are below.
Since the sample does not show a normal distribution, it was decided that we should use the non-parametric Kruskal–Wallis test, in order to test whether the samples presented the same distribution, and identify if there were significant differences between clusters.
The results obtained with all the attributes determined that there were statistically significant differences between the distribution of each variable among the clusters. Therefore, the behavior of the attributes that define each cluster can be analyzed.
Table 5 shows the location in each cluster of the maximum and minimum average values for each variable—the empty cells indicate that there is no variable taking the highest or lowest values in that cluster.
Using the duration attribute as a guideline, the projects with the highest average duration are classified in cluster 1, whereas those with the lowest duration are classified in cluster 5. The same reasoning can be applied to the rest of the attributes.
5.4. Self-Organizing Map Analysis
The presence of similar behavior between projects is analyzed by the generation of clusters using Self-Organizing Maps (SOM), a technique introduced by Teuvo Kohonen [
56], which has been previously used to determine the success or failure of projects, based on determining the project characteristics of groups [
57].
The SOM model allows us to cluster a new project in a grid area and associate it with the trend of success or failure. However, it is necessary to characterize each cluster in order to know the suitability of a project according to the cluster to which it belongs.
In order to extend the capabilities of the study, starting from the results of the SOM, a graph is created using the U-matrix, in which the distance between the centroid of each neuron and its closest neighbors is represented. It can be read that the low values represent a high degree of similarity between neurons in that region.
Figure 4 consists of the SOM model U-matrix followed by thirteen charts, one for each variable. In the chart of the variables, each cell color represents the value taken by that variable in the centroid of the neuron. This rendering allows the comparison of one or several attributes through the grid.
The maps for Backers, Pledged, Comments and Updates take the highest values in the lower left corner, where there is also less similarity or greater distance between the centroids, as indicated by the U-matrix.
The decentralization of the Goal variable should be highlighted—taking high values in two different groups. The zone with higher Goal values is related to zones of high value in Max_RL and Range_RL and, although to a lesser extent, to zones of high value in Pledged_Backer and Levels. On the other hand, the area with not so high Goal values corresponds to high values of Backers, Updates, Pledged and Comments, which complement the importance of the Updates during the collection campaign, a trend observed by other authors [
29].
The attribute Rate_Pledge_Goal—or success quantity—although it is calculated from the information of Pledged, shows no relationship in the maps between them. This means that this new variable provides additional information to that initially provided by Pledged.
5.5. Cluster Success Characterization
After analyzing the cluster-attribute performance, the relationship between the different project categories and each cluster is considered. For this purpose,
Table 6 shows the percentage of successful projects in each cluster by category, with green indicating values over 50% and red indicating values below.
This success distribution provides valuable information for subsequent cluster characterization but does not consider the number of projects in each cell. In order to do this, indicators are defined to quantify the success distribution by category and cluster considering the overall category success.
Table 7 provides the total number of projects by category and cluster which, together with
Table 6, allows for a better understanding of the distribution of success.
At first, two indexes are defined, which allow for the monitoring of both successful and failed projects. SRI (Success Rate Index) assesses the number of successful projects in relation to those that have failed, whereas FRI (Fail Rate Index) assesses the number of failed projects with respect to those that have been successful. In both cases, they are calculated for each category and cluster.
Let
i denote the list of twelve project categories, and let
j denote the six clusters. S and F correspond to the number of successful projects and the number of failed projects, respectively.
In order to increase the significance of SRI and FRI, two other indexes are defined: SOA (Success ratio Over the Average) and FOA (Fail ratio Over the Average). High SOA values in a cluster indicate that a category has a success rate above the overall average; similarly, high FOA values indicate that a category has a failure rate above the overall average. They help us to understand whether a category is highly successful or not, depending on the total success or failure by category.
Let
denote average of success rate index, and
denote average of fail rate index in the
i category.
Table A1 in
Appendix A shows the distribution of the mean values obtained for SOA and FOA by category and cluster, and
Table A2 in
Appendix A shows the mean, median and standard deviation taken by the SRI, FRI, SOA and FOA indicators in each cluster. It should be noted that SOA and FOA are shown as percentages because they are relative values calculated according to SRI and FRI, respectively.
Figure 5 represents SOA indexes, where cluster 3 stands out for having a success rate above average in all project categories. The technology category stands out for having the highest values of SOA, surpassing the success rate of the cluster by more than four times in this particular category. With regard to the rest of the clusters, attention should be drawn to Dance in clusters 2 and 4, and Theater in cluster 2.
In contrast with the previous indicator, in the case of FOA in
Figure 6, clusters 1 and 5 stand out as the most unfavorable, especially for the Games category—which has the highest failure rate—as well as Fashion category in cluster 5.
6. Discussion
Six clusters have been identified by similarities in the characteristics of the attributes. They can be summarized in the two major groups identified in
Figure 3 as a success area. The greatest number of projects that achieve the success and the least number of failures are in that area.
Subsequently, relative indices have been calculated which make it possible to identify the extent of success and failure with respect to that achieved by the category average.
The strategy used to carry out the discussion of the paper consists of analyzing the attributes that define each one of the clusters, taking information from
Table 5, as well as the behavior of the SOA and FOA indicators in each one of the clusters, information that is taken from
Figure 5 and
Figure 6.
Merging these two information sources allows us to know the characteristics of each group, and which success or failure index would be expected by each category inside them.
It starts by defining the behavior of each cluster by the representative attributes in each one.
Observing the Pledged, Comments, Updates and Backers projections in the maps of
Figure 4, the similarities in the distribution of these variables are appreciated, since the highest values are grouped in the lower left corner of each map, which also corresponds to cluster 3 in
Figure 7. Cluster 3 is associated with a likely success of the project, since the SOA index is very positive for all categories, as well as no FOA index being positive.
This relationship underscores the importance of maintaining high values in these attributes in order to set the goal of success. Three of these attributes are classified as “Development Variables”, which mean that they can be modified during the development of the funding campaign to strengthen the project and induce success. For all these reasons, this is known as the “Sponsors Engaged”.
The characteristics of cluster 3 are indisputable when aiming to prioritize the success of projects, but not to maximize profits. Even for categories associated with the performing arts, where success is scattered, it is in cluster 3 where the highest SOA values are found.
As shown in
Table 5, cluster 2 contains the highest average collection values for Rate_Pledged_Goal, as well as the second highest average SRI value (2.47). Therefore, cluster 2 is renamed “Top Collections”. This is reflected in the fact that the FOA index never shows positive values, but only Dance and Theater categories have a slight activation of the SOA index, having a significant success with respect to the average of the category.
Cluster 4 has a success rate of 65.04%, which can be considered as an acceptable risk, keeping in mind the innovative nature of this type of projects. It has the third greatest average SOA value and the highest values of Pledged_Backer as shown in
Table 5. This is then referred to as “Warning”.
Most projects contained in clusters 1 and 5 fail, with success rates of around 40%. They also have the highest average FOA values, so it has been decided to highlight them as groups to be avoided. These two clusters have certain similarities, since they contain a similar number of projects and success rates of 40.13% and 39.77% respectively. Both clusters are not particularly suitable for hosting projects in the Games and Fashion categories, as both have the highest FOA values.
The projects located in cluster 1 have the highest average duration and the lowest Rate_Pledged_Goal values. Cluster 5 is characterized as having the lowest averages of Pledged, Comments, Updates, Backers and Duration. This situation is of a disadvantage to the Technology category, presenting the highest values of FOA. In order to differentiate the failure of these clusters, the duration attributes will be used, naming them “Wide Hole” and “Deep Hole” respectively.
Cluster 6 has a success rate lower than 48% as can be seen in
Table 4, as well as SOA values comparable to those obtained in Cluster 1 and 5. Although it contains projects with a high risk, its most representative feature is its high Goal, as shown in
Table 5. Thus, Cluster 6 has been renamed as “Epic Goal”.
After this discussion, we have found the attributes that define the success of each cluster. We suggest that these attributes are a consequence of the behavior of the cluster. For example, belonging to cluster 3 indicates that a project will have many followers, but there is no guarantee that the consequent will be able to deduce the antecedent—that is, the presence or absence of that attribute does not determine cluster membership.
7. Conclusions
Crowdfunding arises as a response to the problem of financing innovative projects in an environment of strong economic crisis, the paralysis of the growth of this model motivates the interest of this study towards a help tool that makes this model more sustainable. For do this, it was selected a sufficiently representative data set of projects known from other authors, and modern data mining techniques were applied. As a result, our conclusions may add knowledge to this means of acquiring resources in this very representative period of time.
The association of projects in clusters has been decisive in understanding how they work, by grouping and characterizing them by their particularities. An examination of the data set has revealed that it can be distributed over six different clusters. This assignment of projects to clusters allows any project to be subsequently incorporated into its corresponding cluster, making it easier for a creator to define a strategy or reorient a project in order to drive it to success, based on its position in the system.
Clusters known as “Top Collections” and “Sponsors Engaged” are the most suitable to host a project with the maximum potential for success, being characterized by a collection much higher than expected and a strong commitment between the Backers and project through good two-way communication. The name “Top collections” makes it easier to identify the cluster with the highest amount collected above the goal, and “Sponsors Engaged” identifies a cluster with projects more closely linked to communication with backers.
The cluster called “Warning” has a sufficient enough success rate to allow the creator to safely modify the characteristics of his project and improve its potential for success.
The “Deep Hole” and “Wide Hole” clusters are characterized by a very high failure rate. The first of them contains projects with the lowest values in certain basic attributes for the good development of crowdfunding—Comments and Updates, among others. The second cluster groups projects with the longest average duration. Both are considered as clusters to be avoided by any creator, which require the maximum effort to relocate them to the success zone. To facilitate the identification of the most inappropriate clusters to a creator, the designation “Hole” has been chosen. The “Deep hole” cluster contains projects with characteristics that are very difficult to modify to redirect the project to success, and the “Wide hole” cluster is understood as a cluster with a greater number of failed projects, though its characteristics allow a greater margin of modification than “Deep Hole”.
The relevance of having identified these six clusters and their attributes allows project managers to use a tool that facilitates the estimation of the economic and financial viability of the crowdfunding project being undertaken. In this way, resources and efforts can be dedicated to improving the quality and benefit ratios of the project.