1. Background
The traditional folk songs and music of various cultural groups worldwide are important vehicles for reflecting cultural characteristics. However, the rich diversity of the world’s music traditions has not been sufficiently appreciated in contemporary music information retrieval (MIR) studies. In contrast to the vast resources and literature that mainstream Western music culture possesses, the systematic presentation of traditional Chinese folk songs in international academic and educational circles is fragmented, lagging behind the overall development of the music informatics community, which has also created difficulties for the education in this area. We take a step forward in the computational musicology of the traditional Chinese anhemitonic pentatonic folk songs, intending to provide a reference for the cultural variety of the MIR study.
Most scholars use the general word “Chinese” to refer to the Han Chinese people, China’s largest cultural group. However, there is considerable linguistic, custom, and social diversity among the subgroups of the Han, primarily due to historical events, geographical conditions, and assimilation of various regional ethnicities and tribes [
1]. Traditional Chinese folk songs, represented by the Han Chinese songs, were created by local people’s improvisation and passed on from one generation to the next orally [
2], having a significant influence on the development of other forms of traditional music, including the traditional dance music, opera, instrumental music, and Quyi (story-telling and story-singing) art [
3].
1.1. Movable Do Solfège and the Chinese Anhemitonic Pentatonic Scales/Modes
In order to make the traditional Chinese music notation correspond to the Western music notation, the numbered musical notation became the most common for transcribing Chinese folk music in modern times. The melody is sung with do, re, mi, fa, sol, la, and si (In some regions, such as North America, it is ti.). Traditional Chinese folk songs were usually recorded with the movable do solfège [
4] or tonic sol-fa, invented in the nineteenth century by
Sarah Ann Glover. Regardless of the piece’s key, each note’s solfège is the same as when the song was in C major/A minor. For a more intuitive example, the melody “G–A–B–C–D” in a G major song is sung alphabetically or as “sol–la–si/ti–do–re” in the fixed do solfège, but in the movable do solfège, it is sung as “do–re–mi–fa–sol” because that is how the phrase is transposed to C major.
As one of the central regions where the anhemitonic pentatonic scale is used, traditional Chinese music, especially Han Chinese folk songs, frequently uses do, re, mi, sol, and la, known in ancient Chinese as Gong, Shang, Jue, Zhi, and Yu. On the piano, they can be performed entirely in the black keys in F# major. Chinese cultural tradition has always considered pentatonic scales as modes, complete with a modal tonic and a classification of songs and instrumental pieces by way of the five types [
5]. By altering which note is considered the tonic within the same collection of pitches (do, re, mi, sol, and la), one can rotate through all possible five modes of the anhemitonic pentatonic collection, the Gong/Shang/Jue/Zhi/Yu mode, which corresponds to the Raga Bhupali/Megh/Malkauns/Durga/Dhani Carnatic in the Indian pentatonic scale from the perspective of modal pitch orders. Modes in Chinese and Indian music theory, incorporating more than pitch orders, are also associated with certain daytimes, performance practices, emotional situations, and even weather conditions, which is beyond this work’s research.
Fa and si in the movable do solfège are occasionally used in traditional Chinese anhemitonic pentatonic folk songs, only as ornaments or transitions, called Qing Jue/Bian Jue and Bian Gong. Other semitones with # or ♭ in the movable do solfège are extremely rare in most traditional Chinese anhemitonic pentatonic folk songs, among which fa# and si♭ are called Bian Zhi and Run, respectively.
1.2. Pitch Histograms, Merged Pitch Histograms, and Their Anhemitonic Pentatonic Variants
Pitch distributions have been used in computational music study since the 1960s and into the 1980s in various fields such as mathematical music structure analysis and computer-assisted generative composition [
6,
7]. Subsequently, the concept of the pitch histogram, describing the pitch content of notes in music pieces, was systematically introduced in [
8]. In [
9], the authors summarize various variants of the pitch histogram and an extensive list of features extracted from them. MIR studies using pitch histograms emerge from time to time [
10,
11,
12,
13].
Each note has not only pitch information but also duration characteristics. Therefore, rhythmic value histograms [
9], beat histograms [
14,
15,
16,
17,
18], and (normalized) duration histograms [
10] were created and play roles in MIR to describe the temporal information on a song’s melody.
Our study combines temporal information (pitch continuity) with pitch information. Therefore, in addition to applying the basic pitch histogram [
8], we also utilized the recently proposed merged pitch histogram [
19] to visualize songs’ main melody statistics. As a variant of the basic pitch histogram, the merged pitch histogram joins successive identical pitches to be counted only once, one of the alternatives regarding how to “count” pitches, which has been debated and often noted in some computational musicology works but was systematically summarized and named for the first time in [
19]. For our research context, firstly, the merged pitch histogram downplays the continuous repetition of pitches due to splitting words in songs of polysyllabic languages, making them more applicable to be analyzed together with songs of monosyllabic languages (e.g., Mandarin). It is helpful as we will conduct in
Section 4 a classification study of traditional Chinese anhemitonic pentatonic folk songs together with songs of other languages around the world. Secondly, some interesting phenomena in Chinese folk songs, such as the characteristic onomatopoeia, use successive identical pitches, for example, the “dong dong dong” simulating a rattle-drum. Merging the successive identical pitches improves the statistical pitch analysis, which will be explained in
Section 2.2.
This study uses notation-based symbolic pitch histograms, and all histograms plotted are not normalized to facilitate the comparison between each pitch’s merged and original bin heights. The non-normalized visualization has no impact on the histograms’ shapes, the observation analysis, and the feature design in this study.
Because of the characteristics of Chinese anhemitonic pentatonic music introduced above, we will generate movable do solfège-based basic and merged pitch histograms by replacing the pitch letters (A–G) or frequencies in the traditional pitch histograms with the movable do solfège (do–si), as shown on the horizontal axis ticks in
Figure 1. In addition, we will also apply the anhemitonic pentatonic pitch histograms that retain only the pitches fitting the pentatonic scale, which also has its merged version, the merged anhemitonic pentatonic pitch histogram (see
Section 2.2).
1.3. Related Research on the Statistical Visualization of Western Folk Songs and Comparative Study
Prior literature displayed folk song melodies’ properties through statistical visualization, represented by [
20], where, interestingly, the curves depicted by the authors bear similarities to our study, referred to as “melodic arch.” However, the two studies’ objects (Western versus Chinese folk songs) and mathematical models (aggregate phase contours versus pitch histograms) are entirely different.
In terms of promoting cultural diversity in computational musicology, ref. [
21] compared the interval size and phrase position of German and Chinese folk songs, in which the author of [
20] also participated. Still, our study investigates a different object (traditional Chinese anhemitonic pentatonic folk songs) through a different approach (pitch distribution).
1.4. Bell Shapes and the Degree of Bell Shape (DoBell)
In this work, a monotonic rise to a peak followed by a monotonic fall on a histogram (see the envelopes in
Figure 1) is depicted by the term
bell shape. We do not name such curves as “normal distributions” because of their asymmetry and non-smoothness. Being more appropriate than other names that simulate this shape visually, such as pyramid, triangle, and arch, “bell” has the symbolism of sound, suggesting musicological research.
Monotonic trends can be tested by applying the Mann–Kendall test [
22,
23,
24], which is, however, unnecessarily complicated for this study. We propose a novel feature with a simple implementation, Degree of Bell Shape (
), manifesting the degree that a histogram approximates a bell shape with a value between 0 and 1.
Let
denote the pitch of a song
s’s highest bin in its histogram with zero bins removed,
corresponds to each pitch
p’s bin height, and
and
represent
s’s lowest and highest pitches. We define
p’s monotonic trend (MT) as
with zero bins removed and the Degree of Bell Shape of song
s as
The of a V-shaped distribution generates a value near 0.5, while 1 indicates a perfect bell shape, where adjacent equivalent values do not destroy the monotonicity. In this article, a histogram is announced conforming to the bell shape only in the perfect case that its DoBell is precisely equal to 1.
The pronunciation of the abbreviation is reminiscent of the bell-shaped pattern drawn on a doorbell, and the “Do” comes with a prompt for the movable do Solfège.
2. The Pitch Histogram of Traditional Chinese Anhemitonic Pentatonic Folk Songs
Jasmine Flower (Mo Li Hua) should be recognized as the most representative and popular traditional Chinese folk song globally. Both versions of
Jasmine Flower from the Jiangsu province are lyrical and gentle [
2]. They have different melodies, lyrics, and characteristics but are both in the blues minor scale/Zhi mode (ending in sol). The version starting with the lyrics “Hao yi duo mei li de mo li hua” (what a beautiful jasmine flower) was adapted in
Puccini ’s opera
Turandot and thus became widely well known.
Figure 1 visualizes the basic and merged pitch histograms of
Jasmine Flower ’s both versions, revealing their bell-shaped envelopes significantly (
). Inspired by these two histograms of
Jasmine Flower, we analyzed more Chinese anhemitonic pentatonic folk songs.
2.1. Data Preparation
We emphasize that the songs of interest for this study need to meet the following prerequisites:
- 1.
(Truly) traditional. The traditional Chinese folk songs in this study are created by local people’s improvisation and handed down orally among the people, for which the original composers and lyricists are generally unidentifiable. Many folk songs and music composed in modern times, even if they conform to the pentatonic modes, apply more musicological techniques and compositional means during the creation, contrary to traditionality.
- 2.
Chinese anhemitonic pentatonic. It does not mean that non-anhemitonic-pentatonic notes cannot be used. Researchers with relevant knowledge can quickly distinguish aurally or humming-wise whether a piece is Chinese pentatonic, even if it uses the infrequent fa, si, and other semitones.
Obtaining more songs’ data is a challenge. As mentioned in
Section 1, traditional Chinese music has not been extensively studied in music informatics, let alone a systematic compilation of traditional, anhemitonic pentatonic folk songs.
The
Essen Folksong Collection [
25], developed in the early 1980s for folk music research, contains 1161 songs in its “Han” category. However, a large number could not be used directly in this study. A brief look at the title and an explorative humming and listening experience reveals that the "Han" category in EFC contains:
Many folk songs composed in the modern era instead of the improvisation and oral transmission, contradicting the first prerequisite above. There are hundreds of songs with obvious time backgrounds and compositional intent, and they are generally regarded as “revolutionary songs" nowadays and labeled “new folk songs” in the EFC. For instance, over 50 songs with Hongjun, Gongnong, Balujun, Canjun, Jiefang, Suqu, Mao Zedong, or Zhouzongli in their titles, apparently formed in a specific temporal context, are unlikely to come from oral folkloric transmission from ancient times.
Many non-pentatonic songs reflecting the characteristics of other cultural groups or a cultural mixture, contradicting the second prerequisite above.
Some score errors, such as the 19th note of the first song in alphabetical order, Ai Erwa (an octave higher).
Those songs involved in the first two cases above can undoubtedly be widely used in general studies of Chinese music but are inapplicable to this study.
A typical example is Caoyuan Qingge (The Grassland Love Song) in EFC’s “Han” category. Besides this original title, the song is widely known nowadays as In That Place Wholly Faraway (Zai Na Yaoyuan De Difang). The song’s creation background is debated, but there is no doubt that it was composed/adapted by Wang, Luobin in 1939/1940 based mainly on Kazakh melodies. One prevailing theory is that it also draws on Tibetan and Uyghur melodies. In short, it should not be in the "Han" category, not to mention the traditional and the Chinese anhemitonic pentatonic. Besides, this song was created by the composer with extensive musical knowledge, again departing from the subject of our study.
In an initial confirmation of the first 26 songs/2.24% in EFC’s alphabetical “Han” category, we found that:
Three songs/11.54% contain obvious score errors: Ai Erwa (the 19th note being an octave higher), Baogumiao Xiangbadao (missing ♮s), and Bawang Haozi (wrongly using the Gong mode);
Two songs/7.69% belong to other or mixed cultures: Alima (Salar) and Suwugeng (Hua’er);
Eight songs/30.77% are not truly traditional (“revolutionary”), composed, adapted, or reworked;
Three songs/11.54% have linking errors.
We are working on a comprehensive errata and information replenishment to the EFC for the benefit of other users.
In short, even if the EFC is filtered for the two prerequisites of the songs we studied (truly traditional and Chinese anhemitonic pentatonic), only a few dozen songs are expected to be sifted out. In addition, surprisingly, many traditional Chinese folk songs familiar to Chinese people, such as those included in [
26], do not appear in the EFC. The EFC’s collection, which Western scholars dominated, was probably more about the fieldwork of digging. Therefore, it is not convincing to use the songs in the EFC as representatives of most favorite traditional Chinese folk songs for general music education and introduction.
Moving from the assumption of the “decremental” approach to the “incremental” collection, we built up a song collection straightforwardly and reliably. We referred to a collection of 100 most well-known classic Chinese folk song scores [
26] and digitally documented the scores of all 42 songs from it which meet the prerequisites 1 and 2, in a movable do solfège-based
sequence. The other 58 songs not included in our collection are:
Originated from other cultural groups that do not use anhemitonic pentatonic scales, such as the Uyghurs; or
Revolutionary songs and new folk songs with obvious compositional backgrounds; or
From Northern Shaanxi, where the songs, usually also labeled as Chinese/Han Chinese, are subject to mixed culture and often carry the impact of a revolutionary adaptation [
27].
In addition, we have added 11 songs based on senior folk musicians’ input, including second, widely sung versions of the same titles for 4 of the 42 songs above mentioned and 7 folk songs often performed in concerts, audio/video products, and on folk stages. We plan to open our collection of the 53 songs’ movable do solfège-based sequences and metadata as a public library after refinement since, to our best knowledge, it should be the only comprehensive digital symbolic collection to date containing the most representative traditional Chinese anhemitonic pentatonic folk songs that Chinese people from all regions well recognize.
2.2. Illustrative and Quantitative Analysis
Figure 1,
Figure 2,
Figure 3,
Figure 4,
Figure 5 and
Figure 6 demonstrate different types of pitch histograms of all 53 songs from our collection of most representative Chinese anhemitonic pentatonic folk songs. Being from 23 provinces of China (Anhui, Fujian, Gansu, Guangdong, Guangxi, Guizhou, Hebei, Heilongjiang, Henan, Hubei, Hunan, Inner Mongolia, Jiangsu, Jiangxi, Jilin, Liaoning, Qinghai, Shandong, Shanghai, Shanxi, Sichuan, Yunnan, and Zhejiang), the 53 pieces in
Figure 1,
Figure 2,
Figure 3,
Figure 4,
Figure 5 and
Figure 6 cover all five Chinese pentatonic modes:
In
Figure 1 and
Figure 2, all 25 songs’ both basic and merged pitch histograms originally fit the bell shape perfectly (
).
In
Figure 3, the histograms of the 11 songs are attributed to the bell (
) shape after the bins of the pitches si
, fa
, and si
, which infrequently occur in the Chinese pentatonic songs, are removed. We call histograms that retain only anhemitonic pentatonic pitches
anhemitonic pentatonic pitch histograms, which also have their merged versions,
merged anhemitonic pentatonic pitch histograms.
In
Figure 4, each of the 11 songs’ basic pitch histograms contains individual bins that violate the bell shape. Therefore, their
values are less than 1. Applying merged pitch histograms to them will yield perfect bell curves (
), apparently. For example, the lowest pitch of
Figure 4(2) is a significant exception, as it is interesting to note that the “do do do” is frequently repeated at phrase ends of this song to express an upbeat rhythm, causing this pitch to have a higher count number. Employing the merged pitch histogram,
Figure 4(2)’s pitch distribution corresponds well to the bell shape.
In
Figure 5, evidently, the two songs’ perfect bell shape (
) appears on their merged anhemitonic pentatonic pitch histograms.
The pitch histograms of four songs in the 53-song collection deviate from the bell shape (
), and applying various histogram variants to them does not improve or reduce the bell-shapedness (see
Figure 6). Notably, they are all from Shanxi or Sichuan.
The basic pitch histogram of 25/53 (47.17%) songs innately fits the bell shape (). The ideal bell-shapedness is reflected in 49/53 (92.45%) songs after retaining only anhemitonic pentatonic pitches or using the merged pitch histogram to circumvent some high counts due to the unique characteristics of the songs. Only 4/53 (7.55%) of the songs have a pitch distribution differing from the perfect bell shape ().
2.3. Pearson’s Test
Another reference corpus of 53 songs that are not traditional Chinese anhemitonic pentatonic (see
Section 4.2 and
Table A2) is added to the 53 songs presented in
Section 2.2 for a Pearson
test.
We hypothesized whether a song is a traditional Chinese anhemitonic pentatonic folk song or not is independent of whether its pitch histogram is in a perfect bell shape (
) and set a significance level of 5% for the Pearson’s
test [
28]. Using the statistics in
Table A1 and
Table A2, we calculated a
value of 24.41 and a
p-value of 0.0000037. Therefore, it is reasonable to reject the original hypothesis, clarifying whether a song is a traditional Chinese anhemitonic pentatonic folk song is related to whether it has a perfect bell-shaped pitch histogram (
). They are not independent of each other.
The test applied to the merged pitch histogram results in a value of 26.10 and a p-value of 0.00000032, inferring whether a song is a traditional Chinese anhemitonic pentatonic folk song is related to whether it has a perfect bell-shaped merged pitch histogram ().
3. Explanation, Comparison, Discussion, and Extension
Pitch histograms allow a visually resonant introduction and demonstration of traditional Chinese anhemitonic pentatonic folk songs. Although this study is grounded in a music informatics perspective, we attempt to provide some brief cultural clues and references that may be relevant to the observations.
From an cultural perspective, the bell shape described in
Section 2 reflects the Chinese cultural way of
Zhongyong [
29] (the doctrine of the mean). In [
30], the author described Zhongyong as a value of “mean is beautiful” for the purpose of harmony. The basic meaning and spirit of Zhongyong thinking is mastering the extremes but deploying the mean [
31]. The bell pattern of “the closer to the middle the pitch, the more it is sung” is an excellent expression of the cultural background of Zhongyong in folk songs. This phenomenon is also in line with the fact that traditional Chinese folk songs’ folkloric transmission from one generation to the next makes each song more and more “moderate” (Zhongyong), and in turn, the moderate melodies promote the broader dissemination of the songs because of the enhanced ease of singability, a two-way process of mutual improvement. The movable do Solfège for oral or simple written recording also facilitates such a transmission process.
Naturally, a song’s relatively low and high pitches should infrequently appear during the whole song, and the middle part pitches are the vast majority. However, folk songs of cultural groups other than Chinese rarely display perfect bell shapes (
), concluded by the visualization of dozens of well-known Gregorian, chromatic, or pentatonic traditional folk songs from more than 30 different countries, cultural groups, and languages, some examples of which are conveyed in
Figure 7. Only three traditional Gregorian songs among the research materials we grasped inherently have a histogram of bell-shaped patterns, i.e.,
(see
Figure 8). It is particularly worth pointing out that the pitch histogram of
Figure 7(1), a famous Scottish song in the anhemitonic pentatonic scale, appears not to be bell-shaped (
), and becomes even less bell-shaped (
) after applying the merged pitch histogram, quite different from the Chinese anhemitonic pentatonic folk songs, where the merged pitch histogram never worsens the bell-shapedness.
It is worth mentioning that the three folk songs from Asia,
Figure 7(4–6), show a bell shape (
) in their merged pitch distributions after the fa and si bins are removed, which is merely coincidental. Considering that these songs are not anhemitonic pentatonic songs per se, there is no reason to remove the fa and si bins for analysis. For example,
Figure 7(6) is one of the most representative Japanese hemitonic
in scale (Sakura pentatonic scale) songs, where fa and si are two of the essential pitches. Such observation reminds us that it may exist misjudgment to predict whether a song is a traditional Chinese anhemitonic pentatonic folk song simply by removing fa, si, and other semitones to construct an anhemitonic pentatonic histogram. In other song types, frequently-used pitches may be removed during creating the anhemitonic pentatonic histogram, which makes the pitch distribution more bell-like. Therefore, “a traditional Chinese anhemitonic pentatonic folk song” is an approximate sufficient rather than a necessary condition for “a bell-shaped anhemitonic pentatonic pitch histogram”. In short, in the context of this study, we do not conclude that “traditional folk songs worldwide that do not use the Chinese anhemitonic pentatonic scale are improbable to have bell-shaped pitch histograms”. The unique characteristics of each country’s folk songs need to be explored by researchers who grasp the cultures and materials involved.
Another interesting observation worth discussing is that anhemitonic pentatonic folk songs from cultural groups other than Chinese (more precisely, Han Chinese) may also have a perfect or approximate bell-shaped pitch histogram. For example, some cultural groups in East Asia, such as Korean and Japanese, have their own unique folk song systems [
32,
33], like the Japanese hemitonic
in scale. There are also Korean and Japanese folk songs using the anhemitonic pentatonic scale (called
yo scale in Japan), some of which fit the bell shape ideally or to a great extent. Historical interactions between the neighboring countries and peoples might be a reason; however, it is highly controversial whether the pentatonic scales present in different cultures originated independently or through cultural exchange, which is not the topic of this work. Several examples are provided in
Figure 9, including
Arirang,
Doraji Taryeong, and
Soran Bushi, the representative pieces of traditional Korean and Japanese folk songs. The repetition of the pitch la
in
Figure 9(3), exhibiting the rhythmic sense of the fishermen’s net-pulling work songs (“Dokkoisho”), is eliminated in the merged pitch histogram to produce a bell shape, similar to
Figure 4(2). Some folk songs of not-Han cultural groups within China are also anhemitonic pentatonic. For instance, the basic pitch histogram of the Oroqen folk song in
Figure 9(4) conforms to the bell shape, i.e.,
. Some traditional Japanese and Korean anhemitonic pentatonic songs do not perfectly fit the bell-shaped pattern, as
Figure 10 exemplifies. However, as stated at the end of the previous paragraph, a comprehensive analysis of Japanese and Korean pentatonic songs is not the subject of this paper.
4. Initial Attempts at Lightweight Machine Learning
We discovered that many traditional Chinese anhemitonic pentatonic songs widely recognized by Chinese people in all regions today have a bell-shaped pitch distribution in different types of pitch histograms, which is the major contribution of this study and can provide material for the visualization of music education and demonstration. Taking it a step forward, we try to feature and model the findings preliminary to provide references and promote cultural diversity for the MIR study.
4.1. Features
The bell shape is close to the normal distribution in terms of appearance. Unfortunately, the features used to portray the state of a normal distribution, such as skewness and kurtosis, are not applicable here since they are not used to determine whether a distribution is normal-like/bell-shaped in appearance or not.
Besides our newly proposed feature for inscribing bell shape, Degree of Bell Shape (
DoBell) (see
Section 1.4), another appropriate classic feature we found and validated is negative turning (
), which we utilized the
Time Series Feature Extraction Library (TSFEL) [
34] to compute. Treating the pitch histogram as a time series,
counts its number of violations during monotonic rise and fall, where adjacent equivalent values do not break the monotonicity. We found that
is ideally equal to the occurrence number of the local peaks throughout the histogram besides the highest peak. For example, in
Figure 7(8),
equals 2, while
equals 1. A perfect bell shape’s
equals 0. The
s for both basic and merged pitch histograms are usually small integers, most possibly between 0 and 3. Some songs with wide pitch ranges have relatively large
s, such as
You Raise Me Up (both
s are 7).
It is important to note that before calculating the and the features, the bins with zero height in all pitch histograms are removed.
Applying the
and the
features to the pitch histogram and the merged pitch histogram produces four different features. It can be learned from
Section 2.2 that for a Chinese anhemitonic pentatonic folk song where non-anhemitonic-pentatonic pitches occur, the anhemitonic pentatonic pitch histogram may allow better generation of the bell shape. Therefore, we also apply the
and the
to the anhemitonic pentatonic pitch histogram and the merged anhemitonic pentatonic pitch histogram to generate another four features. For instance, in
Figure 3(1), after removing the bins of si
and si
, the
s of the basic/merged anhemitonic pentatonic pitch histograms drop from 2 to 0, and the
s rise from 0.78 to 1.
In the subsequent classification experiments, we generate eight types of features by taking various combinations of one from each of the three groups (, ), (all pitches, anhemitonic pentatonic pitches), and (pitch histogram, merged pitch histogram):
;
;
;
;
;
;
;
.
4.2. Dataset for Pilot Classification Experiments
We used our 53-song collection mentioned in
Section 2.1, which includes almost all the most representative traditional Chinese anhemitonic pentatonic folk songs well recognized by Chinese People from all regions nowadays.
Considering the data balance of binary classification, we applied a whole set of 106 songs in total. In addition to the 53 songs mentioned above, the other half consists of songs from the following types:
Representative traditional fork songs from various cultures of 30 countries/languages.
Folk songs adapted from major works, like symphonies, operas, musicals, and films.
Modern, popular songs in Mandarin.
Chinese new folk songs composed by musicians.
Table A1 and
Table A2 in
Appendix A exhibit all 106 songs’ information. We believe that the amount of data, together with cross-validation, is sufficient for preliminary experimental validation to confirm whether the
and the
features are practical and whether the non-deep machine learning using only succinct pitch statistics performs well in identifying traditional Chinese anhemitonic pentatonic folk songs.
To generate correct basic/merged anhemitonic pentatonic pitch histograms, the songs in the “not traditional Chinese anhemitonic pentatonic folk songs” set were also histogrammed using movable do Solfège, sixteen of which are illustrated in
Figure 7 and
Figure 8. Many of them do not have an original fixed key per se.
4.3. Classifiers
In this work, we explored machine learning preliminarily on the basis of the facts discovered, and more applications to MIR will be future work. Given this, we trained three classifiers, the Gaussian Naive Bayes (GaussianNB), the
k-Nearest Neighbors (
k-NN), and the Support Vector Machines (SVM), with various combinations of the eight features introduced in
Section 4.1, to binarily classify whether the input is a traditional Chinese anhemitonic pentatonic folk song. The validity of these classifiers was confirmed in the previous work orienting European folk songs [
35].
We applied the
scikit-learn packages [
36] of the classifiers and all default settings:
-values are naturally in the
interval due to the denominator in Equation (
2), while
-values were normalized beforehand to be applicable in conjunction with
, which is particularly important for the
k-NN classifier.
4.4. Experimental Results
For each classifier, we performed ten-fold cross-validation with pseudo-random, where each fold was trained using 90% of the data and tested with the rest 10%.
Table 1 lists the average cross-validation accuracy of the binary classification for whether a song is traditional Chinese anhemitonic pentatonic folk, obtained by applying different combinations of features to the three classifiers, where the last two rows give the best leave-one-out cross-validation results with their corresponding feature combinations as references.
Overall, all the classifiers show their ability to discriminate the traditional Chinese anhemitonic pentatonic folk songs. Note that the k-NN results are significantly affected by k, and the value of 49 shows superiority in the ten-fold cross-validation, while the value of 3 leads the results in the leave-one-out cross-validation.
From the feature combination perspective, applying the complete feature set produced satisfactory results but did not reach the optimal performance on the k-NN classifier. The features work viably on their own, even the top on the SVM classifier. The feature set outperforms other combinations on the k-NN classifier.
Features extracted for both types of merged pitch histograms generally perform better than those of the not-merged histograms. However, the way merged pitch histograms help Chinese pentatonic folk songs enclose better bell-shaped envelopes incidentally benefits other songs’ shapes, as
Table A2 implies. For example, the merged pitch histogram in
Figure 7(12) promotes the distribution to a bell shape.
Anhemitonic pentatonic pitch histograms are not necessarily better than the original histograms and even work worse in the merged case. We witnessed that the anhemitonic pentatonic pitch histogram, like the merged histogram, may unexpectedly improve other songs’ bell-shapedness while advantaging the bell shape of the Chinese pentatonic folk songs, as exemplified in
Figure 7(4–6) and explained in
Section 3.
The fact that the merged operation and the anhemitonic-pentatonic operation make some non-anhemitonic-pentatonic songs challenging to distinguish is also reflected by the confusion matrices in
Figure 11, where true positives outperform true negatives overall. The uncomplicated classifiers trained with our designed features perform well in correctly labeling the traditional Chinese anhemitonic pentatonic songs but are likely to misidentify some other types of songs.
5. Conclusions
We discovered that the pitch histogram of many representative traditional Chinese anhemitonic pentatonic folk songs known well by Chinese people from all regions nowadays naturally obeys a bell-like shape, which well reflects the Chinese characteristics of Zhongyong (the doctrine of the mean) in oral transmission.
Subsequently, we suggested that the recently proposed merged pitch histogram and the anhemitonic pentatonic pitch histograms put forward in this paper can reasonably eliminate the exceptions in the pitch histograms to endow many traditional Chinese anhemitonic pentatonic folk songs with perfect bell shapes, confirmed by the statistical visualization of numerous examples. Pearson’s test validated whether a song is a traditional Chinese anhemitonic pentatonic folk song is related to whether it has a perfect bell-shaped basic/merged pitch histogram.
As the main contribution of this article, our findings about the bell-shaped pitch distribution of traditional Chinese anhemitonic pentatonic folk songs can provide visual material for music education and demonstration. Since we did not extend to more extensive, less widely-circulated folk songs, we do not make a broad inference on entire traditional Chinese folk songs. Furthermore, in order to provide a reference for machine learning studies in the field of MIR, we also made preliminary attempts to feature and model the observation. We apply the classic negative turning feature and devise a new feature, Degree of Bell Shape (), to measure whether the histogram approximates bell-shaped.
Finally, we implemented three classifiers based on the different histograms and features. Preliminary experiments revealed that the classifiers have a discriminative capability on traditional Chinese anhemitonic pentatonic folk songs, which states that lightweight machine learning applying only pitch statistics can progress cultural diversity in MIR.
This study of the pentatonic scale and the traditional songs can evoke more research on the folk songs of other cultures and the unique scales they use. The different types of histograms and features we proposed or applied can serve as a reference for MIR studies related to cultural variety and ferment the design of more variants. More features that can be applied to pitch histograms, such as those summarized in [
9], should be examined to find whether they are applicable to judging bell-shapedness or directly identifying traditional Chinese anhemitonic pentatonic folk songs. Moving from overall pitch statistics to segment-based features, many topics, such as subsequence search [
37], deserve investigation.
It would also be promising to examine the peculiarities of other cultures’ anhemitonic pentatonic folk songs to distinguish them from Chinese. For example, traditional Korean folk songs often use the triple meter, which is rare in traditional Chinese folk songs. Moreover, in other regions that systematically use the anhemitonic pentatonic scale, such as India and Scotland, whether their folk songs have similar or unique pitch characteristics needs to be studied by researchers who grasp the cultures and materials (e.g., [
38]) involved.