1. Introduction
As a special social activity group flow, tourist flow, which refers to the movement of tourists within a region based on approximate tourism demands, reflects the relationship between the source and destination of tourists as well as other flows in tourism behaviors, such as information flow and material flow. Therefore, research on tourist flow has always been a crucial topic in tourism geography [
1]. At the same time, with the growing presence of tourism elements in urban agglomerations, the city has gradually evolved into vital nodes and spatial conduits for tourist flows. Numerous scholars have conducted research on tourist flow for a long time, providing the foundation for improving urban tourism. This approach enables researchers and managers to gain a deeper understanding of the current state of tourism development, which can further have significant implications for regional coordination and high-quality development.
The research on tourist flow in Western tourism geography started in the 1960s and has been carried out mainly considering the spatial patterns [
2,
3], influencing factors [
4], and forecast of the flow [
5,
6]. Research on tourist flows in China stared later, in the 1980s. It mainly focused on the spatial distribution [
7], system research [
8], and temporal evolution patterns [
9]. In the early stage, the relevant research was relatively scarce, which mainly centered around basic concepts like tourism flow quantity and quality. Through the interdisciplinary application of various theories and methodologies such as econometric statistics, GIS analysis, regional economics, and physics, research on tourist flows has achieved significant advancements and established a relatively comprehensive research framework. It primarily encompasses the spatial structure of tourist flows, evolution mechanisms, influencing factors, flow effects and integration with other social hotspots. The spatial structure and spatio-temporal evolution of tourist flows are the fundamental aspects of tourism flow research. Yang et al. used the Zipf and variance indices to analyze the spatial structure and variance characteristics of tourist flows in Sichuan Province [
10]. Li et al. assessed the spatial–temporal dynamic evolution of inbound tourist flows in Chinese tourist hotspots from 1998 to 2008 using a gravity model [
11]. Mou et al. proposed a novel research framework for the space distribution and changes of tourist flow and study the structure of tourist flow in Shanghai [
12]. Using hotspot analysis and kernel density estimation methods, Scholz et al. explored the temporal variations in the spatial distribution of tourist flows extracted from Twitter data [
13]. As research on tourist flow spatial structure grows, some researchers have begun to focus on the factors affecting tourist flows. Taking the Beijing–Shanghai high-speed railway line as an example, Wang et al. explored the influencing factors of transportation on the spatial structure of tourist flows [
14]. Chen et al. explored the factors influencing the structure of tourist flows using linear weighted regression methods [
15]. In terms of impact effects, María et al. introduced an augmented gravity model to probe whether tourist traffic affects international trade [
16]. Yun et al. analyzed the effect of tourist flow on province-scale food resource spatial allocation in China [
17].
The research of the structure of tourist flows is the basis for other tourist flow studies and is of great importance for the spatial study of tourism. However, with the ongoing expansion of tourist destinations, the structure of tourist flows tends to become progressively more complex and networked. Therefore, a different perspective from existing ones is needed to elucidate these inherent network properties within the tourism phenomenon. Since the 21st century, social network analysis (SNA) has been introduced into tourist flow research by both domestic and international researchers. SNA is a method grounded in social network theory, used to study the complex interactions among individuals [
18]. With this method, researchers can explore the tourist flow network between different destinations [
19]. Wu et al. used SNA to examine the structural characteristics of the inbound tourist flow network between Beijing and Shanghai and explore its relationship with the aviation network [
20,
21]. Yan et al. conducted a comprehensive investigation into the spatial network characteristics of tourist flows in Luoyang [
22] and further delved into the dynamic mechanism of city domestic tourist flows space base on this research. Zeng et al. identified five distinct flow network patterns of Chinese tourists in Japan from a SNA perspective [
23]. Yu et al. chose SNA to learn the network structure and probe into the features of tourist flow network characteristics in Guilin [
24]. However, these previous studies demonstrate a gap where the spatial differences and changes in the networks are somewhat unclear due to the weakness of SNA for deep spatial analysis. In these studies, GIS analysis methods are often confined to surface-level representations, with limited use of GIS-related analysis modules [
25,
26]. Therefore, there is a need to integrate appropriate geographic approaches with SNA to better delineate spatial variations in tourist flows.
In terms of research data, with the rapid development of digital technology, tourism digital footprint data was used to accurately and quantitatively reflect the structure and spatio-temporal evolution of tourist flow networks. Mou et al., collecting the online travelogues from 2012 to 2018, analyzed the spatial network patterns of tourist flows in Qingdao [
27]. Li et al. incorporated Weibo check-in data during the National Day holiday to study the network structure and spatial distribution of tourist flow in China [
28]. Wang et al., using cellular signaling data and SNA method, compared the residents and non-local tourist flow networks in Nanjing [
29]. Online travelogues, a type of tourism digital footprint data, have been widely used by researchers due to the wealth of information they contain [
30,
31]. However, the increasing volume of data has highlighted certain limitations of online travelogue data, such as a lack of structure and high redundancy, which researchers must process [
32]. In existing domestic research, this process is mainly limited to manual identification [
33,
34], which is time-consuming and labor-intensive. Some studies rely on manually determining keyword matches for identification [
22,
26] but encounter issues such as imperfect rules and incomplete information extraction. Hence, there is an urgent need to explore a more efficient and accurate method to deal with online travelogues. Named entity recognition (NER) models based on deep learning (DL), which can rapidly and accurately extract specific named entities from massive unstructured texts, have been introduced in geography. Wang et al. proposed a Neuro-net Toponym Recognition Model for extracting locations from social media messages and then conducted comparisons with different models to confirm the superiority of deep learning [
35]. Zhang et al. proposed a RoBERTa-BiLSTM-CRF model to extract tourist attractions from tourism notes, which achieved an F1 value of 0.7141 [
36]. However, progress has been relatively slow in the application of deep learning for the processing and analysis of data in the network structure of tourist flow research.
In terms of research scale, the research on the network structure of tourist flows can be broadly divided into national-scale, regional-scale, urban-scale and individual scenic spots. Shi et al. took 21APEC countries (regions) as the research area and analyzed the APEC tourism flow network from 2008 to 2018 [
37]. Peng et al. focused on the tourist flow networks from a cross-provincial boundary perspective and explored the distribution and influencing factors of them in the Lugu Lake areas [
38]. Wang et al. compared the evolution characteristics of the tourism flow network structure between the whole region and the central urban area of Wuhan [
39]. In general, studies at different scales are relatively comprehensive. However, an absence is noticeable on the urban scale, as much of the spatial network structure analysis of tourist flows is directly based on the tourist attractions extracted from online travelogues, rather than place them at the county level. This approach neglects a broader county-level view, leading to potential exclusion of certain areas from the tourist flow network of the city. This limitation largely arises owing to the extraction of tourist nodes, where less visited nodal points are typically omitted in order to enhance the clarity of social network analysis. These neglected nodes are often scattered on the periphery of core tourism regions, whereas the retained visited tourist nodes are mainly clustered in particular core areas. For instance, in Hangzhou, well-attended sites—which show strong results in SNA—are mainly located in Xihu District and Shangcheng, while Qiantang District and Fuyang have no tourist nodes [
40]. Moreover, “all-for-one tourism”, proposed in 2016 in China, seeks to surpass traditional sightseeing limitations and envisions expansive regional tourism areas to enhance the sustainability of tourism development [
41]. This has designated two rounds of national demonstration zones for “all-for-one” tourism at the county level. Therefore, it is more meaningful to conduct research on the structure of urban tourist flow networks from a county-level perspective.
Hangzhou, celebrated as a captivating mix of China’s natural beauty and cultural heritage, attracts visitors from all over the world. However, the city’s current tourism development faces some challenges such as the uneven growth and less development in outer regions. There seems to be a noticeable gap in academic research from the county level to analyze Hangzhou’s tourism flow network patterns. This lack of finer-scale analysis limits our understanding of the broader tourism growth in Hangzhou, possibly influencing the progress of the “all-for-one” tourism approach.
To address these issues, this study first constructed a BERT-BiLSTM-CRF model to extract and identify specific locations and scenic entities within online travelogues for Hangzhou. A pan-attraction database of Hangzhou was also created to facilitate the construction of the tourist flow matrix. Then, social network analysis and GIS spatial analysis were used to study the spatial network structure of tourist flows in Hangzhou, specifically at the county level. In addition to fixing the deficiency in existing research, this paper also provides a scientific basis for Hangzhou to make more rational decisions in aspects such as tourism spatial planning and tourism route designation.
3. Results
3.1. Model Results and Tourist Flow Network Construction
In the experiment, it is observed that choosing 4000 annotated sentences as a corpus and randomly dividing them into training, validation, and test sets in a ratio of 7:2:1 [
50] leads to better recognition results with reduced annotation effort (
Table 3).
The BERT model is an improvement over the word2vec model, which uses the mechanism of the Transformer model to better understand the meaning of words in their context.
Table 4 shows the results of experiments using different models, including the BERT-BiLSTM-CRF model and another five mainstream methods of NER. As shown in
Table 4, BERT-BiLSTM-CRF performs better in recognizing two types of entities than other models, with the most significant improvement in precision, recall and F1 values. Moreover, the result shows that setting the training iteration to 50 achieves the highest F1 values for locations and scenes in the test set. The macro-average F1 value for the entire dataset reaches 87.52%, accompanied by relatively low loss values, suggesting a well-fitted model. Compared with RoBERTa-BiLSTM-CRF, this method has a higher precision and F1 score and significantly reduces the train time, despite a slight lag in recall for Location entities. Therefore, BERT-BiLSTM-CRF has the ability to extract tourist nodes from web travelogue data and is used for subsequent experiments.
With the application of the well-refined BERT-BiLSTM-CRF model, tourism sites within all travelogues are identified following the authors’ narrative sequence. This exercise results in the recognition of 233,196 entities, revealing various tourist routes depicted in the travelogues, such as the “West Lake→Leifeng Pagoda→Hefang Street→Olympic Sports Center” trajectory. Within this sequence, three direct movements are observed: “West Lake→Leifeng Pagoda”, “Leifeng Pagoda→Hefang Street”, and “Hefang Street→Olympic Sports Center”. Subsequently, based on the principle of fuzzy matching and matching with the Hangzhou Pan-attraction Database, these identified routes were assigned to county-level administrative divisions. Finally, this process reveals two instances of direct movement between county (city, district) pairs: “Xihu District→Shangcheng District” and “Shangcheng District→Xiaoshan District”, both recorded as “1”. The movement “Xihu District→Xiaoshan District” is considered indirect and recorded as “0”.
Following the aforementioned recognition principle, a 13 × 13 directional tourist flow matrix is established, with the nodes represented by the 13 counties, cities, and districts in Hangzhou. In this matrix, rows represent the starting points for tourist routes, columns represent the points of destination, and a “1” is assigned for each instance of directional flow. According to software guidelines, an appropriate threshold is introduced, thereby converting the original matrix into a binary matrix. In the binary matrix, nodes with flows exceeding the threshold are marked as “1”, with the rest recorded as “0”. Through a series of iterative experiments, a threshold value of 7 is ultimately chosen for carrying out the social network analysis. This approach facilitates a more accurate determination of the central nodes and their mutual relationships within Hangzhou’s tourist flow network.
3.2. The Overall Network Structure Characteristics of Tourist Flows
3.2.1. Network Density and Average Distance
Network density serves as an indicator of the degree of interconnection among the nodes in a network. After analysis, it is found that a threshold value of 7 results in an optimal network structure, yielding a density of 0.532 (
Table 5). The network has 83 directed relationships, which is less than the theoretical maximum of 156 connections among the 13 nodes. This suggests that the overall network density and strength in Hangzhou are comparatively low. As the threshold increases to 10, there follows a decline in the network density to 0.474, and a noticeable reduction in the actual relationships to 70. This reduction indicates that there is less flow and weaker connections between certain counties, cities, or districts, revealing an imbalance in the distribution of the network structure in Hangzhou.
Figure 4 illustrates this tourist flow network at the county level in Hangzhou with the applied threshold. The size of the regional nodes shown in the figure represents the intensity of their connections with other regions. Some regions within the network exhibit relatively weak relationships. For example, Binjiang District is only associated with tourist inflow and lacks outflow. These observations highlight the requirement for enhanced collaboration and strengthening within the network.
As shown in
Figure 4, a unique spatial pattern pertaining to the network density of tourism flow across Hangzhou county regions is apparent from 2020 to 2022. This pattern showcases an intensified density in the northeast juxtaposed with a relatively diminished density in the southwest. The prevailing pattern is mainly associated with Xihu District, as shown in
Figure 5. Xihu District plays an important role in tourist flows, with frequent and noticeable interactions with four other regions. This can be attributed to its strong economy and abundant resources. In 2022, Xihu District witnessed a significant increase in the value-added of tertiary industry, escalating to RMB 1813.0 billion (roughly USD 259 billion). Meanwhile, it also boasts one 5A scenic spot, eight 4A scenic spots, and twenty-two 3A scenic villages in Zhejiang Province, establishing itself as an important tourism development center in Hangzhou. Additionally, it is observed that frequent tourist transfers occur between Chun’an and Jiande, as well as between Tonglu and Fuyang, due to their geographical closeness and convenient transportation links.
Signifying the mean number of shortest-path edges that interlink any two nodes within the network, the average distance stands as an indicator of network accessibility. Within the tourist flow network of Hangzhou, the average distance pegs at 1.468, complemented with a distance-based cohesion index of 0.766 (
Table 5). The values inferred from these metrics collectively imply a well-interlinked network structure with a smooth overall flow. Even though some counties, cities, or districts may lack intimate connections, they still maintain relatively straightforward relationships. This facet accentuates a notable degree of accessibility across the network.
3.2.2. Graph Centralization
Graph centralization provides insights into the overall structure of a network, including three indicators: degree centralization, betweenness centralization, and closeness centralization. Degree centralization measures the extent of overall centralization throughout the network, while betweenness centralization identifies potential central nodes and cliques. Closeness centralization indicates the degree of variation among nodes. Meanwhile, in-degree and out-degree signify the network’s tendency for internal consolidation and outward divergence, respectively. According to
Table 6, it is evident that both the out-degree centralization and in-degree centralization of the overall tourist flow network in Hangzhou stand at 0.507. In a fully connected pairwise network, the degree centralization is expected to equate to 0. This finding suggests an inconsistent regional distribution within Hangzhou’s tourist network. The closeness centralization for out-degree and in-degree reaches 0.679 and 0.675, respectively, indicating a clear trend of concentrated development in the network bearing recognizable central nodes. The betweenness centralization levels somewhat lower at 0.366, suggesting the existence of several regional nodes playing ‘middlemen’ roles to ensure effective connectivity between the core and peripheral network areas. These intermediary nodes should enhance their intermediary roles, collaborate with peripheral regions, and drive the launch of premium tourist routes to promote the high-quality development of tourism in Hangzhou.
3.2.3. Core–Periphery Model and Correlation Analysis
The results of the Core–Periphery Model (
Table 7) further validate the earlier conclusions, outlining a “Core–Semi-Periphery–Periphery” structure based on coreness scores. Notably, Xihu District and Shangcheng attain a core status, given their coreness scores exceed 0.35, underscoring their significance as primary tourist destinations in Hangzhou. Fuyang, Tonglu, Xiaoshan, Chun’an, and Lin’an exhibit coreness scores above 0.30, placing them in a semi-peripheral role. Conversely, Yuhang, Gongshu, Jiande, Linping, Binjiang, and Qiantang occupy a peripheral position, displaying noticeable disparities in scores compared to the core zones. This suggests that the trickle-down effects from the core regions are limited, failing to effectively promote tourism development in the peripheral areas.
The quantity of cultural and tourism resources in each region is utilized as a measure of resource quantity, while the sum of cultural and tourism resources rated at levels 4–5 serves as an indicator of resource quality. With the assistance of SPSS 26.0, these elements are subjected to a Spearman correlation analysis alongside the core scores of each region. The results show a highly significant correlation (
p < 0.01) between coreness and both resource quantity and quality. Specifically, the correlation coefficient associated with resource quantity is 0.913, and with resource quality, it is 0.746 (
Table 8). This highlights the significant impact of resource conditions on the coreness scores of each region of Hangzhou, emphasizing the importance of resource endowment in regional tourism development. As a result, it is suggested that peripheral areas of tourists in Hangzhou should actively engage in collaborations with the core areas, leveraging the resource advantages of the core areas, enhancing their own tourist reception infrastructure, and benefiting from collaborative tourism development. Moreover, continuous exploration of unique resources, gradual adjustment in positioning, and leveraging the tourist flow network to attract more visitors should be part of their strategy. By aligning their development with the core areas, more new network cores can be established.
3.3. The Node Structure Characteristics of Tourist Flow Network
3.3.1. Nodal Flow Direction and Flow Rate
Through the analysis of original directional flows from online travel travelogues, Sankey diagrams are constructed to visualize the tourist movement across different districts and counties in Hangzhou (
Figure 6). Xihu District and Shangcheng emerge as principal sources and destinations of the tourist flows, with a substantial number of movements between them. This pattern can be attributed to their mutual scenic attractions in the West Lake area and their close geographical adjacency. Additionally, frequent flows are observed between Xiaoshan, Yuhang, Gongshu, and the aforementioned districts. In contrast, regions like Fuyang, Lin’an, Tonglu, Chun’an, and Jiande show relatively diminished connectivity with the core districts. However, noticeable internal of tourist flows are detected within these five regions. Regions such as Binjiang, Qiantang, and Linping demonstrate restrained connectivity, potentially constrained by their respective tourism resource endowments, thus displaying less evident connections with other areas.
3.3.2. Point Centrality
Point centrality analysis is used to examine the specific role of nodes in social networks, involving three key metrics: degree centrality, betweenness centrality, and closeness centrality. Degree centrality measures the number of nodes that are directly connected to a particular node, which represents a local centrality index. Betweenness centrality characterizes a node’s ability to control connections between other nodes, while closeness centrality indicates how much a node can function independently from others.
By using Ucinet 6.0, a quantitative analysis of the nodes in the tourist network in Hangzhou is carried out, yielding the results as shown in
Table 9:
In terms of average values of these metrics, each tourist node in the network is connected to approximately 6.385 other nodes and takes on an intermediary role approximately 5.615 times. The mean shortest distance to all nodes is 17.615, implying that the network density and strength are moderate overall. However, variance examination reveals an unbalanced distribution in the network strength across Hangzhou’s nodes. The variance of betweenness centrality stands at 13.024, the largest among the parameters, indicating a variance in the positions of nodes within the tourist network, with some nodes almost in isolation.
Reviewing specific scores, Xihu District emerges dominant across all three indicators, having the highest betweenness centrality and the lowest closeness centrality, indicating direct interaction with other nodes. As the primary hub for incoming and outgoing tourist flows (
Figure 6), Xihu indisputably assumes a central role in Hangzhou’s tourist network. However, the tourist flow intensity between Xihu and other regions exhibits disparity, and Xihu appears to exert a comparatively limited impact on the peripheral areas. A notably high betweenness centrality score of 50.193 suggests that Xihu also plays a pivotal ‘middleman’ role, exerting considerable control over the tourist flow from other regions. Looking ahead, Xihu District should further explore its cultural and tourist potential, develop internationally appealing products, and amplify core strengths. This strategic direction would consolidate its role as an epicenter, driving optimization in Hangzhou’s overall tourist spatial dynamics.
The regions of Shangcheng, Xiaoshan, Fuyang, Tonglu, Chun’an, Yuhang, and Lin’an exhibit a relatively high degree of centrality and closeness centrality, indicating direct connections with most nodes and smooth tourist flows. These factors position them as secondary-level tourist nodes. As shown in
Figure 6, these regions attract and channel a substantial number of tourists to and from Xihu District, collectively forming major tourist destinations in Hangzhou. As a strategy to enhance collaboration, these regions could improve the quality of tourism products, attract more visitors through economies of scale, and construct solid interconnections with peripheral areas. Specifically, Xiaoshan District demonstrates a substantial betweenness centrality of 6.850, taking the second position, emphasizing its potent control capabilities. Several regions lean on it as a crucial intermediary to forge connections with other areas. In contrast, Chun’an County, known for the 5A-level scenic spot Thousand Island Lake, exhibits a relatively lower betweenness centrality score of 1.376, ranking seventh. Its prominence within the network is not as distinguished.
The regions of Gongshu, Jiande, Linping, Binjiang, and Qiantang all bear inferior scores, with some even achieving a standing of zero. These nodes have limited influence across any given region, thus occupying a marginal position within the tourism network. Some are almost being identified as isolated nodes. For instance, Binjiang District merely boasts of a single linkage with Xihu. An urgent upgrade to the tourist infrastructure is vital for these domains. Furthermore, it is recommended that these peripheral regions connect with established tourist hotspots to co-develop tourism routes, subsequently integrating into the tourism network.
3.4. Tourist Gravity Center Distribution and Migration
The tourist coreness scores for each county and district are individually computed for the years 2020 through 2022, aiming to unveil the tourist gravity center in Hangzhou during this period (
Table 10). The migration trajectory of the tourist gravity center is visualized using ArcGIS 10.6 (
Figure 7).
From a spatial perspective, the tourist gravity center was located in Fuyang District, between 119.79° E and 119.87° E and 30.01° N and 30.07° N, in the southwest of Hangzhou’s administrative center (119.95° E, 30.11° N). This suggests that the overall tourist activity in the semi-peripheral regions of the southwest surpasses that of the northeast, despite the presence of core areas such as West Lake and Shangcheng District in the latter. From a temporal perspective, a discernible southwestward migration trajectory of the tourist gravity center was evident. From 2020 to 2021, it moved southwest from Fuchun Street to the junction of Fuchun Street and Chunjian Township, covering a distance of 1.147 km. From 2021 to 2022, it moved southwest continuously from Fuchun Street to Xindeng Town. The moving distance increased significantly to 8.74 km. The moving trajectory of the tourist gravity center indicates a more obvious tendency for the tourist network to move further into the suburban areas and away from central urban areas.
This shift can be largely attributed to the impact of the rapid development of COVID-19 in urban areas, becoming increasingly pronounced with its progression. Statistical data reveal a staggering increase in confirmed COVID-19 cases in Hangzhou, with the incidence rising over 40-fold in 2022 compared to 2021. Notably, 87.4% of cases were concentrated in the northeast. Driven by risk aversion tendencies, tourists show a strong preference for destinations perceived to have lower risk levels [
51]. Moreover, tourists’ travel preferences have changed with the influence of the pandemic. Under the regular epidemic prevention, natural scenic destinations have emerged as tourism hotspots, particularly in the southwest of Hangzhou where attractions like Thousand Island Lake and Daciyan Scenic Spot are widely distributed.
3.5. Spatio-Temporal Analysis with the Standard Deviation Ellipse
To further illustrate the spatial characteristics of Hangzhou’ tourist network, SDE analysis was conducted for 2020, 2021, and 2022, as well as for the entire period of 2020–2022 (
Table 11 and
Figure 8).
Throughout the years 2020 to 2022, the SDE of Hangzhou’s tourism network generally shows a “northeast–southwest” distribution. It mainly covers six districts and counties, including West Lake District, Shangcheng District, Gongshu District, Binjiang District, Fuyang District, and Tonglu County. From 2020 to 2021, the major axis increased from 67.96 km to 69.76 km, the minor axis decreased from 25.33 km to 24.65 km, and the angle of rotation increased by 3.24°. These changes indicate slight strengthening in the directional distribution of Hangzhou’s tourist network, particularly with more noticeable difference in the east–west direction. And there was a trend towards expansion, as indicated by the increased area of the ellipses. From 2021 to 2022, the major axis decreased by 1.9 km, the minor axis increased by 0.15 km, and the angle of rotation decreased by 5.48°. This suggests a more noticeable shift in spatial distribution during this period, with the disparities in the north–south distribution widening once again. The network demonstrates a tendency to contract towards the southwest direction, accompanied by a distinct reduction in ellipse areas.
The small expansion of the SDE areas from 2020 to 2021 is primarily due to the development of the pandemic, resulting in a reduction in tourist concentration trends and a shift towards dispersion. Conversely, the contraction towards the southwest regions of the SDE from 2021 to 2022 can be attributed to the changes in tourist preferences for travel behavior under the influence of the pandemic, resulting in a further decline in the status of the northeastern core areas.
4. Discussion
Drawing on travelogue data from tourism websites and BERT-BiLSTM-CRF Model deep learning models, this research studied the spatial structures of both the overall and nodal network of tourist flows in Hangzhou at the county level during the during the post-pandemic period from 2020 to 2022. GIS spatial analysis methods, such as the tourist gravity center model and standard deviation ellipse, were also employed to further elucidate the spatial difference within the network. In the following, we will discuss the characteristics of the key findings, research contributions, and limitations of this study.
- (1)
More efficient methods for data processing and comprehensive extraction of data information
With the evolution of internet technology and data, an increasing amount of User-Generated Content data has been used in research on tourist flow. However, challenges such as data overload, complexity, and redundancy have limited the further application of tourism big data. This study, in contrast to traditional manual data processing methods [
33,
34], utilizes advanced deep learning techniques to recognize and extract extensive, unstructured web text data. In comparison with other mainstream models, the BERT-BiLSTM-CRF model demonstrates a higher F1 score of 87.52%, significantly enhancing the speed of information extraction while maintaining a high level of accuracy. This advancement can contribute to the efficacy of big data-driven research on tourism flows.
Moreover, as opposed to exclusively extracting data from a single website as seen in other studies, our research crawled data from seven different travel websites to ensure the authenticity of the tourist flow network structure. In contrast to other studies that are limited to researching A-level attractions, this study employs web crawling and on-site surveys to build a comprehensive database of attractions in Hangzhou. By incorporating popular online tourist destinations into our research, we have ensured a more accurate and reasonable identification of tourist node information.
- (2)
The spatial structure of tourist flow networks from a county perspective
While previous studies on tourist flows in Hangzhou have mainly focused on analyzing individual attractions at a micro-level, this study aims to meet the growing demand of “all-for-one” tourism development and conduct the analysis from a county perspective. This analysis bridges the gap in the existing literature and provides valuable insights for urban–rural integration development.
As the findings reveal, the overall tourist flow network structure in Hangzhou is in line with previous studies, demonstrating a pattern of higher density in the northeast and lower density in the southwest. This pattern showcases a “core–semi-periphery–periphery” structural characteristic, which is significantly influenced by the quantity and quality of tourism resources.
However, the study reveals unique findings at the county level, which are distinct from those in other studies. Firstly, in past analyses of node networks, Thousand Island Lake has typically demonstrated a significant effect on tourist flow network [
52]. However, despite housing this attraction, Chun’an County does not stand out within the network in Hangzhou. This may result from its relatively remote geographical location and the large scale of the Thousand Island Lake scenic area. Often, tourists opt to move between Thousand Island Lake and other attractions in Chun’an, rather than going to other districts and counties. This leads to Chun’an not appearing as closely linked to other areas from a county-level perspective. Secondly, despite Xiaoshan District’s lack of particularly prominent tourist nodes and fewer high-grade resources, its high betweenness centrality score suggests a significant position within the network. This could be attributed to Xiaoshan’s transportation hubs, such as Xiaoshan Airport and Hangzhou South Station. These instances underline the impact of infrastructure, regional connectivity, and other factors on individual positions within the network and overall network development, emphasizing the need for further exploration.
- (3)
Geographical spatial distribution differences from 2020 to 2022
This research also utilizes the tourist gravity center model and SDE method to complement the analysis of the geographical spatial changes in the tourist network. The results show that the tourist gravity center of Hangzhou is mainly located in Fuyang District and has a significant southwest shift, especially from 2021 to 2022. The SDE exhibited successive expansions in the north–south and east–west directions. The overall size expanded slightly and then contracted significantly, indicating a phase of decentralization followed by centralization in tourism. This trend was not observed in previous studies on the structure of the tourism network of Hangzhou during the epidemic period. By combining these approaches, the temporal and spatial evolution of the tourist network can be comprehensively evaluated.
- (4)
Limitations and Future research
At the same time, this research bears some limitations. Regarding the research data, online travelogue data are often subjective, influenced by the personal opinions and emotions of the writer, which may not reflect the actual situation or general consensus. Secondly, the accuracy and credibility of such data may be hard to verify. A travel blogger might embellish or omit certain details, leading to a skewed representation of a place. In our efforts to curb these issues, we undertook pre-processing measures to enhance the overall trustworthiness of the dataset. Lastly, access to data may be restricted by website policies, potentially leading to gaps in the dataset. As a result, the time range of this study is limited to 2020–2022, making comparative analysis of longer time series difficult.
Regarding the research content, in this study, a basic correlation analysis of the factors influencing the structure of the tourism network was conducted, and it was determined that the quantity and quality of tourism resources is an important factor. It was also found that the expansion of COVID-19 may largely change the tourist gravity center. However, the causal analysis of the structure of the network deserves closer observation in future research.
Future research could consider collecting more data from other social media platforms to explore the temporal evolution of tourism networks, such as the periods before, during, and after the epidemic. And it would be meaningful to further analyze the causal mechanisms and spillover effects for the spatial network structure of tourist flow.
5. Conclusions
Collecting online travelogue data from seven platforms between 2020 and 2022 and implementing the method SNA and GIS spatial analysis, this paper investigates the spatial structure of Hangzhou’s tourist flow network at the county level. Due to the large number of online travelogues and the difficulties in manual identification, the paper also introduces the BERT-BiLSTM-CRF model from deep learning to streamline data processing, which yields significant results. The specific conclusions are as follows:
Firstly, the BERT-BiLSTM-CRF model achieves an impressive F1 score of 87.52% during training, which outperforms other models and effectively identifies tourist entities in travel data. This highlights the model’s feasibility in tourist flow research, making it a reliable alternative to manual efforts. The successful use of deep learning methods in handling extensive data in the tourist flow domain provides valuable insights for future research.
Secondly, the distribution of the tourist flow network within Hangzhou’s counties is somewhat sparse and uneven, while presenting desirable accessibility. Geographically, it exhibits a distinct pattern of dense connections in the northeast and sparse connections in the southwest, indicating regional imbalances and a significant centralization trend. The structure follows a ‘core–semi-peripheral–peripheral’ pattern. Xihu and Shangcheng serve as core nodes, exerting strong dominance and fostering intimate ties with semi-peripheral areas. However, connections with peripheral areas appear weaker, resulting in limited trickle-down effects. Some regions, such as Qiantang District, have relatively few connections, even approaching isolation. During the year-to-year evolution of the tourism network, the network’s center of gravity has consistently shifted towards the southwest, largely influenced by the COVID-19 pandemic. Furthermore, the overall size of the network exhibits a pattern of slight expansion followed by contraction.
Finally, in light of the above conclusions, the study proposes several suggestions for regional development. For areas such as Xihu District and Shangcheng, which possess strategic positioning for high tourism development, a dual focus is needed: first, there is a need to improve the quality of tourism to build a global brand; second, there is a need for the initiative of enhancing the trickle-down effects by strengthening connections with peripheral areas. Semi-periphery regions should actively assume the role of the “middleman”, directing tourists towards peripheral areas to enhance network connectivity. Peripheral areas require swift enhancements to their infrastructure, collaboration with core regions to introduce high-quality tourist routes, and efforts to increase their visibility.