1. Introduction
In this paper, a complex disaster loss assessment database system that allows researchers to search, apply, and share database is presented. A previous study focused mainly on the development of Korean building inventory [
1]; however, this paper presents an advanced system architecture design considering system scalability, various functions with specific construction information of the Korean peninsula, and a data management scheme for big disaster data processing.
Earthquake damage involving floods and fires is increasing the world over, and various types of large-scale complex disasters are occurring in succession. South Korea, which was regarded as a safe zone against earthquakes, witnessed the 2016 Gyeongju earthquake (magnitude 5.8) and the 2017 Pohang earthquake (magnitude 5.4), which were the largest in its earthquake observation history. These earthquakes confirmed that South Korea is no longer safe from earthquakes; therefore, it is necessary to develop technologies that can comprehensively predict and respond to the scale of complex disaster damage at the national level, as well as technologies for strengthening disaster resilience [
2].
With the advent of the fourth industrial revolution (or the emergence of industry 4.0), the importance of big data is attracting attention. More rapid and precise damage loss assessment and disaster risk prediction and analysis are possible in the disaster response area through advancements in data collection and analysis technology, based on information and communication technology (ICT) and sufficient computing resources.
Hazards U.S. Multi-Hazard (HAZUS-MH) [
3] and Mid-America Earthquake Center Seismic Loss Assessment System (ERGO) [
4] are representative damage analysis tools related to earthquakes. Most analysis tools calculate earthquake-induced building damage first, assess damage losses based on the degree of damage, and calculate the number of refugees and the number of people requiring shelters. To perform a loss assessment, data representing the information of the damaged area, such as buildings and population, are required. Relevant studies conduct research and analysis using data provided in the forms of files and databases by the country.
In South Korea, studies are being conducted using disaster damage statistical data, relevant research data, and actual measurement data. For earthquakes; however, earthquake-related data for estimating losses are insufficient. Therefore, analyses are conducted based on HAZUS-MH, which is a loss assessment tool of the Federal Emergency Management Agency (FEMA), using a large amount of accumulated actual data, earthquake-related data provided by the tool, and domestic and overseas studies. In order to utilize these specific data sets in various applications, such as HAZUS-MH or ERGO, a lot of time is required to turn the data into useful information through the initial steps of data cleaning, preparation, and preprocessing.
HAZUS-MH is developed by the Federal Emergency Management Agency (FEMA) and is used in many other regions for estimating losses after natural hazards, including earthquake, flooding, fire, and hurricane. In damage estimation studies caused by natural disasters in the Korean peninsula, Kang et al. [
5] estimated loss estimation by using HAZUS-MH and ShakeMap of USGS (United States Geological Survey), and Yu and Jo et al. [
6,
7] investigated flood loss estimation using HAZUS-MH. Besides, Scawthron and Hancilar et al. [
8,
9] analyzed HAZUS-MH-based loss assessment in Canada and Oman.
As the earthquake-related data provided by HAZUS-MH (and the weight data used for loss assessments) reflect the environment and condition of the corresponding country, the data are different from those of the Korean environment. Therefore, it is necessary to provide proper datasets reflecting the Korean environment, such as the geological environment, structures, and uses of buildings, in order to evaluate the loss assessment properly. In the study by [
10], only the direct economic damage of buildings caused by strong ground motion was estimated, which does not reflect the ground liquefaction and indirect damages that reflect the characteristics of the region. The studies by [
11,
12] presented the damage prediction and inventory construction method in a specific area and introduced the necessity of a long-term project caused by inducing the system to expand the inventory [
13].
The loss assessment uses a variety of data, such as demographic figures, building-integrated information, and others, gathered and used at the national level. Most open data are managed and provided from public open data portal systems, and are easy to use for a variety of purposes. However, most of the data are generated by various institutions and for various purposes, and the compatibility and interoperability of the data are not sufficiently considered. Therefore, standard data processes should be performed with respect to various types of data formats, such as structured, semi-structured, and unstructured. This paper focused on the construction of a database system with respect to multi-disaster loss assessment analysis of earthquakes involving floods and fires, and presents a scalable design methodology of a database system considering compatibility, which can serve as a data platform of disaster risk reduction and prevention. The proposed database system plays a crucial role in providing relevant inventories and proper data sets to various applications for disaster risk reduction and prevention in Korea.
In this paper, we performed compatibility and interoperability testing of the proposed database system by applying two different systems—big data analytics framework (real-time/ distributed based analysis) and Ergo application software (batch processing/standalone based analysis). The goal of compatibility and interoperability testing is concerned with how various data sets or different types of systems perform the function properly without redundant preprocessing.
Section 2 describes the characteristics and types of public data and overseas technologies used for estimating disaster losses,
Section 3 explains a scalable disaster risk analysis platform, and
Section 4 explains the data design method for constructing the database system.
Section 5 discusses the comparison between existing systems and the proposed system. Finally, conclusions and future research directions are drawn in
Section 6.
3. Scalable Disaster Risk Analysis Platform
Most of studies related to disaster risk assessment tend to deal with similar properties to influence factors, such as the characteristics of location, building, ground types, population, and socio-economic elements, to estimate loss evaluation. Therefore, this paper presents a disaster risk assessment database system, which can be easily utilized in Korean loss evaluation due to natural hazards, by using both inventories and the compatible data set.
With regard to disasters, it is difficult to assimilate the situation on the spot in real time, and it is difficult to cope with rapidly changing situations. Therefore, it is necessary to repair and prepare for a damage evaluation through a simulation beforehand. The study focuses on the design and implementation of database systems that allow researchers or stakeholders to provide appropriate data sets that take into account the characteristics of specific local environments, including geological properties and population data, through the risk analysis tools or real time analysis platforms.
The disaster risk analysis platform consists of a data preprocessing process, storage process, analysis and visualization process, and service process, as shown in
Figure 5. In the preprocessing process, public data (buildings, population, and weights) scattered around various agencies are collected, and the data format is converted to store and process the data. Currently, data is collected mainly from public data, but social networking service (SNS) data collection will also be conducted to reflect disaster situations in real time.
In the storage process covered in this study, the database is constructed to store public data, which is structured data. The collected data is structured considering the characteristics between the data items. Based on this, a database suitable for data attributes is constructed. When image data such as hazard maps and disaster-related social data are added, large amounts of semi-structured and unstructured data are collected in real time. Therefore, more data storage methods will be added through schema design using the JSON (JavaScript Object Notation) [
19] format and NoSQL data processing. JSON (JavaScript Object Notation) is a standard text-based format for representing structured data and objects. Besides, JSON is light weight data-interchange format which makes it easy to interchange data among web, mobile, and so on. NoSQL refers to a database that is not depending on SQL (structured query language), which is very suitable for big data processing. NoSQL databases are highly scalable, do not rely on a relational model, and provide high performance [
20]. Analysis and visualization processes and services provide data that can help decision-makers to make decisions in order to predict and prepare for disaster damage based on the data they hold.
In this study, we conducted a compatible test with respect to the proposed database system by applying both the big data analytics framework and Ergo. The proposed system can be utilized in various types of platforms in order to assess risk due to natural hazards.
4. Multi-Disaster Loss Assessment Analysis Database System for South Korea
First, Korean environment data were collected to construct a complex disaster loss assessment system for South Korea. The database model was constructed and developed considering a standardization process after refining the data distributed to each institution. In this paper, the database was constructed by considering scalability, and the loaded data can be used as open data in various analysis systems other than disaster-related analysis systems. The system has a workflow as shown in
Figure 6.
The database consists of 30 tables, and is mainly composed of building information data, population information data, basic attribute data used for the loss assessment, weight data, and tables related to formulas and other data. The tables in the Entity-Relationship Diagram (ERD) mainly contain building-related information, because it measures various losses that occur based on the degree of damage to the building.
Figure 7 below shows an ERD of the complex disaster loss assessment and damage analysis database system designed in this paper.
4.4. Weight-Related Table
The weight-related table is composed of tables representing the money information per unit area such as service life, durables and nondurables, loss of contents, building replacement cost, total production (limited to industrial buildings), and the demand on shelters according to age and income, as shown in
Table 6.
The disaster damage in South Korea is calculated in accordance with the guidelines for natural disaster investigation and restoration plan establishment. With regard to private property loss and damage, damage to the internal contents of residential and commercial buildings is not considered, and is calculated in a way that does not depend on the building area. In addition, when the losses for commercial and industrial buildings are calculated, additional economic losses (such as business losses owing to building damage) are excluded. In case of a disaster, the money information per unit area, such as the loss of contents inside the building ($), building replacement cost ($), and total production ($) (limited to commercial and industrial buildings) based on the square footage (SF), must be provided as weights. It is possible to estimate damage to buildings, constructions, and assets (facilities, machinery, furniture, equipment, and vehicles) considering the service life and depreciation, and to calculate economic losses that are suitable for the Korean situation, such as the loss of contents inside the buildings and the total production.
To calculate or predict the population movement based on the victims of building damage owing to a disaster, weights by age and income must be considered.
The prediction model of HAZUS-MH estimates seismic risk in all regions of the United States by using past disaster history including hurricanes, floods, and earthquakes. In addition, the various risk models are incorporated with social vulnerabilities to multi-hazard losses.
In South Korea; however, there is a lack of research on social vulnerabilities to disasters, such as emergency shelter needs. Most of the parameters and weights being used are based on HAZUS-MH models.
A sociodemographic survey considering the Korean environment must be conducted first to construct data suitable for the Korean environment. It is necessary to construct income information per household and the sales amount of the building unit. Additional tables must be designed by calculating the weight information considering the officially announced local building prices to assess the economic loss of buildings more accurately.