1. Introduction
Infrastructure networks enable the provision of connected services and the resources that are necessary for the function of societies in cities. Digital technologies make economic investment more worthwhile by improving the delivery of these services and resources. For example, in order to design a more responsive energy grid that can deliver cleaner energy more flexibly [
1], we need to be able to model buildings and their demand, the wider urban environment, and infrastructure networks [
2,
3]. The networks that transport the resources of gas, water, electricity and heat from suppliers to consumers span multiple spatial scales, from macro distribution down to micro intra-building consumption [
4]. This multi-scale manifestation presents significant challenges to the integrated topological modelling of these networks and there is currently no accepted means by which the spatiotemporal dynamics of the resource flows across these networks can be monitored and analysed.
The breadth of spatial scales spanned by infrastructure networks is such that the challenge of their modelling sits within both the Geospatial Information Science (GIS) and Building Information Modelling (BIM) domains; the disparity between these domains has yielded a diversity of data formats, which implement different semantic and geometric models, and are underpinned by different base languages, thus hindering data integration and software interoperability [
5,
6,
7]. In order for the complexity of infrastructure systems to be mastered for improved resource allocation and asset management; facility operators, engineers, and urban planners require an approach to the modelling of infrastructure networks that addresses the data integration challenges that are encountered in real-world, practical scenarios [
7].
Research on GIS-BIM integration has not yet enabled this; work has focused predominantly on geometry [
8] with relatively little consideration of capturing network topologies and a reliance on overlapping concepts between existing schemas. Furthermore, existing data standards do not readily facilitate the representation of data dynamics [
9], which limits the extent to which models can be used to simulate resource flows. There is a clear need for integrated modelling approaches that enable the representation of complete flows through end-to-end topologies, enabling the deeper spatiotemporal analytical capabilities that are required in a range of application domains [
10].
In this paper, we present a method that brings together the topologies that are represented in multiple, disparate data sets as a system-of-systems; we use a graph database to integrate the topology of infrastructure components that are connected across multiple scales but are modelled in separate static models and dynamic data streams. We also show how the spatiotemporal evolution of resource flows can be disseminated for ingestion by client-side analytics and visualisation systems.
The utility of the developed system is evaluated for electricity demand-supply visualisation across the GIS-BIM interface; we present a case study in which the topology of circuited electrical consumer components of an Industry Foundation Classes (IFC) model is integrated with the local neighbourhood electricity distribution feeder network encoded in the CityGML UtilityNetwork Application Domain Extension (ADE). A real-time stream of JavaScript Object Notation (JSON) messages, derived from a building management system, is then used to enrich the topology of the integrated network and capture the spatiotemporal dynamics of electricity consumption. A publish-subscribe message broker is deployed for dissemination of the dynamics of the integrated network to client systems. A web-based visualisation system is developed as an example exploitation client to the backend capability, validating the method for demand-supply visualisation.
2. Previous Research
Previous GIS-BIM integration research efforts have employed a range of techniques and for a variety of purposes. Given the comprehensiveness and popularity of IFC for exchange of BIM models and the predominance of CityGML for exchange of urban information [
11,
12], these two formats are either fundamental to many of these studies or have been used in the implementation or demonstration of many research methods.
Irizarry and Karan [
13] devised a multi-stage GIS-BIM integration workflow that facilitates the optimisation of tower crane location on construction sites; Amirebrahimi et al. [
14] designed a semi-automated process, involving the ArcGIS Interoperability Extension, which enables the visualisation of flood damage to buildings. Deng et al. [
12] focus on mapping by means of a reference ontology and Cheng et al. [
15] propose a semi-automated framework for schema mapping that makes use of linguistic and text mining techniques. Semantic web technologies, Resource Description Framework (RDF) graphs, and the SPARQL query language were used by Karan et al. [
16] and Hor et al. [
17] to link information between BIM and GIS models. Kang et al. [
18] proposed an Extract, Transform, Load (ETL) architecture that is specific to GIS-BIM integration. The above studies made extensive use of the CityGML and IFC formats in addressing integration challenges. However, despite the breadth of techniques and target use cases, the integration of utility networks has received relatively little attention [
5,
19]. Berlo and Laat’s work [
8] on the CityGML GeoBIM Application Domain Extension (ADE) has strengthened the two formats’ capabilities by enriching CityGML with semantics from IFC and the UtilityNetwork ADE [
20] was developed to enable the representation of utility networks. Hijazi et al. [
19] found that most IFC building service concepts can be mapped into the UtilityNetwork ADE without the loss of information, and Becker et al. [
21] later proposed a specialisation of the ADE with a geospatial information model for multi-infrastructure utility networks.
Although these developments have increased the richness of topological information that can be represented and integrated, the flexibility and applicability of many integration methods are limited by a reliance on the overlap of concepts that are represented in established domain-specific schemas. Furthermore, the emergence of smart sensor networks [
22] is resulting in a growing volume of sensor data describing the flow of data through infrastructure networks and these data need to be related to the digital models. Standards organisations, such as BuildingSMART and the Open Geospatial Consortium (OGC), are modifying and extending existing data formats to address such evolving requirements but there remains a need to devise methods of representing more generically the topologies of infrastructure networks and the dynamics of utility resource flows.
3. Method Design and Implementation
In this section, we present a software systems approach to integrating the topologies of multi-scale infrastructure resource networks within cities and simulating the flow of resources through them. We show how a graph database can be used to integrate the topologies that are represented in instances of different schemas, regardless of overlapping concepts, that the method is flexible enough to enable the integration of the topology derived from real-time data streams, and how the same data streams can be used to simulate resource flows through updates on the properties of graph network relationships. We also show a means of disseminating continuously the evolving state of the integrated network in support of demand-supply visualisation.
The overall software system is represented in
Figure 1. The system comprises two main components: (i) an integration and simulation component in which the topologies represented the disparate static data models are derived and integrated, and associated real-time data feeds of the flows across the integrated network embedded; and (ii) a dissemination component that publishes messages that indicate changes to the network’s structure and flows.
The integration and simulation component comprises three main functions: (i) the system identifies salient elements within the separate data models that form the sub-networks of the system; (ii) the sub-networks are integrated to form a single network graph model in a graph database; and (iii) real-time sensor streaming data are used to identify additional topology structure in the network graph and to dynamically modify the spatiotemporal properties of network nodes.
As IFC is the predominant format for BIM model exchange [
11,
12], a Python-based application was developed to extract internal building network elements and their topology from IFC models using the IfcOpenShell-python module. Elements are matched against a pre-populated gazetteer of elements types, their IDs are used for unique identification, and the IfcRelConnectsPortToElement and IfcRelConnectsPorts relationships are used to identify the connectivity between the elements. Given the wide usage of the CityGML [
11,
12] for urban-scale modelling, a custom Java-based Document Object Model (DOM) parser was developed for processing instances of the format’s UtilityNetwork Application Domain Extension (ADE) [
23]. The application traverses the XML tree, extracts Node elements from a FeatureGraph and uses InteriorFeatureGraph XLink references to connect the nodes. The JSON format is used commonly for transmission of real-time data and is used for representation of building management data in the case study of
Section 4; for these reasons, the integration system was designed to handle streaming data that are structured in the JSON format. The UtilityNetwork ADE and IFC are domain-specific modelling schemas but JSON is not; the precise implementation of the JSON parser was thus tailored according to the structure of the JSON messages that were used in the case study.
In the system, extracted elements and their topological connectivity are pushed to a Neo4j graph database. A graph database was selected as they have been found to be efficient for storing and querying topologically connected data [
24,
25]. Furthermore, it has been proposed that graph theory and graph models are suitable for understanding urban topologies [
26] and integrating models of urban data [
27], and that graph databases can be used for the detection of spatial-semantic changes in CityGML documents [
28]. The sub-networks derived from each data source are integrated into a single network by executing Neo4j Cypher queries that run ‘merge’ clauses on building nodes with matching identifiers across the CityGML and IFC files, and JSON stream. The sub-networks are integrated by their connection to a common building node and the use of the merge clause (rather than the ‘create’ clause) avoids duplication by only creating a node if it does not already exist in the database. For the CityGML files, the GML IDs of the building nodes are used for unique identification; for the IFC files, the name of the IfcBuilding element is used; for the JSON stream, the structure of the messages determines which key-value pair/s should be used for the unique identification of the building and consumer end-point to which the message relates. As for the building nodes, the use of merge clauses avoids the duplication of consumer nodes. The streams are used to update dynamically the temporal state of the integrated network through the use of ‘set’ clauses, which update property values on relationship pairs containing target consumer nodes. This real-time attribution of values to relationships constitutes the dynamic simulation of resource flows through the infrastructure network within the graph database.
The dissemination component of the system in
Figure 1 allows for the integrated infrastructure network and the associated dynamic resource flows to be exploited. This is achieved by making the state of the graph network at any particular time available to other systems. Conceptually, this is achieved by repeatedly capturing and then disseminating snapshots of the network’s state and flows at some time interval (e.g., every second). The technical implementation involves a daemon (background application) that recursively executes Cypher queries to capture the state of the entire database instance at a time interval set by the system administrator. The instances are then published as a JSON messages to a topic on an Apache Kafka [
29] message broker. The JSON messages contain arrays of node and relationship objects that collectively describe the network topology and flow state at the instant the database was queried. Any other systems may then connect to the broker, receive the messages by subscribing to the topic, and exploit them for analysis and/or visualisation. The dissemination component has been designed such that outputs from other processing and modelling workflows may publish messages to the same broker; similarly, other exploitation systems may subscribe to messages for analysis and/or visualisation purposes.
5. Discussion
This paper presents a method and implementation system for integrated modelling of infrastructure networks and simulation of resource flows across the GIS-BIM divide. A case study was used to evaluate the method, applying the system to the intra-urban and intra-building scales of an electricity demand-supply network for the purpose of demand-supply visualisation. The case study showed that the method enables continuous, real-time integration of the elements of GIS and BIM models that are required for electricity flow simulation, avoiding mappings between static data formats; instead, a graph database was used for representation of the elements from both domains and across the multiple scales, an approach that also enabled the integration of a real-time building management sensor data stream, which was used to enrich the network with additional elements, topological structure, and flow data. By parsing the models for salient elements only, the system eliminated much of the modelling complexity inherent in integration methods that attempt lossless or near-lossless conversions [
15,
35]. The integrated model and flow data were made available for exploitation through a dissemination platform, which manages messages that describe the real-time state of the network using a message broker.
It has been shown that a graph database can be used to represent and integrate the topologies of multi-scale electricity networks, and can be updated and queried in real-time to support electrical power flow visualisation. Future work could attempt to verify that this method can be extended to networks of other types of resources, such as gas and water supply and waste water. Further to its suitability for modelling and processing urban topologies [
26,
27], a graph database enables intuitive, human-readable concept models of systems to be reflected in the database structure, which results in a schema that is easier to interpret. There is no need to predefine the database schema since new nodes, relationships, and properties can be added on-the-fly; this flexibility is powerful for a utility resource network model that must integrate data from diverse and dynamic data sources. More generally, the requirement for integration methods to enable deeper mathematical modelling [
10] would be facilitated by the speed and efficiency of using graph databases to execute queries on connected data [
24,
25], and the relative simplicity of constructing query statements.
A message broker was demonstrated as an effective platform for exposing in real-time the state of the integrated electricity network for use in a web-based visualisation. The hosting of the broker on a dedicated, persistent, and accessible (virtual) machine afforded confidence in the reliable availability of real-time data. Beyond the thematic data filtering applied prior to modelling in the graph database, further filtering could be applied by applications that execute database queries for specific subsets of the nodes, relationships, and their properties, returning results to a broker topic that is associated with the query. Subscribing to that topic would be equivalent to subscribing to the database query, with the topic serving as a use-case view of the graph database, which is itself a thematic view of the integrated elements derived from the diverse, semantically rich data sources. The Kafka message broker technology used in this study is highly scalable; it can be deployed as a cluster spanning several servers [
29], with the ability to communicate data through multiple topics. The Kafka Streams API [
36] could be used to support the development of applications and micro-services that perform processing on real-time data streams and future work should consider employing the technology for implementing the concept of thematic and use case views of a graph database.
The method that is presented in this paper can be scaled to integrate larger, more complex infrastructure network and BIM models; and more varied, higher velocity real-time data streams as these sources become increasingly available. In order to exploit fully the potential of the revolution in BIM and the instrumentation of urban environments, there is a requirement to start developing the methods and tools that can integrate, analyse, and visualise these sources in a way that facilitates critical decision-making. The system presented in this paper was developed to the extent that it could be used to evaluate the method for the case study and further development is required to prove its scalability. For example, the method of capturing snapshots of the entire state of the network is unlikely to be computationally efficient enough to be scalable to very large networks. Instead, the messages would either need to capture the entire state of smaller sub-networks (related to a particular theme, for example) or only capture the changes that occur to the network; in the latter case, a client system would need to reconstruct complete topologies and flows from multiple messages stored on the broker. Similarly, the means of identifying elements that are common to multiple data sources—which is fundamental to the integration and currently makes use of UK-specific TOIDs—would also need to be generalised to ensure referencing consistency and universal uniqueness; alternatively, the geometries and geolocations present in each source could be used to identify spatial coincidence of elements and thus infer their equivalence. It will also be important to establish a way to relate real-time data streams to entities found in data models. Currently, the system is capable of relating the flow data only to the topology derived from the same data source and future research should consider enabling the ability to associate a data feed with entities that are derived from other data streams and model instances.
A more developed implementation of the method has the potential to provide many aspects of the spatiotemporal analytical capabilities required for more general GIS-BIM integration [
10]. For example, a regional energy provider could deploy the processing, modelling, and dissemination components of the system, publishing messages that describe the state of the integrated intra-urban and intra-building demand-supply network that it manages. A facility manager could then subscribe to a topic on the broker, exploiting the published messages (in a similar way to component C of
Figure 9) on a dashboard that supports fault diagnosis through highlighting anomalies in a flow visualisation; at the same time, a civil systems engineer or urban planner could subscribe to the same topic and conduct analyses on historic time-series data in support of assessing the impact of proposed modifications to the supporting infrastructure.
The geometries and geolocations encoded in the data sources could be used for several other purposes: to render multi-scale 3D visualisation in augmented or virtual reality environments, providing a more intuitive and immersive visualisation platform to users [
37,
38]; for the detection of clashes between physical assets that would otherwise be represented in disjoint GIS and BIM models; and, to enable the querying of intra-building flow data from geospatial software environments, such as drilling down from an urban-scale topographic map to view the real-time electricity flow within an individual dwelling or factory. The utilisation of timestamps on messages in the broker topic, or on updates to the graph database, would further exploit the method’s potential; by allowing replays of network flow evolution from historic time series data, the system could facilitate the diagnosis of anomalies in usage patterns with alerts issued for values falling outside a predefined tolerance—for example, identifying that an increase in energy costs is due to the machinery in a factory activating erroneously at night.
Future work should also focus on practical applications of the visualisation and analytical capabilities that the method allows, such as the dynamic Sankey visualisations of
Figure 10 and
Figure 11. Sankey diagrams have an extensive history in mapping and exploration of flows in operational systems [
39,
40]. Researchers have studied their use for estimating energy flows from sparse sensor data [
41] and visual representation of interdependencies between electricity, heat, and gas networks [
42]. The results of the case study presented in this paper have shown that Sankey diagrams can be used to represent flows from real-time sensor data in systems with evolving topologies that are derived from disparate data sources. A more developed version of the underpinning integration system presented in this research, which is able to model multi-resource interdependencies, could be used to show visually the effects of these couplings on flow dynamics. Such a system would also enable analyses for purposes such as anomaly detection; for example, if a set of resource provision services have failed, root cause analysis can applied by querying the graph for common dependencies [
43]. Erdener et al. [
44] identify gas-fired power plants in electricity systems and electricity-driven compressors in gas systems as the most significant dependencies in coupled gas-electricity systems, and active demand-side response (DSR) strategies stand to benefit from the modelling of coupled systems; Qadrdan et al. [
45] showed that a significant reduction in gas consumption can be achieved by electricity peak shaving through DSR. For the system presented in this paper to enable this at an intra-building scale, the parsing and graph representation of IFC models would need substantial further development and capturing the topologies of building systems is likely to be key to exploiting smart building technologies.
6. Conclusions
This manuscript presents a GIS-BIM integration and simulation method and implementation, which enables visualisation and analysis of resource flows through multi-scale infrastructure networks. The research demonstrates the effectiveness of using a graph database, message broker, and web technologies in a custom software system that integrates electricity network elements across multiple scales, derived from CityGML UtilityNetwork ADE models, IFC models, and JSON messages from a real-time data stream. The system was evaluated in a case study that focused on the challenge of dynamic visualisation of electricity demand-supply across the urban and building scales, validating the method as a means of understanding in real-time the flow of utility resources across the internal-external building interface. The research contributes to addressing the integration challenge that is presented by the tradition of modelling building interiors and external urban environments separately in the disparate BIM and GIS domains. A graph representation enables a more flexible approach to integrating the topologies derived from diverse datasets, without reliance on existing schemas and their overlapping concepts, and which is able to represent dynamics such as resource flows. Continuation research should focus on associating data feeds with entities that are derived from other data streams and model instances, scaling the capability to multiple buildings and distributions networks, applying the approach to other types of utility resources and their interdependencies, devising a spatial method for matching entities across data sources, and testing the operational robustness of the approach to a range of use cases.