Next Article in Journal
Study on Crushed-Stone Cementation Properties and Bottom Stope Stability of Goaf by Open Stope Mining in Inclined Ore Bodies
Previous Article in Journal
Long-Distance Measurements Using a Chromatic Confocal Sensor
Previous Article in Special Issue
Biomedical Flat and Nested Named Entity Recognition: Methods, Challenges, and Advances
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

TopoSinGAN: Learning a Topology-Aware Generative Model from a Single Image

by
Mohsen Ahmadkhani
* and
Eric Shook
Geography Environment and Society, University of Minnesota, Minneapolis, MN 55455, USA
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(21), 9944; https://doi.org/10.3390/app14219944
Submission received: 10 September 2024 / Revised: 24 October 2024 / Accepted: 29 October 2024 / Published: 30 October 2024
(This article belongs to the Special Issue Advances and Applications of Complex Data Analysis and Computing)

Abstract

:
Generative adversarial networks (GANs) have significantly advanced synthetic image generation, yet ensuring topological coherence remains a challenge. This paper introduces TopoSinGAN, a topology-aware extension of the SinGAN framework, designed to enhance the topological accuracy of generated images. TopoSinGAN incorporates a novel, differentiable topology loss function that minimizes terminal node counts along predicted segmentation boundaries, thereby addressing topological anomalies not captured by traditional losses. We evaluate TopoSinGAN using agricultural and dendrological case studies, demonstrating its capability to maintain boundary continuity and reduce undesired loop openness. A novel evaluation metric, Node Topology Clustering (NTC), is proposed to assess topological attributes independently of geometric variations. TopoSinGAN significantly improves topological accuracy, reducing NTC index values from 15.15 to 3.94 for agriculture and 14.55 to 2.44 for dendrology, compared to the baseline SinGAN. Modified FID evaluations also show improved realism, with lower FID scores: 0.1914 for agricultural fields compared to 0.2485 for SinGAN, and 0.0013 versus 0.0014 for dendrology. The topology loss enables end-to-end training with direct topological feedback. This new framework advances the generation of topologically accurate synthetic images, with applications in fields requiring precise structural representations, such as geographic information systems (GIS) and medical imaging.

1. Introduction

Since its inception by Goodfellow et al. in 2014, Generative Adversarial Networks (GANs) have revolutionized image generation [1]. At the core of GANs are two neural networks, a generator and a discriminator, which are trained concurrently using a game theory strategy. This innovative structure enables GANs to produce synthetic images that are nearly indistinguishable from real ones. GANs have been applied in various domains, including unconditional and conditional image synthesis [2], image inpainting [3], and image-to-image translation [4]. They have been particularly valuable in generating training images to address the scarcity of comprehensive datasets in image processing [2,5,6,7,8,9]. However, dealing with diverse datasets that include multiple object classes, such as ImageNet [10], remains challenging. This often requires training models tailored to specific tasks, such as super resolution [11], inpainting [3], and retargeting [12]. Training GANs typically demands large datasets, which can be resource-intensive. As a result, the focus can shift to training generative models on a limited number of images, or even a single image to capture unique characteristics and address data scarcity [13].
The innovative SinGAN [13] provides a unique approach by training a GAN on a single image, focusing on unconditional image generation and harmonization. SinGAN uses a multitiered and multiresolution method, beginning with training at low resolutions such as 26 × 35 pixels. As training progresses through various scales, the generator’s complexity increases, and the image resolution is enhanced. Each new scale builds on the training from previous scales, with only the newly introduced layers being trained. This multiscale architecture is illustrated in Figure 1. Building on this concept, the SinGAN-Seg model [9] was developed to enhance SinGAN by adding a mask channel to the conventional RGB channels. This modification has proven effective, particularly in creating annotated images for medical segmentation tasks.
Despite their success in generating visually convincing images, these models struggle with ensuring topological integrity in their predictions, which is crucial for applications such as agricultural fields, detailed medical delineations, and robotics. The precise nature of these tasks, whether they involve accurate robotic movements or detailed vascular representations in medical imaging, requires high topological accuracy [14,15].
To address this issue, we introduce TopoSinGAN. This model builds on the SinGAN framework and incorporates a novel differentiable topology loss function. The core of this loss function is its ability to identify and count terminals within the generated boundary mask, where a terminal is defined as a boundary pixel with only one neighboring boundary pixel. By reducing the number of these terminal nodes, TopoSinGAN aims to improve topological accuracy. We further validate that this approach is differentiable, making it suitable for optimization through backpropagation.

1.1. SinGAN

The original SinGAN architecture, as referenced in [13], is notable for its ability to generate synthetic data using a GAN model trained on a single image. The training process in SinGAN employs an image pyramid, which consists of differently scaled versions of the same input image, progressing from low to high resolution. In this hierarchical structure, SinGAN creates a GAN pyramid, where each level corresponds to an image from the image pyramid and undergoes training. SinGAN can input a three-channel RGB image and generate a three-channel RGB output. The use of multiple scales enables the creation of various synthetic versions of the input image across different resolutions. The number of scales is determined by the dimensions of the input image. This approach allows SinGAN to produce multiple synthetic variations from a single real image, illustrated in Figure 1 as a 1:N generation, where N represents the number of layers in the GAN pyramid.
Additionally, the intrinsic design and training mechanism of the SinGAN model provide it with versatility, which has been utilized and modified in studies such as SinGAN-Seg [9]. While SinGAN-Seg introduces modifications such as expanding to four channels (RGB plus a segmentation mask), the core architecture and the image pyramid-based training of the original SinGAN remain fundamental to its operation. SinGAN relies on two central loss functions. The adversarial loss L adv measures the differences between patch distributions in the original image and the generated images. It uses a Markovian discriminator, D N , that classifies patches as real or fake, similar to patch-based GANs. This adversarial method employs the WGAN-GP loss, known for its training stability, and focuses on the entire image, facilitating the learning of boundary conditions [16]. Meanwhile, the reconstruction loss L rec ensures the model can recreate the original image using specific noise maps, with variations based on the scale to emphasize the pixel-wise similarity between the original and generated images. The reconstructed image further aids in determining the noise’s standard deviation across scales. While L adv and L rec are effective at capturing visual details, they may neglect topological nuances, which can present challenges in applications that require precise topological accuracy.

1.2. The Need for Topological Accuracy

Topological accuracy can have different meanings across fields such as graph theory and computer science. In image semantic segmentation models, it often refers to preserving the topological relationships between the input image and the predicted mask [17,18,19]. However, in the context of GANs, where there is no real image corresponding to each generated image for direct comparison, topological accuracy is generally defined by connectivity. This involves ensuring that objects expected to be connected, such as roads or agricultural field boundaries, do not display disconnections in the generated datasets [20,21,22]. Similarly, in this study, we define topological accuracy as minimizing open loops in the generated images. The case studies presented here, agricultural fields and dendrology data, involve scenarios where the generated masks should consist entirely of closed loops, as both agricultural fields and xylem cells in dendrology images are expected to appear as fully enclosed regions. Thus, our research focuses on reducing these discontinuities and enhancing the topological integrity of the generated masks.
Ensuring topological accuracy is crucial in GAN applications across various domains. While GANs are highly effective at generating visually compelling outputs, they may overlook the intrinsic structure and connectivity of linear features that define the topology. This distinction from geometric representations is critical. In contexts such as road network mapping or medical imaging, maintaining topological accuracy in GAN outputs is essential to prevent the misrepresentation of crucial structures. For instance, an inaccurately depicted road junction in an autonomous driving map or an incorrect representation of a neuronal pathway in a medical image can have serious consequences [23]. Therefore, maintaining topological accuracy in GAN outputs not only preserves the interconnected nature of the data but also enhances the reliability and practical utility of the models in real-world applications.
Attempts to improve the structural coherence of generated images began soon after the introduction of GANs. The work by Arjovsky et al. (2017), introducing Wasserstein GAN (WGAN), was an early example of this direction. WGAN aimed to improve training stability and address issues such as mode collapse by minimizing the Earth Mover (Wasserstein) distance, offering a more stable optimization process compared to traditional GANs [24]. In the domain of GANs that emphasize geometry, several models have been developed to capture and utilize geometric nuances from images. For example, geometricGAN [25] integrates the expansive margin principle from SVMs to guide both discriminator and generator training. Similarly, Localized GAN (LGAN) [26] utilizes local coordinates to focus on the specific geometry of the data manifold. GAGAN [27] is notable for its application to facial image generation, drawing on established facial geometries. Meanwhile, GcGAN [28] enforces geometry-consistency regulation to preserve the inherent semantics of images, limiting geometric modifications to simple rotations and flips.
Recent advancements further highlight the integration of topological features in GAN models to enhance their application across different fields. The TR-GAN framework, for instance, improves retinal artery/vein (A/V) classification by employing a topology ranking discriminator and triplet loss, which significantly enhances vessel connectivity, outperforming conventional methods on the AV-DRIVE dataset [29]. TW-GAN further advances A/V classification by incorporating both topology and vessel width awareness, markedly improving performance on datasets such as AV-DRIVE, which is crucial for diagnosing conditions such as hypertension and diabetes [30]. Similarly, Liu et al. (2019) employed a hierarchical architecture to extract and preserve both local and global graph features, leading to more accurate graph reconstruction [31].
Recently, the application of persistence homology (PH) in learning topology-aware studies has gained attention [32,33,34]. Wang et al. (2019) proposed TopoGAN, the pioneering GAN model that explicitly learns the topology of real images, such as connectedness and loops, by introducing a novel topological loss function based on PH. Their approach showed improvements in preserving the topological integrity of generated images [32]. In another study, Bao et al. (2023) developed PHGAN that enhances image generation by incorporating topological features from PH into a GAN. These topological features capture global structural information, which is combined with convolutional features in the discriminator. The topological features extracted by a PH module transform into vectors that are fed into the network [35].
Although PH is a significant tool in topological data analysis (TDA) for addressing inaccuracies in GANs, its integration presents challenges. PH-based methods focus primarily on features such as connected components and loops, often overlooking other crucial aspects such as terminal nodes or disconnections in the generated masks. An example is a bicycle wheel with a missing spoke compared to one with a dangling spoke. In terms of TDA and PH, these configurations are treated as topologically identical because they present the same connected components and loops. However, from our perspective, the presence of a dangling spoke (i.e., a terminal node) is a critical difference that indicates an issue embedded in the structure. This limitation makes the application of PH in GANs less appropriate for certain cases where disconnections are considered important. In addition, the computational demands of PH can be substantial, especially with large datasets, potentially reaching O ( n 3 ) in complexity [32,33,34].

2. Materials and Methods

2.1. Topology Loss Algorithm

A differentiable topology loss was developed to evaluate the number of terminal nodes present in the generated boundary mask. This loss was integrated into the SinGAN architecture as an auxiliary loss term, L topo for the generator. Equation (1) shows all three terms of the loss function. In this equation, λ 1 , λ 2 , and λ 3 represent the coefficients of the loss terms, which can be adjusted manually:
Loss = λ 1 L adv + λ 2 L rec + λ 3 L topo
We propose L topo that primarily focuses on minimizing the presence of terminal nodes in the generated mask. The algorithm consists of three components as follows.
I
Soft Thresholding with Sigmoid Function ( σ ( x ) )
To detect terminals, the input mask (M) undergoes a custom sigmoid function σ ( x ) , resulting in a soft binarization of the tensor. σ ( x ) acts as a soft thresholding mechanism ensuring continuity across the entire domain. The mathematical representation is as follows:
σ ( x ) = 1 1 + e α ( β x )
lim x , α > 0 σ ( x ) = 1 , x > β 0.5 , x = β 0 , x < β
lim x , α < 0 σ ( x ) = 0 , x > β 0.5 , x = β 1 , x < β
where β is the threshold point and α is the thresholding slope. The sigmoid function is smooth and differentiable across its entire domain, ensuring that its gradient can be computed. Specifically, the derivative of σ ( x ) with respect to x is given by:
d σ ( x ) d x = α · σ ( x ) · 1 σ ( x )
II
Detection of Terminal Nodes via Convolution
After applying the sigmoid function, the soft-binarized mask M = σ ( M ) is convolved with a set of eight fixed kernels, K 1 , K 2 , , K 8 , which are depicted in Figure 2. These kernels are specifically designed to target the eight main cardinal and intercardinal directions in a 2D space (north, northeast, east, southeast, south, southwest, west, northwest). Empirically, the use of these kernels has proven effective in identifying discontinuities, thereby enhancing the model’s ability to evaluate topological performance. This evaluation is integral to the core of the developed loss function, enabling the application of corrective penalties during the training phase for improved topological accuracy.
C i ( M ) = M K i
where ⊗ denotes the convolution operator. The convolution operation is differentiable, and thus, gradients can be propagated through this layer.
III
Loss Computation
The topology loss is computed as:
L topo = 1 8 i = 1 8 ( x , y ) C i ( M ) x , y
where C i ( M ) x , y represents the value of the convolution output at position ( x , y ) . Since convolution and summation are linear operations, L topo remains differentiable with respect to M . For each convolution operation, the gradient of the loss L topo with respect to M is computed using the chain rule:
L topo M = i = 1 8 K i L topo C i ( M )
The final gradient of the topology loss with respect to the original mask M is:
L topo M = L topo M · M M
Since both components are differentiable, the overall gradient is well-defined, and the optimization function can be represented as:
θ * = arg min θ L topo ( θ )
where θ represents the parameters of the model and θ * is the optimal set of parameters that minimize the loss.
Including L topo in the model’s training process is conditional on the scale belonging to the second half of the GAN pyramid, which in our case studies corresponds to level four and beyond, as we had a total of eight levels according to the dimensions of the input image. This implementation strategy is based on empirical results indicating that introducing the topology loss at earlier stages can disrupt the learning of foundational features. By applying L topo starting from the second half of the pyramid, the model has already developed basic structures, allowing L topo to effectively enhance topological accuracy without impeding initial convergence. The architecture of the developed loss function is depicted in Figure 3.

2.2. Evaluation

2.2.1. Node Topology Clustering (NTC)

GAN evaluation metrics focusing on the topological structure of the synthetic images are rare. Conventionally, pretrained CNN-based models [36] are used to evaluate the quality of images generated by GANs. Inception score (IS) [37] and Frechet Inception Distance (FID) [38] are common methods of this type. IS calculates the Kullback–Leibler divergence between the conditional and marginal class distributions across the generated data. Likewise, FID uses an Inception network pre-trained on ImageNet to map images into a feature space [39]. These models are biased toward the Inception model’s feature space, while the topological properties are not guaranteed to be preserved in the feature space [32]. Therefore, recent evaluation metrics have been developed to focus more on the topological structure of the generated images to provide a more reliable quantitative description of the model performance [23,32]. These metrics are mostly based on topological data analysis (TDA), which generally uses persistent homology (PH) to measure the similarity of the distribution of persistence diagrams for real and generated images.
However, while PH-based metrics focusing on topology can provide reliable evaluations, they have limitations. These metrics are typically based on the birth and death of holes and simplices at different scales, forming the final barcodes. Consequently, the formation of these barcodes is influenced not only by topological relationships but also by geometric properties such as shape and size [40,41]. In the context of GANs, the inherent stochasticity produces diverse images, which may include both geometric and topological variations. Moreover, TDA- and PH-based evaluation methods, such as Betti number error, are not well-suited for assessing models like ours, which consider the presence of a terminal node as a topological “error”. TDA primarily focuses on features such as connected components and holes without explicitly accounting for terminal nodes, which we consider critical in evaluating topological accuracy. Therefore, relying solely on TDA-based metrics may not fully capture the quality and variability of the generated images. In this paper, we propose Node Topology Clustering (NTC), a novel topology-aware metric to evaluate the performance of GAN models focusing on topology and structural integrity.
The NTC calculation begins with converting the input binary mask into an undirected graph. In this graph, nodes represent critical points such as terminals and junctions, and edges represent the connections between these points using straight lines. Nodes with a degree of one, referred to as terminals, are of particular interest because they indicate points of disconnectivity within the graph. Using these terminal nodes, we define a new measure called Terminal Distance (TD), which captures the distance from each node to its nearest terminal (Figure 4).
Definition 1 (Terminal Distance (TD)).
The Terminal Distance (TD) quantifies the geodesic distance [42] in a graph, derived from a binary image mask, between each node and its nearest terminal node. Let G be an undirected graph constructed from a binary mask M, where nodes represent critical points such as terminals and junctions. The Terminal Distance T D ( v ) for a node v s . G is computed as:
T D ( v ) = min t T d ( v , t )
where T is the set of all terminal nodes and d ( v , t ) is the geodesic distance between nodes v s . and t.
Using the TD values and the graph’s adjacency matrix, a local Moran’s I clustering analysis, also known as Local Indicators of Spatial Association (LISA) [43], is conducted to identify patterns and clusters based on topological relationships among nodes. This analysis categorizes nodes into High-High (HH), Low-High (LH), Low-Low (LL), and High-Low (HL) clusters, reflecting nodes with similar or contrasting TD values and their neighbors with a significance level of 95%. Due to the nature of the TD metric, the input graphs must have at least one terminal node. To ensure this, we randomly connected a single terminal node to each graph. In Appendix A, we demonstrate that the addition and position of this terminal node do not significantly affect the functionality of the proposed evaluation metric (Table A1). Since the predicted masks may have different node counts, we normalize cluster sizes into percentages. Finally, the NTC index is measured by calculating the multidimensional Euclidean distance between the four LISA cluster values of the predicted mask and the ground truth (GT) mask. This Euclidean distance reflects differences in the LISA attributes in multidimensional space, not spatial features.
To evaluate the significance of our proposed NTC topological features, we created graphs in seven groups of extreme cases based on the topological distribution of terminal nodes in a two-dimensional grid of 15 by 15 nodes. These cases include random, dispersed, clustered, clustered-star, clustered-web, isolated edges, and randomly split-edge distributions, detailed in Appendix B, Figure A1. We generated 1000 instances of each group. We used Support Vector Machines (SVM) [44] with 10-fold cross-validation to ensure consistency with previous studies that developed feature extraction methods for graph structural clustering [45,46]. We employed a suite of five network metrics including network density, average clustering coefficient, transitivity, modularity, and assortativity, as proposed by [45]. Network density measures the ratio of actual connections to possible connections, indicating how interconnected the network is [47]. The average clustering coefficient indicates the degree to which nodes cluster together, offering insights into tightly connected groups [48]. Transitivity assesses the probability that adjacent nodes are interconnected, indicating global clustering [49]. Modularity measures the strength of division into communities, and assortativity evaluates the tendency of nodes to connect with others of similar degrees, highlighting hierarchical structures [50].
Additionally, we calculated interval probabilities, known as Degree Distribution Quantile Concentration (DDQC) features, to provide a detailed view of the degree distribution within the network [45]. DDQC, by segmenting node degrees into specific intervals based on statistical properties such as the mean and standard deviation, generates a histogram quantifying the proportion of nodes in each interval. These interval probabilities normalize the histogram values and offer a probabilistic perspective of how node degrees are distributed across the network. Later in this paper, we use these graph features to measure the similarity of graphs generated by GAN models.

2.2.2. Modified Fréchet Inception Distance (FID)

In this work, we employed a modified version of the Fréchet Inception Distance (FID) [38] to evaluate the quality of the images generated by TopoSinGAN compared to those from the baseline SinGAN. This modification accounts for having only a single real input image alongside multiple generated images. Feature representations were extracted using a pre-trained InceptionV3 model for both the real and generated images, and the mean and covariance of these activations were computed accordingly.
Next, we calculated the mean and covariance of the activations for both the real image and the generated images. Since there was only one real image, its covariance matrix was set as the identity matrix of appropriate dimensions to ensure a valid comparison. The modified FID score was then computed by measuring the distance between the means and covariances of the real and generated activations, incorporating the Fréchet distance formulation. To handle numerical issues arising from the matrix square root calculation in the covariance product, adjustments were made to work with real components only if complex values appeared. This modified approach enabled us to evaluate the quality and diversity of generated images relative to the single reference image in a statistically meaningful manner.

2.3. Experimental Setup

For this research, we considered two experiments to showcase the practicality of the developed model. These two experiments include an agricultural fields’ image, and the second experiment uses a dendrology image. We also use the CREMI [51] dataset to compare our model with TopoGAN [32] and WGAN.

2.3.1. Agricultural Fields

For this experiment, we used a 175 × 240 pixel 3-meter resolution XYZ tile downloaded using QGIS 3.34.0 software from agricultural fields of Tehran province, Iran. This image contains red, green, and blue (RGB) channels. We manually annotated the image delineating agricultural fields in the image and appended it as the fourth channel to this image. Notably, the agricultural plots within this region present considerable variation in shape and size, making them intricate subjects for accurate prediction. This study area is especially challenging given the intrinsic differences in field geometries, topological arrangements, irrigation systems, and the diverse range of crops.

2.3.2. Dendrology

In our research, we employed a single gigapixel macro photography (GMP) image, measuring 175 × 240, as developed by [52]. This image, a part of a large dataset featuring an ultra-high resolution of 19,812 dpi, was manually cropped and annotated. It encompasses a four-channel composition, including RGB and an additional mask channel that delineates cellular structures of xylem vessels in hardwood tree sections. In the field of dendrology, such images are invaluable, offering insights ranging from tree age to environmental adaptations, as discussed by [53]. A typical 1 mm2 area of a tree ring contains approximately 750 vessels, presenting significant challenges in manual delineation. This complexity underscores the necessity of leveraging deep learning methodologies for efficient and accurate data synthesis in dendrology studies.

2.4. System Setup

2.4.1. System Configurations

In our study, we utilized the high-performance computing resources of the Minnesota Supercomputing Institute (MSI). The computational backbone of our research was supported by the NVIDIA A100 GPU (NVIDIA, Santa Clara, CA, USA) on the Agate cluster, specifically within the a100-4 partition. For reproducibility purposes, detailed documentation about the system configuration can be found at their website [54].

2.4.2. Training Procedure

Our training framework employs a multiscale approach. It starts at a base scale and progressively refines through larger scales. This progression is parameter-controlled, which defines the number of scales and when to stop refining. The Discriminator is trained with both real and synthetically generated images, incorporating traditional adversarial training steps and a gradient penalty for stability. The Generator is then trained to produce images that increasingly challenge the Discriminator’s detection capabilities. Additional loss functions, such as topological and reconstruction losses, are applied conditionally based on the scale and training parameters. Coefficients λ 1 , λ 2 , and λ 3 in the loss function (Equation (1)) were set to 10, 5, and 0.145, respectively, in our experiments. These values were determined through extensive hyperparameter tuning, carefully balancing adversarial, topological, and reconstruction components to preserve topological features and maintain overall image quality, as reflected in improved NTC scores discussed in Section 3.2. In Section 3.5, we provide a more detailed discussion on the hyperparameter tuning process for λ 3 .

3. Results

3.1. NTC-Based Graph Classification

The graph classification results, as depicted in Figure 5, illustrate the performance of different feature sets in categorizing graph structures using a Support Vector Machine (SVM) classifier. The “Features” set, which includes basic network attributes, achieved an accuracy of 71.35%. The addition of Degree Distribution Quantile Concentration (DDQC) features, labeled as “Features + DDQC”, significantly improved the classification accuracy to 94.73%. The addition of Node Topology Clustering (NTC) features shown as “Features + DDQC + NTC” achieved the highest accuracy of 98.39%. These results suggest that incorporating NTC graph features enhances the model’s ability to distinguish between different graph categories with disparate topological structures.

3.2. TopoSinGAN Performance Evaluation Using NTC

The qualitative evaluation of synthetic images generated by SinGAN and TopoSinGAN reveals notable differences, as illustrated in Figure 6. While SinGAN outputs visually resemble the input images, they exhibit topological inaccuracies upon closer inspection. These inaccuracies include discontinuities along field boundaries and isolated line segments within the generated masks. Such anomalies pose a risk to applications that depend on precise topological structures.
In contrast, TopoSinGAN demonstrates improved topological coherence. It maintains the continuity of agricultural field boundaries without introducing unintended gaps. Additionally, it minimized the number of abnormally isolated line segments within fields. Quantitative assessment using the NTC metric reveals a substantial improvement in accuracy with TopoSinGAN. Specifically, for agricultural and dendrology cases, the NTC indexes between the GT and the 1000 generated images were 15.15 and 14.55, respectively, for SinGAN. These distances are significantly reduced to 3.94 and 2.44 with TopoSinGAN. Lower NTC values indicate higher topological accuracy, thereby demonstrating the effectiveness of TopoSinGAN in enhancing topological coherence. The detailed NTC evaluation results are summarized in Table 1.
To further evaluate the performance of our model and facilitate comparison with existing approaches, we conducted experiments using the CREMI dataset, assessing the results through the NTC metric compared with the TopoGAN and WGAN models. The comparative results are summarized in Table 2. Our model demonstrated improved NTC scores, recording an NTC of 8.15, compared to TopoGAN, which achieved an NTC of 11.05. The WGAN and plain SinGAN models recorded similar NTC scores of 15.76 and 15.46, respectively. Additionally, Figure 7 presents sample generated masks produced by each model, visually comparing their performance.

3.3. Modified FID Evaluation

The experiments were performed on both case studies, dendrology, and agricultural fields, with the TopoSinGAN and the baseline SinGAN models trained on each input image for 10 runs, generating 1000 synthetic images per experiment. The measured average FID values were recorded in Table 3.
As shown in Table 3, the average FID for the agricultural fields image, TopoSinGAN achieved an average FID value of 0.1914, compared to 0.2485 for the baseline SinGAN. The agricultural fields’ image is characterized by significant variations in color, texture, and geometric patterns, leading to inherently higher FID values. The lower FID score for TopoSinGAN suggests that the topologically improved model produced more realistic images compared to SinGAN. Specifically, TopoSinGAN’s ability to preserve topological structures while maintaining visual diversity led to a reduction of 0.0571 in FID, demonstrating its effectiveness in generating images with enhanced topological accuracy while retaining natural variation.
The average FID for the dendrology experiment with TopoSinGAN was 0.0013, while the baseline SinGAN produced a similar value of 0.0014. Given the inherent similarity in the structure and geometry of xylem cells in the dendrology image, small FID values are expected. The similarity between the two models indicates that TopoSinGAN maintains the structural fidelity of the generated xylem images.

3.4. Comparative Efficiency Analysis

To assess the efficiency of our method relative to the original SinGAN model, we conducted experiments where the model processed input images of varying dimensions: 175 × 240, 350 × 400, and 500 × 500. The SinGAN model’s architecture, dictated by a scale factor parameter set by default to 0.75, generated 8, 11, and 12 scales for the respective image dimensions. At each scale of the pyramid, the training process ran for 1000 iterations. In all three experimental setups, topology loss was deliberately excluded during the first four scales to allow the model to form basic topological features without constraints. It was then introduced at finer scales (5–12) to refine the details while preserving structural integrity. All other configurations and parameters remained consistent with those detailed in Section 1.1. The experiment involved running the model on each image size ten times. Table 4 outlines the average recorded execution times for each configuration. On average, our TopoSinGAN model demonstrated an approximate 9.41%, 7.29%, and 5.98% increase in training time compared to the baseline SinGAN model for 175 × 240, 350 × 400, and 500 × 500 dimensions, respectively.

3.5. Hyperparameter Tuning of λ 3 for CREMI Experiment

According to our experiments, the value of λ 3 , the coefficient of L topo in Equation (1), depends on factors such as input image dimensions and the density of line features in the real mask. Therefore, hyperparameter tuning is crucial for each case study. Figure 8 shows the tuning process of the λ 3 parameter for the CREMI experiment. In this experiment, we gradually increased λ 3 by 0.2, starting from 0 (baseline SinGAN), while keeping the other two hyperparameters constant, and recorded the average FID and NTC values for each trial.
In Figure 8, the blue line represents FID scores, which are initially low for SinGAN. As λ 3 increases to 0.2, the FID worsens to 0.1801, while the NTC slightly improves to 14.9. The best balance of FID and NTC occurs at λ 3 = 0.4 , with FID at 0.1127 and NTC at 8.15. When λ 3 is further increased to 0.6, NTC decreases to 6.91, but the FID jumps to 0.15, indicating a negative impact on the realism of the generated contexts. At λ 3 = 0.8 , this trend continues, with NTC dropping to 6.02 and FID rising sharply to 0.21. It is important to note that higher values of λ 3 lead to model divergence.

4. Discussion

The research presented demonstrates the effectiveness of TopoSinGAN in enhancing topological accuracy through its integrated topology loss function. Unlike conventional losses, such as adversarial or reconstruction losses [13,55], the topology loss optimizes for lower terminal counts using its specialized convolutional kernels. This approach allows TopoSinGAN to minimize topological anomalies that could otherwise go unnoticed, thereby addressing a critical need in applications requiring high topological integrity. While SinGAN retains visual similarities, structural discrepancies can arise without dedicated topological supervision, which TopoSinGAN successfully mitigates by penalizing these errors.
A key feature of the topology loss function is its full differentiability, which facilitates end-to-end training via backpropagation. This seamless integration within the training cycle enables direct optimizations based on topological feedback, surpassing methods that rely solely on post-processing. The end-to-end nature of this approach ensures that topological feedback is continually refined throughout the training process, leading to more robust and accurate models. Furthermore, our study introduces a novel GAN evaluation metric focused purely on the topological attributes of the images, excluding geometric variations. This metric, termed the NTC score, provides a more realistic judgment of GAN models by quantifying topological integrity. Consistent improvements in NTC scores indicate the quantitative effectiveness of TopoSinGAN, as lower NTC values correspond to reduced loop openness and fewer topological anomalies. Importantly, these lower NTC scores denote higher topological accuracy, not merely structural similarity. Improved NTC scores reported in the comparative evaluation of TopoSinGAN with plain SinGAN, WGAN, and TopoGAN further highlights TopoSinGAN’s effectiveness in generating topologically enhanced images. According to Table 3, SinGAN and WGAN recorded similar NTC scores as they both use similar adversarial losses [13].
In Appendix C, we use the common graph metrics to demonstrate that our proposed topology loss has no significant negative impact on the learning flow of the generator compared to the original SinGAN (Table A2). The high similarity of common graph metrics for SinGAN and TopoSinGAN outputs suggests two observations. First, both models effectively replicate the general structure of real input images. This signifies that our topology loss has not significantly affected the realism of the generated image contexts which aligns with the result of modified FID evaluation. Second, it also reveals a limitation of common graph metrics to capture the topological inaccuracies within mask graphs. The NTC metric addresses this gap by focusing on a graph’s topological structure, offering a more nuanced evaluation of topological accuracy. Unlike traditional metrics that compare graphs or graphlets for similarity [56,57,58,59], the NTC metric emphasizes topological integrity, making it a specialized tool for assessing the precision of topological features in a graph.
Additionally, the topological loss function in TopoSinGAN is uniquely characterized by its standalone application, assessing the model’s topological loss based solely on the generated mask channel. This approach penalizes the model for topological inaccuracies without comparing the generated masks to real inputs, focusing on minimizing the occurrence of terminal nodes. Despite the modest additional computational cost, its benefits in preserving the integrity of topological features make it a valuable tool for various applications demanding precise structural fidelity.
Nevertheless, TopoSinGAN has certain limitations that should be considered. Its treatment of hanging lines as anomalies makes it particularly suitable for applications requiring closed loops, such as agricultural field delineation or membrane cell segmentation. However, for applications where hanging lines are expected, such as in road network analysis, TopoSinGAN may not be the optimal choice. This specificity underscores the importance of context in applying any approach including TopoSinGAN, highlighting its potential limitations in certain scenarios. This limitation is due to the design of the current optimization function (Equation (10)), which is designed to minimize the number of terminal nodes in the generated mask. To extend the applicability of our method to scenarios where hanging lines are expected, a possible adaptation could modify the optimization function to minimize the difference between the number of terminal nodes in the real input mask and those in the generated mask. For example, an adaptation of the optimization function (Equation (10)) could be expressed as:
θ * = arg min θ L topo Real ( θ ) L topo Generated ( θ )
This approach could allow the model to preserve a desired number of terminal nodes rather than eliminating them entirely. However, implementing and validating this adaptation is beyond the scope of the current research, but could be addressed in future work. Another limitation of TopoSinGAN relates to the thickness of the boundary mask in the input real image. Our loss function uses fixed 3 × 3 terminal detector kernels, which are sensitive to boundary thickness, potentially leading to suboptimal performance in certain cases. Two potential adaptations could be to convert the mask layer to a centerline (skeletonized) binary mask or introduce larger versions of the kernels, such as 4 × 4 or 5 × 5. Developing a version of TopoSinGAN that is robust to mask thickness will also be a focus of future work.

5. Conclusions

In this study, we introduced TopoSinGAN, an extension of SinGAN that incorporates a differentiable topology loss function to enhance topological accuracy in synthetic image generation. By minimizing terminal nodes, TopoSinGAN effectively addresses topological anomalies that conventional GAN losses often overlook, without significantly increasing computational complexity. Another contribution of this work is the proposal of the Node Topology Clustering (NTC) metric, which focuses on evaluating topological integrity rather than just geometric similarity. Our experiments, involving agricultural and dendrological images, demonstrated that TopoSinGAN achieves significantly better topological fidelity than SinGAN, as evidenced by lower NTC indexes. This indicates fewer instances of loop openness and isolated line segments, which are crucial for applications requiring precise topological features. While TopoSinGAN excels in preserving topological integrity, it may not be ideal for contexts where open structures are expected. Nevertheless, it represents a crucial advancement in generating topologically accurate synthetic data, with potential applications in GIS, medical imaging, and other fields where topological accuracy is essential. Future work will focus on refining the topology loss function and extending its application to a broader range of tasks, such as image semantic segmentation. Additionally, we plan to adapt the proposed model to overcome the limitations outlined in Section 4. Moreover, the development of more sophisticated evaluation metrics for topological accuracy will remain a priority, aiming to enhance the robustness and applicability of generative models across diverse fields.

Author Contributions

Conceptualization, M.A. and E.S.; methodology, M.A.; software, M.A.; validation, M.A. and E.S.; formal analysis, M.A.; investigation, M.A. and E.S.; resources, E.S.; data curation, M.A.; writing—original draft preparation, M.A.; writing—review and editing, M.A. and E.S.; visualization, M.A.; supervision, E.S.; project administration, M.A. and E.S.; funding acquisition, E.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The code and data for this study are available in the project’s GitHub repository [60].

Acknowledgments

The authors would like to thank the Minnesota Supercomputing Institute (MSI) and the South Dakota State University Research Cyberinfrastructure (RCi) for providing the high-performance computing resources essential for our experimentation. We also extend our gratitude to everyone who contributed to the success of this project, including Fan Wang, Pegah Salehi, and Sajad Amouei Sheshkal for their valuable insights.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
GANGenerative adversarial network
GTGround truth
TDTerminal distance
NTCNode topology clustering
PHPersistence homology
STDStandard deviation
GISGeographic information science
TDATopological data analysis
MSIMinnesota supercomputing institute
FIDFréchet inception distance
RGBRed, green, blue

Appendix A. Demonstration of Independence of NTC Metric on the Placement of the Randomly Added Terminal Node

To verify that the addition of a terminal node does not significantly impact the proposed topological evaluation metric, we randomly added a terminal node to both case study graphs 1000 times and calculated the percentage of LISA clusters. The results from both experiments, presented in Table A1, indicate that the percentage of clusters remained consistent, with a similar mean and a small standard deviation (STD).
Table A1. LISA cluster analysis results for 1000 iterations with randomly added terminal nodes.
Table A1. LISA cluster analysis results for 1000 iterations with randomly added terminal nodes.
Cluster TypeAgricultural FieldsDendrology
MeanSTDMeanSTD
HH22.142.9820.041.71
LL18.911.8418.761.78
LH0.0170.2200
HL0000

Appendix B. Extreme Cases Considered for Graph Classification

Figure A1 illustrates an instance of the seven extreme cases used to simulate various topological structures of 2D graphs.
Figure A1. The seven extreme topological distributions of terminal nodes in a 15 × 15 2D grid, including random, dispersed, clustered, clustered-star, clustered-web, isolated edges, and randomly split-edge distributions.
Figure A1. The seven extreme topological distributions of terminal nodes in a 15 × 15 2D grid, including random, dispersed, clustered, clustered-star, clustered-web, isolated edges, and randomly split-edge distributions.
Applsci 14 09944 g0a1

Appendix C. Demonstrating the Non-Disruptive Effect of Topology Loss on SinGAN Learning

To ensure that our proposed TopoLoss does not negatively impact the learning process, we first measure the common graph metrics specified in Section 2.2.1 (excluding NTC features). We exclude the NTC features from the calculations here to evaluate the similarity of the general attributes in the generated masks. Next, we use cosine similarity metric to measure the similarity between the GTs and synthetic images generated by both GAN models. The results shown in Table A2 indicate that SinGAN and TopoSinGAN exhibit similar performance when NTC features are excluded from the evaluation, indicating comparable efficacy in replicating common graph metrics.
Table A2. Cosine similarity between GT and 1000 synthetic images for SinGAN and TopoSinGAN.
Table A2. Cosine similarity between GT and 1000 synthetic images for SinGAN and TopoSinGAN.
SinGANTopoSinGAN
MeanSTDMeanSTD
Agricultural0.98730.00230.98950.0001
Dendrology0.98920.00110.98910.0004

References

  1. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar]
  2. Xu, M.; Yoon, S.; Fuentes, A.; Park, D.S. A Comprehensive Survey of Image Augmentation Techniques for Deep Learning. Pattern Recognit. 2023, 137, 109347. [Google Scholar] [CrossRef]
  3. Liu, H.; Wan, Z.; Huang, W.; Song, Y.; Han, X.; Liao, J. PD-GAN: Probabilistic Diverse GAN for Image Inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9371–9381. [Google Scholar]
  4. Ren, Y.; Wu, J.; Zhang, P.; Zhang, M.; Xiao, X.; He, Q.; Wang, R.; Zheng, M.; Pan, X. UGC: Unified GAN Compression for Efficient Image-to-Image Translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 17281–17291. [Google Scholar]
  5. Mahapatra, D.; Ge, Z. Training Data Independent Image Registration with GANs Using Transfer Learning and Segmentation Information. In Proceedings of the 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), Venice, Italy, 8–11 April 2019; pp. 709–713. [Google Scholar] [CrossRef]
  6. Jain, M.; Meegan, C.; Dev, S. Using GANs to Augment Data for Cloud Image Segmentation Task. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 3452–3455. [Google Scholar] [CrossRef]
  7. Zhaoa, Z.; Wang, Y.; Liu, K.; Yang, H.; Sun, Q.; Qiao, H. Semantic Segmentation by Improved Generative Adversarial Networks. arXiv 2021, arXiv:2104.09917. [Google Scholar]
  8. Majurski, M.; Manescu, P.; Padi, S.; Schaub, N.; Hotaling, N.; Simon, C.; Bajcsy, P. Cell Image Segmentation Using Generative Adversarial Networks, Transfer Learning, and Augmentations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
  9. Thambawita, V.; Salehi, P.; Sheshkal, S.A.; Hicks, S.A.; Hammer, H.L.; Parasa, S.; de Lange, T.; Halvorsen, P.; Riegler, M.A. Singan-Seg: Synthetic Training Data Generation for Medical Image Segmentation. PLoS ONE 2022, 17, e0267976. [Google Scholar] [CrossRef] [PubMed]
  10. Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
  11. You, C.; Li, G.; Zhang, Y.; Zhang, X.; Shan, H.; Li, M.; Ju, S.; Zhao, Z.; Zhang, Z.; Cong, W.; et al. CT Super-Resolution GAN Constrained by the Identical, Residual, and Cycle Learning Ensemble (GAN-CIRCLE). IEEE Trans. Med. Imaging 2020, 39, 188–203. [Google Scholar] [CrossRef]
  12. Dy, J.B.; Virtusio, J.J.; Tan, D.S.; Lin, Y.-X.; Ilao, J.; Chen, Y.-Y.; Hua, K.-L. MCGAN: Mask Controlled Generative Adversarial Network for Image Retargeting. Neural Comput. Appl. 2023, 35, 10497–10509. [Google Scholar] [CrossRef]
  13. Shaham, T.R.; Dekel, T.; Michaeli, T. SinGAN: Learning a Generative Model From a Single Natural Image. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4570–4580. [Google Scholar] [CrossRef]
  14. Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.-Y.; et al. Segment Anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 4015–4026. [Google Scholar]
  15. Mo, Y.; Wu, Y.; Yang, X.; Liu, F.; Liao, Y. Review the State-Of-The-Art Technologies of Semantic Segmentation Based on Deep Learning. Neurocomputing 2022, 493, 626–646. [Google Scholar] [CrossRef]
  16. Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved Training of Wasserstein GANs. Adv. Neural Inf. Process. Syst. 2017, 30, 5767–5777. [Google Scholar]
  17. Liu, C.; Ma, B.; Ban, X.; Xie, Y.; Wang, H.; Xue, W.; Ma, J.; Xu, K. Enhancing Boundary Segmentation for Topological Accuracy with Skeleton-Based Methods. arXiv 2024, arXiv:2404.18539. [Google Scholar]
  18. Mosinska, A.; Marquez-Neila, P.; Kozinski, M.; Fua, P. Beyond the Pixel-Wise Loss for Topology-Aware Delineation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3136–3145. [Google Scholar]
  19. Hu, X. Structure-Aware Image Segmentation with Homotopy Warping. Adv. Neural Inf. Process. Syst. 2022, 35, 24046–24059. [Google Scholar]
  20. Costea, D.; Marcu, A.; Leordeanu, M.; Slusanschi, E. Creating Roadmaps in Aerial Images with Generative Adversarial Networks and Smoothing-Based Optimization. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy, 22–29 October 2017. [Google Scholar]
  21. Park, E. Refining Inferred Road Maps Using GANs. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2019. [Google Scholar]
  22. Guo, X.; Zhou, R. Data Augmentation Method for Extracting Partially Occluded Roads From High Spatial Resolution Remote Sensing Images. IEEE Access 2023, 11, 79232–79239. [Google Scholar] [CrossRef]
  23. Patel, H.; Farrelly, C.; Hathaway, Q.A.; Rozenblit, J.Z.; Deepa, D.; Singh, Y.; Chaudhary, A.; Himeur, Y.; Mansoor, W.; Atalls, S. Topology-Aware GAN (TopoGAN): Transforming Medical Imaging Advances. In Proceedings of the 2023 Tenth International Conference on Social Networks Analysis, Management and Security (SNAMS), Abu Dhabi, United Arab Emirates, 21–24 November 2023; pp. 1–3. [Google Scholar] [CrossRef]
  24. Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
  25. Lim, J.H.; Ye, J.C. Geometric GAN. arXiv 2017, arXiv:1705.02894. [Google Scholar]
  26. Qi, G.-J.; Zhang, L.; Hu, H.; Edraki, M.; Wang, J.; Hua, X.-S. Global versus Localized Generative Adversarial Nets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1517–1525. [Google Scholar]
  27. Kossaifi, J.; Tran, L.; Panagakis, Y.; Pantic, M. GAGAN: Geometry-Aware Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 878–887. [Google Scholar]
  28. Fu, H.; Gong, M.; Wang, C.; Batmanghelich, K.; Zhang, K.; Tao, D. Geometry-Consistent Generative Adversarial Networks for One-Sided Unsupervised Domain Mapping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2427–2436. [Google Scholar]
  29. Chen, W.; Yu, S.; Wu, J.; Ma, K.; Bian, C.; Chu, C.; Shen, L.; Zheng, Y. TR-GAN: Topology ranking GAN with triplet loss for retinal artery/vein classification. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, 4–8 October 2020; pp. 616–625. [Google Scholar]
  30. Chen, W.; Yu, S.; Ma, K.; Ji, W.; Bian, C.; Chu, C.; Shen, L.; Zheng, Y. TW-GAN: Topology and width aware GAN for retinal artery/vein classification. Med. Image Anal. 2022, 77, 102340. [Google Scholar] [CrossRef]
  31. Liu, W.; Chen, P.Y.; Yu, F.; Suzumura, T.; Hu, G. Learning graph topological features via GAN. IEEE Access 2019, 7, 21834–21843. [Google Scholar] [CrossRef]
  32. Wang, F.; Liu, H.; Samaras, D.; Chen, C. TopoGAN: A Topology-Aware Generative Adversarial Network. In Computer Vision—ECCV 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 118–136. [Google Scholar] [CrossRef]
  33. Hu, X.; Li, F.; Samaras, D.; Chen, C. Topology-Preserving Deep Image Segmentation. Adv. Neural Inf. Process. Syst. 2019, 32, abs/1906.05404. [Google Scholar]
  34. Clough, J.R.; Byrne, N.; Oksuz, I.; Zimmer, V.A.; Schnabel, J.A.; King, A.P. A Topological Loss Function for Deep-Learning Based Image Segmentation Using Persistent Homology. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 8766–8778. [Google Scholar] [CrossRef]
  35. Bao, J.; Wang, Z.; Wang, J.; Yan, C. Persistent Homology Based Generative Adversarial Network. In Proceedings of the VISIGRAPP (4: VISAPP), Lisbon, Portugal, 19–21 February 2023; pp. 196–203. [Google Scholar]
  36. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar] [CrossRef]
  37. Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved Techniques for Training GANs. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
  38. Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
  39. Horak, D.; Yu, S.; Salimi-Khorshidi, G. Topology Distance: A Topology-Based Approach for Evaluating Generative Adversarial Networks. AAAI 2021, 35, 7721–7728. [Google Scholar] [CrossRef]
  40. Cerri, A.; Di Fabio, B.; Jabłoński, G.; Medri, F. Comparing Shapes through Multi-Scale Approximations of the Matching Distance. Comput. Vis. Image Underst. 2014, 121, 43–56. [Google Scholar] [CrossRef]
  41. Sheehy, D.; Kisielius, O.; Cavanna, N.J. Computing the Shift-Invariant Bottleneck Distance for Persistence Diagrams. In Proceedings of the Canadian Conference on Computational Geometry, Winnipeg, MB, Canada, 8–10 August 2018; pp. 78–84. [Google Scholar]
  42. Bouttier, J.; Di Francesco, P.; Guitter, E. Geodesic Distance in Planar Graphs. Nucl. Phys. B 2003, 663, 535–567. [Google Scholar] [CrossRef]
  43. Anselin, L. Local Indicators of Spatial association—LISA. Geogr. Anal. 1995, 27, 93–115. [Google Scholar] [CrossRef]
  44. Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support Vector Machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef]
  45. Aliakbary, S.; Habibi, J.; Movaghar, A. Feature Extraction from Degree Distribution for Comparison and Analysis of Complex Networks. Comput. J. 2015, 58, 2079–2091. [Google Scholar] [CrossRef]
  46. Attar, N.; Aliakbary, S. Classification of Complex Networks Based on Similarity of Topological Network Features. Chaos 2017, 27, 091102. [Google Scholar] [CrossRef]
  47. Paul, E.; Rényi, A. On the Central Limit Theorem for Samples from a Finite Population. Sel. Pap. Alfréd Rényi 1959, 353, 49–61. [Google Scholar]
  48. Watts, D.; Strogatz, S. Collective Dynamics of “small-World” Networks. Nature 1998, 393, 440–442. [Google Scholar] [CrossRef]
  49. Costa, L.d.F.; Rodrigues, F.A.; Travieso, G.; Villas Boas, P.R. Characterization of Complex Networks: A Survey of Measurements. Adv. Phys. 2007, 56, 167–242. [Google Scholar] [CrossRef]
  50. Newman, M.E.J. Modularity and Community Structure in Networks. Proc. Natl. Acad. Sci. USA 2006, 103, 8577–8582. [Google Scholar] [CrossRef]
  51. Miccai Challenge on Circuit Reconstruction from Electron Microscopy Images. Available online: http://cremi.org/ (accessed on 23 October 2024).
  52. Griffin, D.; Porter, S.T.; Trumper, M.L.; Carlson, K.E.; Crawford, D.J.; Schwalen, D.; McFadden, C.H. Gigapixel Macro Photography of Tree Rings. Tree-Ring Res. 2021, 77, 86–94. [Google Scholar] [CrossRef]
  53. Hacke, U.G.; Spicer, R.; Schreiber, S.G.; Plavcová, L. An Ecophysiological and Developmental Perspective on Variation in Vessel Diameter. Plant Cell Environ. 2017, 40, 831–845. [Google Scholar] [CrossRef] [PubMed]
  54. Research Computing, Research and Innovation Office. Minnesota Supercomputing Institute (MSI)-Agate Cluster. Available online: https://msi.umn.edu/about-msi-services/high-performance-computing/agate (accessed on 23 October 2024).
  55. Li, Y.; Xiao, N.; Ouyang, W. Improved Generative Adversarial Networks with Reconstruction Loss. Neurocomputing 2019, 323, 363–372. [Google Scholar] [CrossRef]
  56. Yaveroğlu, Ö.N.; Milenković, T.; Pržulj, N. Proper Evaluation of Alignment-Free Network Comparison Methods. Bioinformatics 2015, 31, 2697–2704. [Google Scholar] [CrossRef]
  57. Yaveroğlu, Ö.N.; Malod-Dognin, N.; Davis, D.; Levnajic, Z.; Janjic, V.; Karapandza, R.; Stojmirovic, A.; Pržulj, N. Revealing the Hidden Language of Complex Networks. Sci. Rep. 2014, 4, 4547. [Google Scholar] [CrossRef] [PubMed]
  58. Przulj, N. Biological Network Comparison Using Graphlet Degree Distribution. Bioinformatics 2007, 23, e177–e183. [Google Scholar] [CrossRef]
  59. Kuchaiev, O.; Stevanović, A.; Hayes, W.; Pržulj, N. GraphCrunch 2: Software Tool for Network Modeling, Alignment and Clustering. BMC Bioinform. 2011, 12, 24. [Google Scholar] [CrossRef] [PubMed]
  60. Ahmadkhani, M.; Shook, E. TopoSinGAN Github Repository. Available online: https://github.com/mohsenumn/TopoSinGAN (accessed on 23 October 2024).
Figure 1. TopoSinGAN’s multiresolution architecture.
Figure 1. TopoSinGAN’s multiresolution architecture.
Applsci 14 09944 g001
Figure 2. (a) The eight 3 × 3 convolution kernels employed in the topology enhancer loss. Dark pixels represent 1 s and 0 s otherwise. (b) The convolution kernels’ performance for topology loss. The output of the kernel convolution (highlighted points) is overlaid on the input tensor for better visualization.
Figure 2. (a) The eight 3 × 3 convolution kernels employed in the topology enhancer loss. Dark pixels represent 1 s and 0 s otherwise. (b) The convolution kernels’ performance for topology loss. The output of the kernel convolution (highlighted points) is overlaid on the input tensor for better visualization.
Applsci 14 09944 g002
Figure 3. The architecture of the developed loss function.
Figure 3. The architecture of the developed loss function.
Applsci 14 09944 g003
Figure 4. Illustration of defined terminal distance (TD) in a simple graph.
Figure 4. Illustration of defined terminal distance (TD) in a simple graph.
Applsci 14 09944 g004
Figure 5. The result of SVM graph classification using different feature sets.
Figure 5. The result of SVM graph classification using different feature sets.
Applsci 14 09944 g005
Figure 6. The outputs of the SinGAN and TopoSinGAN models trained on a single real image for agricultural fields and dendrology experiments. The red frame highlights the TopoSinGAN results, emphasizing the improved topological accuracy in comparison with SinGAN.
Figure 6. The outputs of the SinGAN and TopoSinGAN models trained on a single real image for agricultural fields and dendrology experiments. The red frame highlights the TopoSinGAN results, emphasizing the improved topological accuracy in comparison with SinGAN.
Applsci 14 09944 g006
Figure 7. Sample outputs generated by TopoSinGAN and other GAN models. The single input used for the SinGAN and TopoSinGAN training is displayed on the left, while WGAN and TopoGAN were trained using the CREMI dataset. The red frame highlights the outputs of the TopoSinGAN model, showcasing its topological consistency relative to other models.
Figure 7. Sample outputs generated by TopoSinGAN and other GAN models. The single input used for the SinGAN and TopoSinGAN training is displayed on the left, while WGAN and TopoGAN were trained using the CREMI dataset. The red frame highlights the outputs of the TopoSinGAN model, showcasing its topological consistency relative to other models.
Applsci 14 09944 g007
Figure 8. Tuning of the λ 3 parameter for the CREMI experiment, showing its impact on FID (blue line) and NTC (red line). The best balance between FID and NTC is achieved at λ 3 = 0.4 for this experiment.
Figure 8. Tuning of the λ 3 parameter for the CREMI experiment, showing its impact on FID (blue line) and NTC (red line). The best balance between FID and NTC is achieved at λ 3 = 0.4 for this experiment.
Applsci 14 09944 g008
Table 1. Measured NTC indexes for 1000 synthetic images.
Table 1. Measured NTC indexes for 1000 synthetic images.
SinGANTopoSinGAN
Mean STDMeanSTD
Agricultural15.153.413.941.81
Dendrology14.553.072.441.35
Table 2. NTC evaluation of TopoSinGAN in comparison with other models on the CREMI dataset.
Table 2. NTC evaluation of TopoSinGAN in comparison with other models on the CREMI dataset.
SinGANWGANTopoGANTopoSinGAN
MeanSTDMeanSTDMeanSTDMeanSTD
CREMI15.462.8615.762.7711.052.808.152.01
Table 3. Average FID values over 10 runs for 1000 synthetic images per model for each case study.
Table 3. Average FID values over 10 runs for 1000 synthetic images per model for each case study.
SinGANTopoSinGAN
MeanSTDMeanSTD
Agricultural0.24850.00860.19140.0083
Dendrology0.00140.00130.00140.0018
Table 4. The results of comparative performance analysis.
Table 4. The results of comparative performance analysis.
Input DimensionsPyramid ScalesSinGANTopoSinGAN
(Minutes) (Minutes)
175 × 240 × 4820.9322.90
350 × 400 × 41165.1369.88
500 × 500 × 412108.11114.58
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ahmadkhani, M.; Shook, E. TopoSinGAN: Learning a Topology-Aware Generative Model from a Single Image. Appl. Sci. 2024, 14, 9944. https://doi.org/10.3390/app14219944

AMA Style

Ahmadkhani M, Shook E. TopoSinGAN: Learning a Topology-Aware Generative Model from a Single Image. Applied Sciences. 2024; 14(21):9944. https://doi.org/10.3390/app14219944

Chicago/Turabian Style

Ahmadkhani, Mohsen, and Eric Shook. 2024. "TopoSinGAN: Learning a Topology-Aware Generative Model from a Single Image" Applied Sciences 14, no. 21: 9944. https://doi.org/10.3390/app14219944

APA Style

Ahmadkhani, M., & Shook, E. (2024). TopoSinGAN: Learning a Topology-Aware Generative Model from a Single Image. Applied Sciences, 14(21), 9944. https://doi.org/10.3390/app14219944

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop