Conditional Community Search Based on Weight Information

Wang, Mengxiang; Ma, Dong; Fu, Qiang; Zong, Chuanyu

doi:10.3390/electronics13214321

Open AccessArticle

Conditional Community Search Based on Weight Information

¹

China National Institute of Standardization, Beijing 100191, China

²

School of Computer Science, Shenyang Aerospace University, Shenyang 110136, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2024, 13(21), 4321; https://doi.org/10.3390/electronics13214321

Submission received: 8 October 2024 / Revised: 27 October 2024 / Accepted: 2 November 2024 / Published: 4 November 2024

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Community search aims to identify cohesive subgraphs containing user-given query nodes in social networks. As information technology develops, user demands for community search have become increasingly sophisticated. The searched communities must not only meet the structural cohesiveness requirements but also adhere to some complex search conditions based on Boolean expressions. For example, certain desired nodes should be contained in the communities, while certain undesired nodes cannot exist in the communities, which is called conditional community search. However, existing solutions for conditional community search often introduce some undesired nodes into the identified communities and exhibit relatively low search efficiency. To overcome these drawbacks, therefore, this paper investigates the problem of conditional community search based on weight information. First, we refine the original problem definition of conditional community search and outline the need for an improved algorithm for calculating the weights of the nodes. Then, we explore two novel algorithms for searching conditional communities based on calculated weight information. Finally, we conduct extensive experiments on several real-world datasets to verify the accuracy and efficiency of our proposed searching algorithms.

Keywords:

community search; k-core; user search condition; weight information

1. Introduction

With the development of information networks, the community structure has become a very important component within these networks. It is now increasingly common to study community structures in social networks [1,2], biological networks, and communication networks. The goal of community search is to locate some cohesive subgraphs in the networks that meet the users’ requirements while taking into account the stated structural relationships between nodes and query nodes. The most commonly utilized cohesive subgraph structures in community search are k-core, k-clique, k-truss, and k-ECC [3]. A lot of studies have focused on the subgraph structures that desired communities should exhibit with most of these studies being based on the four cohesive structures mentioned above.

Most of the conditions established for community search allow users to specify one or more desired nodes that the community should contain; less attention has been given to users’ query conditions in community search. For example, users may require that certain nodes be contained in the community while simultaneously ensuring that specific nodes should not appear in the desired community. However, such conditions are increasingly insufficient to meet the expanding search requirements. In some cases, the users want to find some desired communities that can satisfy some more complex search conditions, such as requiring the community to include either node u and node v while excluding both nodes o and p. Current community search techniques do not adequately address these complex search requirements.

Motivations. To make the obtained community better meets the actual needs of users, therefore, Zhu et al. proposed a complex conditional community search method based on label propagation [4]. In this approach, node weights are introduced to distinguish the varying importance of each node, and a method for calculating these weights is proposed. For instance, Figure 1 illustrates a graph for community search where the user requires node A to be included in the community and node I cannot appear in the community. In this scenario, node A is assigned an initial weight of 1, node I an initial weight of

- 1

, and all other nodes an initial weight of 0. To ensure that as many nodes as possible in the graph have weight information, the method is implemented by employing multiple weight updates, with the current weight of each node being the average of the previous round weights of all its neighboring nodes. The solution proposed in [4] can satisfy the current requirements of conditional community search; however, it has some drawbacks. For example, consider node L in Figure 1. Since it is distant from the nodes specified in the user condition, its weight remains at 0 in the initial rounds because its neighbors do not change their weights. Consequently, it is inefficient to compute the weight of such a node in every round, as this leads to the unnecessary use of computational resources. Moreover, when conducting a community search in a large graph, there must exist many nodes similar to node L, so it is necessary to perform such computational optimization. To address these issues, we propose improvements to the method in this paper.

In addition, we identified a problem with the community search results when using the method proposed in [4]. The approach in [4] involves first obtaining a subgraph based on the qualified nodes (which satisfy the k-core requirement), which is followed by performing a community search on that subgraph by expanding outward from the necessary nodes to check whether their neighbors meet the k-core requirement. However, the method does not adequately handle unqualified nodes, which can lead to incorrect search results. For instance, Figure 2 illustrates a qualified subgraph for performing a 2-core community search, where node A is a required node. Nodes C and D are added to the community, and node B is also added to the community because it has two neighbors G and F. However, in the subsequent judgement, it is found that nodes G and F do not satisfy the k-core requirement. As a result, the degree of node B is affected by these unqualified nodes not being added to the community, causing node B to no longer satisfy the k-core requirement. Consequently, the original conditional community search strategy fails to produce the correct community result (

{A, C, D

}). To overcome the drawbacks mentioned above, therefore, we will investigate the problem of conditional community search based on the weight information in this paper.

Challenges and Contributions. The primary challenges in processing the problem of conditional community search based on weight information include the following: (1) how to design appropriate and effective search conditions for conditional community search; (2) how to develop incremental methods for updating the weights of nodes to avoid invalid calculations; (3) how to design efficient approaches for searching conditional communities based on weight information, ensuring both the precision of community results and optimal processing efficiency. The main contributions of this paper are summarized as follows.

The limitations of existing methods for conditional community search are addressed to provide users with more accurate conditional communities.
To improve the search efficiency, an improved method is proposed for effectively calculating the weights of the nodes.
Two improved methods are explored for performing conditional community search based on weight information.
Extensive experiments on three real graph datasets are conducted. The results demonstrate that our search methods are efficient in searching for conditional community in large graphs based on weight information.

The rest of the paper is organized as follows. In Section 2, we review related work in community search in graphs and conditional community search. In Section 3, we introduce some key definitions and formally present our problem definition. In Section 4, we explore effective algorithms for the proposed problem. In Section 5, we conduct extensive experiments on three real-world datasets to demonstrate the effectiveness and efficiency of both our proposed community structure and solutions. Finally, we conclude the paper and discuss the future work in Section 6.

2. Related Work

In this section, we review related work about community search in graphs and conditional community search.

2.1. Community Search in Graphs

The community search problem, which aims to identify tightly connected subgraphs with specified query nodes in a large graph, has attracted a lot of attention due to its applications becoming more widespread [5,6,7,8,9]. For various community models, community search efforts have predominantly focused on four classic structural cohesiveness metrics and their extensions: k-core [10,11,12], k-clique [13,14,15], k-truss [16,17,18], and k-ECC [19,20]. In response to the diversity of graph types, ref. [21] addressed the problem of attribute-sensitive community search in attributed heterogeneous information networks, ref. [22] searched for a maximal size constraint community over bipartite graphs, ref. [23] investigated the problem of a truss-based community search over streaming directed graphs, ref. [24] studied the community search problem in temporal graphs, and [24] researched the reliable community in dynamic networks. Traditional community search approaches have largely relied on data mining algorithms and indexing techniques, but recent studies have begun integrating advanced deep learning techniques into community search [25,26,27,28,29,30]. Despite these advancements, most existing community search methods are tailored to retrieve communities centered around given query nodes and cannot directly address the complexities of conditional community search based on weight information.

2.2. Conditional Community Search

Traditional community search methods require users to specify both the query nodes and the community’s structural cohesion parameters. Although numerous studies have explored this area [31,32,33], these approaches often fall short in addressing users’ actual needs. To overcome the limitation of merely specifying which types of nodes should be included in the community, Zhu et al. [4] introduced a new approach called conditional community search. This method formally defines the conditional community search problem using Boolean expressions, allowing users to express more complex requirements. For instance, it can specify that certain nodes must not be part of the community or that at least one of a given set of nodes should appear in the community. However, existing models and solutions for a conditional community search have some drawbacks as mentioned above. Therefore, in this paper, we investigate the problem of conditional community search based on weight information.

3. Problem Formulation

In this section, we first present some important definitions. Then, we present the problem definition of conditional community search.

Preliminary

Given an undirected graph

G (V, E)

, where V represents the set of nodes and E represents the set of edges, a graph H is considered a subgraph of G if

V (H) \subseteq V (G)

and

E (H) \subseteq E (G)

, which is denoted as

H \subseteq G

. In this paper, all the community search algorithms are designed to use the k-core structure to impose constrain on the cohesiveness of the community.

Definition 1

(k-core [34]). Given a graph G and a positive integer k, a subgraph H is defined as a k-core of G if it satisfies the following conditions:

(i): Degree condition. Every node $v \in H$ has at least k neighbors in H;
(ii): Connectivity. The subgraph H is connected;
(iii): Maximality. There exists no other subgraph $H^{'}$ satisfying the above two conditions, and $H \subset H^{'}$ .

In reference [4], the search condition is simplified. In conditional community search, the condition contains only two types of nodes: one is the nodes that must appear in the community, and the other is the nodes that are not allowed to appear in the community. To address more complex community search requirements of users, this paper extends the search condition as follows.

Definition 2

(Single conditional community search [4]). A conditional community search problem is termed a single conditional community search if there exists one and only one set of node variables whose values satisfy the search condition.

In this paper, our method is optimized for a single conditional community search, and we do not address the handing of complex conditions or the merging of final results.

According to the definition of single conditional community search, it is clear that the nodes in the condition are classified into two types: necessary nodes, which must be included in the community, and forbidden nodes, which cannot appear in the community. Therefore, the search condition of the community in our paper is defined as follows.

Definition 3

(Search condition). The condition is represented as two sets: one is the set of necessary nodes

i n_s e t = {v_{1}, v_{2}, \dots, v_{n}}

, and the other is the set of forbidden nodes

o u t_s e t = {v_{1}, v_{2}, \dots, v_{m}}

. The search condition is expressed as

S = {i n_s e t, o u t_s e t}

.

Additionally, to remove undesirable nodes from the graph, the user must specify a weight threshold,

λ

. Given the size of the graph, the proportion of nodes contained in the search condition is negligible, making it difficult to determine the weight of each node. To avoid excluding nodes that the query user may want to include in the community but that lack assigned weights, we would like to distribute the weights of the nodes set in the search condition. This approach enriches the weight information in the graph. The user also specifies the number of iterations for this weight distribution, which is denoted as y.

Definition 4

(Weight threshold). The weight threshold is denoted as λ. If node v appears in the conditional community result, the weight of node v after y rounds of computation is not less than λ.

Moreover, the original weight calculation method calculates the weight of a node by averaging the weights of its neighbor nodes, which introduces some unnecessary calculations in this process. To address this, we simplify the computation process.

Definition 5

(Conditional community). Given a graph

G (V, E)

, a search condition S, a cohesiveness parameter k, and a weight threshold λ, a subgraph H in G is a conditional community if the following conditions are satisfied:

(1): H contains all necessary nodes required in S;
(2): H is a k-core;
(3): For any node v in H, the weight of v is no less than the weight threshold λ.

Now, we present the problem definition of conditional community search.

Problem 1

(Conditional community search based on weight information). Given a graph

G (V, E)

, a search condition S, a cohesiveness parameter k, a weight threshold λ, and an iteration number y, the problem of conditional community search is to find a conditional community H from the graph G based on weight information.

4. Algorithm Design

In this section, we first improve the weight calculation method. Instead of updating the weights of all nodes in each round, we find out the nodes that that require weight updates and focus on updating only those. Additionally, we propose a method to obtain accurate community results, which serves as the foundation for our basic algorithm. Finally, we propose two effective algorithms to improve conditional community search, ensuring the accuracy of structural results while improving search efficiency. The architecture of our proposed approach is detailed in Figure 3. For an input conditional community search problem, first, based on the weight threshold

λ

, a set of qualified nodes

N_{Q}

is obtained by calculating the weight of each node in G. A basic algorithm BasicCCS is proposed by filtering nodes through backtracking to remove unqualified nodes. However, this method is inefficient due to repeated backtracking steps. To address this issue, we explore an improved algorithm ImCCS by initially deleting nodes based on the k-core constraint. However, it remains suboptimal when generating new subgraphs for conditional community search, as it requires verifying that a connected component contains all necessary nodes. To further improve performance, we explore an effective algorithm CDCCS based on core decomposition, which effectively addresses these issues.

4.1. Basic Algorithm for Conditional Community Search

In this section, we design and implement a basic method of conditional community search. Before performing the search, the search condition must be preprocessed. The basic method calculates the weight of a node in round i as the average of the weights of all its neighbors in round

i - 1

. The corresponding formula is presented in Equation (1). The default number of rounds y for calculating the weights is set to 4. Here,

W_{i} [v]

denotes the weight of node v in the i-th round.

W_{i} [v] = \frac{\sum_{u \in N_{G} (v)} W_{i - 1} [u]}{| N_{G} (v) |}

(1)

The weights of necessary nodes are set to 1, the weights of forbidden nodes are set to −1, and the weights of other nodes not included in the condition are defaulted to 0. As mentioned earlier, the weight calculation for some nodes is unnecessary; therefore, addressing these nodes can significantly improve the efficiency of calculating node weights.

In the case where the user-assigned weights remain constant, our analysis indicates that if a node A updates its weight in a certain round, it only needs to propagate the change to all its neighboring nodes. This allows the neighbors of A to use the updated information to efficiently calculate their own weights in the subsequent round.

W_{i} [v] = W_{i - 1} [v] + \frac{\sum_{u \in N_{G} (v)} W_{C} [u]}{| N_{G} (v) |}

(2)

In this paper, the change in weight between two iterations for node v is expressed as

W_{C} [v]

, where

W_{C} [v] = W_{i - 1} [v] - W_{i - 2} [v]

.

W_{C} [v]

is the initial weight of v as set by the user when i = 1.

N_{G} (v)

denotes the set of neighbor nodes of node v.

| N_{G} (v) |

denotes the number of these neighbors. Then, node weights are calculated using Equation (2).

Assuming that Table 1 outlines the condition used for community search, we illustrate the weight calculation method through Example 1.

Example 1.

Assume that Figure 1 is the graph used for performing a community search. According to the search condition in Table 1, the weight of node A is set to 1 and the weight of I is set to

- 1

. Then, node A sends its weight change,

W_{C} [A] = 1

, to nodes B, C, and D, while node I sends its weight change information to its neighbors. In round 1, the weights of nodes B, C, D, F, G, and H should be calculated as detailed in Figure 4. These nodes will then propagate the changes in their weights to their respective neighbors for subsequent weight calculations. In round 2, for instance, node B will use the received messages

W_{C} [A]

,

W_{C} [C]

, and

W_{C} [D]

to calculate

W_{2} [B]

by using Equation (2). The same calculation process is repeated for the remaining rounds.

Algorithm 1 is employed to calculate the weights of the nodes. First, the relevant variables required for the process are defined (Lines 1–2). Then, the weights of the nodes in the graph are initialized according to the specified search condition (Lines 3–11). The calculation of the node weights for each round is implemented by using the method described earlier in Example 1 (Lines 12–23). Finally, the algorithm verifies the qualified nodes and returns them (Lines 24–27).

Algorithm 1: Calculate the weights of nodes

After calculating the weights of the nodes in the graph, we exclude the nodes with weights that do not meet the user-given weight threshold,

λ

. Subsequently, a graph is generated from the remaining nodes for the subsequent conditional community search. The subgraph generated in Example 1 is detailed in Figure 5.

However, when applying this method to calculate the weights, we find that its efficiency diminishes when the number of nodes involved in the calculation reaches a certain threshold. This decrease in efficiency occurs because, at the later stages of the process, nearly all nodes become involved in the weight calculation, leading to additional computational overhead. Consequently, it is important to carefully consider when to employ this method. To address this, we draw upon a well-known principle from the field of economics—the “Pareto principle [35]”.

Pareto principle. Only about 20% of the variables manipulate 80% of the situation. That is, only 20% of all variables are the most important, and although the remaining 80% make up the majority, the contribution is much less than that of the critical minority. According to this principle, the node weight calculation should concentrate on the initial 20% of nodes in the graph that first engage in the process. As the remaining 80% of nodes begin to participate, it is likely that this will rapidly lead to the involvement of all nodes in the graph, ultimately resulting in reduced efficiency of the method.

The conditional community search can be implemented using Algorithm 2, which takes the subgraph of qualified nodes, necessary nodes, and the parameter k as input. The basic idea of Algorithm 2 is to start the search from the necessary nodes in the subgraph and add all nodes that satisfy the k-core requirement into the community result. After the community result is obtained, the nodes in the community are checked to ensure that no nodes are included if they fail to meet the k-core requirement due to the exclusion of their neighbors from the community.

The procedure of Algorithm 2 is presented as follows. First, a temporary community is formed with the necessary nodes (Lines 1–3). Next, the temporary community is extended outward by adding all nodes that satisfy the k-core requirement in the subgraph to the community (Lines 4–7). A new subgraph is then generated based on the updated temporary community. Then, the nodes in the community are traversed, the degree of each node in the subgraph is recorded, and the nodes with a degree less than k is added to a removed set. The nodes in the removed set are subsequently removed one by one. During this process, any nodes affected by the removal are also added to the removed set (Lines 4–19). Finally, the newly generated subgraph is checked to determine whether any connected components include all the necessary nodes, and then it returns the final result (Lines 20–25).

The time complexity of the basic algorithm BasicCCS for conditional community search is

O ((| N_{G^{'}} {(v)}^{'} | + | N_{o u t} | + 4) \cdot | H |)

, where

| N_{G^{'}} {(v)}^{'} |

represents the average number of neighbors of nodes in subgraph

G^{'}

, and

G^{'}

is constructed by using the nodes in

N_{Q}

. The analysis proceeds as follows: First, a preliminary inspection is performed on the graph nodes based on the necessary nodes to generate the initial community results, with a computational cost of

O (| H | \cdot | N_{G^{'}} {(v)}^{'} |)

. Next, a temporal subgrpah is created from the community results, incurring a cost of

O (| H |)

. Following this, the community results are refined by removing unqualified nodes, with this step requiring

O (| H | + | N_{o u t} | \cdot | H |)

operations. Lastly, based on the final community results, a new subgraph is generated, and the connected components within the subgraph are computed, contributing a computational cost of

O (2 | H |)

.

In Algorithm 2, candidate nodes are judged by expanding outward from the necessary nodes. The nodes are first sorted according to their closeness to the necessary nodes, and those that satisfy the specified condition are added to the temporary community one by one until the temporary community can satisfy the user’s search condition.

Algorithm 2: Basic CCS

After executing Algorithm 2, the final conditional community that satisfies the

t h r e e

-core requirement in Figure 1 is

{A, B, C, D}

. However, consider a situation where the failure of a single node to meet the k-core requirement triggers a cascading failure in other nodes, each sequentially failing to meet the k-core requirement as a result. In such a case, using the basic method would be inefficient, as it initially adds all the failed nodes in the community and then checks them one by one. To address this inefficiency, we propose two optimized methods in the following section.

4.2. Optimized Algorithms

In the previous section, we introduced the basic method for conditional community search. However, when dealing with a large number of unqualified nodes in the graph, the basic method becomes inefficient. To address this issue, we explore two improvement methods in this section.

4.2.1. Search Method after Removing Unqualified Nodes

As discussed in the basic method, inefficiency arises when unqualified nodes are added to the community, necessitating further validation of the obtained community results. To prevent the inclusion of nodes that do not meet the community criteria during the search process, we propose an improved method in this subsection.

The idea of the improved method is that first obtaining a subgraph for community search. Next, any nodes in the subgraph that do not meet the k-core requirement are deleted, ensuring that the degree of all remaining nodes in the subgraph is greater than or equal to k. Finally, we check whether a connected component exists in the subgraph that includes all the necessary nodes.

The improved method is implemented in Algorithm 3. Firstly, nodes whose degrees are less than k in the subgraph are identified and removed from the set of qualified nodes (Lines 1–12). Subsequently, a new subgraph is generated based on the updated set of qualified nodes, and the connected components within the graph are found (Line 13). Finally, we check whether there is a connected component that contains all the necessary nodes, returning the result H (Lines 14–18).

Algorithm 3: ImCCS

The time complexity of the improved algorithm ImCCS for conditional community search is

O ((| N_{o u t} | + 3) \cdot | N_{Q} | + | N_{o u t} | \cdot | N_{G^{'}} {(v)}^{'} |)

. The analysis is as follows: First, the degrees of the nodes in the graph with qualified weights are checked to verify if they meet the requirement, and any unqualified nodes are recorded. The computation cost of this step is

O (| N_{Q} |)

. Next, the algorithm counts all the unqualified nodes with the computation cost of

O (| N_{o u t} | \cdot | N_{G^{'}} {(v)}^{'} |)

. Following this, all unqualified nodes are removed from the set of qualified nodes, and a new subgraph is generated based on the updated set of qualified nodes. The connected components of the new subgraph are then computed with a total computation cost of

O (| N_{o u t} | \cdot | N_{Q} | + 2 | N_{Q} |)

.

Figure 6 shows the graph after nodes have been deleted using this method, in which there are two connected components. The connected component on the left can be returned since it contains the necessary nodes and satisfies the cohesion requirement. However, this improved method requires generating a new subgraph for conditional community search and verifying whether there is a connected component that contains all the necessary nodes. This method may be effective when only a small number of unqualified nodes are added; it can become inefficient when a large number of unqualified nodes need to be processed.

4.2.2. Conditional Community Search Based on Core Decomposition

To avoid inefficient processing, such as extensive node checking and subgraph regeneration, we propose a community search method based on core decomposition. First, the minimum degree of the subgraph is identified. Then, all nodes in the subgraph whose degree is less than this minimum degree are removed from the qualified node set, and the degrees of their neighboring nodes are reduced by 1. Then, the minimum degree is updated, and the above process is repeated until no qualified nodes remain. After the above processing, the remaining degree of each node corresponds to the k-core community that it can join. Then, a node is selected from the necessary nodes as the starting node for outward expansion, and the node whose remaining degree meets the k-core requirement is added to the community. Finally, whether the community contains all the necessary nodes is checked.

Figure 7 shows the coreness [36] of each node in the graph after executing core decomposition, where the coreness of a node v stands for the highest order of a k-core that contains v. In the first round, the degree of node F in the graph is the smallest. It is removed from the set of qualified nodes, and the degree of its neighbor node E is reduced by one, which changes from 3 to 2. In the next round, node E has the smallest degree, and its removal makes the degrees of B and J change from 4 to 3. In the final round, all remaining nodes in the graph have a degree of 3, and the core decomposition method ends. During the subsequent community search, only nodes with a remaining degree that meets the k-core requirement are added to the community. In this example, the final community result is

{A, B, C, D}

.

The improved method is presented in Algorithm 4. The algorithm begins by checking the minimum degree of all nodes in the current subgraph (Lines 1–6). Then, it removes nodes with the minimum degree from the set of qualified nodes and reduces the degree of nodes connected to them in the subgraph by 1 (Lines 7–13). This process of finding the minimum degree among the remaining qualified nodes and removing them is repeated until the set of qualified nodes is empty (Lines 14–17). Next, a node is selected from the set of necessary nodes to be added to the community, and the algorithm expands outward from this node, adding nodes whose remaining degree is greater than or equal to k to the community. During this process, the number of necessary nodes that have been added to the community is recorded. Finally, the algorithm checks whether all necessary nodes are included in the community and returns the result (Lines 18–25).

Algorithm 4: CDCCS

The time complexity of the conditional community search algorithm CDCCS based on core decomposition is calculated as follows:

O ((| N_{G^{'}} {(v)}^{'} | + 1) \cdot | N_{Q} | + | I n_N o d e | + | H | \cdot | N_{G^{'}} {(v)}^{'} |)

. For the analysis, first, the minimum subgraph degree (coreness) is identified among the set of qualified nodes, incurring a computation cost

O (| N_{Q} |)

. Next, the core decomposition process is conducted with a computation cost of

O (| N_{G^{'}} {(v)}^{'} | \cdot | N_{Q} |)

. Then, the necessary nodes are checked to ensure their inclusion in the community with a computation cost of

O (| I n_N o d e |)

. Finally, the community search process has a computation cost of

O (| H | \cdot | N_{G^{'}} {(v)}^{'} |)

.

5. Experiments

In this section, we report the experimental results. All algorithms proposed in this paper are implemented in C++ and executed on a machine with an Intel(R) Core(TM) i5-6300HQ CPU and 8 GB of RAM. We use real-world datasets to validate the effectiveness of the conditional community search algorithms. Details of these datasets are provided in Table 2. To evaluate the performance of the proposed approach, we first compare the performance of two weight calculation methods: one is the average method, which is denoted as AVMethod, and the other one is the weight propagation method presented in Algorithm 1, which is referred to as WPMethod. Then, we compare our proposed search algorithms: BasicCCS, textbfImCCS, and CDCCS. Notably, BasicCCS is implemented based on the state-of-the-art algorithm for conditional community search proposed in [4].

Exp-1: The running time of the two weight calculation methods is compared. As detailed in Figure 8, the weight propagation method WPMethod has a significant efficiency improvement compared with the average method AVMethod in calculating the weights, and the difference is more obvious in the datasets that have sparse relationships. The reason is that the nodes defined in the search condition can only affect a limited range of the graph when the datasets are sparse, and the weight propagation method WPMethod leverages this by focusing only on a small part of the nodes in the graph, making it much more efficient. In contrast, AVMethod needs to recalculate the weights of all nodes in each round, which becomes computationally expensive. It is also notable that while the running time of AVMethod is largely unaffected by the number of nodes in the search condition, the performance of WPMethod is influenced by the specific nodes in the condition, especially when the search condition involves a small number of nodes. However, the overall number of nodes still has a minimal effect on MPMethod, making it scalable in larger and sparse graphs.

Exp-2: We verify the effect of the cohesiveness parameter k on the three conditional community search methods. As shown in Figure 9, under different datasets and parameter settings, both ImCCS and CDCCS significantly outperform BasicCCS in every dataset and for each cohesiveness parameter k. In particular, the difference between the methods is more obvious in the first two datasets. This is because in the YouTube dataset, it becomes difficult to find community results that satisfy the search conditions when the cohesiveness parameter k exceeds 2. This difficulty explains why the basic method BasicSCC can return results more quickly—it tends to fail earlier, leading to the running time being reduced. As the value of k increases, fewer nodes satisfy the condition, and BasicSCC is less burdened by backtracking checks, further decreasing its running time. Additionally, in the YouTube dataset, when the cohesion parameter k exceeds 2, the running time of the two improved methods ImCCS and CDCCS is almost the same. The reason is the limited number of qualifying nodes when

k > 2

, leading both methods to converge toward similar performance.

Exp-3: As detailed in Figure 10, the effect of the weight threshold

λ

on the performance of the conditional community search methods is evaluated across different datasets and weight threshold settings. The results show that ImCCS and CDCCS continue to outperform BasicCCS under various weight threshold conditions. The reason is that when the weight threshold

λ

is set high, the number of weight-qualified nodes is small, which leads to no community results that satisfy the conditions. In these cases, the subgraphs generated by the weight-qualified nodes are smaller, which reduces the efficiency gap between BasicCCS and the two improved methods ImCCS and CDCCS. However, ImCCS and CDCCS still perform better because they are able to terminate the community search process faster when no qualified communities exist, leading to more efficient processing. This efficiency gap becomes more noticeable as the weight threshold increases.

Exp-4: We evaluate the effect of different search conditions on the efficiency of searching conditional communities. As detailed in Figure 11, the number of nodes in the search condition directly affects the running time of the three community search methods. However, the two improved methods ImCCS and CDCCS consistently outperform the base method. In the YouTube dataset, the number of nodes in the search condition has less of an effect on the performance of individual methods. This is due to the tighter community structure in this dataset, which reduces the complexity of the search even with more nodes involved. On the other hand, in the remaining two datasets, the more nodes included in the search condition, the larger the subgraph used for performing community search, leading to an increased running time for each method.

Exp-5: As seen in Figure 12, the effect of data sizes on the efficiency of conditional community search methods varies depending on the dataset’s structure. For more sparsely related datasets like Amazon and DBLP, the running time of methods BasicCCS, ImCCS, and CDCCS consistently increases with the data size. This reason is that larger datasets contain more nodes and edges, which increases the complexity of the search process. In contrast, for the more closely related YouTube dataset, there is no linear relationship between data size and conditional community search method. When the data size increases from 20% to 40%, the additional data cause more nodes to be involved in the community, increasing the running time of the community search method. However, as the data size continues to grow, the overall weight of nodes decreases, which leads to a decrease in the size of the conditional community that is searched.

6. Conclusions and Future Works

To overcome the drawbacks of existing methods for the problem of conditional community search, we first design an effective community search condition model that incorporates both necessary nodes and forbidden nodes, and we present a novel architecture for searching conditional communities. Then, an effective incremental calculation method, referred to as WPMethod, for calculating the weights of nodes is proposed. A basic algorithm, BasicCCS, is proposed, which backtracks to deleted unqualified nodes in the initialized community. To improve the search efficiency, we propose an improved algorithm, ImCCS, that deletes unqualified nodes based on the k-core constraint firstly. To further improve the efficiency of searching conditional community, we explore an effective algorithm, CDCCS, which utilizes core decomposition. Finally, we conduct extensive experiments on several real datasets, and the experimental results show that our approach achieve high accuracy and efficiency in conditional community search. Nonetheless, the designed community search condition model may not be suitable for more complex search scenarios, and the proposed algorithm CDCCS still exhibits inefficiencies in large social networks or other conditional community search methods in light of different subgraph models. In the future, therefore, we aim to develop more versatile conditional community search models and explore more effective solutions for conditional community search over large social networks as well as for conditional community search based on other subgraph models.

Author Contributions

Conceptualization, D.M. and M.W.; methodology, D.M. and C.Z.; software, D.M.; validation, M.W. and Q.F.; formal analysis, M.W.; investigation, D.M.; resources, D.M.; data curation, D.M.; writing—original draft preparation, D.M.; writing—review and editing, M.W.; funding acquisition, M.W. and C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The work is partially supported by the Special Fundamental Research Fund for the Central Public Scientific Research Institutes (NO. 602024Y-11792), the Natural Science Foundation of Liao Ning (No. 2022-MS-303), the Liaoning Provincial Department of Education Science Foundation (No. JYTMS20230270), and the Fundamental Research Funds for the Universities of Liaoning Province.

Data Availability Statement

All datasets used in this study are publicly available and are discussed in Section 5. They are also available from the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wu, Y.; Peng, X.; Niu, Y.; Gui, Z. MFM: A Multiple-Features Model for Leisure Event Recommendation in Geotagged Social Networks. Electronics 2024, 13, 112. [Google Scholar] [CrossRef]
Chen, Z.; Zhuang, J.; Wang, X.; Tang, X.; Yang, K.; Du, M.; Zhou, J. Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs. Electronics 2024, 13, 192. [Google Scholar] [CrossRef]
Fang, Y.; Huang, X.; Qin, L.; Zhang, Y.; Zhang, W.; Cheng, R.; Lin, X. A survey of community search over big graphs. VLDB J. 2020, 29, 353–392. [Google Scholar] [CrossRef]
Zhu, J.C.; Wang, C.K. Approaches to community search under complex conditions. J. Softw. 2019, 30, 552–572. [Google Scholar]
Wu, Y.; Zhao, J.; Sun, R.; Chen, C.; Wang, X. Efficient personalized influential community search in large networks. Data Sci. Eng. 2021, 6, 310–322. [Google Scholar] [CrossRef]
Sun, H.; Huang, R.; Jia, X.; He, L.; Sun, M.; Wang, P.; Sun, Z.; Huang, J. Community search for multiple nodes on attribute graphs. Knowl.-Based Syst. 2020, 193, 105393. [Google Scholar] [CrossRef]
Lu, Z.; Zhu, Y.; Zhong, M.; Yu, J.X. On Time-optimal (k, p)-core Community Search in Dynamic Graphs. In Proceedings of the 38th International Conference on Data Engineering, Kuala Lumpur, Malaysia, 9–12 May 2022; pp. 1396–1407. [Google Scholar]
Miao, X.; Liu, Y.; Chen, L.; Gao, Y.; Yin, J. Reliable community search on uncertain graphs. In Proceedings of the 38th International Conference on Data Engineering, Kuala Lumpur, Malaysia, 9–12 May 2022; pp. 1166–1179. [Google Scholar]
Dong, Z.; Huang, X.; Yuan, G.; Zhu, H.; Xiong, H. Butterfly-core community search over labeled graphs. Proc. VLDB Endow. 2021, 14, 2006–2018. [Google Scholar] [CrossRef]
Barbieri, N.; Bonchi, F.; Galimberti, E.; Gullo, F. Efficient and effective community search. Data Min. Knowl. Discov. 2015, 29, 1406–1433. [Google Scholar] [CrossRef]
Cui, W.; Xiao, Y.; Wang, H.; Wang, W. Local search of communities in large graphs. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 22–27 June 2014; pp. 991–1002. [Google Scholar]
Sozio, M.; Gionis, A. The community-search problem and how to plan a successful cocktail party. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 25–28 July 2010; pp. 939–948. [Google Scholar]
Yuan, L.; Qin, L.; Zhang, W.; Chang, L.; Yang, J. Index-based densest clique percolation community search in networks. IEEE Trans. Knowl. Data Eng. 2017, 30, 922–935. [Google Scholar] [CrossRef]
Yang, D.N.; Chen, Y.L.; Lee, W.C.; Chen, M.S. On social-temporal group query with acquaintance constraint. Proc. VLDB Endow. 2011, 4, 397–408. [Google Scholar] [CrossRef]
Zhou, Y.; Guo, Q.; Fang, Y.; Ma, C. A Counting-based Approach for Efficient k-Clique Densest Subgraph Discovery. Proc. ACM Manag. Data 2024, 2, 1–27. [Google Scholar] [CrossRef]
Huang, X.; Lakshmanan, L.V.; Yu, J.X.; Cheng, H. Approximate closest community search in networks. Proc. VLDB Endow. 2015, 9, 276–287. [Google Scholar] [CrossRef]
Huang, X.; Lakshmanan, L.V.S. Attribute-driven community search. Proc. VLDB Endow. 2017, 10, 949–960. [Google Scholar] [CrossRef]
Zheng, Z.; Ye, F.; Li, R.H.; Ling, G.; Jin, T. Finding weighted k-truss communities in large networks. Inf. Sci. 2017, 417, 344–360. [Google Scholar] [CrossRef]
Hu, J.; Wu, X.; Cheng, R.; Luo, S.; Fang, Y. On minimal steiner maximum-connected subgraph queries. IEEE Trans. Knowl. Data Eng. 2017, 29, 2455–2469. [Google Scholar] [CrossRef]
Kim, D.; Kim, S.; Kim, J.; Kim, J.; Feng, K.; Lim, S.; Kim, J. Experimental analysis and evaluation of cohesive subgraph discovery. Inf. Sci. 2024, 672, 120664. [Google Scholar] [CrossRef]
Wang, J.; Zhou, L.; Wang, X.; Wang, L.; Li, S. Attribute-sensitive community search over attributed heterogeneous information networks. Expert Syst. Appl. 2024, 235, 121153. [Google Scholar] [CrossRef]
Li, M.; Borovica-Gajic, R.; Choudhury, F.M.; Cui, N.; Ding, L. Maximal size constraint community search over bipartite graphs. Knowl.-Based Syst. 2024, 297, 111961. [Google Scholar] [CrossRef]
Liao, X.; Liu, Q.; Huang, X.; Xu, J. Truss-Based Community Search over Streaming Directed Graphs. Proc. VLDB Endow. 2024, 17, 1816–1829. [Google Scholar] [CrossRef]
Lin, L.; Yuan, P.; Li, R.H.; Zhu, C.; Qin, H.; Jin, H.; Jia, T. QTCS: Efficient Query-Centered Temporal Community Search. Proc. VLDB Endow. 2024, 17, 1187–1199. [Google Scholar] [CrossRef]
Tang, Y.; Li, J.; Haldar, N.A.H.; Guan, Z.; Xu, J.; Liu, C. Reliability-driven local community search in dynamic networks. IEEE Trans. Knowl. Data Eng. 2023, 36, 809–822. [Google Scholar] [CrossRef]
Gao, J.; Chen, J.; Oloulade, B.M.; Al-Sabri, R.; Lyu, T.; Zhang, J.; Li, Z. Commgnas: Unsupervised graph neural architecture search for community detection. IEEE Trans. Emerg. Top. Comput. 2023, 12, 444–454. [Google Scholar] [CrossRef]
Chen, J.; Gao, J.; Cui, B. ICS-GNN+: Lightweight interactive community search via graph neural network. VLDB J. 2023, 32, 447–467. [Google Scholar] [CrossRef]
Wang, J.; Wang, K.; Lin, X.; Zhang, W.; Zhang, Y. Efficient Unsupervised Community Search with Pre-trained Graph Transformer. arXiv 2024, arXiv:2403.18869. [Google Scholar] [CrossRef]
Wang, Y.; Gou, X.; Xu, X.; Geng, Y.; Ke, X.; Wu, T.; Yu, Z.; Chen, R.; Wu, X. Scalable community search over large-scale graphs based on graph transformer. In Proceedings of the ACM SIGIR, Washington, DC, USA, 14–18 July 2024; pp. 1680–1690. [Google Scholar]
Wang, J.; Wang, K.; Lin, X.; Zhang, W.; Zhang, Y. Neural Attributed Community Search at Billion Scale. Proc. ACM Manag. Data 2024, 1, 1–25. [Google Scholar] [CrossRef]
Cheng, J.; Ke, Y.; Chu, S.; Özsu, M.T. Efficient core decomposition in massive networks. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering, Hannover, Germany, 11–16 April 2011; pp. 51–62. [Google Scholar]
Wang, J.; Cheng, J. Truss decomposition in massive networks. Proc. VLDB Endow. 2012, 5, 812–823. [Google Scholar] [CrossRef]
Qin, L.; Li, R.H.; Chang, L.; Zhang, C. Locally densest subgraph discovery. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2015; pp. 965–974. [Google Scholar]
Seidman, S.B. Network structure and minimum degree. Soc. Netw. 1983, 5, 269–287. [Google Scholar] [CrossRef]
Chen, T.; Li, M. The weights can be harmful: Pareto search versus weighted search in multi-objective search-based software engineering. ACM Trans. Softw. Eng. Methodol. 2023, 32, 1–40. [Google Scholar] [CrossRef]
Batagelj, V.; Zaversnik, M. An o (m) algorithm for cores decomposition of networks. arXiv 2003, arXiv:cs/0310049. [Google Scholar]

Figure 1. A social network graph.

Figure 2. An instance of conditional community search.

Figure 3. The architecture of our approach.

Figure 4. Weights of nodes in round 1.

Figure 5. The graph after deleting nodes based on weight threshold.

Figure 6. The graph after deleting nodes based on constraint k.

Figure 7. Coreness of each node based on core decomposition.

Figure 8. Running times of different weight calculation methods.

Figure 9. Running times under different k.

Figure 10. Running times under different

λ

.

Figure 10. Running times under different

λ

.

Figure 11. Running time under different numbers of conditional nodes.

Figure 12. Running times under different dataset sizes.

Table 1. A search condition specifies that node A must be included in the conditional community, while node I must be excluded.

Search Condition	Detail
$n e c e s s a r y$ $n o d e$	${A}$
$f o r b i d e n$ $n o d e$	${I}$

Table 2. Real-world datasets.

Networks	# Nodes	# Edges	# Community
$A m a z o n$	334,863	925,872	75,149
$D B L P$	317,080	1,049,866	13,477
$Y o u T u B e$	1,134,890	2,987,624	8385

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, M.; Ma, D.; Fu, Q.; Zong, C. Conditional Community Search Based on Weight Information. Electronics 2024, 13, 4321. https://doi.org/10.3390/electronics13214321

AMA Style

Wang M, Ma D, Fu Q, Zong C. Conditional Community Search Based on Weight Information. Electronics. 2024; 13(21):4321. https://doi.org/10.3390/electronics13214321

Chicago/Turabian Style

Wang, Mengxiang, Dong Ma, Qiang Fu, and Chuanyu Zong. 2024. "Conditional Community Search Based on Weight Information" Electronics 13, no. 21: 4321. https://doi.org/10.3390/electronics13214321

APA Style

Wang, M., Ma, D., Fu, Q., & Zong, C. (2024). Conditional Community Search Based on Weight Information. Electronics, 13(21), 4321. https://doi.org/10.3390/electronics13214321

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Conditional Community Search Based on Weight Information

Abstract

1. Introduction

2. Related Work

2.1. Community Search in Graphs

2.2. Conditional Community Search

3. Problem Formulation

Preliminary

4. Algorithm Design

4.1. Basic Algorithm for Conditional Community Search

4.2. Optimized Algorithms

4.2.1. Search Method after Removing Unqualified Nodes

4.2.2. Conditional Community Search Based on Core Decomposition

5. Experiments

6. Conclusions and Future Works

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI