Clustering

Algorithms

jgrapht.algorithms.clustering.k_spanning_tree(graph, k)[source]

The k spanning tree clustering algorithm.

The algorithm finds a minimum spanning tree T using Prim’s algorithm, then executes Kruskal’s algorithm only on the edges of T until k trees are formed. The resulting trees are the final clusters. The total running time is \(\mathcal{O}(m + n \log n)\).

The algorithm is strongly related to single linkage cluster analysis, also known as single-link clustering. For more information see: J. C. Gower and G. J. S. Ross. Minimum Spanning Trees and Single Linkage Cluster Analysis. Journal of the Royal Statistical Society. Series C (Applied Statistics), 18(1):54–64, 1969.

Parameters
  • graph – the graph. Needs to be undirected

  • k – integer k, denoting the number of clusters

Returns

a clustering as an instance of Clustering

jgrapht.algorithms.clustering.label_propagation(graph, max_iterations=None, seed=None)[source]

Label propagation clustering.

The algorithm is a near linear time algorithm capable of discovering communities in large graphs. It is described in detail in the following paper:

  • Raghavan, U. N., Albert, R., and Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical review E, 76(3), 036106.

As the paper title suggests the running time is close to linear. The algorithm runs in iterations, each of which runs in \(\mathcal{O}(n + m)\) where \(n\) is the number of vertices and \(m\) is the number of edges. The authors found experimentally that in most cases, 95% of the nodes or more are classified correctly by the end of iteration five. See the paper for more details.

The algorithm is randomized, meaning that two runs on the same graph may return different results. If the user requires deterministic behavior, a random generator seed can be provided as a parameter.

Parameters
  • graph – the graph. Needs to be undirected

  • max_iterations – maximum number of iterations (None means no limit)

  • seed – seed for the random number generator, if None then the system time is used

Returns

a clustering as an instance of Clustering

Types

Vertex clusterings are represented using instances of the following class.

class jgrapht.types.Clustering[source]

A vertex clustering.

abstract ith_cluster(i)[source]

Set of vertices comprising the i-th cluster.

abstract number_of_clusters()[source]

Number of clusters.