Scoring

The algorithms in this module compute scores for the vertices of the graph.

Algorithms

jgrapht.algorithms.scoring.alpha_centrality(graph, damping_factor=0.01, exogenous_factor=1.0, max_iterations=100, tolerance=0.0001)[source]

Alpha centrality.

See https://en.wikipedia.org/wiki/Alpha_centrality for a description of alpha centrality.

This is a simple iterative implementation of which stops after a given number of iterations or if the centralityvalues between two iterations do not change more than a predefined value.

Each iteration of the algorithm runs in linear time \(\mathcal{O}(n+m)\) when \(n\) is the number of nodes and \(m\) the number of edges in the graph. The maximum number of iterations can be adjusted by the caller.

By adjusting the exogenous factor, users may compute either eigenvector centrality (https://en.wikipedia.org/wiki/Eigenvector_centrality) or Katz centrality (https://en.wikipedia.org/wiki/Katz_centrality).

Parameters
  • graph – the graph

  • damping_factor – the damping factor

  • exogenous_factor – the exogenous factor

  • max_iterations – maximum iterations

  • tolerance – tolerance. The calculation will stop if the difference of centrality values between iterations change less than this value

Returns

a dictionary from vertices to double values

jgrapht.algorithms.scoring.betweenness_centrality(graph, incoming=False, normalize=False)[source]

Betweenness centrality.

For the definition see https://en.wikipedia.org/wiki/Betweenness_centrality.

The algorithm is based on:

  • Brandes, Ulrik (2001). “A faster algorithm for betweenness centrality”. Journal of Mathematical Sociology. 25 (2): 163–177.

Running time is \(\mathcal{O}(nm +n^2 \log n)\) for weighted and \(\mathcal{O}(mn)\) for unweighted graphs.

Note

if normalization is used, then the result is divided by \((n-1) \cdot (n-2)\) where \(n\) is the number of vertices in the graph

Parameters
  • graph – the graph

  • normalize – whether to use normalization

Returns

a dictionary from vertices to double values

jgrapht.algorithms.scoring.closeness_centrality(graph, incoming=False, normalize=True)[source]

Closeness centrality.

Computes the closeness centrality of each vertex of a graph. The closeness of a vertex \(x\) is defined as the reciprocal of the farness, that is \(H(x)= 1 / \sum_{y \neq x} d(x,y)\), where \(d(x,y)\) is the shortest path distance from \(x\) to \(y\). When normalization is used, the score is multiplied by \(n-1\) where \(n\) is the total number of vertices in the graph. For more details see https://en.wikipedia.org/wiki/Closeness_centrality and

  • Alex Bavelas. Communication patterns in task-oriented groups. J. Acoust. Soc. Am, 22(6):725–730, 1950.

This implementation computes by default the closeness centrality using outgoing paths and normalizes the scores. This behavior can be adjusted by the arguments.

When the graph is disconnected, the closeness centrality score equals \(0\) for all vertices. In the case of weakly connected digraphs, the closeness centrality of several vertices might be \(0\). See harmonic_centrality() for a different approach in case of disconnected graphs.

Shortest paths are computed either by using Dijkstra’s algorithm or Floyd-Warshall depending on whether the graph has edges with negative edge weights. Thus, the running time is either \(\mathcal{O}(n (m + n \log n))\) or \(\mathcal{O}(n^3)\) respectively, where \(n\) is the number of vertices and \(m\) the number of edges of the graph.

Parameters
  • graph – the graph

  • incoming – if True then the incoming edges are used instead of the outgoing

  • normalize – whether to use normalization

Returns

a dictionary from vertices to double values

jgrapht.algorithms.scoring.clustering_coefficient(graph)[source]

Clustering Coefficient.

For definitions see https://en.wikipedia.org/wiki/Clustering_coefficient.

Computes the local clustering coefficient for each vertex of a graph. It also computes both the global and the average clustering coefficient.

The global clustering coefficient is discussed in

  • R. D. Luce and A. D. Perry (1949). “A method of matrix analysis of group structure”. Psychometrika. 14 (1): 95–116. doi:10.1007/BF02289146 .

The average local clustering coefficient was introduced in

  • D. J. Watts and Steven Strogatz (June 1998). “Collective dynamics of ‘small-world’ networks”. Nature. 393 (6684): 440–442. doi:10.1038/30918

Running time is \(\mathcal{O}(n + \Delta(G)^2)\) where \(n\) is the number of vertices in the graph and \(\Delta(G)\) is the maximum degree of any vertex.

Parameters

graph – the graph

Returns

a tuple (global, avg, local coefficients dictionary)

jgrapht.algorithms.scoring.coreness(graph)[source]

Computes the coreness of each vertex in an undirected graph.

A \(k\)-core of a graph \(G\) is a maximal connected subgraph of \(G\) in which all vertices have degree at least \(k\). Equivalently, it is one of the connected components of the subgraph of \(G\) formed by repeatedly deleting all vertices of degree less than \(k\). A vertex \(u\) has coreness \(c\) if it belongs to a \(c\)-core but not to any \((c+1)\)-core.

If a non-empty \(k\)-core exists, then, clearly, \(G\) has degeneracy at least \(k\), and the degeneracy of \(G\) is the largest \(k\) for which \(G\) has a \(k\)-core.

As described in the following paper

  • D. W. Matula and L. L. Beck. Smallest-last ordering and clustering and graph coloring algorithms. Journal of the ACM, 30(3):417–427, 1983.

it is possible to find a vertex ordering of a finite graph \(G\) that optimizes the coloring number of the ordering, in linear time, by using a bucket queue to repeatedly find and remove the vertex of smallest degree.

Parameters

graph – the graph

Returns

a tuple containing the degeneracy and a dictionary from vertices to integer values (coreness of each vertex)

jgrapht.algorithms.scoring.harmonic_centrality(graph, incoming=False, normalize=True)[source]

Harmonic Centrality. The harmonic centrality of a vertex \(x\) is defined as

\[H(x)=\sum_{y \neq x} 1/d(x,y)\]

where \(d(x,y)\) is the shortest path distance from \(x\) to \(y\). In case a distance \(d(x,y)\) is equal to infinity, then \(1/d(x,y)=0\). When normalization is used the score is divided by \(n-1\) where \(n\) is the total number of vertices in the graph.

For details see the following papers:

  • Yannick Rochat. Closeness centrality extended to unconnected graphs: The harmonic centrality index. Applications of Social Network Analysis, 2009.

  • Newman, Mark. 2003. The Structure and Function of Complex Networks. SIAM Review, 45(mars), 167–256

and https://en.wikipedia.org/wiki/Closeness_centrality.

This implementation computes by default the centrality using outgoing paths and normalizes the scores. This behavior can be adjusted by the arguments.

Shortest paths are computed either by using Dijkstra’s algorithm or Floyd-Warshall depending on whether the graph has edges with negative edge weights. Thus, the running time is either \(\mathcal{O}(n (m + n \log n))\) or \(\mathcal{O}(n^3)\) respectively, where \(n\) is the number of vertices and \(m\) the number of edges of the graph.

Parameters
  • graph – the graph

  • incoming – if True then the incoming edges are used instead of the outgoing

  • normalize – whether to use normalization

Returns

a dictionary from vertices to double values

jgrapht.algorithms.scoring.pagerank(graph, damping_factor=0.85, max_iterations=100, tolerance=0.0001)[source]

PageRank

The wikipedia article contains a nice description of PageRank. The method can be found on the article: Sergey Brin and Larry Page: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Proceedings of the 7th World-Wide Web Conference, Brisbane, Australia, April 1998. See also the following page.

This is a simple iterative implementation of PageRank which stops after a given number of iterations or if the PageRank values between two iterations do not change more than a predefined value. The implementation uses the variant which divides by the number of nodes, thus forming a probability distribution over graph nodes.

Each iteration of the algorithm runs in linear time \(\mathcal{O}(n+m)\) when \(n\) is the number of nodes and \(m\) the number of edges of the graph. The maximum number of iterations can be adjusted by the caller.

If the graph is a weighted graph, a weighted variant is used where the probability of following an edge e out of node \(v\) is equal to the weight of \(e\) over the sum of weights of all outgoing edges of \(v\).

Parameters
  • graph – the graph

  • damping_factor – damping factor

  • max_iterations – max iterations

  • tolerance – tolerance. The calculation will stop if the difference of PageRank values between iterations change less than this value

Returns

a dictionary from vertices to double values