API Reference¶
Graph¶

class
cugraph.
Graph
¶ cuGraph graph class containing basic graph creation and transformation operations.
Methods
add_adj_list
(self, offset_col, index_col[, …])Create the adjacency list representation of a Graph.
add_edge_list
(self, source_col, dest_col[, …])Create the edge list representation of a Graph.
add_transposed_adj_list
(self)Compute the transposed adjacency list from the edge list and add it to the existing graph.
degree
(self[, vertex_subset])Calculates and returns the degree of vertices.
delete_adj_list
(self)Delete the adjacency list.
delete_edge_list
(self)Delete the edge list.
Delete the transposed adjacency list.
get_two_hop_neighbors
(self)Return a dataframe containing vertex pairs such that each pair of vertices is connected by a path of two hops in the graph.
in_degree
(self[, vertex_subset])Calculates and returns the indegree of vertices.
number_of_edges
(self)Get the number of edges in the graph
number_of_vertices
(self)Get the number of vertices in the graph
out_degree
(self[, vertex_subset])Calculates and returns the outdegree of vertices.
renumber
(self, source_col, dest_col)Take a (potentially sparse) set of source and destination vertex ids and renumber the vertices to create a dense set of vertex ids using all values contiguously from 0 to the number of unique vertices  1.
view_adj_list
(self)Display the adjacency list.
view_edge_list
(self)Display the edge list.
view_transposed_adj_list
(self)Display the transposed adjacency list.

add_adj_list
(self, offset_col, index_col, value_col=None, copy=False)¶ Create the adjacency list representation of a Graph. The passed offset_col and index_col arguments wrap gdf_column objects that represent a graph using the adjacency list format. If value_col is None, an unweighted graph is created. If value_col is not None, an weighted graph is created. If copy is False, this function stores references to the passed objects pointed by offset_col and index_col. If copy is True, this funcion stores references to the deepcopies of the passed objects pointed by offset_col and index_col. If this class instance already stores a graph, invoking this function raises an error. Parameters ——— offset_col : cudf.Series
This cudf.Series wraps a gdf_column of size V + 1 (V: number of vertices). The gdf column contains the offsets for the vertices in this graph. Offsets must be in the range [0, E] (E: number of edges).
 index_colcudf.Series
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the destination index for each edge. Destination indices must be in the range [0, V) (V: number of vertices).
 value_col (optional)cudf.Series
This pointer can be
none
. If not, this cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the weight value for each edge. The expected type of the gdf_column element is floating point number.
>>> import numpy as np >>> import pytest >>> from scipy.io import mmread >>> >>> import cudf >>> import cugraph >>> >>> >>> mm_file = '../datasets/karate.mtx' >>> M = mmread(mm_file).asfptype() >>> M = M.tocsr() >>> offsets = cudf.Series(M.indptr) >>> indices = cudf.Series(M.indices) >>> >>> G = cugraph.Graph() >>> G.add_adj_list(offsets, indices, None)

add_edge_list
(self, source_col, dest_col, value_col=None, copy=False)¶ Create the edge list representation of a Graph. The passed source_col and dest_col arguments wrap gdf_column objects that represent a graph using the edge list format. If value_col is None, an unweighted graph is created. If value_col is not None, an weighted graph is created. If copy is False, this function stores references to the passed objects pointed by source_col and dest_col. If copy is True, this funcion stores references to the deepcopies of the passed objects pointed by source_col and dest_col. If this class instance already stores a graph, invoking this function raises an error. Parameters ——— source_col : cudf.Series
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the source index for each edge. Source indices must be in the range [0, V) (V: number of vertices).
 dest_colcudf.Series
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the destination index for each edge. Destination indices must be in the range [0, V) (V: number of vertices).
 value_col (optional)cudf.Series
This pointer can be
none
. If not, this cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the weight value for each edge. The expected type of the gdf_column element is floating point number.
>>> import numpy as np >>> import pytest >>> from scipy.io import mmread >>> >>> import cudf >>> import cugraph >>> >>> >>> mm_file = '../datasets/karate.mtx' >>> M = mmread(mm_file).asfptype() >>> sources = cudf.Series(M.row) >>> destinations = cudf.Series(M.col) >>> >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None)

add_transposed_adj_list
(self)¶ Compute the transposed adjacency list from the edge list and add it to the existing graph.

degree
(self, vertex_subset=None)¶ Calculates and returns the degree of vertices. Vertex degree is the number of edges adjacent to that vertex. Parameters ——— vertex_subset(optional, default=all vertices) : cudf.Series or iterable container
A container of vertices for displaying corresponding degree
df : cudf.DataFrame GPU data frame of size N (the default) or the size of the given vertices (vertex_subset) containing the degree. The ordering is relative to the adjacency list, or that given by the specified vertex_subset.
df[‘vertex’]: The vertex IDs (will be identical to vertex_subset if specified) df[‘degree’]: The computed degree of the corresponding vertex Examples ——– >>> import numpy as np >>> import pytest >>> from scipy.io import mmread >>> >>> import cudf >>> import cugraph >>> mm_file = ‘/datasets/networks/karate.mtx’ >>> M = mmread(mm_file).asfptype() >>> sources = cudf.Series(M.row) >>> destinations = cudf.Series(M.col) >>> >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations) >>> degree_df = G.degree([0,9,12])

delete_adj_list
(self)¶ Delete the adjacency list.

delete_edge_list
(self)¶ Delete the edge list.

delete_transposed_adj_list
(self)¶ Delete the transposed adjacency list.

get_two_hop_neighbors
(self)¶ Return a dataframe containing vertex pairs such that each pair of vertices is connected by a path of two hops in the graph. The resulting pairs are returned in sorted order.
Returns: df : a cudf.DataFrame object df[‘first’] the first vertex id of a pair df[‘second’] the second vertex id of a pair

in_degree
(self, vertex_subset=None)¶ Calculates and returns the indegree of vertices. Vertex indegree is the number of edges pointing in to the vertex. Parameters ——— vertex_subset(optional, default=all vertices) : cudf.Series or iterable container
A container of vertices for displaying corresponding indegree
df : cudf.DataFrame GPU data frame of size N (the default) or the size of the given vertices (vertex_subset) containing the in_degree. The ordering is relative to the adjacency list, or that given by the specified vertex_subset.
df[‘vertex’]: The vertex IDs (will be identical to vertex_subset if specified) df[‘degree’]: The computed indegree of the corresponding vertex Examples ——– >>> import numpy as np >>> import pytest >>> from scipy.io import mmread >>> >>> import cudf >>> import cugraph >>> mm_file = ‘/datasets/networks/karate.mtx’ >>> M = mmread(mm_file).asfptype() >>> sources = cudf.Series(M.row) >>> destinations = cudf.Series(M.col) >>> >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations) >>> in_degree_df = G.in_degree([0,9,12])

number_of_edges
(self)¶ Get the number of edges in the graph

number_of_vertices
(self)¶ Get the number of vertices in the graph

out_degree
(self, vertex_subset=None)¶ Calculates and returns the outdegree of vertices. Vertex outdegree is the number of edges pointing out from the vertex. Parameters ——— vertex_subset(optional, default=all vertices) : cudf.Series or iterable container
A container of vertices for displaying corresponding outdegree
df : cudf.DataFrame GPU data frame of size N (the default) or the size of the given vertices (vertex_subset) containing the out_degree. The ordering is relative to the adjacency list, or that given by the specified vertex_subset.
df[‘vertex’]: The vertex IDs (will be identical to vertex_subset if specified) df[‘degree’]: The computed outdegree of the corresponding vertex Examples ——– >>> import numpy as np >>> import pytest >>> from scipy.io import mmread >>> >>> import cudf >>> import cugraph >>> mm_file = ‘/datasets/networks/karate.mtx’ >>> M = mmread(mm_file).asfptype() >>> sources = cudf.Series(M.row) >>> destinations = cudf.Series(M.col) >>> >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations) >>> out_degree_df = G.out_degree([0,9,12])

renumber
(self, source_col, dest_col)¶ Take a (potentially sparse) set of source and destination vertex ids and renumber the vertices to create a dense set of vertex ids using all values contiguously from 0 to the number of unique vertices  1.
Input columns can be either int64 or int32. The output will be mapped to int32, since many of the cugraph functions are limited to int32. If the number of unique values in source_col and dest_col > 2^311 then this function will return an error.
Return from this call will be three cudf Series  the renumbered source_col, the renumbered dest_col and a numbering map that maps the new ids to the original ids.
 Parameters
 source_colcudf.Series
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the source index for each edge. Source indices must be an integer type.
 dest_colcudf.Series
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the destination index for each edge. Destination indices must be an integer type.
Examples
>>> import numpy as np >>> import pytest >>> from scipy.io import mmread >>> >>> import cudf >>> import cugraph >>> >>> >>> mm_file = '../datasets/karate.mtx' >>> M = mmread(mm_file).asfptype() >>> sources = cudf.Series(M.row) >>> destinations = cudf.Series(M.col) >>> >>> G = cugraph.Graph() >>> src_r, dst_r, numbering = G.renumber(sources, destinations)

view_adj_list
(self)¶ Display the adjacency list. Compute it if needed.

view_edge_list
(self)¶ Display the edge list. Compute it if needed.

view_transposed_adj_list
(self)¶ Display the transposed adjacency list. Compute it if needed.

Pagerank¶

cugraph.
pagerank
(G, alpha=0.85, max_iter=100, tol=1.0e5)¶ Find the PageRank vertex values for a graph. cuGraph computes an approximation of the Pagerank eigenvector using the power method. The number of iterations depends on the properties of the network itself; it increases when the tolerance descreases and/or alpha increases toward the limiting value of 1. The user is free to use default values or to provide inputs for the initial guess, tolerance and maximum number of iterations.
 Parameters
 graphcuGraph.Graph
cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). The transposed adjacency list will be computed if not already present.
 alphafloat
The damping factor alpha represents the probability to follow an outgoing edge, standard value is 0.85. Thus, 1.0alpha is the probability to “teleport” to a random vertex. Alpha should be greater than 0.0 and strictly lower than 1.0.
 tolerancefloat
Set the tolerance the approximation, this parameter should be a small magnitude value. The lower the tolerance the better the approximation. If this value is 0.0f, cuGraph will use the default value which is 1.0E5. Setting too small a tolerance can lead to nonconvergence due to numerical roundoff. Usually values between 0.01 and 0.00001 are acceptable.
 max_iterint
The maximum number of iterations before an answer is returned. This can be used to limit the execution time and do an early exit before the solver reaches the convergence tolerance. If this value is lower or equal to 0 cuGraph will use the default value, which is 100.
 Returns
 PageRankcudf.DataFrame
GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding PageRank values.
Examples
>>> M = read_mtx_file(graph_file) >>> sources = cudf.Series(M.row) >>> destinations = cudf.Series(M.col) >>> G = cuGraph.Graph() >>> G.add_edge_list(sources,destinations,None) >>> pr = cuGraph.pagerank(G, alpha = 0.85, max_iter = 500, tol = 1.0e05)
Bfs¶

cugraph.
bfs
(G, start, directed=True)¶ Find the distances and predecessors for a breadth first traversal of a graph.
 Parameters
 Gcugraph.graph
cuGraph graph descriptor, should contain the connectivity information as an adjacency list.
 startInteger
The index of the graph vertex from which the traversal begins
 directedbool
Indicates whether the graph in question is a directed graph, or whether each edge has a corresponding reverse edge. (Allows optimizations if the graph is undirected)
 Returns
 dfcudf.DataFrame
df[‘vertex’][i] gives the vertex id of the i’th vertex df[‘distance’][i] gives the path distance for the i’th vertex from the starting vertex df[‘predecessor’][i] gives for the i’th vertex the vertex it was reached from in the traversal
Examples
>>> M = read_mtx_file(graph_file) >>> sources = cudf.Series(M.row) >>> destinations = cudf.Series(M.col) >>> G = cuGraph.Graph() >>> G.add_edge_list(sources,destinations,none) >>> dist, pred = cuGraph.bfs(G, 0, false)
Jaccard¶
Louvain¶

cugraph.
nvLouvain
(input_graph)¶ Compute the modularity optimizing partition of the input graph using the Louvain heuristic
 Parameters
 input_graphcuGraph.Graph
cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). The adjacency list will be computed if not already present.
 Returns
 louvain_parts, modularity_scorecudf.DataFrame
 louvain_parts: GPU data frame of size V containing two columns: the vertex id
and the partition id it is assigned to.
modularity_score: a double value containing the modularity score of the partitioning
Examples
>>> M = read_mtx_file(graph_file) >>> sources = cudf.Series(M.row) >>> destinations = cudf.Series(M.col) >>> G = cuGraph.Graph() >>> G.add_edge_list(sources,destinations,None) >>> louvain_parts, modularity_score = cuGraph.louvain(G)
Spectral Clustering¶

cugraph.
spectralBalancedCutClustering
(G, num_clusters, num_eigen_vects=2, evs_tolerance=.00001, evs_max_iter=100, kmean_tolerance=.00001, kmean_max_iter=100)¶ Compute a clustering/partitioning of the given graph using the spectral balanced cut method.
 Parameters
 GcuGraph.Graph
cuGraph graph descriptor
 num_clustersinteger
Specifies the number of clusters to find
 num_eigen_vectsinteger
Specifies the number of eigenvectors to use. Must be lower or equal to num_clusters.
 evs_tolerance: float
Specifies the tolerance to use in the eigensolver
 evs_max_iter: integer
Specifies the maximum number of iterations for the eigensolver
 kmean_tolerance: float
Specifies the tolerance to use in the kmeans solver
 kmean_max_iter: integer
Specifies the maximum number of iterations for the kmeans solver
 Returns
 DFGPU data frame containing two cudf.Series of size V: the vertex identifiers and the
 corresponding cluster assignments.
DF[‘vertex’] contains the vertex identifiers DF[‘cluster’] contains the cluster assignments

cugraph.
spectralModularityMaximizationClustering
(G, num_clusters, num_eigen_vects=2, evs_tolerance=.00001, evs_max_iter=100, kmean_tolerance=.00001, kmean_max_iter=100)¶ Compute a clustering/partitioning of the given graph using the spectral modularity maximization method.
 Parameters
 GcuGraph.Graph
cuGraph graph descriptor
 num_clustersinteger
Specifies the number of clusters to find
 num_eigen_vectsinteger
Specifies the number of eigenvectors to use. Must be lower or equal to num_clusters
 evs_tolerance: float
Specifies the tolerance to use in the eigensolver
 evs_max_iter: integer
Specifies the maximum number of iterations for the eigensolver
 kmean_tolerance: float
Specifies the tolerance to use in the kmeans solver
 kmean_max_iter: integer
Specifies the maximum number of iterations for the kmeans solver
 Returns
 DFGPU data frame containing two cudf.Series of size V: the vertex identifiers and
 the corresponding cluster assignments.
DF[‘vertex’] contains the vertex identifiers DF[‘cluster’] contains the cluster assignments

cugraph.
analyzeClustering_modularity
(G, n_clusters, clustering)¶ Compute the modularity score for a partitioning/clustering
 Parameters
 GcuGraph.Graph
cuGraph graph descriptor
 n_clustersinteger
Specifies the number of clusters in the given clustering
 clusteringcudf.Series
The cluster assignment to analyze.
 Returns
 ——
 scorefloat
The computed modularity score

cugraph.
analyzeClustering_edge_cut
(G, n_clusters, clustering)¶ Compute the edge cut score for a partitioning/clustering
 Parameters
 GcuGraph.Graph
cuGraph graph descriptor
 n_clustersinteger
Specifies the number of clusters in the given clustering
 clusteringcudf.Series
The cluster assignment to analyze.
 Returns
 ——
 scorefloat
The computed edge cut score

cugraph.
analyzeClustering_ratio_cut
(G, n_clusters, clustering)¶ Compute the ratio cut score for a partitioning/clustering
 Parameters
 GcuGraph.Graph
cuGraph graph descriptor
 n_clustersinteger
Specifies the number of clusters in the given clustering
 clusteringcudf.Series
The cluster assignment to analyze.
 Returns
 ——
 scorefloat
The computed ratio cut score
Sssp¶

cugraph.
sssp
(G, source)¶ Compute the distance from the specified source to all vertices in the connected component. The distances column will store the distance from the source to each vertex.
 Parameters
 graphcuGraph.Graph
cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). The transposed adjacency list will be computed if not already present.
 sourceint
Index of the source vertex
 Returns
 distances :
GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding SSSP distances.
Examples
>>> M = read_mtx_file(graph_file) >>> sources = cudf.Series(M.row) >>> destinations = cudf.Series(M.col) >>> G = cuGraph.Graph() >>> G.add_edge_list(sources,destinations,None) >>> distances = cuGraph.sssp(G, source)