cuGraph API Reference¶
Structure¶
Graph¶

class
cugraph.structure.graph.
Graph
¶ cuGraph graph class containing basic graph creation and transformation operations.
Methods
add_adj_list
(self, offset_col, index_col[, …])Initialize a graph from the adjacency list.
add_edge_list
(self, source_col, dest_col[, …])Initialize a graph from the edge list.
add_transposed_adj_list
(self)Compute the transposed adjacency list.
clear
(self)Empty this graph.
degree
(self[, vertex_subset])Compute veretx degree.
degrees
(self[, vertex_subset])Compute veretx indegree and outdegree.
delete_adj_list
(self)Delete the adjacency list.
delete_edge_list
(self)Delete the edge list.
Delete the transposed adjacency list.
get_two_hop_neighbors
(self)Compute vertex pairs that are two hops apart.
in_degree
(self[, vertex_subset])Compute veretx indegree.
number_of_edges
(self)Get the number of edges in the graph.
number_of_nodes
(self)An alias of number_of_vertices().
number_of_vertices
(self)Get the number of vertices in the graph.
out_degree
(self[, vertex_subset])Compute veretx outdegree.
view_adj_list
(self)Display the adjacency list.
view_edge_list
(self)Display the edge list.
view_transposed_adj_list
(self)Display the transposed adjacency list.

add_adj_list
(self, offset_col, index_col, value_col=None, copy=False)¶ Initialize a graph from the adjacency list. It is an error to call this method on an initialized Graph object. The passed offset_col and index_col arguments wrap gdf_column objects that represent a graph using the adjacency list format. If value_col is None, an unweighted graph is created. If value_col is not None, a weighted graph is created. If copy is False, this function stores references to the passed objects pointed by offset_col and index_col. If copy is True, this funcion stores references to the deepcopies of the passed objects pointed by offset_col and index_col. Undirected edges must be stored as directed edges in both directions.
 Parameters
 offset_colcudf.Series
This cudf.Series wraps a gdf_column of size V + 1 (V: number of vertices). The gdf column contains the offsets for the vertices in this graph. Offsets must be in the range [0, E] (E: number of edges).
 index_colcudf.Series
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the destination index for each edge. Destination indices must be in the range [0, V) (V: number of vertices).
 value_colcudf.Series, optional
This pointer can be
None
. If not, this cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the weight value for each edge. The expected type of the gdf_column element is floating point number.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> M = M.to_pandas() >>> M = scipy.sparse.coo_matrix((M['2'],(M['0'],M['1']))) >>> M = M.tocsr() >>> offsets = cudf.Series(M.indptr) >>> indices = cudf.Series(M.indices) >>> G = cugraph.Graph() >>> G.add_adj_list(offsets, indices, None)

add_edge_list
(self, source_col, dest_col, value_col=None, copy=False)¶ Initialize a graph from the edge list. It is an error to call this method on an initialized Graph object. The passed source_col and dest_col arguments wrap gdf_column objects that represent a graph using the edge list format. Source and destination indices must be in the range [0, V) where V is the number of vertices. They must be 32 bit integers. Please refer to cuGraph’s renumbering feature if your input does not match these requierments. When using cudf.read_csv to load a CSV edge list, make sure to set dtype to int32 for the source and destination columns. If value_col is None, an unweighted graph is created. If value_col is not None, a weighted graph is created. If copy is False, this function stores references to the passed objects pointed by source_col and dest_col. If copy is True, this funcion stores references to the deepcopies of the passed objects pointed by source_col and dest_col. Undirected edges must be stored as directed edges in both directions.
 Parameters
 source_colcudf.Series
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the source index for each edge. Source indices must be in the range [0, V) (V: number of vertices). Source indices must be 32 bit integers.
 dest_colcudf.Series
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the destination index for each edge. Destination indices must be in the range [0, V) (V: number of vertices). Destination indices must be 32 bit integers.
 value_colcudf.Series, optional
This pointer can be
None
. If not, this cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the weight value for each edge. The expected type of the gdf_column element is floating point number.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None)

add_transposed_adj_list
(self)¶ Compute the transposed adjacency list. It is an error to call this method on an uninitialized Graph object or a Graph object without an existing edge list.

clear
(self)¶ Empty this graph. This function is added for NetworkX compatibility.

degree
(self, vertex_subset=None)¶ Compute veretx degree. By default, this method computes vertex degrees for the entire set of vertices. If vertex_subset is provided, this method optionally filters out all but those listed in vertex_subset.
 Parameters
 vertex_subsetcudf.Series or iterable container, optional
A container of vertices for displaying corresponding degree. If not set, degrees are computed for the entire set of vertices.
 Returns
 dfcudf.DataFrame
GPU data frame of size N (the default) or the size of the given vertices (vertex_subset) containing the degree. The ordering is relative to the adjacency list, or that given by the specified vertex_subset.
 df[‘vertex’]cudf.Series
The vertex IDs (will be identical to vertex_subset if specified).
 df[‘degree’]cudf.Series
The computed degree of the corresponding vertex.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> df = G.degree([0,9,12])

degrees
(self, vertex_subset=None)¶ Compute veretx indegree and outdegree. By default, this method computes vertex degrees for the entire set of vertices. If vertex_subset is provided, this method optionally filters out all but those listed in vertex_subset.
 Parameters
 vertex_subsetcudf.Series or iterable container, optional
A container of vertices for displaying corresponding degree. If not set, degrees are computed for the entire set of vertices.
 Returns
 dfcudf.DataFrame
 df[‘vertex’]cudf.Series
The vertex IDs (will be identical to vertex_subset if specified).
 df[‘in_degree’]cudf.Series
The indegree of the vertex.
 df[‘out_degree’]cudf.Series
The outdegree of the vertex.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> df = G.degrees([0,9,12])

delete_adj_list
(self)¶ Delete the adjacency list.

delete_edge_list
(self)¶ Delete the edge list.

delete_transposed_adj_list
(self)¶ Delete the transposed adjacency list.

get_two_hop_neighbors
(self)¶ Compute vertex pairs that are two hops apart. The resulting pairs are sorted before returning.
 Returns
 dfcudf.DataFrame
 df[‘first’]cudf.Series
the first vertex id of a pair.
 df[‘second’]cudf.Series
the second vertex id of a pair.

in_degree
(self, vertex_subset=None)¶ Compute veretx indegree. Vertex indegree is the number of edges pointing into the vertex. By default, this method computes vertex degrees for the entire set of vertices. If vertex_subset is provided, this method optionally filters out all but those listed in vertex_subset.
 Parameters
 vertex_subsetcudf.Series or iterable container, optional
A container of vertices for displaying corresponding indegree. If not set, degrees are computed for the entire set of vertices.
 Returns
 dfcudf.DataFrame
GPU data frame of size N (the default) or the size of the given vertices (vertex_subset) containing the in_degree. The ordering is relative to the adjacency list, or that given by the specified vertex_subset.
 df[‘vertex’]cudf.Series
The vertex IDs (will be identical to vertex_subset if specified).
 df[‘degree’]cudf.Series
The computed indegree of the corresponding vertex.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> df = G.in_degree([0,9,12])

number_of_edges
(self)¶ Get the number of edges in the graph.

number_of_nodes
(self)¶ An alias of number_of_vertices(). This function is added for NetworkX compatibility.

number_of_vertices
(self)¶ Get the number of vertices in the graph.

out_degree
(self, vertex_subset=None)¶ Compute veretx outdegree. Vertex outdegree is the number of edges pointing out from the vertex. By default, this method computes vertex degrees for the entire set of vertices. If vertex_subset is provided, this method optionally filters out all but those listed in vertex_subset.
 Parameters
 vertex_subsetcudf.Series or iterable container, optional
A container of vertices for displaying corresponding outdegree. If not set, degrees are computed for the entire set of vertices.
 Returns
 dfcudf.DataFrame
GPU data frame of size N (the default) or the size of the given vertices (vertex_subset) containing the out_degree. The ordering is relative to the adjacency list, or that given by the specified vertex_subset.
 df[‘vertex’]cudf.Series
The vertex IDs (will be identical to vertex_subset if specified).
 df[‘degree’]cudf.Series
The computed outdegree of the corresponding vertex.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> df = G.out_degree([0,9,12])

view_adj_list
(self)¶ Display the adjacency list. Compute it if needed.
 Returns
 offset_colcudf.Series
This cudf.Series wraps a gdf_column of size V + 1 (V: number of vertices). The gdf column contains the offsets for the vertices in this graph. Offsets are in the range [0, E] (E: number of edges).
 index_colcudf.Series
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the destination index for each edge. Destination indices are in the range [0, V) (V: number of vertices).
 value_colcudf.Series or
None
This pointer is
None
for unweighted graphs. For weighted graphs, this cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the weight value for each edge. The expected type of the gdf_column element is floating point number.

view_edge_list
(self)¶ Display the edge list. Compute it if needed.
 Returns
 source_colcudf.Series
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the source index for each edge. Source indices are in the range [0, V) (V: number of vertices). Source indices must be 32 bit integers.
 dest_colcudf.Series
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the destination index for each edge. Destination indices are in the range [0, V) (V: number of vertices). Destination indices must be 32 bit integers.
 value_colcudf.Series or
None
This pointer is
None
for unweighted graphs. For weighted graphs, this cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the weight value for each edge. The expected type of the gdf_column element is floating point number.

view_transposed_adj_list
(self)¶ Display the transposed adjacency list. Compute it if needed.
 Returns
 offset_colcudf.Series
This cudf.Series wraps a gdf_column of size V + 1 (V: number of vertices). The gdf column contains the offsets for the vertices in this graph. Offsets are in the range [0, E] (E: number of edges).
 index_colcudf.Series
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the source index for each edge. Source indices are in the range [0, V) (V: number of vertices).
 value_colcudf.Series or
None
This pointer is
None
for unweighted graphs. For weighted graphs, this cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the weight value for each edge. The expected type of the gdf_column element is floating point number.

Renumbering¶

cugraph.structure.renumber.
renumber
(source_col, dest_col)¶ Take a (potentially sparse) set of source and destination vertex ids and renumber the vertices to create a dense set of vertex ids using all values contiguously from 0 to the number of unique vertices  1.
Input columns can be either int64 or int32. The output will be mapped to int32, since many of the cugraph functions are limited to int32. If the number of unique values in source_col and dest_col > 2^311 then this function will return an error.
Return from this call will be three cudf Series  the renumbered source_col, the renumbered dest_col and a numbering map that maps the new ids to the original ids.
 Parameters
 source_colcudf.Series
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the source index for each edge. Source indices must be an integer type.
 dest_colcudf.Series
This cudf.Series wraps a gdf_column of size E (E: number of edges). The gdf column contains the destination index for each edge. Destination indices must be an integer type.
 numbering_mapcudf.Series
This cudf.Series wraps a gdf column of size V (V: number of vertices). The gdf column contains a numbering map that mpas the new ids to the original ids.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> source_col, dest_col, numbering_map = cugraph.renumber(sources, >>> destinations) >>> G = cugraph.Graph() >>> G.add_edge_list(source_col, dest_col, None)
Conversion from Other Formats¶

cugraph.structure.convert_matrix.
from_cudf_edgelist
(df, source='source', target='target', weight=None)¶ Return a new graph created from the edge list representaion. This function is added for NetworkX compatibility (this function is a RAPIDS version of NetworkX’s from_pandas_edge_list()).
 Parameters
 dfcudf.DataFrame
This cudf.DataFrame contains columns storing edge source vertices, destination (or target following NetworkX’s terminology) vertices, and (optional) weights.
 sourcestring or integer
This is used to index the source column.
 targetstring or integer
This is used to index the destination (or target following NetworkX’s terminology) column.
 weightstring or integer, optional
This pointer can be
None
. If not, this is used to index the weight column.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> G = cugraph.Graph() >>> G = cugraph.from_cudf_edgelist(M, source='0', target='1', weight='2')
Community¶
Louvain¶

cugraph.community.louvain.
louvain
(input_graph)¶ Compute the modularity optimizing partition of the input graph using the Louvain heuristic
 Parameters
 input_graphcugraph.Graph
cuGraph graph descriptor, should contain the connectivity information as an edge list. The adjacency list will be computed if not already present. The graph should be undirected where an undirected edge is represented by a directed edge in both direction.
 Returns
 partscudf.DataFrame
GPU data frame of size V containing two columns the vertex id and the partition id it is assigned to.
 modularity_scorefloat
a floating point number containing the modularity score of the partitioning.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> parts, modularity_score = cugraph.louvain(G)
Spectral Clustering¶

cugraph.community.spectral_clustering.
analyzeClustering_edge_cut
(G, n_clusters, clustering)¶ Compute the edge cut score for a partitioning/clustering
 Parameters
 Gcugraph.Graph
cuGraph graph descriptor
 n_clustersinteger
Specifies the number of clusters in the given clustering
 clusteringcudf.Series
The cluster assignment to analyze.
 Returns
 scorefloat
The computed edge cut score
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> df = cugraph.spectralBalancedCutClustering(G, 5) >>> score = cugraph.analyzeClustering_edge_cut(G, 5, df['cluster'])

cugraph.community.spectral_clustering.
analyzeClustering_modularity
(G, n_clusters, clustering)¶ Compute the modularity score for a partitioning/clustering
 Parameters
 Gcugraph.Graph
cuGraph graph descriptor. This graph should have edge weights.
 n_clustersinteger
Specifies the number of clusters in the given clustering
 clusteringcudf.Series
The cluster assignment to analyze.
 Returns
 scorefloat
The computed modularity score
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> values = cudf.Series(M['2']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, values) >>> df = cugraph.spectralBalancedCutClustering(G, 5) >>> score = cugraph.analyzeClustering_modularity(G, 5, df['cluster'])

cugraph.community.spectral_clustering.
analyzeClustering_ratio_cut
(G, n_clusters, clustering)¶ Compute the ratio cut score for a partitioning/clustering
 Parameters
 Gcugraph.Graph
cuGraph graph descriptor. This graph should have edge weights.
 n_clustersinteger
Specifies the number of clusters in the given clustering
 clusteringcudf.Series
The cluster assignment to analyze.
 Returns
 scorefloat
The computed ratio cut score
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> values = cudf.Series(M['2']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, values) >>> df = cugraph.spectralBalancedCutClustering(G, 5) >>> score = cugraph.analyzeClustering_ratio_cut(G, 5, df['cluster'])

cugraph.community.spectral_clustering.
spectralBalancedCutClustering
(G, num_clusters, num_eigen_vects=2, evs_tolerance=1e05, evs_max_iter=100, kmean_tolerance=1e05, kmean_max_iter=100)¶ Compute a clustering/partitioning of the given graph using the spectral balanced cut method.
 Parameters
 Gcugraph.Graph
cuGraph graph descriptor
 num_clustersinteger
Specifies the number of clusters to find
 num_eigen_vectsinteger
Specifies the number of eigenvectors to use. Must be lower or equal to num_clusters.
 evs_tolerance: float
Specifies the tolerance to use in the eigensolver
 evs_max_iter: integer
Specifies the maximum number of iterations for the eigensolver
 kmean_tolerance: float
Specifies the tolerance to use in the kmeans solver
 kmean_max_iter: integer
Specifies the maximum number of iterations for the kmeans solver
 Returns
 dfcudf.DataFrame
GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding cluster assignments.
 df[‘vertex’]cudf.Series
contains the vertex identifiers
 df[‘cluster’]cudf.Series
contains the cluster assignments
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> df = cugraph.spectralBalancedCutClustering(G, 5)

cugraph.community.spectral_clustering.
spectralModularityMaximizationClustering
(G, num_clusters, num_eigen_vects=2, evs_tolerance=1e05, evs_max_iter=100, kmean_tolerance=1e05, kmean_max_iter=100)¶ Compute a clustering/partitioning of the given graph using the spectral modularity maximization method.
 Parameters
 Gcugraph.Graph
cuGraph graph descriptor. This graph should have edge weights.
 num_clustersinteger
Specifies the number of clusters to find
 num_eigen_vectsinteger
Specifies the number of eigenvectors to use. Must be lower or equal to num_clusters
 evs_tolerance: float
Specifies the tolerance to use in the eigensolver
 evs_max_iter: integer
Specifies the maximum number of iterations for the eigensolver
 kmean_tolerance: float
Specifies the tolerance to use in the kmeans solver
 kmean_max_iter: integer
Specifies the maximum number of iterations for the kmeans solver
 Returns
 dfcudf.DataFrame
 df[‘vertex’]cudf.Series
contains the vertex identifiers
 df[‘cluster’]cudf.Series
contains the cluster assignments
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> values = cudf.Series(M['2']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, values) >>> df = cugraph.spectralModularityMaximizationClustering(G, 5)
Subgraph Extraction¶

cugraph.community.subgraph_extraction.
subgraph
(G, vertices)¶ Compute a subgraph of the existing graph including only the specified vertices. This algorithm works for both directed and undirected graphs, it does not actually traverse the edges, simply pulls out any edges that are incident on vertices that are both contained in the vertices list.
 Parameters
 Gcugraph.Graph
cuGraph graph descriptor
 verticescudf.Series
Specifies the vertices of the induced subgraph
 Returns
 Sgcugraph.Graph
A graph object containing the subgraph induced by the given vertex set.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> verts = numpy.zeros(3, dtype=numpy.int32) >>> verts[0] = 0 >>> verts[1] = 1 >>> verts[2] = 2 >>> sverts = cudf.Series(verts) >>> Sg = cugraph.subgraph(G, sverts)
Tirangle Counting¶

cugraph.community.triangle_count.
triangles
(G)¶ Compute the triangle (number of cycles of length three) count of the input graph.
 Parameters
 Gcugraph.graph
cuGraph graph descriptor, should contain the connectivity information, (edge weights are not used in this algorithm)
 Returns
 countint64
A 64 bit integer whose value gives the number of triangles in the graph.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> count = cugraph.triangles(G)
Components¶
Connected Components¶

cugraph.components.connectivity.
strongly_connected_components
(G)¶ Generate the stronlgly connected components and attach a component label to each vertex.
 Parameters
 Gcugraph.Graph
cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). The graph can be either directed or undirected where an undirected edge is represented by a directed edge in both directions. The adjacency list will be computed if not already present. The number of vertices should fit into a 32b int.
 Returns
 dfcudf.DataFrame
df[‘labels’][i] gives the label id of the i’th vertex df[‘vertices’][i] gives the vertex id of the i’th vertex
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources,destinations,None) >>> df = cugraph.strongly_connected_components(G)

cugraph.components.connectivity.
weakly_connected_components
(G)¶ Generate the weakly connected components and attach a component label to each vertex.
 Parameters
 Gcugraph.Graph
cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). Currently, the graph should be undirected where an undirected edge is represented by a directed edge in both directions. The adjacency list will be computed if not already present. The number of vertices should fit into a 32b int.
 Returns
 dfcudf.DataFrame
df[‘labels’][i] gives the label id of the i’th vertex df[‘vertices’][i] gives the vertex id of the i’th vertex
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> df = cugraph.weakly_connected_components(G)
Link Analysis¶
Pagerank¶

cugraph.link_analysis.pagerank.
pagerank
(G, alpha=0.85, personalization=None, max_iter=100, tol=1e05, nstart=None)¶ Find the PageRank vertex values for a graph. cuGraph computes an approximation of the Pagerank eigenvector using the power method. The number of iterations depends on the properties of the network itself; it increases when the tolerance descreases and/or alpha increases toward the limiting value of 1. The user is free to use default values or to provide inputs for the initial guess, tolerance and maximum number of iterations.
 Parameters
 graphcugraph.Graph
cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). The transposed adjacency list will be computed if not already present.
 alphafloat
The damping factor alpha represents the probability to follow an outgoing edge, standard value is 0.85. Thus, 1.0alpha is the probability to “teleport” to a random vertex. Alpha should be greater than 0.0 and strictly lower than 1.0.
 personalizationcudf.Dataframe
GPU Dataframe containing the personalizatoin information.
 personalization[‘vertex’]cudf.Series
Subset of vertices of graph for personalization
 personalization[‘values’]cudf.Series
Personalization values for vertices
 max_iterint
The maximum number of iterations before an answer is returned. This can be used to limit the execution time and do an early exit before the solver reaches the convergence tolerance. If this value is lower or equal to 0 cuGraph will use the default value, which is 100.
 tolerancefloat
Set the tolerance the approximation, this parameter should be a small magnitude value. The lower the tolerance the better the approximation. If this value is 0.0f, cuGraph will use the default value which is 1.0E5. Setting too small a tolerance can lead to nonconvergence due to numerical roundoff. Usually values between 0.01 and 0.00001 are acceptable.
 nstartcudf.Dataframe
GPU Dataframe containing the initial guess for pagerank.
 nstart[‘vertex’]cudf.Series
Subset of vertices of graph for initial guess for pagerank values
 nstart[‘values’]cudf.Series
Pagerank values for vertices
 Returns
 PageRankcudf.DataFrame
GPU data frame containing two cudf.Series of size V: the vertex identifiers and the corresponding PageRank values.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> pr = cugraph.pagerank(G, alpha = 0.85, max_iter = 500, tol = 1.0e05)
Link Prediction¶
Jaccard Coefficient¶

cugraph.link_prediction.jaccard.
jaccard
(input_graph, first=None, second=None)¶ Compute the Jaccard similarity between each pair of vertices connected by an edge, or between arbitrary pairs of vertices specified by the user. Jaccard similarity is defined between two sets as the ratio of the volume of their intersection divided by the volume of their union. In the context of graphs, the neighborhood of a vertex is seen as a set. The Jaccard similarity weight of each edge represents the strength of connection between vertices based on the relative similarity of their neighbors. If first is specified but second is not, or vice versa, an exception will be thrown.
 Parameters
 graphcugraph.Graph
cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). The graph should be undirected where an undirected edge is represented by a directed edge in both direction. The adjacency list will be computed if not already present.
 firstcudf.Series
Specifies the first vertices of each pair of vertices to compute for, must be specified along with second.
 secondcudf.Series
Specifies the second vertices of each pair of vertices to compute for, must be specified along with first.
 Returns
 dfcudf.DataFrame
GPU data frame of size E (the default) or the size of the given pairs (first, second) containing the Jaccard weights. The ordering is relative to the adjacency list, or that given by the specified vertex pairs.
 df[‘source’]cudf.Series
The source vertex ID (will be identical to first if specified)
 df[‘destination’]cudf.Series
The destination vertex ID (will be identical to second if specified)
 df[‘jaccard_coeff’]cudf.Series
The computed Jaccard coefficient between the source and destination vertices
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> df = cugraph.jaccard(G)

cugraph.link_prediction.wjaccard.
jaccard_w
(input_graph, weights, first=None, second=None)¶ Compute the weighted Jaccard similarity between each pair of vertices connected by an edge, or between arbitrary pairs of vertices specified by the user. Jaccard similarity is defined between two sets as the ratio of the volume of their intersection divided by the volume of their union. In the context of graphs, the neighborhood of a vertex is seen as a set. The Jaccard similarity weight of each edge represents the strength of connection between vertices based on the relative similarity of their neighbors. If first is specified but second is not, or vice versa, an exception will be thrown.
 Parameters
 graphcugraph.Graph
cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). The adjacency list will be computed if not already present.
 weightscudf.Series
Specifies the weights to be used for each vertex.
 firstcudf.Series
Specifies the first vertices of each pair of vertices to compute for, must be specified along with second.
 secondcudf.Series
Specifies the second vertices of each pair of vertices to compute for, must be specified along with first.
 Returns
 dfcudf.DataFrame
GPU data frame of size E (the default) or the size of the given pairs (first, second) containing the Jaccard weights. The ordering is relative to the adjacency list, or that given by the specified vertex pairs.
 df[‘source’]cudf.Series
The source vertex ID
 df[‘destination’]cudf.Series
The destination vertex ID
 df[‘jaccard_coeff’]cudf.Series
The computed weighted Jaccard coefficient between the source and destination vertices.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> weights = cudf.Series(numpy.ones( >>> max(sources.max(),destinations.max())+1, dtype=numpy.float32)) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> df = cugraph.jaccard_w(G, weights)
Overlap Coefficient¶

cugraph.link_prediction.overlap.
overlap
(input_graph, first=None, second=None)¶ Compute the Overlap Coefficient between each pair of vertices connected by an edge, or between arbitrary pairs of vertices specified by the user. Overlap Coefficient is defined between two sets as the ratio of the volume of their intersection divided by the smaller of their two volumes. In the context of graphs, the neighborhood of a vertex is seen as a set. The Overlap Coefficient weight of each edge represents the strength of connection between vertices based on the relative similarity of their neighbors. If first is specified but second is not, or vice versa, an exception will be thrown.
 Parameters
 graphcugraph.Graph
cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). The adjacency list will be computed if not already present.
 firstcudf.Series
Specifies the first vertices of each pair of vertices to compute for, must be specified along with second.
 secondcudf.Series
Specifies the second vertices of each pair of vertices to compute for, must be specified along with first.
 Returns
 dfcudf.DataFrame
GPU data frame of size E (the default) or the size of the given pairs (first, second) containing the Overlap coefficients. The ordering is relative to the adjacency list, or that given by the specified vertex pairs.
 df[‘source’]cudf.Series
The source vertex ID (will be identical to first if specified).
 df[‘destination’]cudf.Series
The destination vertex ID (will be identical to second if specified).
 df[‘overlap_coeff’]cudf.Series
The computed Overlap coefficient between the source and destination vertices.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> df = cugraph.overlap(G)

cugraph.link_prediction.woverlap.
overlap_w
(input_graph, weights, first=None, second=None)¶ Compute the weighted Overlap Coefficient between each pair of vertices connected by an edge, or between arbitrary pairs of vertices specified by the user. Overlap Coefficient is defined between two sets as the ratio of the volume of their intersection divided by the smaller of their volumes. In the context of graphs, the neighborhood of a vertex is seen as a set. The Overlap Coefficient weight of each edge represents the strength of connection between vertices based on the relative similarity of their neighbors. If first is specified but second is not, or vice versa, an exception will be thrown.
 Parameters
 input_graphcugraph.Graph
cuGraph graph descriptor, should contain the connectivity information as an edge list (edge weights are not used for this algorithm). The adjacency list will be computed if not already present.
 weightscudf.Series
Specifies the weights to be used for each vertex.
 firstcudf.Series
Specifies the first vertices of each pair of vertices to compute for, must be specified along with second.
 secondcudf.Series
Specifies the second vertices of each pair of vertices to compute for, must be specified along with first.
 Returns
 dfcudf.DataFrame
GPU data frame of size E (the default) or the size of the given pairs (first, second) containing the overlap coefficients. The ordering is relative to the adjacency list, or that given by the specified vertex pairs.
 df[‘source’]cudf.Series
The source vertex ID
 df[‘destination’]cudf.Series
The destination vertex ID
 df[‘overlap_coeff’]cudf.Series
The computed weighted Overlap coefficient between the source and destination vertices.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> weights = cudf.Series(numpy.ones( >>> max(sources.max(),destinations.max())+1, dtype=numpy.float32)) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> df = cugraph.overlap_w(G, weights)
Traversal¶
Breadthfirstsearch¶

cugraph.traversal.bfs.
bfs
(G, start, directed=True)¶ Find the distances and predecessors for a breadth first traversal of a graph.
 Parameters
 Gcugraph.graph
cuGraph graph descriptor, should contain the connectivity information as an adjacency list.
 startInteger
The index of the graph vertex from which the traversal begins
 directedbool
Indicates whether the graph in question is a directed graph, or whether each edge has a corresponding reverse edge. (Allows optimizations if the graph is undirected)
 Returns
 dfcudf.DataFrame
df[‘vertex’][i] gives the vertex id of the i’th vertex df[‘distance’][i] gives the path distance for the i’th vertex from the starting vertex df[‘predecessor’][i] gives for the i’th vertex the vertex it was reached from in the traversal
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> df = cugraph.bfs(G, 0)
Singlesourceshortestpath¶

cugraph.traversal.sssp.
filter_unreachable
(df)¶ Remove unreachable vertices from the result of SSSP or BFS
 Parameters
 dfcudf.DataFrame
cudf.DataFrame that is the output of SSSP or BFS
 Returns
 dffiltered cudf.DataFrame with only reachable vertices
df[‘vertex’][i] gives the vertex id of the i’th vertex. df[‘distance’][i] gives the path distance for the i’th vertex from the starting vertex. df[‘predecessor’][i] gives the vertex that was reached before the i’th vertex in the traversal.

cugraph.traversal.sssp.
sssp
(G, source)¶ Compute the distance and predecessors for shortest paths from the specified source to all the vertices in the graph. The distances column will store the distance from the source to each vertex. The predecessors column will store each vertex’s predecessor in the shortest path. Vertices that are unreachable will have a distance of infinity denoted by the maximum value of the data type and the predecessor set as 1. The source vertex’s predecessor is also set to 1. Graphs with negative weight cycles are not supported.
 Parameters
 graphcuGraph.Graph
cuGraph graph descriptor with connectivity information. Edge weights, if present, should be single or double precision floating point values.
 sourceint
Index of the source vertex.
 Returns
 dfcudf.DataFrame
df[‘vertex’][i] gives the vertex id of the i’th vertex. df[‘distance’][i] gives the path distance for the i’th vertex from the starting vertex. df[‘predecessor’][i] gives the vertex id of the vertex that was reached before the i’th vertex in the traversal.
Examples
>>> M = cudf.read_csv('datasets/karate.csv', delimiter=' ', >>> dtype=['int32', 'int32', 'float32'], header=None) >>> sources = cudf.Series(M['0']) >>> destinations = cudf.Series(M['1']) >>> G = cugraph.Graph() >>> G.add_edge_list(sources, destinations, None) >>> distances = cugraph.sssp(G, 0)