The User Guide of FlashGraphR
Load and initialize FlashGraphR
Users can call fg.set.conf
to initialize FlashGraphR. Users can specify the number of threads to use and enable FlashGraphR to use SSDs to process large graphs. Here shows an example of a configuration file. This page explains important parameters in the configuration file.
> library(FlashGraphR)
> fg.set.conf("/path/to/conf/file")
Load a graph to FlashGraphR
Before running any graph algorithms on a graph, users need to first load the graph to FlashGraphR. There are multiple ways of loading a graph to FlashGraphR.
- Load a graph from a text file that contains an edge list in Linux filesystem:
g <- fg.load.graph("/path/to/edge/list/file")
. FlashGraphR reads the graph in the edge list file and converts it into the FlashGraph format. The graph is kept in memory. The function returns a FlashGraphR object. To utilize SSDs, a user should passin.mem=FALSE
to this function. In this case, the graph is constructed and stored on SSDs. - Load a graph from a file that contains a graph in the FlashGraph format:
g <- fg.load.graph("/path/to/adj/list/file", index="/path/to/index/file")
. A graph in the FlashGraph format has two parts: the graph file and the index file. When users specify an index file, FlashGraphR assumes that the input graph file is in the FlashGraph format. FlashGraphR reads the graph and its index and keeps them in memory. The function returns a FlashGraphR object.
The graph algorithms
Once a graph is loaded to FlashGraphR, users can perform graph algorithms on the graph. The graph applications supported by FlashGraph are listed below:
- weakly connected components:
fg.clusters(g, mode="weak")
- strongly connected components:
fg.clusters(g, mode="strong")
- graph transitivity:
fg.transitivity(g)
- PageRank:
fg.page.rank(g)
- triangle counting:
fg.directed.triangles(g)
andfg.undirected.triangles(g)
- scan statistics:
fg.topK.scan(g)
andfg.local.scan(g)
- coreness:
fg.coreness(g)
- diameter estimation:
fg.diameter(g)
- spectral embedding:
fg.spectral.embedding(g, nev)
. It supports performing spectral embedding on an adjacency matrix, a Laplacian matrix and a normalized Laplacian matrix.
Other functions
fg.list.graphs
: list all graphs loaded to FlashGraphR.fg.exist.graph
: test if a graph has been loaded to FlashGraphR.fg.vcount
: get the number of vertices in a graph.fg.ecount
: get the number of edges in a graph.fg.in.mem
: whether a graph is stored in memory or on disks.fg.is.directed
: whether a graph is directed.fg.degree
: get the degrees of all vertices.fg.fetch.subgraph
: extract an induced subgraph.
Examples of using FlashGraphR
Compute spectral embedding on the largest connected component in a graph.
library(FlashGraphR)
fg.set.conf("flash-graph/conf/run_test.txt")
# Load the Facebook graph as an undirected graph.
fg <- fg.load.graph("/mnt/nfs/graph-data/facebook_combined.txt", directed=FALSE)
# Get the strongly connected components.
# It returns an array whose elements are the component Ids of the vertices.
cc <- fg.clusters(fg, "weakly")
# Get the size of each component
counts <- as.data.frame(fm.table(cc[cc >= 0]))
# Get the largest component ID
lcc.id <- as.vector(counts$val)[which.max(as.vector(counts$Freq))]
# Get all vertices in the largest component
lcc.v <- which(cc == lcc.id)
# Get the induced subgraph with all vertices in the largest components
# The function outputs a graph in the FlashGraph format.
sub.fg <- fg.fetch.subgraph(fg, vertices=lcc.v, compress=FALSE)
# Compute spectral embedding on the largest connected component.
res <- fg.spectral.embedding(sub.fg, 10, which="A")
# Use kmeans to cluster the vertices.
clus.res <- kmeans(res$vectors, 10)