Device allows a personal computer to process huge graphs
With novel system, data scientists can analyze massive networks
without the need for power-hungry servers.
May 31, 2018
data-science parlance, graphs are structures of nodes and
connecting lines that are used to map scores of complex data
relationships. Analyzing graphs is useful for a broad range of
applications, such as ranking webpages, analyzing social
networks for political insights, or plotting neuron structures
in the brain.
Consisting of billions of nodes and lines, however, large graphs
can reach terabytes in size. The graph data are typically
processed in expensive dynamic random access memory (DRAM)
across multiple power-hungry servers.
Researchers from MIT’s Computer Science and Artificial
Intelligence Laboratory (CSAIL) have now designed a device that
uses cheap flash storage — the type used in smartphones — to
process massive graphs using only a single personal computer.
Flash storage is typically far slower than DRAM at processing
graph data. But the researchers developed a device consisting of
a flash chip array and computation “accelerator,” that helps
flash achieves DRAM-like performance.
Powering the device is a novel algorithm that sorts all access
requests for graph data into a sequential order that flash can
access quickly and easily. It also merges some requests to
reduce the overhead — the combined computation time, memory,
bandwidth, and other computing resources — of sorting.
The researchers ran the device against several traditional
high-performance systems processing several large graphs,
including the massive Web Data Commons Hyperlink Graph, which
has 3.5 billion nodes and 128 billion connecting lines. To
process that graph, the traditional systems all required a
server that cost thousands of dollars and contained 128
gigabytes of DRAM. The researchers achieved the same performance
by plugging two of their devices — totaling 1 gigabyte of DRAM
and 1 terabyte of flash — into a desktop computer. Moreover, by
combining several devices, they could process massive graphs —
up to 4 billion nodes and 128 billion connecting lines — that no
other system could handle on the 128-gigabyte server.
“The bottom line is that we can maintain performance with much
smaller, fewer, and cooler — as in temperature and power
consumption — machines,” says Sang-Woo Jun, a CSAIL graduate
student and first author on a paper describing the device, which
is being presented at the International Symposium on Computer
The device could be used to cut costs and energy associated with
graph analytics, and even improve performance, in a broad range
of applications. The researchers, for instance, are currently
creating a program that could identify genes that cause cancers.
Major tech companies such as Google could also leverage the
devices to reduce their energy footprint by using far fewer
machines to run analytics.
“Graph processing is such a general idea,” says co-author Arvind,
the Johnson Professor in Computer Science Engineering. “What
does page ranking have in common with gene detection? For us,
it’s the same computation problem — just different graphs with
different meanings. The type of application someone develops
will determine the impact it has on society.”
Paper co-authors are CSAIL graduate student Shuotao Xu, and Andy
Wright and Sizhuo Zhang, two graduate students in CSAIL and the
Department of Electrical Engineering and Computer Science.
Sort and reduce
In graph analytics, a system will basically search for and
update a node’s value based on its connections with other nodes,
among other metrics. In webpage ranking, for instance, each node
represents a webpage. If node A has a high value and connects to
node B, then node B’s value will also increase.
Traditional systems store all graph data in DRAM, which makes
them fast at processing the data but also expensive and
power-hungry. Some systems offload some data storage to flash,
which is cheaper but slower and less efficient, so they still
require substantial amounts of DRAM.
The researchers’ device runs on what the researchers call a
“sort-reduce” algorithm, which solves a major issue with using
flash as the primary storage source: waste.
Graph analytics systems require access to nodes that may be very
far from one another across a massive, sparse graph structure.
Systems generally request direct access to, say, 4 to 8 bytes of
data to update a node’s value. DRAM provides that direct access
very quickly. Flash, however, only accesses data in 4- to
8-kilobyte chunks, but still only updates a few bytes. Repeating
that access for every request while jumping across the graph
wastes bandwidth. “If you need to access the entire 8 kilobytes,
and use only 8 bytes and toss the rest, you end up throwing
1,000 times performance away,” Jun says.
The sort-reduce algorithm instead takes all direct access
requests and sorts them in sequential order by identifiers,
which show the destination of the request — such as grouping
together all updates for node A, all for node B, and so on.
Flash can then access kilobyte-sized chunks of thousands of
requests at once, making it far more efficient.
To further save computation power and bandwidth, the algorithm
simultaneously merges the data into the smallest groupings
possible. Whenever the algorithm notes matching identifiers, it
sums those into a single data packet — such as A1 and A2
becoming A3. It continues doing so, creating increasingly
smaller packets of data with matching identifiers, until it
produces the smallest possible packet to sort. This drastically
reduces the amount of duplicate requests to access.
Using the sort-reduce algorithm on two large graphs, the
researchers reduced the total data that needed to be updated in
flash by about 90 percent.
The sort-reduce algorithm is computation-intensive for a host
computer, however, so the researchers implemented a custom
accelerator in the device. The accelerator acts as a midway
point between the host and flash chips, executing all
computation for the algorithm. This offloads so much power to
the accelerator that the host can be a low-powered PC or laptop
that manages sorted data and executes other minor tasks.
“Accelerators are supposed to help the host compute, but we’ve
come so far [with the computations] that the host becomes
unimportant,” Arvind says.
MIT work shows a new way to perform analytics on very large
graphs: Their work exploits flash memory to store the graphs and
exploits ‘field-programmable gate arrays’ [custom integrated
circuits] in an ingenious way to perform both the analytics and
the data processing required to use flash memory effectively,”
says Keshav Pingali, a professor of computer science at the
University of Texas at Austin. “In the long run, this may lead
to systems that can process large amounts of data efficiently on
laptops or desktops, which will revolutionize how we do big-data
Because the host can be so low-powered, Jun says, a long-term
goal is to create a general-purpose platform and software
library for consumers to develop their own algorithms for
applications beyond graph analytics. “You could plug this
platform into a laptop, download [the software], and write
simple programs to get server-class performance on your laptop,”