Publications

Publications / Conference Poster

Using vertex-centric programming platforms to implement SPARQL queries on large graphs

Goodman, Eric G.; Grunwald, Dirk

In this paper we explore the fusion of two largely disparate but related communities, that of Big Data and the Semantic Web. Due to the rise of large real-world graph datasets, a number of graph-centric parallel platforms have been proposed and developed. Many of these platforms, notable among them Pregel, Giraph, GraphLab, GraphChi, the Graph Processing System, and GraphX, present a programming interface that is vertexcentric, a variant of Valiant's Bulk Synchronous Parallel model. These platforms seek to address growing analytical needs for very large graph datasets arising from a variety of sources, such as social, biological, and computer networks. With this growing interest in large graphs, there has also been a concomitant rise in the Semantic Web, which describes data in terms of subjectpredicate- object triples, or in other words edges of a graph where the predicate is a directed labeled edge between the two vertices, the subject and object. Despite the graph-oriented nature of Semantic Web data, and the advent of an increasingly large web of data, no one has explored the usage of these maturing graph platforms to analyze Semantic Web data. In this paper we outline a method of implementing SPARQL queries within the GraphLab framework, obtaining good scaling to the size of our system, 51 nodes.