Publications

30 Results
Skip to search filters

Using vertex-centric programming platforms to implement SPARQL queries on large graphs

Proceedings of IA3 2014 - 4th Workshop on Irregular Applications: Architectures and Algorithms, Held in conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Goodman, Eric G.; Grunwald, Dirk

In this paper we explore the fusion of two largely disparate but related communities, that of Big Data and the Semantic Web. Due to the rise of large real-world graph datasets, a number of graph-centric parallel platforms have been proposed and developed. Many of these platforms, notable among them Pregel, Giraph, GraphLab, GraphChi, the Graph Processing System, and GraphX, present a programming interface that is vertexcentric, a variant of Valiant's Bulk Synchronous Parallel model. These platforms seek to address growing analytical needs for very large graph datasets arising from a variety of sources, such as social, biological, and computer networks. With this growing interest in large graphs, there has also been a concomitant rise in the Semantic Web, which describes data in terms of subjectpredicate- object triples, or in other words edges of a graph where the predicate is a directed labeled edge between the two vertices, the subject and object. Despite the graph-oriented nature of Semantic Web data, and the advent of an increasingly large web of data, no one has explored the usage of these maturing graph platforms to analyze Semantic Web data. In this paper we outline a method of implementing SPARQL queries within the GraphLab framework, obtaining good scaling to the size of our system, 51 nodes.

More Details

Irregular large-scale computed tomography on multiple graphics processors improves energy-efficiency metrics for industrial applications

Proceedings of SPIE - The International Society for Optical Engineering

Jimenez, Edward S.; Goodman, Eric G.; Park, Ryeojin; Orr, Laurel J.; Thompson, Kyle R.

This paper will investigate energy-efficiency for various real-world industrial computed-tomography reconstruction algorithms, both CPU- and GPU-based implementations. This work shows that the energy required for a given reconstruction is based on performance and problem size. There are many ways to describe performance and energy efficiency, thus this work will investigate multiple metrics including performance-per-watt, energy-delay product, and energy consumption. This work found that irregular GPU-based approaches1 realized tremendous savings in energy consumption when compared to CPU implementations while also significantly improving the performanceper- watt and energy-delay product metrics. Additional energy savings and other metric improvement was realized on the GPU-based reconstructions by improving storage I/O by implementing a parallel MIMD-like modularization of the compute and I/O tasks.

More Details

Triangle finding: How graph theory can help the semantic web

CEUR Workshop Proceedings

Jimenez, Edward S.; Goodman, Eric G.

RDF data can be thought of as a graph where the subject and objects are vertices and the predicates joining them are edge attributes. Despite decades of research in graph theory, very little of this work has been applied to RDF data sets and it has been largely ignored by the Semantic Web research community. We present a case study of triangle finding, where existing algorithms from graph theory provide excellent complexity bounds, growing at a significantly slower rate than algorithms used within existing RDF triple stores. In order to scale to large volumes of data, the Semantic Web community should look to the many existing graph algorithms.

More Details

Scalable hashing for shared memory supercomputers

Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

Goodman, Eric G.; Lemaster, M.N.; Jimenez, Edward S.

Hashing is a fundamental technique in computer science to allow O(1) insert and lookups of items in an associative array. Here we present several thread coordination and hashing strategies and compare and contrast their performance on large, shared memory symmetric multiprocessor machines, each possessing between a half to a full terabyte of memory. We show how our approach can be used as a key kernel for fundamental paradigms such as dynamic programming and MapReduce. We further show that a set of approaches yields close to linear speedup for both uniform random and more difficult power law distributions. This scalable performance is in spite of the fact that our set of approaches is not completely lock-free. Our experimental results utilize and compare an SGI Altix UV with 4 Xeon processors (32 cores) and a Cray XMT with 128 processors. On the scale of data we addressed, on the order of 5 billion integers, we show that the Altix UV far exceeds the performance of the Cray XMT for power law distributions. However, the Cray XMT exhibits greater scalability. Copyright 2011 ACM.

More Details

High-performance computing applied to semantic databases

Jimenez, Edward S.; Goodman, Eric G.

To-date, the application of high-performance computing resources to Semantic Web data has largely focused on commodity hardware and distributed memory platforms. In this paper we make the case that more specialized hardware can offer superior scaling and close to an order of magnitude improvement in performance. In particular we examine the Cray XMT. Its key characteristics, a large, global shared-memory, and processors with a memory-latency tolerant design, offer an environment conducive to programming for the Semantic Web and have engendered results that far surpass current state of the art. We examine three fundamental pieces requisite for a fully functioning semantic database: dictionary encoding, RDFS inference, and query processing. We show scaling up to 512 processors (the largest configuration we had available), and the ability to process 20 billion triples completely in-memory.

More Details

Scalable in-memory RDFS closure on billions of triples

CEUR Workshop Proceedings

Goodman, Eric G.; Mizell, David

We present an RDFS closure algorithm, specifically designed and implemented on the Cray XMT supercomputer, that obtains inference rates of 13 million inferences per second on the largest system configuration we used. The Cray XMT, with its large global memory (4TB for our experiments), permits the construction of a conceptually straightforward algorithm, fundamentally a series of operations on a shared hash table. Each thread is given a partition of triple data to process, a dedicated copy of the ontology to apply to the data, and a reference to the hash table into which it inserts inferred triples. The global nature of the hash table allows the algorithm to avoid a common obstacle for distributed memory machines: the creation of duplicate triples. On LUBM data sets ranging between 1.3 billion and 5.3 billion triples, we obtain nearly linear speedup except for two portions: file I/O, which can be ameliorated with the additional service nodes, and data structure initialization, which requires nearly constant time for runs involving 32 processors or more.

More Details

High performance semantic factoring of giga-scale semantic graph databases

Goodman, Eric G.; Mackey, Greg

As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deploying that for the analysis of the Billion Triple dataset with respect to its semantic factors, including basic properties, connected components, namespace interaction, and typed paths.

More Details

Hashing strategies for the cray XMT

Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010

Goodman, Eric G.; Haglin, David J.; Scherrer, Chad; Chavarría-Miranda, Daniel; Mogill, Jace; Feo, John

Two of the most commonly used hashing strategies-linear probing and hashing with chaining-are adapted for efficient execution on a Cray XMT. These strategies are designed to minimize memory contention. Datasets that follow a power law distribution cause significant performance challenges to shared memory parallel hashing implementations. Experimental results show good scalability up to 128 processors on two power law datasets with different data types: integer and string. These implementations can be used in a wide range of applications. © 2010 IEEE.

More Details
30 Results
30 Results