Stochastic Optimization for Large-Scale Tensor Decomposition
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
SIAM Journal on Scientific Computing
In this paper, we develop software for decomposing sparse tensors that is portable to and performant on a variety of multicore, manycore, and GPU computing architectures. The result is a single code whose performance matches optimized architecture-specific implementations. The key to a portable approach is to determine multiple levels of parallelism that can be mapped in different ways to different architectures, and we explain how to do this for the matricized tensor times Khatri-Rao product (MTTKRP), which is the key kernel in canonical polyadic tensor decomposition. Our implementation leverages the Kokkos framework, which enables a single code to achieve high performance across multiple architectures that differ in how they approach fine-grained parallelism. We also introduce a new construct for portable thread-local arrays, which we call compile-time polymorphic arrays. Not only are the specifics of our approaches and implementation interesting for tuning tensor computations, but they also provide a roadmap for developing other portable high-performance codes. As a last step in optimizing performance, we modify the MTTKRP algorithm itself to do a permuted traversal of tensor nonzeros to reduce atomic-write contention. We test the performance of our implementation on 16- and 68-core Intel CPUs and the K80 and P100 NVIDIA GPUs, showing that we are competitive with state-of-the-art architecture-specific codes while having the advantage of being able to run on a variety of architectures.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Journal of Complex Networks
Because hyperbolic space has properties that make it amenable to graph representations, there is significant interest in scalable hyperbolic-space embedding methods. These embeddings enable constant-time approximation of shortest-path distances, and so are significantly more efficient than full shortest-path computations. In this article, we improve on existing landmark-based hyperbolic embedding algorithms for large-scale graphs. Whereas previous methods compute the embedding by using the derivative-free Nelder- Mead simplex optimization method, our approach uses the limited-memoryBFGS(LBFGS) method, which is quasi-Newton optimization, with analytic gradients. Our method is not only significantly faster but also produces higher-quality embeddings. Moreover, we are able to include the hyperbolic curvature as a variable in the optimization. We compare our hyperbolic embedding method implementation in Python (called Hypy) against the best publicly available software, Rigel. Our method is an order of magnitude faster and shows significant improvements in the accuracy of the shortest-path distance calculations. Tests are performed on a variety of real-world networks, and we show the scalability of our method by embedding a graph with 1.8 billion edges and 65 million nodes.
Neuron
Perceptions, thoughts, and actions unfold over millisecond timescales, while learned behaviors can require many days to mature. While recent experimental advances enable large-scale and long-term neural recordings with high temporal fidelity, it remains a formidable challenge to extract unbiased and interpretable descriptions of how rapid single-trial circuit dynamics change slowly over many trials to mediate learning. We demonstrate a simple tensor component analysis (TCA) can meet this challenge by extracting three interconnected, low-dimensional descriptions of neural data: neuron factors, reflecting cell assemblies; temporal factors, reflecting rapid circuit dynamics mediating perceptions, thoughts, and actions within each trial; and trial factors, describing both long-term learning and trial-to-trial changes in cognitive state. We demonstrate the broad applicability of TCA by revealing insights into diverse datasets derived from artificial neural networks, large-scale calcium imaging of rodent prefrontal cortex during maze navigation, and multielectrode recordings of macaque motor cortex during brain machine interface learning. Williams et al. describe an unsupervised method to uncover simple structure in large-scale recordings by extracting distinct cell assemblies with rapid within-trial dynamics, reflecting interpretable aspects of perceptions, actions, and thoughts, and slower across-trial dynamics reflecting learning and internal state changes.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Through long-term investments in computing, algorithms, facilities, and instrumentation, DOE is an established leader in massive-scale, high-fidelity simulations, as well as science-leading experimentation. In both cases, DOE is generating more data than it can analyze and the problem is intensifying quickly. The need for advanced algorithms that can automatically convert the abundance of data into a wealth of useful information by discovering hidden structures is well recognized. Such efforts however, are hindered by the massive volume of the data and its high velocity. Here, the challenge is developing unsupervised learning methods to discover hidden structure in high-volume, high-velocity data.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.