TYPE Journal Article YEAR 2020

OSTI DOI

Faster Johnson–Lindenstrauss transforms via Kronecker products

Information and inference (Online)

Jin, Ruhui J.; Ward, Rachel W.; Kolda, Tamara G.

The Kronecker product is an important matrix operation with a wide range of applications in signal processing, graph theory, quantum computing and deep learning. In this work, we introduce a generalization of the fast Johnson–Lindenstrauss projection for embedding vectors with Kronecker product structure, the Kronecker fast Johnson–Lindenstrauss transform (KFJLT). The KFJLT reduces the embedding cost by an exponential factor of the standard fast Johnson–Lindenstrauss transform’s cost when applied to vectors with Kronecker structure, by avoiding explicitly forming the full Kronecker products. Here, we prove that this computational gain comes with only a small price in embedding power: consider a finite set of $p$ points in a tensor product of $d$ constituent Euclidean spaces $\bigotimes _{k=d}^{1}{\mathbb{R}}^{n_k}$, and let $N = \prod _{k=1}^{d}n_k$. With high probability, a random KFJLT matrix of dimension $m \times N$ embeds the set of points up to multiplicative distortion $(1\pm \varepsilon )$ provided $m \gtrsim \varepsilon ^{-2} \, \log ^{2d - 1} (p) \, \log N$. We conclude by describing a direct application of the KFJLT to the efficient solution of large-scale Kronecker-structured least squares problems for fitting the CP tensor decomposition.

More Details

TYPE Journal Article YEAR 2020

OSTI DOI

Mathematics: The Tao of Data Science

Harvard Data Science Review

Kolda, Tamara G.

The two pieces, "Ten Research Challenge Areas in Data Science" by Jeannette M. Wing and “Challenges and Opportunities in Statistics and Data Science: Ten Research Areas” by Xuming He and Xihong Lin, provide an impressively complete list of data science challenges from luminaries in the field of data science. They have done an extraordinary job, so this response offers a complementary viewpoint from a mathematical perspective and evangelizes advanced mathematics as a key tool for meeting the challenges they have laid out. Notably, we pick up the themes of scientific understanding of machine learning and deep learning, computational considerations such as cloud computing and scalability, balancing computational and statistical considerations, and inference with limited data. We propose that mathematics is an important key to establishing rigor in the field of data science and as such has an essential role to play in its future.

More Details

TYPE Journal Article YEAR 2020

OSTI DOI

ENLA Seminar: Practical Leverage-Based Sampling for Low-Rank Tensor Decomposition

Kolda, Tamara G.; Larsen, Brett L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Practical Leverage Score-Based Sampling for Low-Rank Tensor Decompositions

Larsen, Brett L.; Kolda, Tamara G.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

SIAM AN08: Practical Leverage-Based Sampling for Low-Rank Tensor Decomposition

Kolda, Tamara G.; Larsen, Brett L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition

ACM Transactions on Mathematical Software

SIAM Journal on Scientific Computing

Scopus OSTI

An Overview of Tensor Decompositions for Data Analysis with Emphasis on Computation and Scalability

Kolda, Tamara G.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Kolda, Tamara G.; Aksoy, Sinan A.; Pinar, Ali P.; Plantenga, Todd P.; Comandur, Seshadhri C.; Stark, Dylan S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Parallel Tucker Compression for Large-Scale Scientific Data

Kolda, Tamara G.; Ballard, Grey B.; Austin, Woody

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Newton-based optimization for Kullback-Leibler nonnegative tensor factorizations

Optimization Methods and Software

Hansen, Samantha; Plantenga, Todd P.; Kolda, Tamara G.

Tensor factorizations with nonnegativity constraints have found application in analysing data from cyber traffic, social networks, and other areas. We consider application data best described as being generated by a Poisson process (e.g. count data), which leads to sparse tensors that can be modelled by sparse factor matrices. In this paper, we investigate efficient techniques for computing an appropriate canonical polyadic tensor factorization based on the Kullback-Leibler divergence function. We propose novel subproblem solvers within the standard alternating block variable approach. Our new methods exploit structure and reformulate the optimization problem as small independent subproblems. We employ bound-constrained Newton and quasi-Newton methods. We compare our algorithms against other codes, demonstrating superior speed for high accuracy results and the ability to quickly find sparse solutions.

More Details

TYPE Journal Article YEAR 2015

Scopus OSTI DOI

A brief summary on formalizing parallel tensor distributions redistributions and algorithm derivations

Schatz, Martin D.; Kolda, Tamara G.; van de Geijn, Robert v.

Large-scale datasets in computational chemistry typically require distributed-memory parallel methods to perform a special operation known as tensor contraction. Tensors are multidimensional arrays, and a tensor contraction is akin to matrix multiplication with special types of permutations. Creating an efficient algorithm and optimized im- plementation in this domain is complex, tedious, and error-prone. To address this, we develop a notation to express data distributions so that we can apply use automated methods to find optimized implementations for tensor contractions. We consider the spin-adapted coupled cluster singles and doubles method from computational chemistry and use our methodology to produce an efficient implementation. Experiments per- formed on the IBM Blue Gene/Q and Cray XC30 demonstrate impact both improved performance and reduced memory consumption.

More Details

TYPE SAND Report YEAR 2015

OSTI DOI

Numerical optimization for symmetric tensor decomposition

Mathematical Programming

A scalable generative graph model with community structure

SIAM Journal on Scientific Computing

Kolda, Tamara G.; Pinar, Ali; Plantenga, Todd P.; Comandur, Seshadhri C.

Network data is ubiquitous and growing, yet we lack realistic generative network models that can be calibrated to match real-world data. The recently proposed block two-level Erd?os- Rényi (BTER) model can be tuned to capture two fundamental properties: degree distribution and clustering coefficients. The latter is particularly important for reproducing graphs with community structure, such as social networks. In this paper, we compare BTER to other scalable models and show that it gives a better fit to real data. We provide a scalable implementation that requires only O(dmax) storage, where dmax is the maximum number of neighbors for a single node. The generator is trivially parallelizable, and we show results for a Hadoop MapReduce implementation for modeling a real-worldWeb graph with over 4.6 billion edges. We propose that the BTER model can be used as a graph generator for benchmarking purposes and provide idealized degree distributions and clustering coefficient profiles that can be tuned for user specifications.

More Details

TYPE Journal Article YEAR 2014

Scopus OSTI

An adaptive shifted power method for computing generalized tensor eigenpairs

SIAM Journal on Matrix Analysis and Applications

Kolda, Tamara G.; Mayo, Jackson M.

Several tensor eigenpair definitions have been put forth in the past decade, but these can all be unified under generalized tensor eigenpair framework, introduced by Chang, Pearson, and Zhang [J. Math. Anal. Appl., 350 (2009), pp. 416-422]. Given mth-order, n-dimensional realvalued symmetric tensors A and B, the goal is to find λ ε ℝ and x ε ℝn, x ≠= 0 such that Axm-1 = λBxm-1. Different choices for B yield different versions of the tensor eigenvalue problem. We present our generalized eigenproblem adaptive power (GEAP) method for solving the problem, which is an extension of the shifted symmetric higher-order power method (SS-HOPM) for finding Z-eigenpairs. A major drawback of SS-HOPM is that its performance depended on choosing an appropriate shift, but our GEAP method also includes an adaptive method for choosing the shift automatically.

More Details

TYPE Journal Article YEAR 2014

Scopus OSTI DOI

Newton-based Optimization for Nonnegative Tensor Factorizations

Optimization Methods and Software

Proposed for publication in ArXiV, and then SIAM J on Matrix Analysis and Applications.

Plantenga, Todd P.; Kolda, Tamara G.

Abstract not provided.

More Details

TYPE Journal Article YEAR 2013

OSTI

Task, Christine T.; Pinar, Ali P.; Kolda, Tamara G.

More Details

TYPE Presentation YEAR 2012

OSTI

Physical Review E - Statistical, Nonlinear, and Soft Matter Physics

Kolda, Tamara G.; Dixon, Kevin R.; Kegelmeyer, William P.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Nonnegative Tensor Factorizations for Sparse Count Data

Kolda, Tamara G.

Abstract not provided.

More Details

TYPE Presentation YEAR 2011

OSTI

The Block Two-Level Erdos-Renyi (BTER) Graph Model

Kolda, Tamara G.; Comandur, Seshadhri C.; Pinar, Ali P.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Nonnegative Tensor Factorizations for Sparse Count Data

Kolda, Tamara G.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

An In-Depth Study of Stochastic Kronecker Graphs

Journal of ACM

Resolving the sign ambiguity in the singular value decomposition

Journal of Chemometrics

Bro, R.; Acar, E.; Kolda, Tamara G.

Many modern data analysis methods involve computing a matrix singular value decomposition (SVD) or eigenvalue decomposition (EVD). Principal component analysis is the time-honored example, but more recent applications include latent semantic indexing (LSI), hypertext induced topic selection (HITS), clustering, classification, etc. Though the SVD and EVD are well established and can be computed via state-of-the-art algorithms, it is not commonly mentioned that there is an intrinsic sign indeterminacy that can significantly impact the conclusions and interpretations drawn from their results. Here we provide a solution to the sign ambiguity problem and show how it leads to more sensible solutions. Copyright © 2008 John Wiley amp; Sons, Ltd.

More Details

TYPE SAND Report YEAR 2008

Scopus OSTI DOI

Cross-language information retrieval using PARAFAC2

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Advances in Water Resources

Kegelmeyer, William P.; Kolda, Tamara G.

Abstract not provided.

More Details

TYPE Presentation YEAR 2006

OSTI

Kolda, Tamara G.

More Details

TYPE Conference YEAR 2006

OSTI

Heroux, Michael A.; Kolda, Tamara G.; Long, Kevin R.; Hoekstra, Robert J.; Pawlowski, Roger P.; Phipps, Eric T.; Salinger, Andrew G.; Williams, Alan B.; Heroux, Michael A.; Hu, Jonathan J.; Lehoucq, Richard B.; Thornquist, Heidi K.; Tuminaro, Raymond S.; Willenbring, James M.; Bartlett, Roscoe B.; Howle, Victoria E.

The Trilinos Project is an effort to facilitate the design, development, integration and ongoing support of mathematical software libraries. In particular, our goal is to develop parallel solver algorithms and libraries within an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific applications. Our emphasis is on developing robust, scalable algorithms in a software framework, using abstract interfaces for flexible interoperability of components while providing a full-featured set of concrete classes that implement all abstract interfaces. Trilinos uses a two-level software structure designed around collections of packages. A Trilinos package is an integral unit usually developed by a small team of experts in a particular algorithms area such as algebraic preconditioners, nonlinear solvers, etc. Packages exist underneath the Trilinos top level, which provides a common look-and-feel, including configuration, documentation, licensing, and bug-tracking. Trilinos packages are primarily written in C++, but provide some C and Fortran user interface support. We provide an open architecture that allows easy integration with other solver packages and we deliver our software to the outside community via the Gnu Lesser General Public License (LGPL). This report provides an overview of Trilinos, discussing the objectives, history, current development and future plans of the project.

More Details

TYPE Report YEAR 2003

OSTI DOI