Publications

Results 151–175 of 199
Skip to search filters

Link prediction on evolving graphs using matrix and tensor factorizations

Kolda, Tamara G.; Dunlavy, Daniel D.

More Details

Scalable tensor factorizations with missing data

Dunlavy, Daniel D.; Kolda, Tamara G.

The problem of missing data is ubiquitous in domains such as biomedical signal processing, network traffic analysis, bibliometrics, social network analysis, chemometrics, computer vision, and communication networks|all domains in which data collection is subject to occasional errors. Moreover, these data sets can be quite large and have more than two axes of variation, e.g., sender, receiver, time. Many applications in those domains aim to capture the underlying latent structure of the data; in other words, they need to factorize data sets with missing entries. If we cannot address the problem of missing data, many important data sets will be discarded or improperly analyzed. Therefore, we need a robust and scalable approach for factorizing multi-way arrays (i.e., tensors) in the presence of missing data. We focus on one of the most well-known tensor factorizations, CANDECOMP/PARAFAC (CP), and formulate the CP model as a weighted least squares problem that models only the known entries. We develop an algorithm called CP-WOPT (CP Weighted OPTimization) using a first-order optimization approach to solve the weighted least squares problem. Based on extensive numerical experiments, our algorithm is shown to successfully factor tensors with noise and up to 70% missing data. Moreover, our approach is significantly faster than the leading alternative and scales to larger problems. To show the real-world usefulness of CP-WOPT, we illustrate its applicability on a novel EEG (electroencephalogram) application where missing data is frequently encountered due to disconnections of electrodes.

More Details

Poblano v1.0 : a Matlab toolbox for gradient-based optimization

Dunlavy, Daniel D.; Kolda, Tamara G.

We present Poblano v1.0, a Matlab toolbox for solving gradient-based unconstrained optimization problems. Poblano implements three optimization methods (nonlinear conjugate gradients, limited-memory BFGS, and truncated Newton) that require only first order derivative information. In this paper, we describe the Poblano methods, provide numerous examples on how to use Poblano, and present results of Poblano used in solving problems from a standard test collection of unconstrained optimization problems.

More Details

HOPSPACK: Hybrid Optimization Parallel Search Package

Kolda, Tamara G.

In this paper, we describe the technical details of HOPSPACK (Hybrid Optimization Parallel SearchPackage), a new software platform which facilitates combining multiple optimization routines into asingle, tightly-coupled, hybrid algorithm that supports parallel function evaluations. The frameworkis designed such that existing optimization source code can be easily incorporated with minimalcode modification. By maintaining the integrity of each individual solver, the strengths and codesophistication of the original optimization package are retained and exploited.4

More Details

Multilinear algebra for analyzing data with multiple linkages

Dunlavy, Daniel D.; Kolda, Tamara G.; Kegelmeyer, William P.

Link analysis typically focuses on a single type of connection, e.g., two journal papers are linked because they are written by the same author. However, often we want to analyze data that has multiple linkages between objects, e.g., two papers may have the same keywords and one may cite the other. The goal of this paper is to show that multilinear algebra provides a tool for multilink analysis. We analyze five years of publication data from journals published by the Society for Industrial and Applied Mathematics. We explore how papers can be grouped in the context of multiple link types using a tensor to represent all the links between them. A PARAFAC decomposition on the resulting tensor yields information similar to the SVD decomposition of a standard adjacency matrix. We show how the PARAFAC decomposition can be used to understand the structure of the document space and define paper-paper similarities based on multiple linkages. Examples are presented where the decomposed tensor data is used to find papers similar to a body of work (e.g., related by topic or similar to a particular author's papers), find related authors using linkages other than explicit co-authorship or citations, distinguish between papers written by different authors with the same name, and predict the journal in which a paper was published.

More Details

CPOPT : optimization for fitting CANDECOMP/PARAFAC models

Kolda, Tamara G.; Acar Ataman, Evrim N.; Dunlavy, Daniel D.

Tensor decompositions (e.g., higher-order analogues of matrix decompositions) are powerful tools for data analysis. In particular, the CANDECOMP/PARAFAC (CP) model has proved useful in many applications such chemometrics, signal processing, and web analysis; see for details. The problem of computing the CP decomposition is typically solved using an alternating least squares (ALS) approach. We discuss the use of optimization-based algorithms for CP, including how to efficiently compute the derivatives necessary for the optimization methods. Numerical studies highlight the positive features of our CPOPT algorithms, as compared with ALS and Gauss-Newton approaches.

More Details

Resolving the sign ambiguity in the singular value decomposition

Journal of Chemometrics

Bro, R.; Acar, E.; Kolda, Tamara G.

Many modern data analysis methods involve computing a matrix singular value decomposition (SVD) or eigenvalue decomposition (EVD). Principal component analysis is the time-honored example, but more recent applications include latent semantic indexing (LSI), hypertext induced topic selection (HITS), clustering, classification, etc. Though the SVD and EVD are well established and can be computed via state-of-the-art algorithms, it is not commonly mentioned that there is an intrinsic sign indeterminacy that can significantly impact the conclusions and interpretations drawn from their results. Here we provide a solution to the sign ambiguity problem and show how it leads to more sensible solutions. Copyright © 2008 John Wiley amp; Sons, Ltd.

More Details

Cross-language information retrieval using PARAFAC2

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Chew, Peter A.; Bader, Brett W.; Kolda, Tamara G.; Abdelali, Ahmed

A standard approach to cross-language information retrieval (CLIR) uses Latent Semantic Analysis (LSA) in conjunction with a multilingual parallel aligned corpus. This approach has been shown to be successful in identifying similar documents across languages - or more precisely, retrieving the most similar document in one language to a query in another language. However, the approach has severe drawbacks when applied to a related task, that of clustering documents 'language independently', so that documents about similar topics end up closest to one another in the semantic space regardless of their language. The problem is that documents are generally more similar to other documents in the same language than they are to documents in a different language, but on the same topic. As a result, when using multilingual LSA, documents will in practice cluster by language, not by topic. We propose a novel application of PARAFAC2 (which is a variant of PARAFAC, a multi-way generalization of the singular value decomposition [SVD]) to overcome this problem. Instead of forming a single multilingual term-by-document matrix which, under LSA, is subjected to SVD, we form an irregular three-way array, each slice of which is a separate term-by-document matrix for a single language in the parallel corpus. The goal is to compute an SVD for each language such that V (the matrix of right singular vectors) is the same across all languages. Effectively, PARAFAC2 imposes the constraint, not present in standard LSA, that the 'concepts' in all documents in the parallel corpus are the same regardless of language. Intuitively, this constraint makes sense, since the whole purpose of using a parallel corpus is that exactly the same concepts are expressed in the translations. We tested this approach by comparing the performance of PARAFAC2 with standard LSA in solving a particular CLIR problem. From our results, we conclude that PARAFAC2 offers a very promising alternative to LSA not only for multilingual document clustering, but also for solving other problems in crosslanguage information retrieval. © 2007 ACM.

More Details
Results 151–175 of 199
Results 151–175 of 199