Publications Search

Tensor decompositions for count data that leverage stochastic and deterministic optimization

Optimization Methods and Software

There is growing interest to extend low-rank matrix decompositions to multi-way arrays, or tensors. One fundamental low-rank tensor decomposition is the canonical polyadic decomposition (CPD). The challenge of fitting a low-rank, nonnegative CPD model to Poisson-distributed count data is of particular interest. Several popular algorithms use local search methods to approximate the maximum likelihood estimator (MLE) of the Poisson CPD model. This work presents two new algorithms that extend state-of-the-art local methods for Poisson CPD. Hybrid GCP-CPAPR combines Generalized Canonical Decomposition (GCP) with stochastic optimization and CP Alternating Poisson Regression (CPAPR), a deterministic algorithm, to increase the probability of converging to the MLE over either method used alone. Restarted CPAPR with SVDrop uses a heuristic based on the singular values of the CPD model unfoldings to identify convergence toward optimizers that are not the MLE and restarts within the feasible domain of the optimization problem, thus reducing overall computational cost when using a multi-start strategy. We provide empirical evidence that indicates our approaches outperform existing methods with respect to converging to the Poisson CPD MLE.

More Details

TYPE Other Report YEAR 2025

DOI OSTI Scopus

Computing Sparse Tensor Decompositions via Chapel and C++/MPI Interoperability without Intermediate I/O

Geronimo Anderson, Sean I.; Dunlavy, Daniel M.

We extend an existing approach for efficient use of shared mapped memory across Chapel and C++ for graph data stored as 1-D arrays to sparse tensor data stored using a combination of 2-D and 1-D arrays. We describe the specific extensions that provide use of shared mapped memory tensor data for a particular C++ tensor decomposition tool called GentenMPI. We then demonstrate our approach on several real-world datasets, providing timing results that illustrate minimal overhead incurred using this approach. Finally, we extend our work to improve memory usage and provide convenient random access to sparse shared mapped memory tensor elements in Chapel, while still being capable of leveraging high performance implementations of tensor algorithms in C++.

More Details

TYPE Other Report YEAR 2024

DOI OSTI

The Average Spectrum Norm and Near-Optimal Tensor Completion

Lopez, Oscar F.; Lehoucq, Rich; Llosa-Vite, Carlos; Prasadan, Arvind; Dunlavy, Daniel M.

We propose the average spectrum norm to study the minimum number of measurements required to approximate a multidimensional array (i.e., sample complexity) via low-rank tensor recovery. Our focus is on the tensor completion problem, where the aim is to estimate a multiway array using a subset of tensor entries corrupted by noise. Our average spectrum norm-based analysis provides near-optimal sample complexities, exhibiting dependence on the ambient dimensions and rank that do not suffer from exponential scaling as the order increases.

More Details

TYPE Other Report YEAR 2024

DOI OSTI

The Poisson Canonical Polyadic Tensor Model as a Latent-Variable Model

Llosa-Vite, Carlos; Dunlavy, Daniel M.; Lehoucq, Rich; Prasadan, Arvind; Lopez, Oscar F.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2024

DOI OSTI

Zero-truncated Poisson regression for sparse multiway count data corrupted by false zeros

Information and Inference

Dunlavy, Daniel M.; Lehoucq, Rich; Lopez, Oscar F.

We propose a novel statistical inference methodology for multiway count data that is corrupted by false zeros that are indistinguishable from true zero counts. Our approach consists of zero-truncating the Poisson distribution to neglect all zero values. This simple truncated approach dispenses with the need to distinguish between true and false zero counts and reduces the amount of data to be processed. Inference is accomplished via tensor completion that imposes low-rank tensor structure on the Poisson parameter space. Our main result shows that an N-way rank-R parametric tensor M ∈ (0, ∞)I×.....×I generating Poisson observations can be accurately estimated by zero-truncated Poisson regression from approximately IR2 log22(I) non-zero counts under the nonnegative canonical polyadic decomposition. Our result also quantifies the error made by zero-truncating the Poisson distribution when the parameter is uniformly bounded from below. Therefore, under a low-rank multiparameter model, we propose an implementable approach guaranteed to achieve accurate regression in under-determined scenarios with substantial corruption by false zeros. Several numerical experiments are presented to explore the theoretical results.

More Details

TYPE Journal Article YEAR 2023

DOI OSTI Scopus DOI OSTI Scopus

Recent Improvements in CP Poisson Tensor Algorithms

Myers, Jeremy M.; Dunlavy, Daniel M.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2023

DOI OSTI

Constrained Tucker Decompositions and Conservation Principles for Direct Numerical Simulation Data Compression

Dunlavy, Daniel M.; Phipps, Eric T.; Kolla, Hemanth; Shadid, John N.; Phillips, Edward

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2023

DOI OSTI

Computing Sparse Tensor Decompositions via Chapel and C++/MPI Interoperability without Intermediate I/O

Geronimo Anderson, Sean I.; Dunlavy, Daniel M.

Abstract not provided.

More Details

TYPE Conference Paper YEAR 2023

OSTI

Hybrid Methods for Tensor Decompositions that Leverage Stochastic and Deterministic Optimization

Myers, Jeremy M.; Dunlavy, Daniel M.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2023

DOI OSTI

Low-Rank Tensor Decompositions with Nonlinear Constraints for Conserving Quantities of Interest in Numerical Simulation Data Modeling

Dunlavy, Daniel M.; Kolla, Hemanth; Phipps, Eric T.; Shadid, John N.; Phillips, Edward

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2023

DOI OSTI

Tensor Decompositions using Stochastic and Deterministic Optimization

Myers, Jeremy M.; Dunlavy, Daniel M.

Abstract not provided.

More Details

TYPE Conference Display YEAR 2023

DOI OSTI

Poisson-response Tensor-on-Tensor Regression

Llosa-Vite, Carlos; Dunlavy, Daniel M.

Abstract not provided.

More Details

TYPE Conference Display YEAR 2023

DOI OSTI

Low-Rank Models for Tensor Count Data Corrupted by False Zeros

Lopez, Oscar F.; Dunlavy, Daniel M.; Lehoucq, Rich

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2023

DOI OSTI

Conserving Quantities of Interest in Low-Rank Tensor Decompositions of Numerical Simulation Data

Dunlavy, Daniel M.; Kolla, Hemanth; Phipps, Eric T.

Abstract not provided.

More Details

TYPE Conference Display YEAR 2023

DOI OSTI

Chapel/C++ Interoperability: Enabling Large-Scale Data Science without Costly I/O

Dunlavy, Daniel M.; Geronimo Anderson, Sean I.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

Generalized CP (GCP) Tensor Decompositions

Dunlavy, Daniel M.; Ballard, Grey; Kolda, Tamara G.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

Zero-Truncated Poisson Regression for Multiway Count Data

Lopez, Oscar; Dunlavy, Daniel M.; Lehoucq, Rich

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2022

DOI OSTI

Low-Rank Tensor Decompositions for Large Sparse Count Data

Dunlavy, Daniel M.; Lehoucq, Rich; Lopez, Oscar

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

A Hybrid Method for Tensor Decompositions that Leverages Stochastic and Deterministic Optimization

Myers, Jeremy M.; Dunlavy, Daniel M.

In this paper, we propose a hybrid method that uses stochastic and deterministic search to compute the maximum likelihood estimator of a low-rank count tensor with Poisson loss via state-of-theart local methods. Our approach is inspired by Simulated Annealing for global optimization and allows for fine-grain parameter tuning as well as adaptive updates to algorithm parameters. We present numerical results that indicate our hybrid approach can compute better approximations to the maximum likelihood estimator with less computation than the state-of-the-art methods by themselves.

More Details

TYPE Other Report YEAR 2022

DOI OSTI

Cyclic GCP-CPAPR Hybrid

Myers, Jeremy M.; Dunlavy, Daniel M.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

Zero-Truncated Poisson Tensor Decomposition for Sparse Count Data

Lopez, Oscar F.; Lehoucq, Rich; Dunlavy, Daniel M.

We propose a novel statistical inference paradigm for zero-inflated multiway count data that dispenses with the need to distinguish between true and false zero counts. Our approach ignores all zero entries and applies zero-truncated Poisson regression on the positive counts. Inference is accomplished via tensor completion that imposes low-rank structure on the Poisson parameter space. Our main result shows that an $\textit{N}$-way rank-R parametric tensor 𝓜 ϵ (0, ∞)^{$I$Χ∙∙∙Χ$I$} generating Poisson observations can be accurately estimated from approximately $IR^2 \text{log}^2_2(I)$ non-zero counts for a nonnegative canonical polyadic decomposition. Several numerical experiments are presented demonstrating that our zero-truncated paradigm is comparable to the ideal scenario where the locations of false zero counts are known $\textit{a priori}$.

More Details

TYPE Other Report YEAR 2022

DOI OSTI

Document Retrieval and Ranking using Similarity Graph Mean Hitting Times

Dunlavy, Daniel M.; Chew, Peter A.

We present a novel approach to information retrieval and document analysis based on graph analytic methods. Traditional information retrieval methods use a set of terms to define a query that is applied against a document corpus to identify the documents most related to those terms. In contrast, we define a query as a set of documents of interest and apply the query by computing mean hitting times between this set and all other documents on a document similarity graph abstraction of the semantic relationships between all pairs of documents. We present the steps of our approach along with a simple example application illustrating how this approach can be used to find documents related to two or more documents or topics of interest.

More Details

TYPE SAND Report YEAR 2021

DOI OSTI

Performance-Portable Sparse Tensor Decomposition Kernels on Emerging Parallel Architectures

Geronimo Anderson, Sean I.; Teranishi, Keita; Dunlavy, Daniel M.; Choi, Jee

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Using Computation Effectively for Scalable Poisson Tensor Factorization: Comparing Methods Beyond Computational Efficiency

Myers, Jeremy M.; Dunlavy, Daniel M.

Abstract not provided.

More Details

TYPE Conference Paper YEAR 2021

DOI OSTI

Performance-Portable Sparse Tensor Decomposition Kernels on Emerging Parallel Architectures

Geronimo Anderson, Sean I.; Teranishi, Keita; Dunlavy, Daniel M.; Choi, Jee

Abstract not provided.

More Details

TYPE Conference Paper YEAR 2021

OSTI

Questa High Performance Data Analytics

Dunlavy, Daniel M.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Performance Portability of an SpMV Kernel Across Scientific Computing and Data Science Applications

2021 IEEE High Performance Extreme Computing Conference Hpec 2021

Olivier, Stephen L.; Ellingwood, Nathan D.; Berry, Jonathan; Dunlavy, Daniel M.

Both the data science and scientific computing communities are embracing GPU acceleration for their most demanding workloads. For scientific computing applications, the massive volume of code and diversity of hardware platforms at supercomputing centers has motivated a strong effort toward performance portability. This property of a program, denoting its ability to perform well on multiple architectures and varied datasets, is heavily dependent on the choice of parallel programming model and which features of the programming model are used. In this paper, we evaluate performance portability in the context of a data science workload in contrast to a scientific computing workload, evaluating the same sparse matrix kernel on both. Among our implementations of the kernel in different performance-portable programming models, we find that many struggle to consistently achieve performance improvements using the GPU compared to simple one-line OpenMP parallelization on high-end multicore CPUs. We show one that does, and its performance approaches and sometimes even matches that of vendor-provided GPU math libraries.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI Scopus

Performance Portability of an SpMV Kernel Across Scientific Computing and Data Science Applications

2021 IEEE High Performance Extreme Computing Conference, HPEC 2021

Olivier, Stephen L.; Ellingwood, Nathan D.; Berry, Jonathan; Dunlavy, Daniel M.

Both the data science and scientific computing communities are embracing GPU acceleration for their most demanding workloads. For scientific computing applications, the massive volume of code and diversity of hardware platforms at supercomputing centers has motivated a strong effort toward performance portability. This property of a program, denoting its ability to perform well on multiple architectures and varied datasets, is heavily dependent on the choice of parallel programming model and which features of the programming model are used. In this paper, we evaluate performance portability in the context of a data science workload in contrast to a scientific computing workload, evaluating the same sparse matrix kernel on both. Among our implementations of the kernel in different performance-portable programming models, we find that many struggle to consistently achieve performance improvements using the GPU compared to simple one-line OpenMP parallelization on high-end multicore CPUs. We show one that does, and its performance approaches and sometimes even matches that of vendor-provided GPU math libraries.

More Details

TYPE Conference Paper YEAR 2021

DOI OSTI Scopus

Using Computation Effectively for Scalable Poisson Tensor Factorization: Comparing Methods beyond Computational Efficiency

2021 IEEE High Performance Extreme Computing Conference, HPEC 2021

Myers, Jeremy M.; Dunlavy, Daniel M.

Poisson Tensor Factorization (PTF) is an important data analysis method for analyzing patterns and relationships in multiway count data. In this work, we consider several algorithms for computing a low-rank PTF of tensors with sparse count data values via maximum likelihood estimation. Such an approach reduces to solving a nonlinear, non-convex optimization problem, which can leverage considerable parallel computation due to the structure of the problem. However, since the maximum likelihood estimator corresponds to the global minimizer of this optimization problem, it is important to consider how effective methods are at both leveraging this inherent parallelism as well as computing a good approximation to the global minimizer. In this work we present comparisons of multiple methods for PTF that illustrate the tradeoffs in computational efficiency and accurately computing the maximum likelihood estimator. We present results using synthetic and real-world data tensors to demonstrate some of the challenges when choosing a method for a given tensor.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI Scopus

Parameter Sensitivity Analysis of the SparTen High Performance Sparse Tensor Decomposition Software (Extended Analysis)

Myers, Jeremy M.; Dunlavy, Daniel M.; Teranishi, Keita; Hollman, David S.

Tensor decomposition models play an increasingly important role in modern data science applications. One problem of particular interest is fitting a low-rank Canonical Polyadic (CP) tensor decomposition model when the tensor has sparse structure and the tensor elements are nonnegative count data. SparTen is a high-performance C++ library which computes a low-rank decomposition using different solvers: a first-order quasi-Newton or a second-order damped Newton method, along with the appropriate choice of runtime parameters. Since default parameters in SparTen are tuned to experimental results in prior published work on a single real-world dataset conducted using MATLAB implementations of these methods, it remains unclear if the parameter defaults in SparTen are appropriate for general tensor data. Furthermore, it is unknown how sensitive algorithm convergence is to changes in the input parameter values. This report addresses these unresolved issues with large-scale experimentation on three benchmark tensor data sets. Experiments were conducted on several different CPU architectures and replicated with many initial states to establish generalized profiles of algorithm convergence behavior.

More Details

TYPE Other Report YEAR 2020

DOI OSTI

Using Neural Architecture Search for Improving Software Flaw Detection in Multimodal Deep Learning Models

Cooper, Alexis; Zhou, Xin; Heidbrink, Scott; Dunlavy, Daniel M.

Software flaw detection using multimodal deep learning models has been demonstrated as a very competitive approach on benchmark problems. In this work, we demonstrate that even better performance can be achieved using neural architecture search (NAS) combined with multimodal learning models. We adapt a NAS framework aimed at investigating image classification to the problem of software flaw detection and demonstrate improved results on the Juliet Test Suite, a popular benchmarking data set for measuring performance of machine learning models in this problem domain.

More Details

TYPE Other Report YEAR 2020

DOI OSTI

SparTen: Leveraging Kokkos for On-node Parallelism in a Second-Order Method for Fitting Canonical Polyadic Tensor Models to Poisson Data

2020 IEEE High Performance Extreme Computing Conference, HPEC 2020

Teranishi, Keita; Dunlavy, Daniel M.; Myers, Jeremy M.; Barrett, Richard F.

Canonical Polyadic tensor decomposition using alternate Poisson regression (CP-APR) is an effective analysis tool for large sparse count datasets. One of the variants using projected damped Newton optimization for row subproblems (PDNR) offers quadratic convergence and is amenable to parallelization. Despite its potential effectiveness, PDNR performance on modern high performance computing (HPC) systems is not well understood. To remedy this, we have developed a parallel implementation of PDNR using Kokkos, a performance portable parallel programming framework supporting efficient runtime of a single code base on multiple HPC systems. We demonstrate that the performance of parallel PDNR can be poor if load imbalance associated with the irregular distribution of nonzero entries in the tensor data is not addressed. Preliminary results using tensors from the FROSTT data set indicate that using multiple kernels to address this imbalance when solving the PDNR row subproblems in parallel can improve performance, with up to 80% speedup on CPUs and 10-fold speedup on NVIDIA GPUs.

More Details

TYPE Conference Poster YEAR 2020

OSTI Scopus

Parameter Sensitivity Analysis of the SparTen High Performance Sparse Tensor Decomposition Software

2020 IEEE High Performance Extreme Computing Conference, HPEC 2020

Myers, Jeremy M.; Dunlavy, Daniel M.; Teranishi, Keita; Hollman, David S.

Tensor decomposition models play an increasingly important role in modern data science applications. One problem of particular interest is fitting a low-rank Canonical Polyadic (CP) tensor decomposition model when the tensor has sparse structure and the tensor elements are nonnegative count data. SparTen is a high-performance C++ library which computes a low-rank decomposition using different solvers: a first-order quasi-Newton or a second-order damped Newton method, along with the appropriate choice of runtime parameters. Since default parameters in SparTen are tuned to experimental results in prior published work on a single real-world dataset conducted using MATLAB implementations of these methods, it remains unclear if the parameter defaults in SparTen are appropriate for general tensor data. Furthermore, it is unknown how sensitive algorithm convergence is to changes in the input parameter values. This report addresses these unresolved issues with large-scale experimentation on three benchmark tensor data sets. Experiments were conducted on several different CPU architectures and replicated with many initial states to establish generalized profiles of algorithm convergence behavior.

More Details

TYPE Conference Poster YEAR 2020

OSTI Scopus

Joint Analysis of Program Data Representations using Machine Learning for Improved Software Assurance and Development Capabilities

Heidbrink, Scott; Rodhouse, Kathryn N.; Dunlavy, Daniel M.; Cooper, Alexis; Zhou, Xin

We explore the use of multiple deep learning models for detecting flaws in software programs. Current, standard approaches for flaw detection rely on a single representation of a software program (e.g., source code or a program binary). We illustrate that, by using techniques from multimodal deep learning, we can simultaneously leverage multiple representations of software programs to improve flaw detection over single representation analyses. Specifically, we adapt three deep learning models from the multimodal learning literature for use in flaw detection and demonstrate how these models outperform traditional deep learning models. We present results on detecting software flaws using the Juliet Test Suite and Linux Kernel.

More Details

TYPE SAND Report YEAR 2020

DOI OSTI

Multimodal Deep Learning for Flaw Detection in Software Programs

Heidbrink, Scott; Rodhouse, Kathryn N.; Dunlavy, Daniel M.

We explore the use of multiple deep learning models for detecting flaws in software programs. Current, standard approaches for flaw detection rely on a single representation of a software program (e.g., source code or a program binary). We illustrate that, by using techniques from multimodal deep learning, we can simultaneously leverage multiple representations of software programs to improve flaw detection over single representation analyses. Specifically, we adapt three deep learning models from the multimodal learning literature for use in flaw detection and demonstrate how these models outperform traditional deep learning models. We present results on detecting software flaws using the Juliet Test Suite and Linux Kernel.

More Details

TYPE Other Report YEAR 2020

DOI OSTI

Using Neural Architecture Search for Improving Software Flaw Detection in Multimodal Deep Learning Models

Cooper, Alexis; Zhou, Xin; Dunlavy, Daniel M.; Heidbrink, Scott

Software flaw detection using multimodal deep learning models has been demonstrated as a very competitive approach on benchmark problems. In this work, we demonstrate that even better performance can be achieved using neural architecture search (NAS) combined with multimodal learning models. We adapt a NAS framework aimed at investigating image classification to the problem of software flaw detection and demonstrate improved results on the Juliet Test Suite, a popular benchmarking data set for measuring performance of machine learning models in this problem domain.

More Details

TYPE SAND Report YEAR 2020

DOI OSTI

Multimodal Deep Learning for Flaw Detection in Software Programs

Heidbrink, Scott; Dunlavy, Daniel M.; Rodhouse, Kathryn N.

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Tensor Decompositions for Analyzing Multi-Way Data

Dunlavy, Daniel M.

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

SparTen: Leveraging Kokkos for On-node Parallelism in a Second-Order Method for Fitting Canonical Polyadic Tensor Models to Poisson Data

Teranishi, Keita; Dunlavy, Daniel M.; Myers, Jeremy M.; Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

TOPIC MODELING WITH NATURAL LANGUAGE PROCESSING FOR IDENTIFICATION OF NUCLEAR PROLIFERATION-RELEVANT SCIENTIFIC AND TECHNICAL PUBLICATIONS

Bisila, Jonathan; Dunlavy, Daniel M.; Gastelum, Zoe N.; Ulmer, Craig

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Parameter Sensitivity Analysis of the SparTen High Performance Sparse Tensor Decomposition Software

Myers, Jeremy M.; Dunlavy, Daniel M.; Teranishi, Keita; Hollman, David S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

TOPIC MODELING WITH NATURAL LANGUAGE PROCESSING FOR IDENTIFICATION OF NUCLEAR PROLIFERATION-RELEVANT SCIENTIFIC AND TECHNICAL PUBLICATIONS

Bisila, Jonathan; Dunlavy, Daniel M.; Gastelum, Zoe N.; Ulmer, Craig

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Detecting Flaws in Software Programs using Multimodal Deep Learning

Heidbrink, Scott; Rodhouse, Kathryn N.; Dunlavy, Daniel M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Performance and Parallelization of CP-Alternate Poisson Regression Sparse Tensor Decomposition

Teranishi, Keita; Hollman, David S.; Myers, Jeremy M.; Dunlavy, Daniel M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Natural Language Processing for Topic Identification Supporting Document Search and Identification for Nuclear Proliferation Detection

Bisila, Jonathan; Dunlavy, Daniel M.; Gastelum, Zoe N.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Modeling Complex Relationships in Large-Scale Data using Hypergraphs (LDRD Final Report)

Dunlavy, Daniel M.; Wang, Fulton; Wolf, Michael; Ellingwood, Nathan D.

This SAND report documents the findings of the LDRD project, "Modeling Complex Relationships in Large-Scale Data using Hypergraphs". The project ran from October 2017 through September 2019. The focus of the project was the development and application of hypergraph data analytics to Sandia relational data applications. In this project, we attempted to apply a hypergraph data analysis method—specifically, hypergraph eigenvector centrality—to Sandia mission problems to identify influential entities (people, location, times, etc.) in the data. Unfortunately, the application data led to graph and hypergraph representations containing disconnected components. To date, there are no well-established techniques for applying eigenvector centrality to such graphs and hypergraphs. In this report, we present several heuristics for computing eigenvector centrality for disconnected graphs. We believe this is an important start to understanding how to approach the similar problem for hypergraphs, but this project concluded before we made progress on that problem. The ideas, methods, and suggestions presented here can be used for further research into this challenging problem. We also present our ideas for generating graphs with known degree and centrality distributions. The goal in presenting this work is to identify a procedure for analyzing such graphs once the problem of addressing disconnected components has been addressed. When working with a single data set, this generator can be used to create many instances of graphs that can be used to analyze the robustness of the centrality computations for the original data set. Although the results did not match perfectly in the case of the Facebook Ego dataset used in the experiments presented here, this again represents a good start in the direction of a graph generator for such problems. We note that there are potential trade-offs between how the degree and centrality distributions are fit to the original data and suggested several possible avenues for follow-on research efforts.

More Details

TYPE SAND Report YEAR 2019

DOI OSTI