Publications

45 Results
Skip to search filters

MCS+: An Efficient Algorithm for Crawling the Community Structure in Multiplex Networks

ACM Transactions on Knowledge Discovery from Data

Laishram, Ricky; Wendt, Jeremy D.; Soundarajan, Sucheta

In this article, we consider the problem of crawling a multiplex network to identify the community structure of a layer-of-interest. A multiplex network is one where there are multiple types of relationships between the nodes. In many multiplex networks, some layers might be easier to explore (in terms of time, money etc.). We propose MCS+, an algorithm that can use the information from the easier to explore layers to help in the exploration of a layer-of-interest that is expensive to explore. We consider the goal of exploration to be generating a sample that is representative of the communities in the complete layer-of-interest. This work has practical applications in areas such as exploration of dark (e.g., criminal) networks, online social networks, biological networks, and so on. For example, in a terrorist network, relationships such as phone records, e-mail records, and so on are easier to collect; in contrast, data on the face-To-face communications are much harder to collect, but also potentially more valuable. We perform extensive experimental evaluations on real-world networks, and we observe that MCS+ consistently outperforms the best baseline-the similarity of the sample that MCS+ generates to the real network is up to three times that of the best baseline in some networks. We also perform theoretical and experimental evaluations on the scalability of MCS+ to network properties, and find that it scales well with the budget, number of layers in the multiplex network, and the average degree in the original network.

More Details

NetProtect: Network Perturbations to Protect Nodes against Entry-Point Attack

ACM International Conference Proceeding Series

Laishram, Ricky; Hozhabrierdi, Pegah; Wendt, Jeremy D.; Soundarajan, Sucheta

In many network applications, it may be desirable to conceal certain target nodes from detection by a data collector, who is using a crawling algorithm to explore a network. For example, in a computer network, the network administrator may wish to protect those computers (target nodes) with sensitive information from discovery by a hacker who has exploited vulnerable machines and entered the network. These networks are often protected by hiding the machines (nodes) from external access, and allow only fixed entry points into the system (protection against external attacks). However, in this protection scheme, once one of the entry points is breached, the safety of all internal machines is jeopardized (i.e., the external attack turns into an internal attack). In this paper, we view this problem from the perspective of the data protector. We propose the Node Protection Problem: given a network with known entry points, which edges should be removed/added so as to protect as many target nodes from the data collector as possible? A trivial way to solve this problem would be to simply disconnect either the entry points or the target nodes - but that would make the network non-functional. Accordingly, we impose certain constraints: for each node, only (1 - r) fraction of its edges can be removed, and the resulting network must not be disconnected. We propose two novel scoring mechanisms - the Frequent Path Score and the Shortest Path Score. Using these scores, we propose NetProtect, an algorithm that selects edges to be removed or added so as to best impede the progress of the data collector. We show experimentally that NetProtect outperforms baseline node protection algorithms across several real-world networks. In some datasets, With 1% of the edges removed by NetProtect, we found that the data collector requires up to 6 (4) times the budget compared to the next best baseline in order to discover 5 (50) nodes.

More Details

Crawling the community structure of multiplex networks

33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019

Laishram, Ricky; Wendt, Jeremy D.; Soundarajan, Sucheta

We examine the problem of crawling the community structure of a multiplex network containing multiple layers of edge relationships. While there has been a great deal of work examining community structure in general, and some work on the problem of sampling a network to preserve its community structure, to the best of our knowledge, this is the first work to consider this problem on multiplex networks. We consider the specific case in which the layers of a multiplex network have different query (collection) costs and reliabilities; and a data collector is interested in identifying the community structure of the most expensive layer. We propose MultiComSample (MCS), a novel algorithm for crawling a multiplex network. MCS uses multiple levels of multi-armed bandits to determine the best layers, communities and node roles for selecting nodes to query. We test MCS against six baseline algorithms on real-world multiplex networks, and achieved large gains in performance. For example, after consuming a budget equivalent to sampling 20% of the nodes in the expensive layer, we observe that MCS outperforms the best baseline by up to 49%.

More Details

Efficient transfer learning for neural network language models

Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2018

Skryzalin, Jacek S.; Link, Hamilton E.; Wendt, Jeremy D.; Field, Richard V.; Richter, Samuel N.

We apply transfer learning techniques to create topically and/or stylistically biased natural language models from small data samples, given generic long short-term memory (LSTM) language models trained on larger data sets. Although LSTM language models are powerful tools with wide-ranging applications, they require enormous amounts of data and time to train. Thus, we build general purpose language models that take advantage of large standing corpora and computational resources proactively, allowing us to build more specialized analytical tools from smaller data sets on demand. We show that it is possible to construct a language model from a small, focused corpus by first training an LSTM language model on a large corpus (e.g., the text from English Wikipedia) and then retraining only the internal transition model parameters on the smaller corpus. We also show that a single general language model can be reused through transfer learning to create many distinct special purpose language models quickly with modest amounts of data.

More Details

An Example of Counter-Adversarial Community Detection Analysis

Kegelmeyer, William P.; Wendt, Jeremy D.; Pinar, Ali P.

Community detection is often used to understand the nature of a network. However, there may exist an adversarial member of the network who wishes to evade that understanding. We analyze one such specific situation, quantifying the efficacy of certain attacks against a particular analytic use of community detection and providing a preliminary assessment of a possible defense.

More Details

A dynamic model for social networks

Field, Richard V.; Link, Hamilton E.; Skryzalin, Jacek S.; Wendt, Jeremy D.

Social network graph models are data structures representing entities (often people, corpora- tions, or accounts) as "vertices" and their interactions as "edges" between pairs of vertices. These graphs are most often total-graph models -- the overall structure of edges and vertices in a bidirectional or directional graph are described in global terms and the network is gen- erated algorithmically. We are interested in "egocentrie or "agent-based" models of social networks where the behavior of the individual participants are described and the graph itself is an emergent phenomenon. Our hope is that such graph models will allow us to ultimately reason from observations back to estimated properties of the individuals and populations, and result in not only more accurate algorithms for link prediction and friend recommen- dation, but also a more intuitive understanding of human behavior in such systems than is revealed by previous approaches. This report documents our preliminary work in this area; we describe several past graph models, two egocentric models of our own design, and our thoughts about the future direction of this research.

More Details

Estimating users’ mode transition functions and activity levels from social media

Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017

Link, Hamilton E.; Wendt, Jeremy D.; Field, Richard V.; Marthe, Jocelyn

We present a temporal model of individual-scale social media user behavior, comprising modal activity levels and mode switching patterns. We show that this model can be effectively and easily learned from available social media data, and that our model is sufficiently flexible to capture diverse users’ daily activity patterns. In applications such as electric power load prediction, computer network traffic analysis, disease spread modeling, and disease outbreak forecasting, it is useful to have a model of individual-scale patterns of human behavior. Our user model is intended to be suitable for integration into such population models, for future applications of prediction, change detection, or agent-based simulation.

More Details

Data Inferencing on Semantic Graphs (DISeG) Final Report

Wendt, Jeremy D.; Quach, Tu-Thach Q.; Zage, David J.; Field, Richard V.; Wells, Randall W.; Soundarajan, Sucheta S.; Cruz, Gerardo C.

The Data Inferencing on Semantic Graphs project (DISeG) was a two-year investigation of inferencing techniques (focusing on belief propagation) to social graphs with a focus on semantic graphs (also called multi-layer graphs). While working this problem, we developed a new directed version of inferencing we call Directed Propagation (Chapters 2 and 4), identified new semantic graph sampling problems (Chapter 3).

More Details

Workshop on Incomplete Network Data Held at Sandia National Labs – Livermore

Soundarajan, Sucheta S.; Wendt, Jeremy D.

While network analysis is applied in a broad variety of scientific fields (including physics, computer science, biology, and the social sciences), how networks are constructed and the resulting bias and incompleteness have drawn more limited attention. For example, in biology, gene networks are typically developed via experiment -- many actual interactions are likely yet to be discovered. In addition to this incompleteness, the data-collection processes can introduce significant bias into the observed network datasets. For instance, if you observe part of the World Wide Web network through a classic random walk, then high degree nodes are more likely to be found than if you had selected nodes at random. Unfortunately, such incomplete and biasing data collection methods must be often used.

More Details

A diffusion model for maximizing influence spread in large networks

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Quach, Tu-Thach Q.; Wendt, Jeremy D.

Influence spread is an important phenomenon that occurs in many social networks. Influence maximization is the corresponding problem of finding the most influential nodes in these networks. In this paper, we present a new influence diffusion model, based on pairwise factor graphs, that captures dependencies and directions of influence among neighboring nodes.We use an augmented belief propagation algorithm to efficiently compute influence spread on this model so that the direction of influence is preserved. Due to its simplicity, the model can be used on large graphs with high-degree nodes, making the influence maximization problem practical on large, real-world graphs. Using large Flixster and Epinions datasets, we provide experimental results showing that our model predictions match well with ground-truth influence spreads, far better than other techniques. Furthermore, we show that the influential nodes identified by our model achieve significantly higher influence spread compared to other popular models. The model parameters can easily be learned from basic, readily available training data. In the absence of training, our approach can still be used to identify influential seed nodes.

More Details

Benchmarking Adiabatic Quantum Optimization for Complex Network Analysis

Parekh, Ojas D.; Wendt, Jeremy D.; Shulenburger, Luke N.; Landahl, Andrew J.; Moussa, Jonathan E.; Aidun, John B.

We lay the foundation for a benchmarking methodology for assessing current and future quantum computers. We pose and begin addressing fundamental questions about how to fairly compare computational devices at vastly different stages of technological maturity. We critically evaluate and offer our own contributions to current quantum benchmarking efforts, in particular those involving adiabatic quantum computation and the Adiabatic Quantum Optimizers produced by D-Wave Systems, Inc. We find that the performance of D-Wave's Adiabatic Quantum Optimizers scales roughly on par with classical approaches for some hard combinatorial optimization problems; however, architectural limitations of D-Wave devices present a significant hurdle in evaluating real-world applications. In addition to identifying and isolating such limitations, we develop algorithmic tools for circumventing these limitations on future D-Wave devices, assuming they continue to grow and mature at an exponential rate for the next several years.

More Details

Incremental learning for automated knowledge capture

Davis, Warren L.; Dixon, Kevin R.; Martin, Nathaniel M.; Wendt, Jeremy D.

People responding to high-consequence national-security situations need tools to help them make the right decision quickly. The dynamic, time-critical, and ever-changing nature of these situations, especially those involving an adversary, require models of decision support that can dynamically react as a situation unfolds and changes. Automated knowledge capture is a key part of creating individualized models of decision making in many situations because it has been demonstrated as a very robust way to populate computational models of cognition. However, existing automated knowledge capture techniques only populate a knowledge model with data prior to its use, after which the knowledge model is static and unchanging. In contrast, humans, including our national-security adversaries, continually learn, adapt, and create new knowledge as they make decisions and witness their effect. This artificial dichotomy between creation and use exists because the majority of automated knowledge capture techniques are based on traditional batch machine-learning and statistical algorithms. These algorithms are primarily designed to optimize the accuracy of their predictions and only secondarily, if at all, concerned with issues such as speed, memory use, or ability to be incrementally updated. Thus, when new data arrives, batch algorithms used for automated knowledge capture currently require significant recomputation, frequently from scratch, which makes them ill suited for use in dynamic, timecritical, high-consequence decision making environments. In this work we seek to explore and expand upon the capabilities of dynamic, incremental models that can adapt to an ever-changing feature space.

More Details

Evaluating Near-Term Adiabatic Quantum Computing

Parekh, Ojas D.; Aidun, John B.; Dubicka, Irene D.; Landahl, Andrew J.; Shulenburger, Luke N.; Tigges, Chris P.; Wendt, Jeremy D.

This report summarizes the first year’s effort on the Enceladus project, under which Sandia was asked to evaluate the potential advantages of adiabatic quantum computing for analyzing large data sets in the near future, 5-to-10 years from now. We were not specifically evaluating the machine being sold by D-Wave Systems, Inc; we were asked to anticipate what future adiabatic quantum computers might be able to achieve. While realizing that the greatest potential anticipated from quantum computation is still far into the future, a special purpose quantum computing capability, Adiabatic Quantum Optimization (AQO), is under active development and is maturing relatively rapidly; indeed, D-Wave Systems Inc. already offers an AQO device based on superconducting flux qubits. The AQO architecture solves a particular class of problem, namely unconstrained quadratic Boolean optimization. Problems in this class include many interesting and important instances. Because of this, further investigation is warranted into the range of applicability of this class of problem for addressing challenges of analyzing big data sets and the effectiveness of AQO devices to perform specific analyses on big data. Further, it is of interest to also consider the potential effectiveness of anticipated special purpose adiabatic quantum computers (AQCs), in general, for accelerating the analysis of big data sets. The objective of the present investigation is an evaluation of the potential of AQC to benefit analysis of big data problems in the next five to ten years, with our main focus being on AQO because of its relative maturity. We are not specifically assessing the efficacy of the D-Wave computing systems, though we do hope to perform some experimental calculations on that device in the sequel to this project, at least to provide some data to compare with our theoretical estimates.

More Details

Omen: identifying potential spear-phishing targets before the email is sent

Wendt, Jeremy D.

We present the results of a two year project focused on a common social engineering attack method called "spear phishing". In a spear phishing attack, the user receives an email with information specifically focused on the user. This email contains either a malware-laced attachment or a link to download the malware that has been disguised as a useful program. Spear phishing attacks have been one of the most effective avenues for attackers to gain initial entry into a target network. This project focused on a proactive approach to spear phishing. To create an effective, user-specific spear phishing email, the attacker must research the intended recipient. We believe that much of the information used by the attacker is provided by the target organization's own external website. Thus when researching potential targets, the attacker leaves signs of his research in the webserver's logs. We created tools and visualizations to improve cybersecurity analysts' abilities to quickly understand a visitor's visit patterns and interests. Given these suspicious visitors and log-parsing tools, analysts can more quickly identify truly suspicious visitors, search for potential spear-phishing targeted users, and improve security around those users before the spear phishing email is sent.

More Details

Reliable forward walking parameters from head-track data alone

Proceedings - IEEE Virtual Reality

Wendt, Jeremy D.; Whitton, Mary C.; Adalsteinsson, David; Brooks, Frederick P.

Head motion during real walking is complex: The basic translational path is obscured by head bobbing. Many VE applications would be improved if a bobbing-free path were available. This paper introduces a model that describes head position while walking in terms of a bobbing free path and the head bobs. We introduce two methods to approximate the model from head-track data. © 2012 IEEE.

More Details

Trusted Computing Technologies, Intel Trusted Execution Technology

Wendt, Jeremy D.; Guise, Max G.

We describe the current state-of-the-art in Trusted Computing Technologies - focusing mainly on Intel's Trusted Execution Technology (TXT). This document is based on existing documentation and tests of two existing TXT-based systems: Intel's Trusted Boot and Invisible Things Lab's Qubes OS. We describe what features are lacking in current implementations, describe what a mature system could provide, and present a list of developments to watch. Critical systems perform operation-critical computations on high importance data. In such systems, the inputs, computation steps, and outputs may be highly sensitive. Sensitive components must be protected from both unauthorized release, and unauthorized alteration: Unauthorized users should not access the sensitive input and sensitive output data, nor be able to alter them; the computation contains intermediate data with the same requirements, and executes algorithms that the unauthorized should not be able to know or alter. Due to various system requirements, such critical systems are frequently built from commercial hardware, employ commercial software, and require network access. These hardware, software, and network system components increase the risk that sensitive input data, computation, and output data may be compromised.

More Details
45 Results
45 Results