The Data Inferencing on Semantic Graphs project (DISeG) was a two-year investigation of inferencing techniques (focusing on belief propagation) to social graphs with a focus on semantic graphs (also called multi-layer graphs). While working this problem, we developed a new directed version of inferencing we call Directed Propagation (Chapters 2 and 4), identified new semantic graph sampling problems (Chapter 3).
Intuitively, nodes are actions that happen in a sequence. Thus, node 0 happens before node 1, node 1 happens before node 2, and so on. A link indicates that there is a relationship between two nodes, that is, (i, j) indicates that i influenced j and j was influenced by i.
We present history-independent alternatives to a B-tree, the primary indexing data structure used in databases. A data structure is history independent (HI) if it is impossible to deduce any information by examining the bit representation of the data structure that is not already available through the API. We show how to build a history-independent cache-oblivious B-tree and a history-independent external-memory skip list. One of the main contributions is a data structure we build on the way - a history-independent packed-memory array (PMA). The PMA supports efficient range queries, one of the most important operations for answering database queries. Our HI PMA matches the asymptotic bounds of prior non-HI packed-memory arrays and sparse tables. Specifically, a PMA maintains a dynamic set of elements in sorted order in a linearsized array. Inserts and deletes take an amortized O(log2 N) element moves with high probability. Simple experiments with our implementation of HI PMAs corroborate our theoretical analysis. Comparisons to regular PMAs give preliminary indications that the practical cost of adding history-independence is not too large. Our HI cache-oblivious B-tree bounds match those of prior non-HI cache-oblivious B-trees. Searches take O(logB N) I/Os; inserts and deletes take O(log2N/B + logB N) amortized I/Os with high probability; and range queries returning k elements take O(logB N + k/B) I/Os. Our HI external-memory skip list achieves optimal bounds with high probability, analogous to in-memory skip lists: O(logB N) I/Os for point queries and amortized O(logB N) I/Os for inserts/deletes. Range queries returning k elements run in O(logB N + k/B) I/Os. In contrast, the best possible high-probability bounds for inserting into the folklore B-skip list, which promotes elements with probability 1/B, is just Θ(log N) I/Os. This is no better than the bounds one gets from running an inmemory skip list in external memory.
In the realm of cyber security, recent events have demonstrated the need for a significant change in the philosophies guiding the identification and mitigation of attacks. The unprecedented increase in the quantity and sophistication of cyber attacks in the past year alone has proven the inadequacy of current defensive philosophies that do not assume continuous compromise. This has given rise to new perspectives on cyber defense where, instead of total prevention, threat intelligence is the crucial tool allowing the mitigation of cyber threats. This paper formalizes a new framework for obtaining threat intelligence from an active cyber attack and demonstrates the realization of this framework in the software tool, LinkShop. Specifically, using the behavioral analysis technique known as linkography, our framework allows cyber defenders to, in an automated fashion, quantitatively capture both general and nuanced patterns in attacker's behavior - pushing capabilities for generating threat intelligence far beyond what is currently possible with rudimentary indicators of compromise and into the realm of capability needed to combat future cyber attackers. Furthermore, this paper shows in detail how such knowledge can be achieved by using LinkShop on actual cyber event data and lays a foundation for further scientific investigation into cyber attacker behavior.
The Center for Strategic and International Studies estimates the annual cost from cyber crime to be more than $400 billion. Most notable is the recent digital identity thefts that compromised millions of accounts. These attacks emphasize the security problems of using clonable static information. One possible solution is the use of a physical device known as a Physically Unclonable Function (PUF). PUFs can be used to create encryption keys, generate random numbers, or authenticate devices. While the concept shows promise, current PUF implementations are inherently problematic: inconsistent behavior, expensive, susceptible to modeling attacks, and permanent. Therefore, we propose a new solution by which an unclonable, dynamic digital identity is created between two communication endpoints such as mobile devices. This Physically Unclonable Digital ID (PUDID) is created by injecting a data scrambling PUF device at the data origin point that corresponds to a unique and matching descrambler/hardware authentication at the receiving end. This device is designed using macroscopic, intentional anomalies, making them inexpensive to produce. PUDID is resistant to cryptanalysis due to the separation of the challenge response pair and a series of hash functions. PUDID is also unique in that by combining the PUF device identity with a dynamic human identity, we can create true two-factor authentication. We also propose an alternative solution that eliminates the need for a PUF mechanism altogether by combining tamper resistant capabilities with a series of hash functions. This tamper resistant device, referred to as a Quasi-PUDID (Q-PUDID), modifies input data, using a black-box mechanism, in an unpredictable way. By mimicking PUF attributes, Q-PUDID is able to avoid traditional PUF challenges thereby providing high-performing physical identity assurance with or without a low performing PUF mechanism. Three different application scenarios with mobile devices for PUDID and Q-PUDID have been analyzed to show their unique advantages over traditional PUFs and outline the potential for placement in a host of applications.
IEEE ISI 2013 - 2013 IEEE International Conference on Intelligence and Security Informatics: Big Data, Emergent Threats, and Decision-Making in Security Informatics
For critical infrastructure facilities, mitigation techniques for insider threats are primarily non-technical in nature and rely heavily on policies/procedures. Traditional access control measures (access cards, biometrics, PIN numbers, etc.) are built on a philosophy of trust that enables those with appropriate permissions to access facilities without additional monitoring or restrictions. Systems based on these measures have three main limitations: 1) access is typically bound to a single authentication occurrence; 2) the authentication factors have little impact against human (insider) threats to security systems; and 3) many of the authentication systems inconvenience end-users. In order to mitigate the aforementioned deficiencies, we propose utilizing the concept of Ephemeral Biometrics to construct strong, persistent authentication protocols.