Publications

Results 1–25 of 39

Using advanced data structures to enable responsive security monitoring

Cluster Computing

Vorobyeva, Janet; Delayo, Daniel R.; Bender, Michael A.; Farach-Colton, Martín; Pandey, Prashant; Phillips, Cynthia A.; Singh, Shikha; Thomas, Eric D.; Kroeger, Thomas M.

Write-optimized data structures (WODS), offer the potential to keep up with cyberstream event rates and give sub-second query response for key items like IP addresses. These data structures organize logs as the events are observed. To work in a real-world environment and not fill up the disk, WODS must efficiently expire older events. As the basis for our research into organizing security monitoring data, we implemented a tool, called Diventi, to index IP addresses in connection logs using RocksDB (a write-optimized LSM tree). We extended Diventi to automatically expire data as part of the data structures’ normal operations. We guarantee that Diventi always tracks the N most recent events and tracks no more than N+ k events for a parameter k< N, while ensuring the index is opportunistically pruned. To test Diventi at scale in a controlled environment, we used anonymized traces of IP communications collected at SuperComputing 2019. We synthetically extended the 2.4 billion connection events to 100 billion events. We tested Diventi vs. Elasticsearch, a common log indexing tool. In our test environment, Elasticsearch saw an ingestion rate of at best 37,000 events/s while Diventi sustained ingestion rates greater than 171,000 events/s. Our query response times were as much as 100 times faster, typically answering queries in under 80 ms. Furthermore, we saw no noticeable degradation in Diventi from expiration. We have deployed Diventi for many months where it has performed well and supported new security analysis capabilities.

More Details

TYPE Journal Article YEAR 2022

Scopus OSTI DOI

Advanced Data Structures for Monitoring Cyber Streams

Bender, Michael A.; Berry, Jonathan W.; Farach-Colton, Martin F.; Johnson, Rob J.; Kroeger, Thomas M.; Pandey, Prashant P.; Phillips, Cynthia A.; Singh, Shikha S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Timely Reporting of Heavy Hitters Using External Memory

ACM Transactions on Database Systems

Singh, Shikha; Pandey, Prashant; Bender, Michael A.; Berry, Jonathan W.; Farach-Colton, Martín; Johnson, Rob; Kroeger, Thomas M.; Phillips, Cynthia A.

Given an input stream S of size N, a φ-heavy hitter is an item that occurs at least φN times in S. The problem of finding heavy-hitters is extensively studied in the database literature.We study a real-time heavy-hitters variant in which an element must be reported shortly after we see its T = φN-th occurrence (and hence it becomes a heavy hitter). We call this the Timely Event Detection (TED) Problem. The TED problem models the needs of many real-world monitoring systems, which demand accurate (i.e., no false negatives) and timely reporting of all events from large, high-speed streams with a low reporting threshold (high sensitivity).Like the classic heavy-hitters problem, solving the TED problem without false-positives requires large space (ω (N) words). Thus in-RAM heavy-hitters algorithms typically sacrifice accuracy (i.e., allow false positives), sensitivity, or timeliness (i.e., use multiple passes).We show how to adapt heavy-hitters algorithms to external memory to solve the TED problem on large high-speed streams while guaranteeing accuracy, sensitivity, and timeliness. Our data structures are limited only by I/O-bandwidth (not latency) and support a tunable tradeoff between reporting delay and I/O overhead. With a small bounded reporting delay, our algorithms incur only a logarithmic I/O overhead.We implement and validate our data structures empirically using the Firehose streaming benchmark. Multi-threaded versions of our structures can scale to process 11M observations per second before becoming CPU bound. In comparison, a naive adaptation of the standard heavy-hitters algorithm to external memory would be limited by the storage device's random I/O throughput, i.e., ≈100K observations per second.

More Details

TYPE Journal Article YEAR 2021

Scopus OSTI DOI

Advanced Data Algorithms & Architectures for Security Monitoring

Kroeger, Thomas M.; Wright, Brian J.; Phillips, Cynthia A.; Thomas, Eric D.

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2020

OSTI DOI

Timely Reporting of Heavy Hitters using External Memory

Proceedings of the ACM SIGMOD International Conference on Management of Data

Pandey, Prashant; Singh, Shikha; Bender, Michael A.; Berry, Jonathan W.; Farach-Colton, Martín; Johnson, Rob; Kroeger, Thomas M.; Phillips, Cynthia A.

Given an input stream of size N, a †-heavy hitter is an item that occurs at least † N times in S. The problem of finding heavy-hitters is extensively studied in the database literature. We study a real-time heavy-hitters variant in which an element must be reported shortly after we see its T = † N-th occurrence (and hence becomes a heavy hitter). We call this the Timely Event Detection (TED) Problem. The TED problem models the needs of many real-world monitoring systems, which demand accurate (i.e., no false negatives) and timely reporting of all events from large, high-speed streams, and with a low reporting threshold (high sensitivity). Like the classic heavy-hitters problem, solving the TED problem without false-positives requires large space (ω(N) words). Thus in-RAM heavy-hitters algorithms typically sacrifice accuracy (i.e., allow false positives), sensitivity, or timeliness (i.e., use multiple passes). We show how to adapt heavy-hitters algorithms to external memory to solve the TED problem on large high-speed streams while guaranteeing accuracy, sensitivity, and timeliness. Our data structures are limited only by I/O-bandwidth (not latency) and support a tunable trade-off between reporting delay and I/O overhead. With a small bounded reporting delay, our algorithms incur only a logarithmic I/O overhead. We implement and validate our data structures empirically using the Firehose streaming benchmark. Multi-threaded versions of our structures can scale to process 11M observations per second before becoming CPU bound. In comparison, a naive adaptation of the standard heavy-hitters algorithm to external memory would be limited by the storage device's random I/O throughput, i.e., ∼100K observations per second.

More Details

TYPE Conference Poster YEAR 2020

Scopus OSTI DOI

Quantifying Uncertainty in Emulations: LDRD Report

Crussell, Jonathan C.; Brown, Aaron B.; Jennings, Jeremy K.; Kavaler, David; Kroeger, Thomas M.; Phillips, Cynthia A.

This report summarizes the work performed under the project "Quantifying Uncertainty in Emulations." Emulation can be used to model real-world systems, typically using virtual- ization to run the real software on virtualized hardware. Emulations are increasingly used to answer mission-oriented questions, but how well they represent the real-world systems is still an open area of research. The goal of the project was to quantify where and how emulations differ from the real world. To do so, we ran a representative workload on both, and collected and compared metrics to identify differences. We aimed to capture behaviors, rather than performance, differences as the latter is more well-understood in the literature. This report summarizes the project's major accomplishments, with the background to understand these accomplishments. It gathers the abstracts and references for the refereed publications that have appeared as part of this work. We then archive partial work not yet ready for publication. 1 Principal Investigator 2 Remaining authors ordered alphabetically by last name

More Details

TYPE SAND Report YEAR 2019

OSTI DOI

Data Architecture for Security Monitoring -- Project Summary

Kroeger, Thomas M.

Abstract not provided.

More Details

TYPE SAND Report YEAR 2019

OSTI DOI

Advanced Data Algorithms & Architectures for Security Monitoring

Kroeger, Thomas M.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Lessons Learned from 10k Experiments to Compare Virtual and Physical Testbeds

Crussell, Jonathan C.; Kroeger, Thomas M.; Kavaler, David; Brown, Aaron B.; Phillips, Cynthia A.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Virtually the Same: Comparing Physical and Virtual Testbeds

2019 International Conference on Computing, Networking and Communications, ICNC 2019

Crussell, Jonathan C.; Kroeger, Thomas M.; Brown, Aaron B.; Phillips, Cynthia A.

Network designers, planners, and security professionals increasingly rely on large-scale testbeds based on virtualization to emulate networks and make decisions about real-world deployments. However, there has been limited research on how well these virtual testbeds match their physical counterparts. Specifically, does the virtualization that these testbeds depend on actually capture real-world behaviors sufficiently well to support decisions?As a first step, we perform simple experiments on both physical and virtual testbeds to begin to understand where and how the testbeds differ. We set up a web service on one host and run ApacheBench against this service from a different host, instrumenting each system during these tests.We define an initial repeatable methodology (algorithm) to quantitatively compare physical and virtual testbeds. Specifically we compare the testbeds at three levels of abstraction: application, operating system (OS) and network. For the application level, we use the ApacheBench results. For OS behavior, we compare patterns of system call orderings using Markov chains. This provides a unique visual representation of the workload and OS behavior in our testbeds. We also drill down into read-system-call behaviors and show how at one level both systems are deterministic and identical, but as we move up in abstractions that consistency declines. Finally, we use packet captures to compare network behaviors and performance. We reconstruct flows and compare per-flow and per-experiment statistics.From these comparisons, we find that the behavior of the workload in the testbeds is similar but that the underlying processes to support it do vary. The low-level network behavior can vary quite widely in packetization depending on the virtual network driver. While these differences can be important, and knowing about them will help experiment designers, the core application and OS behaviors still represent similar processes.

More Details

TYPE Conference Poster YEAR 2019

Scopus OSTI DOI

Tracking Network Events with Write Optimized Data Structures

Kroeger, Thomas M.; Raizes, Justin L.; West, Evan T.; Wright, Brian J.; Phillips, Cynthia A.; Berry, Jonathan W.; Johnson, Rob J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Tracking Network Events with Write Optimized Data Structures

Kroeger, Thomas M.; West, Evan T.; Raizes, Justin L.; Phillips, Cynthia A.; Berry, Jonathan W.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Virtually the Same: Comparing Physical and Virtual Testbeds

Crussell, Jonathan C.; Kroeger, Thomas M.; Brown, Aaron B.; Phillips, Cynthia A.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI DOI

Tracking Network Events with Write Optimized Data Structures

Kroeger, Thomas M.; Raizes, Justin L.; Phillips, Cynthia A.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Virtually the Same? The Empirical Differences Between Physical and Virtual Networks

Crussell, Jonathan C.; Kroeger, Thomas M.; Brown, Aaron B.; Phillips, Cynthia A.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Locally Operated Cooperative Key Sharing (LOCKS) Sharing Session Keys to Enable Deep Packet Inspection

Kroeger, Thomas M.; Bierma, Michael B.; Poston, Howard E.; Delano, Troy E.; Brown, Aaron B.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Locally Operated Cooperative Key Sharing (LOCKS) Sharing Session Keys to Enable Deep Packet IDS

Kroeger, Thomas M.; Bierma, Michael B.; Brown, Aaron B.; Delano, Troy E.; Poston, Howard E.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Secure Distributed Membership Tests via Secret Sharing

Kroeger, Thomas M.; Donoghue, Nolan P.; Zage, David J.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Locally Operated Cooperative Key Sharing

Bierma, Michael B.; Kroeger, Thomas M.; Delano, Troy E.; Brown, Aaron B.

More Details

TYPE Conference Poster YEAR 2017

OSTI DOI

RESAR: Reliable storage at exabyte scale

Proceedings - 2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2016

Schwarz, Thomas; Amer, Ahmed; Kroeger, Thomas M.; Miller, Ethan; Long, Darrell; Paris, Jehan F.

Stored data needs to be protected against device failure and irrecoverable sector read errors, yet doing so at exabyte scale can be challenging given the large number of failures that must be handled. We have developed RESAR (Robust, Efficient, Scalable, Autonomous, Reliable) storage, an approach to storage system redundancy that only uses XOR-based parity and employs a graph to lay out data and parity. The RESAR layout offers greater robustness and higher flexibility for repair at the same overhead as a declustered version of RAID 6. For instance, a RESAR-based layout with 16 data disklets per stripe has about 50 times lower probability of suffering data loss in the presence of a fixed number of failures than a corresponding RAID 6 organization. RESAR uses a layer of virtual storage elements to achieve better manageability, a broader potential for energy savings, as well as easier adoption of heterogeneous storage devices.

More Details

TYPE Conference Poster YEAR 2016

Scopus OSTI DOI

Anti-persistence on persistent storage: History-independent sparse tables and dictionaries

Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems

Bender, Michael A.; Berry, Jonathan W.; Johnson, Rob; Kroeger, Thomas M.; McCauley, Samuel; Phillips, Cynthia A.; Simon, Bertrand; Singh, Shikha; Zage, David J.

We present history-independent alternatives to a B-tree, the primary indexing data structure used in databases. A data structure is history independent (HI) if it is impossible to deduce any information by examining the bit representation of the data structure that is not already available through the API. We show how to build a history-independent cache-oblivious B-tree and a history-independent external-memory skip list. One of the main contributions is a data structure we build on the way - a history-independent packed-memory array (PMA). The PMA supports efficient range queries, one of the most important operations for answering database queries. Our HI PMA matches the asymptotic bounds of prior non-HI packed-memory arrays and sparse tables. Specifically, a PMA maintains a dynamic set of elements in sorted order in a linearsized array. Inserts and deletes take an amortized O(log2 N) element moves with high probability. Simple experiments with our implementation of HI PMAs corroborate our theoretical analysis. Comparisons to regular PMAs give preliminary indications that the practical cost of adding history-independence is not too large. Our HI cache-oblivious B-tree bounds match those of prior non-HI cache-oblivious B-trees. Searches take O(logB N) I/Os; inserts and deletes take O(log2N/B + logB N) amortized I/Os with high probability; and range queries returning k elements take O(logB N + k/B) I/Os. Our HI external-memory skip list achieves optimal bounds with high probability, analogous to in-memory skip lists: O(logB N) I/Os for point queries and amortized O(logB N) I/Os for inserts/deletes. Range queries returning k elements run in O(logB N + k/B) I/Os. In contrast, the best possible high-probability bounds for inserting into the folklore B-skip list, which promotes elements with probability 1/B, is just Θ(log N) I/Os. This is no better than the bounds one gets from running an inmemory skip list in external memory.

More Details

TYPE Conference Poster YEAR 2016

Scopus OSTI DOI

Secure distributed membership tests via secret sharing: How to hide your hostile hosts: Harnessing shamir secret sharing

2016 International Conference on Computing, Networking and Communications, ICNC 2016

Zage, David J.; Xu, Helen; Kroeger, Thomas M.; Hahn, Bridger; Donoghue, Nolan P.; Benson, Thomas R.

More Details

TYPE Conference Poster YEAR 2016

Scopus OSTI

TWIAD the Write-Optimized IP Address Database

Kroeger, Thomas M.; Xu, Helen; Donoghue, Nolan P.; Hahn, Bridger; Zage, David J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Track Network Events with Write Optimized Data Structures

Kroeger, Thomas M.; Xu, Helen; Donoghue, Nolan P.; Hahn, Bridger; Zage, David J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Secure Distributed Set Membership through Secret Sharing

Kroeger, Thomas M.; Zage, David J.; Phillips, Cynthia A.; Saia, J S.; Benson, T.R.B.

More Details

TYPE Presentation YEAR 2015

OSTI

Results 1–25 of 39

Results 1–25 of 39