Publications Search

MalGen: Malware Generation with Specific Behaviors to Improve Machine Learning-based Detectors

Smith, Michael R.; Carbajal, Armida J.; Domschot, Eva D.; Johnson, Nicholas J.; Goyal, Akul A.; Lamb, Christopher L.; Lubars, Joseph L.; Kegelmeyer, William P.; Krishnakumar, Raga K.; Quynn, Sophie Q.; Ramyaa, Ramyaa R.; Verzi, Stephen J.; Zhou, Xin Z.

In recent years, infections and damage caused by malware have increased at exponential rates. At the same time, machine learning (ML) techniques have shown tremendous promise in many domains, often out performing human efforts by learning from large amounts of data. Results in the open literature suggest that ML is able to provide similar results for malware detection, achieving greater than 99% classifcation accuracy [49]. However, the same detection rates when applied in deployed settings have not been achieved. Malware is distinct from many other domains in which ML has shown success in that (1) it purposefully tries to hide, leading to noisy labels and (2) often its behavior is similar to benign software only differing in intent, among other complicating factors. This report details the reasons for the diffcultly of detecting novel malware by ML methods and offers solutions to improve the detection of novel malware.

More Details

TYPE SAND Report YEAR 2022

OSTI DOI

Going Beyond Signature Malware Detection by Learning Behaviors

Johnson, Nicholas T.; Domschot, Eva D.; Khanna, Kanad K.; Kegelmeyer, William P.; Lamb, Christopher L.; Ramyaa, Ramyaa R.; Smith, Michael R.; Verzi, Stephen J.; Zhou, Xin Z.; Carbajal, Armida J.; Haus, Bridget H.; Ingram, Joey

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2021

OSTI DOI

Mind the Gap: On Bridging the Semantic Gap between Machine Learning and Malware Analysis

AISec 2020 - Proceedings of the 13th ACM Workshop on Artificial Intelligence and Security

Smith, Michael R.; Johnson, Nicholas T.; Ingram, Joey; Carbajal, Armida J.; Haus, Bridget I.; Domschot, Eva; Ramyaa, Ramyaa; Lamb, Christopher L.; Verzi, Stephen J.; Kegelmeyer, William P.

Machine learning (ML) techniques are being used to detect increasing amounts of malware and variants. Despite successful applications of ML, we hypothesize that the full potential of ML is not realized in malware analysis (MA) due to a semantic gap between the ML and MA communities-as demonstrated in the data that is used. Due in part to the available data, ML has primarily focused on detection whereas MA is also interested in identifying behaviors. We review existing open-source malware datasets used in ML and find a lack of behavioral information that could facilitate stronger impact by ML in MA. As a first step in bridging this gap, we label existing data with behavioral information using open-source MA reports-1) altering the analysis from identifying malware to identifying behaviors, 2)~aligning ML better with MA, and 3)~allowing ML models to generalize to novel malware in a zero/few-shot learning manner. We classify the behavior of a malware family not seen during training using transfer learning from a state-of-the-art model for malware family classification and achieve 57%-84% accuracy on behavioral identification but fail to outperform the baseline set by a majority class predictor. This highlights opportunities for improvement on this task related to the data representation, the need for malware specific ML techniques, and a larger training set of malware samples labeled with behaviors.

More Details

TYPE Conference Presenation YEAR 2020

Scopus OSTI DOI

Mind the Gap: On Bridging the Semantic Gap between Machine Learning and Malware Analysis

Smith, Michael R.; Johnson, Nicholas T.; Ingram, Joey; Carbajal, Armida J.; Haus, Bridget I.; Domschot, Eva D.; Ramyaa, Ramyaa R.; Lamb, Christopher L.; Verzi, Stephen J.; Kegelmeyer, William P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI DOI

MalGen: On Bridging the Semantic Gap between Machine Learning and Malware Analysis

Smith, Michael R.; Carbajal, Armida J.; Domschot, Eva D.; Haus, Bridget I.; Ingram, Joey; Johnson, Nicholas T.; Kegelmeyer, William P.; Lamb, Christopher L.; Ramyaa, Ramyaa R.; Verzi, Stephen J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Mind the Gap: On Bridging the Semantic Gap between Machine Learning and Information Security

Smith, Michael R.; Johnson, Nicholas T.; Ingram, Joey; Carbajal, Armida J.; Ramyaa, Ramyaa R.; Domschot, Evelyn D.; Lamb, Christopher L.; Verzi, Stephen J.; Kegelmeyer, William P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Focused Itemset Mining: Finding Anomalies in Code

Rodhouse, Kathryn N.; Kegelmeyer, William P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

In-Situ Machine Learning for Intelligent Data Capture on Exascale Platforms

Davis IV, Warren L.; Shead, Timothy M.; Kolla, Hemanth K.; Reed, Kevin R.; Kegelmeyer, William P.; Popoola, Gabriel A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

The Potential of Integrated Machine Learning Algorithms for Tropical Cyclone Detection in Advanced Climate Modeling

Davis IV, Warren L.; Shead, Timothy M.; Kolla, Hemanth K.; Popoola, Gabriel A.; Kegelmeyer, William P.; Konduri, Aditya K.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Data Analytics are Powerful -- Handle with Care

Wendt, Jeremy D.; Kegelmeyer, William P.; Pinar, Ali P.; Shead, Timothy M.; Saavedra, Gary J.; Safta, Cosmin S.; Bertino, Joseph B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

In-Situ Machine Learning for Intelligent Data Capture on Exascale Platforms

Davis, Warren L.; Shead, Timothy M.; Kolla, Hemanth K.; Kegelmeyer, William P.; Popoola, Gabriel A.; Reed, Kevin R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

A Framework for In-Situ Anomaly Detection in HPC Environments

Shead, Timothy M.; Dunlavy, Daniel D.; Kolla, Hemanth K.; Konduri, Aditya K.; Popoola, Gabriel A.; Davis, Warren L.; Kegelmeyer, William P.; Reed, Kevin R.; Ling, Julia L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

An Overview of Training Data Security Vulnerabilities: Machine Learning is a Leaky Black Box

Kegelmeyer, William P.; Wendt, Jeremy D.; Safta, Cosmin S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

An Example of Counter-Adversarial Community Detection Analysis

Kegelmeyer, William P.; Wendt, Jeremy D.; Pinar, Ali P.

Community detection is often used to understand the nature of a network. However, there may exist an adversarial member of the network who wishes to evade that understanding. We analyze one such specific situation, quantifying the efficacy of certain attacks against a particular analytic use of community detection and providing a preliminary assessment of a possible defense.

More Details

TYPE SAND Report YEAR 2018

OSTI DOI

Adverse Event Prediction Using Graph-Augmented Temporal Analysis: Final Report

Brost, Randolph B.; Carrier, Erin E.; Carroll, Michelle C.; Groth, Katrina M.; Kegelmeyer, William P.; Leung, Vitus J.; Link, Hamilton E.; Patterson, Andrew J.; Phillips, Cynthia A.; Richter, Samuel N.; Robinson, David G.; Staid, Andrea S.; Woodbridge, Diane M.-K.

This report summarizes the work performed under the Sandia LDRD project "Adverse Event Prediction Using Graph-Augmented Temporal Analysis." The goal of the project was to de- velop a method for analyzing multiple time-series data streams to identify precursors provid- ing advance warning of the potential occurrence of events of interest. The proposed approach combined temporal analysis of each data stream with reasoning about relationships between data streams using a geospatial-temporal semantic graph. This class of problems is relevant to several important topics of national interest. In the course of this work we developed new temporal analysis techniques, including temporal analysis using Markov Chain Monte Carlo techniques, temporal shift algorithms to refine forecasts, and a version of Ripley's K-function extended to support temporal precursor identification. This report summarizes the project's major accomplishments, and gathers the abstracts and references for the publication sub- missions and reports that were prepared as part of this work. We then describe work in progress that is not yet ready for publication.

More Details

TYPE SAND Report YEAR 2018

OSTI DOI

A Brief Survey of Adversarial Concerns in Machine Learning and Deep Learning

Kegelmeyer, William P.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

In-Situ Machine Learning for Intelligent Data Capture in HPC Simulations

Davis, Warren L.; Dunlavy, Daniel D.; Kegelmeyer, William P.; Kolla, Hemanth K.; Konduri, Aditya K.; Shead, Timothy M.; Reed, Kevin R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

EVENT DETECTION IN MULTI-VARIATE SCIENTIFIC SIMULATIONS USING FEATURE ANOMALY METRICS

Konduri, Aditya K.; Kolla, Hemanth K.; Ling, Julia L.; Kegelmeyer, William P.; Dunlavy, Daniel D.; Shead, Timothy M.; Davis, Warren L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Counter-Adversarial Node Labeling

Kegelmeyer, William P.; Wendt, Jeremy D.; Pinar, Ali P.; Anderson-Bergman, Clifford I.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Machine Learning Adversarial Label Tampering: Design and Detection

Kegelmeyer, William P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI