Publications

Publications / Conference

Streaming malware classification in the presence of concept drift and class imbalance

Kegelmeyer, William P.; Chiang, Ken C.; Ingram, Joey

Malware, or malicious software, is capable of performing any action or command that can be expressed in code and is typically used for illicit activities, such as e-mail spamming, corporate espionage, and identity theft. Most organizations rely on anti-virus software to identifymalware, which typically utilize signatures that can only identify previously-seen malware instances. We consider the detection ofmalware executables that are downloaded in streaming network data as a supervised machine learning problem. Using malwaredata collected over multiple years, we characterize the effect of concept drift and class imbalance on batch and streaming decision tree ensembles. In particular, we illustrate a surprising vulnerability generated by precisely the aspect of streaming methods that seemed most likely to help them, when compared to batch methods. © 2013 IEEE.