Sandia LabNews

Using biomimicry to detect outbreaks faster


Using biomimicry to detect outbreaks faster

Sandia collaborating with UNM, CDC to improve biosurveillance

Our immune systems are made up of billions of white blood cells searching for signs of infections and foreign invaders, ready to raise the alarm.

Sandia computer scientists Pat Finley and Drew Levin have been working to improve the U.S. biosurveillance system that alerts authorities to disease outbreaks by mimicking the human immune system. They are working with researchers at the University of New Mexico and the Centers for Disease Control and Prevention.

The CDC coordinates the National Syndromic Surveillance Program. It collects anonymized data from most emergency departments around the nation and analyzes public health indicators to speed up the response to hazardous events and disease outbreaks.

“The national biosurveillance system serves essentially the same purpose as the human immune system, just on a larger scale,” said Drew, who started working on the project as a UNM graduate student and was hired by Sandia to continue his work after he graduated. “The immune system is made up of numerous T-cells that all operate independently. There’s no centralized controller and yet we do pretty well not dying.”

The CDC uses traditional statistical analyses to look for anomalies, such as a large or sudden increase in ER visits and determine the likelihood of an outbreak. These algorithms are based on reliable, decades-old math but usually only look at one variable at a time, said Drew.

The faster an emerging outbreak is detected, the more lives are saved; however, flagging non-outbreaks can waste resources.

Pat said the biosurveillance system has the dual challenge of detecting new outbreaks of old diseases, such as seasonal influenza, as well as outbreaks of new diseases, such as the next Zika virus, a very tough problem. However, it is a problem the immune system has been working on for millions of years.

Synthetic T-cells monitor multiple variables for nuanced alerts 

T-cells are a type of white blood cell that recognize and kill virus-infected cells and other foreign pathogens. They are “trained” to focus on these invaders through a process where every T-cell that attacks normal body cells is destroyed. Other than this initial negative-selection “training,” there’s no central “brain” telling the T-cells where to go or what to look for.

Pat thought that mimicking how T-cells work might speed up outbreak detection. In 2015, he began collaborating with immune system modeling experts at UNM as part of Sandia’s Academic Alliance program. The Academic Alliance is a partnership Sandia has built with five universities to promote collaborative research on tough problems and attract top talent to work on these challenges.

“The adaptive immune system in vertebrates is one of the most complex systems in biology with trillions of cells, dozens of cell types and signaling molecules,” said Melanie Moses, a UNM professor of computer science and biology involved in the project. “Through computer modeling and simulation, we understand how the immune system works which, in the long term, can lead to improved immuno-therapies, allergy treatments and vaccines. It also provides inspiration for the design of other decentralized systems for surveillance and protection.”

Working together, they created synthetic, mathematical “T-cells” that look at multiple different variables at the same time, such as number of clinic visits, day of the year and intake temperature. Then, mimicking the T-cell negative selection process, Drew ran the synthetic T-cell algorithms against past data collected by the CDC and New Mexico Department of Health. He compared the algorithms and selected the most accurate.

In 2016, initial tests on a pilot-scale biosurveillance system showed that Drew’s synthetic T-cells performed better than the traditional statistical methods, said Pat. Also, because they track multiple variables intrinsically, they could provide more nuanced alerts, such as separating an outbreak of a new disease from seasonal influenza, he said.

Brain-inspired machine learning improves chief complaint deciphering

The first piece of data the CDC receives from each emergency room visit is called the chief complaint. This is a concise statement describing why a patient has gone to the emergency room or clinic, before they’ve seen a doctor and have been diagnosed. Chief complaints range from “chest pain” and “fever three days” to specialized abbreviations.

Biosurveillance Topology

Current biosurveillance topology
Lymph node biosurveillance topology

These terse statements are full of medical jargon and even misspelled words, making them difficult to decipher by simple keyword searches or by the inexperienced. Also, many words describe the same symptoms, such as fever, hot, temperature and chills.

Technology companies have been using deep learning for similar natural-language processing problems. Deep learning is brain-inspired machine learning that excels at finding patterns without being explicitly programmed on what to look for. One such algorithm, called Word2vec, converts the context of words into mathematical vectors.

When Drew ran the Word2vec algorithm on anonymized chief complaint data collected by the New Mexico Department of Health, it out-performed a standard keyword search, as well as other state-of-the-art machine learning algorithms. However, it still had troubles with misspelled words and abbreviations.

To work around this, Drew tried two related neural network algorithms: one that converts letters into vectors and another that converts words into random vectors. The algorithm that converted words into random, or untrained, vectors was most accurate. The Word2vec algorithm was trained on standard, non-medical prose, and makes antonyms such as hot and cold too similar mathematically speaking, which could be why it didn’t produce the best results, said Drew.

Though more optimization is needed, the team’s deep-learning algorithm for deciphering chief complaints could be particularly useful for the opioid epidemic, said Pat. He added, “New terms for street drugs tend to appear much more quickly than the public health community realizes. If we find that a weird word is popping up a lot in an area, it could be a new variety of fentanyl.”

Future of distributed biosurveillance centers 

Lymph nodes are distributed throughout the body and act as immune system hubs, chock full of T-cells and the B-cells that produce antibodies to fight off infections. 

Pat and his team are just beginning to explore how mimicking lymph nodes might improve the biosurveillance system. Pat believes it would be particularly helpful in detecting outbreaks of regional diseases like Lyme diseaseplague and Hantavirus. Also, distributed detection algorithms could be more efficient by bypassing the physical and power consumption limits that Moore’s Law computers are now running up against, added Drew.

“We are working closely with the CDC to test a number of our deep learning approaches on a subset of the national data flow,” said Pat.

His goal is to have his bio-inspired system set up by October to allow side-by-side comparisons with the traditional statistical methods at the national scale. He believes the different approaches will have different strengths, and combining them will improve the speed and accuracy of outbreak detection.

This research was funded by Sandia’s Laboratory Directed Research and Development program. Sandia computer scientists Walt Beyeler and Michael Mitchell and UNM postdoctoral fellow Tatiana Flanagan also worked on the project, focusing on the lymph system-mimicking distributed detection algorithm.

“This project with Sandia has provided us with an opportunity to test the practical application of the concepts we’ve learned from our models,” said Melanie. “Ultimately, this project will lead to a more complete understanding of the immune system, as well as a practical way to quickly identify and respond to disease outbreaks and other biological threats.”