This research explores novel methods for extracting relevant information from EEG data to characterize individual differences in cognitive processing. Our approach combines expertise in machine learning, statistics, and cognitive science, advancing the state-of-the art in all three domains. Specifically, by using cognitive science expertise to interpret results and inform algorithm development, we have developed a generalizable and interpretable machine learning method that can accurately predict individual differences in cognition. The output of the machine learning method revealed surprising features of the EEG data that, when interpreted by the cognitive science experts, provided novel insights to the underlying cognitive task. Additionally, the outputs of the statistical methods show promise as a principled approach to quickly find regions within the EEG data where individual differences lie, thereby supporting cognitive science analysis and informing machine learning models. This work lays methodological ground work for applying the large body of cognitive science literature on individual differences to high consequence mission applications.
With machine learning (ML) technologies rapidly expanding to new applications and domains, users are collaborating with artificial intelligence-assisted diagnostic tools to a larger and larger extent. But what impact does ML aid have on cognitive performance, especially when the ML output is not always accurate? Here, we examined the cognitive effects of the presence of simulated ML assistance—including both accurate and inaccurate output—on two tasks (a domain-specific nuclear safeguards task and domain-general visual search task). Patterns of performance varied across the two tasks for both the presence of ML aid as well as the category of ML feedback (e.g., false alarm). These results indicate that differences such as domain could influence users’ performance with ML aid, and suggest the need to test the effects of ML output (and associated errors) in the specific context of use, especially when the stimuli of interest are vague or ill-defined
Eye tracking is a useful tool for studying human cognition, both in the laboratory and in real-world applications. However, there are cases in which eye tracking is not possible, such as in high-security environments where recording devices cannot be introduced. After facing this challenge in our own work, we sought to test the effectiveness of using artificial foveation as an alternative to eye tracking for studying visual search performance. Two groups of participants completed the same list comparison task, which was a computer-based task designed to mimic an inventory verification process that is commonly performed by international nuclear safeguards inspectors. We manipulated the way in which the items on the inventory list were ordered and color coded. For the eye tracking group, an eye tracker was used to assess the order in which participants viewed the items and the number of fixations per trial in each list condition. For the artificial foveation group, the items were covered with a blurry mask except when participants moused over them. We tracked the order in which participants viewed the items by moving their mouse and the number of items viewed per trial in each list condition. We observed the same overall pattern of performance for the various list display conditions, regardless of the method. However, participants were much slower to complete the task when using artificial foveation and had more variability in their accuracy. Our results indicate that the artificial foveation method can reveal the same pattern of differences across conditions as eye tracking, but it can also impact participants’ task performance.
In this project, our goal was to develop methods that would allow us to make accurate predictions about individual differences in human cognition. Understanding such differences is important for maximizing human and human-system performance. There is a large body of research on individual differences in the academic literature. Unfortunately, it is often difficult to connect this literature to applied problems, where we must predict how specific people will perform or process information. In an effort to bridge this gap, we set out to answer the question: can we train a model to make predictions about which people understand which languages? We chose language processing as our domain of interest because of the well- characterized differences in neural processing that occur when people are presented with linguistic stimuli that they do or do not understand. Although our original plan to conduct several electroencephalography (EEG) studies was disrupted by the COVID-19 pandemic, we were able to collect data from one EEG study and a series of behavioral experiments in which data were collected online. The results of this project indicate that machine learning tools can make reasonably accurate predictions about an individual?s proficiency in different languages, using EEG data or behavioral data alone.
This report details the results of a three-fold investigation of sensitivity analysis (SA) for machine learning (ML) explainability (MLE): (1) the mathematical assessment of the fidelity of an explanation with respect to a learned ML model, (2) quantifying the trustworthiness of a prediction, and (3) the impact of MLE on the efficiency of end-users through multiple users studies. We focused on the cybersecurity domain as the data is inherently non-intuitive. As ML is being using in an increasing number of domains, including domains where being wrong can elicit high consequences, MLE has been proposed as a means of generating trust in a learned ML models by end users. However, little analysis has been performed to determine if the explanations accurately represent the target model and they themselves should be trusted beyond subjective inspection. Current state-of-the-art MLE techniques only provide a list of important features based on heuristic measures and/or make certain assumptions about the data and the model which are not representative of the real-world data and models. Further, most are designed without considering the usefulness by an end-user in a broader context. To address these issues, we present a notion of explanation fidelity based on Shapley values from cooperative game theory. We find that all of the investigated MLE explainability methods produce explanations that are incongruent with the ML model that is being explained. This is because they make critical assumptions about feature independence and linear feature interactions for computational reasons. We also find that in deployed, explanations are rarely used due to a variety of reason including that there are several other tools which are trusted more than the explanations and there is little incentive to use the explanations. In the cases when the explanations are used, we found that there is the danger that explanations persuade the end users to wrongly accept false positives and false negatives. However, ML model developers and maintainers find the explanations more useful to help ensure that the ML model does not have obvious biases. In light of these findings, we suggest a number of future directions including developing MLE methods that directly model non-linear model interactions and including design principles that take into account the usefulness of explanations to the end user. We also augment explanations with a set of trustworthiness measures that measure geometric aspects of the data to determine if the model output should be trusted.
Due to their recent increases in performance, machine learning and deep learning models are being increasingly adopted across many domains for visual processing tasks. One such domain is international nuclear safeguards, which seeks to verify the peaceful use of commercial nuclear energy across the globe. Despite recent impressive performance results from machine learning and deep learning algorithms, there is always at least some small level of error. Given the significant consequences of international nuclear safeguards conclusions, we sought to characterize how incorrect responses from a machine or deep learning-assisted visual search task would cognitively impact users. We found that not only do some types of model errors have larger negative impacts on human performance than other errors, the scale of those impacts change depending on the accuracy of the model with which they are presented and they persist in scenarios of evenly distributed errors and single-error presentations. Further, we found that experiments conducted using a common visual search dataset from the psychology community has similar implications to a safeguards- relevant dataset of images containing hyperboloid cooling towers when the cooling tower images are presented to expert participants. While novice performance was considerably different (and worse) on the cooling tower task, we saw increased novice reliance on the most challenging cooling tower images compared to experts. These findings are relevant not just to the cognitive science community, but also for developers of machine and deep learning that will be implemented in multiple domains. For safeguards, this research provides key insights into how machine and deep learning projects should be implemented considering their special requirements that information not be missed.
University partnerships play an essential role in sustaining Sandia’s vitality as a national laboratory. The SAA is an element of Sandia’s broader University Partnerships program, which facilitates recruiting and research collaborations with dozens of universities annually. The SAA program has two three-year goals. SAA aims to realize a step increase in hiring results, by growing the total annual inexperienced hires from each out-of-state SAA university. SAA also strives to establish and sustain strategic research partnerships by establishing several federally sponsored collaborations and multi-institutional consortiums in science & technology (S&T) priorities such as autonomy, advanced computing, hypersonics, quantum information science, and data science. The SAA program facilitates access to talent, ideas, and Research & Development facilities through strong university partnerships. Earlier this year, the SAA program and campus executives hosted John Myers, Sandia’s former Senior Director of Human Resources (HR) and Communications, and senior-level staff at Georgia Tech, U of Illinois, Purdue, UNM, and UT Austin. These campus visits provided an opportunity to share the history of the partnerships from the university leadership, tours of research facilities, and discussions of ongoing technical work and potential recruiting opportunities. These visits also provided valuable feedback to HR management that will help Sandia realize a step increase in hiring from SAA schools. The 2020-2021 Collaboration Report is a compilation of accomplishments in 2020 and 2021 from SAA and Sandia’s valued SAA university partners.
As the ability to collect and store data grows, so does the need to efficiently analyze that data. As human-machine teams that use machine learning (ML) algorithms as a way to inform human decision-making grow in popularity it becomes increasingly critical to understand the optimal methods of implementing algorithm assisted search. In order to better understand how algorithm confidence values associated with object identification can influence participant accuracy and response times during a visual search task, we compared models that provided appropriate confidence, random confidence, and no confidence, as well as a model biased toward over confidence and a model biased toward under confidence. Results indicate that randomized confidence is likely harmful to performance while non-random confidence values are likely better than no confidence value for maintaining accuracy over time. Providing participants with appropriate confidence values did not seem to benefit performance any more than providing participants with under or over confident models.
The impact of machine learning (ML) explanations and different attributes of explanations on human performance was investigated in a simulated spam detection task. Participants decided whether the metadata presented about an email indicated that it was spam or benign. The task was completed with the aid of a ML model. The ML model’s prediction was displayed on every trial. The inclusion of an explanation and, if an explanation was presented, attributes of the explanation were manipulated within subjects: the number of model input features (3, 7) and visualization of feature importance values (graph, table), as was trial type (i.e., hit, false alarm). Overall model accuracy (50% vs 88%) was manipulated between subjects, and user trust in the model was measured as an individual difference metric. Results suggest that a user’s trust in the model had the largest impact on the decision process. The users showed better performance with a more accurate model, but no differences in accuracy based on number of input features or visualization condition. Rather, users were more likely to detect false alarms made by the more accurate model; they were also more likely to comply with a model “miss” when more model explanation was provided. Finally, response times were longer in individuals reporting low model trust, especially when they did not comply with the model’s prediction. Our findings suggest that the factors impacting the efficacy of ML explanations depends, minimally, on the task, the overall model accuracy, the likelihood of different model errors, and user trust.
Studies of bilingual language processing typically assign participants to groups based on their language proficiency and average across participants in order to compare the two groups. This approach loses much of the nuance and individual differences that could be important for furthering theories of bilingual language comprehension. In this study, we present a novel use of machine learning (ML) to develop a predictive model of language proficiency based on behavioral data collected in a priming task. The model achieved 75% accuracy in predicting which participants were proficient in both Spanish and English. Our results indicate that ML can be a useful tool for characterizing and studying individual differences.