This research explores novel methods for extracting relevant information from EEG data to characterize individual differences in cognitive processing. Our approach combines expertise in machine learning, statistics, and cognitive science, advancing the state-of-the art in all three domains. Specifically, by using cognitive science expertise to interpret results and inform algorithm development, we have developed a generalizable and interpretable machine learning method that can accurately predict individual differences in cognition. The output of the machine learning method revealed surprising features of the EEG data that, when interpreted by the cognitive science experts, provided novel insights to the underlying cognitive task. Additionally, the outputs of the statistical methods show promise as a principled approach to quickly find regions within the EEG data where individual differences lie, thereby supporting cognitive science analysis and informing machine learning models. This work lays methodological ground work for applying the large body of cognitive science literature on individual differences to high consequence mission applications.
With machine learning (ML) technologies rapidly expanding to new applications and domains, users are collaborating with artificial intelligence-assisted diagnostic tools to a larger and larger extent. But what impact does ML aid have on cognitive performance, especially when the ML output is not always accurate? Here, we examined the cognitive effects of the presence of simulated ML assistance—including both accurate and inaccurate output—on two tasks (a domain-specific nuclear safeguards task and domain-general visual search task). Patterns of performance varied across the two tasks for both the presence of ML aid as well as the category of ML feedback (e.g., false alarm). These results indicate that differences such as domain could influence users’ performance with ML aid, and suggest the need to test the effects of ML output (and associated errors) in the specific context of use, especially when the stimuli of interest are vague or ill-defined
Eye tracking is a useful tool for studying human cognition, both in the laboratory and in real-world applications. However, there are cases in which eye tracking is not possible, such as in high-security environments where recording devices cannot be introduced. After facing this challenge in our own work, we sought to test the effectiveness of using artificial foveation as an alternative to eye tracking for studying visual search performance. Two groups of participants completed the same list comparison task, which was a computer-based task designed to mimic an inventory verification process that is commonly performed by international nuclear safeguards inspectors. We manipulated the way in which the items on the inventory list were ordered and color coded. For the eye tracking group, an eye tracker was used to assess the order in which participants viewed the items and the number of fixations per trial in each list condition. For the artificial foveation group, the items were covered with a blurry mask except when participants moused over them. We tracked the order in which participants viewed the items by moving their mouse and the number of items viewed per trial in each list condition. We observed the same overall pattern of performance for the various list display conditions, regardless of the method. However, participants were much slower to complete the task when using artificial foveation and had more variability in their accuracy. Our results indicate that the artificial foveation method can reveal the same pattern of differences across conditions as eye tracking, but it can also impact participants’ task performance.
In this project, our goal was to develop methods that would allow us to make accurate predictions about individual differences in human cognition. Understanding such differences is important for maximizing human and human-system performance. There is a large body of research on individual differences in the academic literature. Unfortunately, it is often difficult to connect this literature to applied problems, where we must predict how specific people will perform or process information. In an effort to bridge this gap, we set out to answer the question: can we train a model to make predictions about which people understand which languages? We chose language processing as our domain of interest because of the well- characterized differences in neural processing that occur when people are presented with linguistic stimuli that they do or do not understand. Although our original plan to conduct several electroencephalography (EEG) studies was disrupted by the COVID-19 pandemic, we were able to collect data from one EEG study and a series of behavioral experiments in which data were collected online. The results of this project indicate that machine learning tools can make reasonably accurate predictions about an individual?s proficiency in different languages, using EEG data or behavioral data alone.
This report details the results of a three-fold investigation of sensitivity analysis (SA) for machine learning (ML) explainability (MLE): (1) the mathematical assessment of the fidelity of an explanation with respect to a learned ML model, (2) quantifying the trustworthiness of a prediction, and (3) the impact of MLE on the efficiency of end-users through multiple users studies. We focused on the cybersecurity domain as the data is inherently non-intuitive. As ML is being using in an increasing number of domains, including domains where being wrong can elicit high consequences, MLE has been proposed as a means of generating trust in a learned ML models by end users. However, little analysis has been performed to determine if the explanations accurately represent the target model and they themselves should be trusted beyond subjective inspection. Current state-of-the-art MLE techniques only provide a list of important features based on heuristic measures and/or make certain assumptions about the data and the model which are not representative of the real-world data and models. Further, most are designed without considering the usefulness by an end-user in a broader context. To address these issues, we present a notion of explanation fidelity based on Shapley values from cooperative game theory. We find that all of the investigated MLE explainability methods produce explanations that are incongruent with the ML model that is being explained. This is because they make critical assumptions about feature independence and linear feature interactions for computational reasons. We also find that in deployed, explanations are rarely used due to a variety of reason including that there are several other tools which are trusted more than the explanations and there is little incentive to use the explanations. In the cases when the explanations are used, we found that there is the danger that explanations persuade the end users to wrongly accept false positives and false negatives. However, ML model developers and maintainers find the explanations more useful to help ensure that the ML model does not have obvious biases. In light of these findings, we suggest a number of future directions including developing MLE methods that directly model non-linear model interactions and including design principles that take into account the usefulness of explanations to the end user. We also augment explanations with a set of trustworthiness measures that measure geometric aspects of the data to determine if the model output should be trusted.
Due to their recent increases in performance, machine learning and deep learning models are being increasingly adopted across many domains for visual processing tasks. One such domain is international nuclear safeguards, which seeks to verify the peaceful use of commercial nuclear energy across the globe. Despite recent impressive performance results from machine learning and deep learning algorithms, there is always at least some small level of error. Given the significant consequences of international nuclear safeguards conclusions, we sought to characterize how incorrect responses from a machine or deep learning-assisted visual search task would cognitively impact users. We found that not only do some types of model errors have larger negative impacts on human performance than other errors, the scale of those impacts change depending on the accuracy of the model with which they are presented and they persist in scenarios of evenly distributed errors and single-error presentations. Further, we found that experiments conducted using a common visual search dataset from the psychology community has similar implications to a safeguards- relevant dataset of images containing hyperboloid cooling towers when the cooling tower images are presented to expert participants. While novice performance was considerably different (and worse) on the cooling tower task, we saw increased novice reliance on the most challenging cooling tower images compared to experts. These findings are relevant not just to the cognitive science community, but also for developers of machine and deep learning that will be implemented in multiple domains. For safeguards, this research provides key insights into how machine and deep learning projects should be implemented considering their special requirements that information not be missed.
University partnerships play an essential role in sustaining Sandia’s vitality as a national laboratory. The SAA is an element of Sandia’s broader University Partnerships program, which facilitates recruiting and research collaborations with dozens of universities annually. The SAA program has two three-year goals. SAA aims to realize a step increase in hiring results, by growing the total annual inexperienced hires from each out-of-state SAA university. SAA also strives to establish and sustain strategic research partnerships by establishing several federally sponsored collaborations and multi-institutional consortiums in science & technology (S&T) priorities such as autonomy, advanced computing, hypersonics, quantum information science, and data science. The SAA program facilitates access to talent, ideas, and Research & Development facilities through strong university partnerships. Earlier this year, the SAA program and campus executives hosted John Myers, Sandia’s former Senior Director of Human Resources (HR) and Communications, and senior-level staff at Georgia Tech, U of Illinois, Purdue, UNM, and UT Austin. These campus visits provided an opportunity to share the history of the partnerships from the university leadership, tours of research facilities, and discussions of ongoing technical work and potential recruiting opportunities. These visits also provided valuable feedback to HR management that will help Sandia realize a step increase in hiring from SAA schools. The 2020-2021 Collaboration Report is a compilation of accomplishments in 2020 and 2021 from SAA and Sandia’s valued SAA university partners.
As the ability to collect and store data grows, so does the need to efficiently analyze that data. As human-machine teams that use machine learning (ML) algorithms as a way to inform human decision-making grow in popularity it becomes increasingly critical to understand the optimal methods of implementing algorithm assisted search. In order to better understand how algorithm confidence values associated with object identification can influence participant accuracy and response times during a visual search task, we compared models that provided appropriate confidence, random confidence, and no confidence, as well as a model biased toward over confidence and a model biased toward under confidence. Results indicate that randomized confidence is likely harmful to performance while non-random confidence values are likely better than no confidence value for maintaining accuracy over time. Providing participants with appropriate confidence values did not seem to benefit performance any more than providing participants with under or over confident models.
The impact of machine learning (ML) explanations and different attributes of explanations on human performance was investigated in a simulated spam detection task. Participants decided whether the metadata presented about an email indicated that it was spam or benign. The task was completed with the aid of a ML model. The ML model’s prediction was displayed on every trial. The inclusion of an explanation and, if an explanation was presented, attributes of the explanation were manipulated within subjects: the number of model input features (3, 7) and visualization of feature importance values (graph, table), as was trial type (i.e., hit, false alarm). Overall model accuracy (50% vs 88%) was manipulated between subjects, and user trust in the model was measured as an individual difference metric. Results suggest that a user’s trust in the model had the largest impact on the decision process. The users showed better performance with a more accurate model, but no differences in accuracy based on number of input features or visualization condition. Rather, users were more likely to detect false alarms made by the more accurate model; they were also more likely to comply with a model “miss” when more model explanation was provided. Finally, response times were longer in individuals reporting low model trust, especially when they did not comply with the model’s prediction. Our findings suggest that the factors impacting the efficacy of ML explanations depends, minimally, on the task, the overall model accuracy, the likelihood of different model errors, and user trust.
Studies of bilingual language processing typically assign participants to groups based on their language proficiency and average across participants in order to compare the two groups. This approach loses much of the nuance and individual differences that could be important for furthering theories of bilingual language comprehension. In this study, we present a novel use of machine learning (ML) to develop a predictive model of language proficiency based on behavioral data collected in a priming task. The model achieved 75% accuracy in predicting which participants were proficient in both Spanish and English. Our results indicate that ML can be a useful tool for characterizing and studying individual differences.
Recent event-related brain potential (ERP) experiments have demonstrated parafoveal N400 expectancy and congruity effects, showing that semantic information can be accessed from words in parafoveal vision (a conclusion also supported by some eye-tracking work). At the same time, it is unclear how higher-order integrative aspects of language comprehension unfold across the visual field during reading. In the current study, we recorded ERPs in a parafoveal flanker paradigm, while readers were instructed to read passively for comprehension or to judge the plausibility of sentences in which target words varied in their semantic expectancy and congruity. We directly replicated prior work showing graded N400 effects for parafoveal viewing, which are then not duplicated when the target words are processed foveally. Critically, although N400 effects were not modulated by task goals, a posteriorly distributed late positive component thought to reflect semantic integration processes was observed to semantic incongruities only in the plausibility judgment task. However, this effect was observed at a considerable delay, appearing only after words had moved into foveal vision. Our findings thus suggest that semantic access can be initiated in parafoveal vision, whereas central foveal vision may be necessary to enact higher-order (and task-dependent) integrative processing.
Much has been written on the potential for games to enhance our ability to study complex systems. In this chapter we focus on how we can use games to study national security issues. We reflect on the benefits of using games and the inherent difficulties that we must address. As a means of grounding the discussion, we will present a case study of a retrospective analysis of gaming data.
International nuclear safeguards inspectors visit nuclear facilities to assess their compliance with international nonproliferation agreements. Inspectors note whether anything unusual is happening in the facility that might indicate the diversion or misuse of nuclear materials, or anything that changed since the last inspection. They must complete inspections under restrictions imposed by their hosts, regarding both their use of technology or equipment and time allotted. Moreover, because inspections are sometimes completed by different teams months apart, it is crucial that their notes accurately facilitate change detection across a delay. The current study addressed these issues by investigating how note-taking methods (e.g., digital camera, hand-written notes, or their combination) impacted memory in a delayed recall test of a complex visual array. Participants studied four arrays of abstract shapes and industrial objects using a different note-taking method for each, then returned 48–72Â h later to complete a memory test using their notes to identify objects changed (e.g., location, material, orientation). Accuracy was highest for both conditions using a camera, followed by hand-written notes alone, and all were better than having no aid. Although the camera-only condition benefitted study times, this benefit was not observed at test, suggesting drawbacks to using just a camera to aid recall. Change type interacted with note-taking method; although certain changes were overall more difficult, the note-taking method used helped mitigate these deficits in performance. Finally, elaborative hand-written notes produced better performance than simple ones, suggesting strategies for individual note-takers to maximize their efficacy in the absence of a digital aid.
International nuclear safeguards inspectors are tasked with verifying that nuclear materials in facilities around the world are not misused or diverted from peaceful purposes. They must conduct detailed inspections in complex, information-rich environments, but there has been relatively little research into the cognitive aspects of their jobs. We posit that the speed and accuracy of the inspectors can be supported and improved by designing the materials they take into the field such that the information is optimized to meet their cognitive needs. Many in-field inspection activities involve comparing inventory or shipping records to other records or to physical items inside of a nuclear facility. The organization and presentation of the records that the inspectors bring into the field with them could have a substantial impact on the ease or difficulty of these comparison tasks. In this paper, we present a series of mock inspection activities in which we manipulated the formatting of the inspectors’ records. We used behavioral and eye tracking metrics to assess the impact of the different types of formatting on the participants’ performance on the inspection tasks. The results of these experiments show that matching the presentation of the records to the cognitive demands of the task led to substantially faster task completion.
In order to understand the effect of economic interdependence on conflict and on deterrents to conflict, and to assess the viability of online games as experiments to perform research, an online serious game was used to gather data on economic, political, and military factors in the game setting. These data were operationalized in forms analogous to variables from the real-world Militarized Interstate Disputes (MIDs) dataset. A set of economic predictor variables was analyzed using linear mixed effects regression models in an attempt to discover relationships between the predictor variables and conflict outcomes. Differences between the online game results and results from the real world are discussed.
The doctrine of nuclear deterrence and a belief in its importance underpins many aspects of United States policy; it informs strategic force structures within the military, incentivizes multi-billion-dollar weapon-modernization programs within the Department of Energy, and impacts international alliances with the 29 member states of the North Atlantic Treaty Organization (NATO). The doctrine originally evolved under the stewardship of some of the most impressive minds of the twentieth century, including the physicist and H-bomb designer Herman Kahn, the Nobel Prize-winning economist Thomas Schelling, and the preeminent political scientist and diplomat Henry Kissinger.
ERPs are a powerful tool for the study of reading, as they are both temporally precise and functionally specific. These are essential characteristics for studying a process that unfolds rapidly and consists of multiple, interactive subprocesses. In work with adults, clear, specific models exist linking components of the ERP with individual subprocesses of reading including orthographic decoding, phonological processing, and semantic access (e.g., Grainger & Holcomb, 2009). The relationships between ERP components and reading subprocesses are less clear in development; here, we address two questions regarding these relationships. First, we ask whether there are ERP markers that predict future reading behaviors across a longitudinal year. Second, we ask whether any relationships observed between ERP components and reading behavior across time map onto the better-established relationships between ERPs and reading subprocesses in adults. To address these questions, we acquired ERPs from children engaging in a silent reading task and then, a year later, collected behavioral assessments of their reading ability. Finally, we find that ERPs collected in Year 1 do predict reading behaviors a year later. Further, we find that these relationships do conform, at least to some extent, to relationships between ERP components and reading subprocesses observed in adults, with, for example, N250 amplitude in Year 1 predicting phonological awareness in Year 2, and N400 amplitude in Year 1 predicting vocabulary in Year 2.
An important question in the reading literature regards the nature of the semantic information readers can extract from the parafovea (i.e., the next word in a sentence). Recent eye-tracking findings have found a semantic parafoveal preview benefit under many circumstances, and findings from event-related brain potentials (ERPs) also suggest that readers can at least detect semantic anomalies parafoveally. We use ERPs to ask whether fine-grained aspects of semantic expectancy can affect the N400 elicited by a word appearing in the parafovea. In an RSVP-with-flankers paradigm, sentences were presented word by word, flanked 2° bilaterally by the previous and upcoming words. Stimuli consisted of high constraint sentences that were identical up to the target word, which could be expected, unexpected but plausible, or anomalous, as well as low constraint sentences that were always completed with the most expected ending. Findings revealed an N400 effect to the target word when it appeared in the parafovea, which was graded with respect to the target’s expectancy and congruency within the sentence context. Moreover, when targets appeared at central fixation, this graded congruency effect was mitigated, suggesting that the semantic information gleaned from parafoveal vision functionally changes the semantic processing of those words when foveated.
Data visualizations are used to communicate information to people in a wide variety of contexts, but few tools are available to help visualization designers evaluate the effectiveness of their designs. Visual saliency maps that predict which regions of an image are likely to draw the viewer’s attention could be a useful evaluation tool, but existing models of visual saliency often make poor predictions for abstract data visualizations. These models do not take into account the importance of features like text in visualizations, which may lead to inaccurate saliency maps. In this paper we use data from two eye tracking experiments to investigate attention to text in data visualizations. The data sets were collected under two different task conditions: a memory task and a free viewing task. Across both tasks, the text elements in the visualizations consistently drew attention, especially during early stages of viewing. These findings highlight the need to incorporate additional features into saliency models that will be applied to visualizations.
In 2 experiments, we examined the impact of foveal semantic expectancy and congruity on parafoveal word processing during reading. Experiment 1 utilized an eye-tracking gaze-contingent display change paradigm, and Experiment 2 measured event-related brain potentials (ERPs) in a modified flanker rapid serial visual presentation (RSVP) paradigm. Eye-tracking and ERP data converged to reveal graded effects of foveal load on parafoveal processing. In Experiment 1, when word n was highly expected, and thus foveal load was low, there was a large parafoveal preview benefit to word n + 1. When word n was unexpected but still plausible, preview benefits to n + 1 were reduced in magnitude, and when word n was semantically incongruent, the preview benefit to n + 1 was unreliable in early pass measures. In Experiment 2, ERPs indicated that when word n was expected, and thus foveal load was low, readers successfully discriminated between valid and orthographically invalid previews during parafoveal perception. However, when word n was unexpected, parafoveal processing of n + 1 was reduced, and it was eliminated when word n was semantically incongruent. Taken together, these findings suggest that sentential context modulates the allocation of attention in the parafovea, such that covert allocation of attention to parafoveal processing is disrupted when foveal words are inconsistent with expectations based on various contextual constraints.
We investigate the online processing consequences of encountering compound words with transposed letters (TLs), in order to determine if cross-morpheme TLs are more disruptive to reading than those within a single morpheme, as would be predicted by accounts of obligatory morpho-orthopgrahic decomposition. Two measures of online processing, eye movements and event-related potentials (ERPs), were collected in separate experiments. Participants read sentences containing correctly spelled compound words (cupcake), or compounds with TLs occurring either across morphemes (cucpake) or within one morpheme (cupacke). Results showed that between- and within-morpheme transpositions produced equal processing costs in both measures, in the form of longer reading times (Experiment 1) and a late posterior positivity (Experiment 2) that did not differ between conditions. Our findings converge to suggest that within- and between-morpheme TLs are equally disruptive to recognition, providing evidence against obligatory morpho-orthographic processing and in favour of whole-word access of English compound words during sentence reading.