Publications

8 Results
Skip to search filters

Active Learning for Language Modeling

Kemp, Emily K.; Compton, Jonathan E.; McKenzie, Darrien M.

Foreign disinformation campaigns undermine national security. Various supervised language modeling techniques in NLP can help to understand and dismantle these campaigns, but they rely heavily on large, labeled (often by humans) datasets. This work provides a solution to this problem in the form of an active learning (AL) framework, which is used to generate labeled datasets and leverage human input for detecting disinformation. The developed AL framework utilizes task adaptive pretraining to fully leverage the unlabeled data and boost the performance of the classifier used for labeling. A disinformation rhetoric metric was developed to measure the presence of common rhetorical techniques used in text that are meant to deceive, for both the classifier and human to use in the task of identifying disinformation. This metric was combined with an uncertainty criterion to create a hybrid acquisition method for AL, and this hybrid method was tested alongside other acquisition functions. A sophisticated and robust stopping strategy was developed to signal the AL process should terminate, saving human time from being wasted on iterations that would not significantly benefit classifier performance.

More Details

A Process to Colorize and Assess Visualizations of Noisy X-Ray Computed Tomography Hyperspectral Data of Materials with Similar Spectral Signatures

2021 IEEE Nuclear Science Symposium and Medical Imaging Conference Record, NSS/MIC 2021 and 28th International Symposium on Room-Temperature Semiconductor Detectors, RTSD 2022

Clifford, Joshua M.; Kemp, Emily K.; Limpanukorn, Ben L.; Jimenez, Edward S.

Dimension reduction techniques have frequently been used to summarize information from high dimensional hyperspectral data, usually done in effort to classify or visualize the materials contained in the hyperspectral image. The main challenge in applying these techniques to Hyperspectral Computed Tomography (HCT) data is that if the materials in the field of view are of similar composition then it can be difficult for a visualization of the hyperspectral image to differentiate between the materials. We propose novel alternative methods of preprocessing and summarizing HCT data in a single colorized image and novel measures to assess desired qualities in the resultant colored image, such as the contrast between different materials and the consistency of color within the same object. Proposed processes in this work include a new majority-voting method for multi-level thresholding, binary erosion, median filters, PAM clustering for grouping pixels into objects (of homogeneous materials) and mean/median assignment along the spectral dimension for representing the underlying signature, UMAP or GLMs to assign colors, and quantitative coloring assessment with developed measures. Strengths and weaknesses of various combinations of methods are discussed. These results have the potential to create more robust material identification methods from HCT data that has wide use in industrial, medical, and security-based applications for detection and quantification, including visualization methods to assist with rapid human interpretability of these complex hyperspectral signatures.

More Details
8 Results
8 Results