We apply transfer learning techniques to create topically and/or stylistically biased natural language models from small data samples, given generic long short-term memory (LSTM) language models trained on larger data sets. Although LSTM language models are powerful tools with wide-ranging applications, they require enormous amounts of data and time to train. Thus, we build general purpose language models that take advantage of large standing corpora and computational resources proactively, allowing us to build more specialized analytical tools from smaller data sets on demand. We show that it is possible to construct a language model from a small, focused corpus by first training an LSTM language model on a large corpus (e.g., the text from English Wikipedia) and then retraining only the internal transition model parameters on the smaller corpus. We also show that a single general language model can be reused through transfer learning to create many distinct special purpose language models quickly with modest amounts of data.
Social network graph models are data structures representing entities (often people, corpora- tions, or accounts) as "vertices" and their interactions as "edges" between pairs of vertices. These graphs are most often total-graph models -- the overall structure of edges and vertices in a bidirectional or directional graph are described in global terms and the network is gen- erated algorithmically. We are interested in "egocentrie or "agent-based" models of social networks where the behavior of the individual participants are described and the graph itself is an emergent phenomenon. Our hope is that such graph models will allow us to ultimately reason from observations back to estimated properties of the individuals and populations, and result in not only more accurate algorithms for link prediction and friend recommen- dation, but also a more intuitive understanding of human behavior in such systems than is revealed by previous approaches. This report documents our preliminary work in this area; we describe several past graph models, two egocentric models of our own design, and our thoughts about the future direction of this research.
In this work, we approach topic tracking and meme trending in social media with a temporal focus; rather than analyzing topics, we aim to identify time periods whose content differs significantly from normal. We detail two approaches. The first is an information-theoretic analysis of the distributions of terms emitted during each time period. In the second, we cluster the documents from each time period and analyze the tightness of each clustering. We also discuss a method of combining the scores created by each technique, and we provide ample empirical analysis of our methodology on various Twitter datasets.
Topological data analysis is the study of data using techniques from algebraic topology. Often, one begins with a finite set of points representing data and a “filter” function which assigns a real number to each datum. Using both the data and the filter function, one can construct a filtered complex for further analysis. For example, applying the homology functor to the filtered complex produces an algebraic object known as a “one-dimensional persistence module”, which can often be interpreted as a finite set of intervals representing various geometric features in the data. If one runs the above process incorporating multiple filter functions simultaneously, one instead obtains a multidimensional persistence module. Unfortunately, these are much more difficult to interpret. In this article, we analyze the space of multidimensional persistence modules from the perspective of algebraic geometry. First we build a moduli space of a certain subclass of easily analyzed multidimensional persistence modules, which we construct specifically to capture much of the information which can be gained by using multidimensional persistence instead of one-dimensional persistence. Fruthermore, we argue that the global sections of this space provide interesting numeric invariants when evaluated against our subclass of multidimensional persistence modules. Finally, we extend these global sections to the space of all multidimensional persistence modules and discuss how the resulting numeric invariants might be used to study data. This paper extends the results of Adcock et al. (Homol Homotopy Appl 18(1), 381–402, 2016) by constructing numeric invariants from the computation of a multidimensional persistence module as given by Carlsson et al. (J Comput Geom 1(1), 72–100, 2010).
In this paper, we analyze the space of multidimensional persistence modules from the perspectives of algebraic geometry. We first build a moduli space of a certain subclass of easily analyzed multidimensional persistence modules, which we construct specifically to capture much of the information which can be gained by using multidimensional persistence over one-dimensional persistence. We argue that the global sections of this space provide interesting numeric invariants when evaluated against our subclass of multidimensional persistence modules. Lastly, we extend these global sections to the space of all multidimensional persistence modules and discuss how the resulting numeric invariants might be used to study data.