Collaborating Authors

Me [a computer] Talk Pretty One Day


If I asked you to say which sentence was well-formed, you'd probably say the first sentence is and the second isn't. In linguistics, we would say that the first one is grammatical and the second one isn't. However, grammaticality is not always as simple as "this one works" and "this one doesn't work." In many cases, including my own BA thesis, people will be asked to rank the grammaticality of a sentence on a scale (in my research, I asked subjects to rate sentences on a scale from 1 to 7, which is relatively standard). The reason I bring up this concept of grammaticality is to highlight an important aspect of language, its ambiguity.

Learning the language of viral evolution and escape


Viral mutations that evade neutralizing antibodies, an occurrence known as viral escape, can occur and may impede the development of vaccines. To predict which mutations may lead to viral escape, Hie et al. used a machine learning technique for natural language processing with two components: grammar (or syntax) and meaning (or semantics) (see the Perspective by Kim and Przytycka). Three different unsupervised language models were constructed for influenza A hemagglutinin, HIV-1 envelope glycoprotein, and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike glycoprotein. Semantic landscapes for these viruses predicted viral escape mutations that produce sequences that are syntactically and/or grammatically correct but effectively different in semantics and thus able to evade the immune system. Science , this issue p. [284][1]; see also p. [233][2] The ability for viruses to mutate and evade the human immune system and cause infection, called viral escape, remains an obstacle to antiviral and vaccine development. Understanding the complex rules that govern escape could inform therapeutic design. We modeled viral escape with machine learning algorithms originally developed for human natural language. We identified escape mutations as those that preserve viral infectivity but cause a virus to look different to the immune system, akin to word changes that preserve a sentence’s grammaticality but change its meaning. With this approach, language models of influenza hemagglutinin, HIV-1 envelope glycoprotein (HIV Env), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Spike viral proteins can accurately predict structural escape patterns using sequence data alone. Our study represents a promising conceptual bridge between natural language and viral evolution. [1]: /lookup/doi/10.1126/science.abd7331 [2]: /lookup/doi/10.1126/science.abf6894

Differential Linguistic Features in U.S. Immigration Newspaper Articles: A Contrastive Corpus Analysis Using the Gramulator

AAAI Conferences

Our corpus comprises 752 texts, culled from newspapers of U.S. border states (approximately 75 texts per state). Immigration is a national issue in the United States; Because four states border Mexico, we selected four however, regional implications differ because of matching states (of the 11) that border Canada. To do so, immigrants' varying effects on local economies. These we considered the following criteria for all 15 terrestrial implications are made manifest in the reportage of local border states: total population, immigrant population, newspapers, which, while ostensibly portraying length of international border, and political leaning. These "objective" language, may reveal the narrative of local data were input into a custom PERL script designed to perspectives on national issues.

Gradience in Grammar: Experimental and Computational Aspects of Degrees of Grammaticality

AITopics Original Links

This thesis deals with gradience in grammar, i.e., with the fact that some linguistic structures are not fully acceptable or unacceptable, but receive gradient linguistic judgments. The importance of gradient data for linguistic theory has been recognized at least since Chomsky's Logical Structure of Linguistic Theory. However, systematic empirical studies of gradience are largely absent, and none of the major theoretical frameworks is designed to account for gradient data.

Utilizing Evidence Spans via Sequence-Level Contrastive Learning for Long-Context Question Answering Artificial Intelligence

Long-range transformer models have achieved encouraging results on long-context question answering (QA) tasks. Such tasks often require reasoning over a long document, and they benefit from identifying a set of evidence spans (e.g., sentences) that provide supporting evidence for addressing the question. In this work, we propose a novel method for equipping long-range transformers with an additional sequence-level objective for better identification of supporting evidence spans. We achieve this by proposing an additional contrastive supervision signal in finetuning, where the model is encouraged to explicitly discriminate supporting evidence sentences from negative ones by maximizing the question-evidence similarity. The proposed additional loss exhibits consistent improvements on three different strong long-context transformer models, across two challenging question answering benchmarks - HotpotQA and QAsper.