AITopics | augenstein

Collaborating Authors

augenstein

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Topic-Guided Sampling For Data-Efficient Multi-Domain Stance Detection

Arakelyan, Erik, Arora, Arnav, Augenstein, Isabelle

arXiv.org Machine LearningJun-1-2023

Stance Detection is concerned with identifying the attitudes expressed by an author towards a target of interest. This task spans a variety of domains ranging from social media opinion identification to detecting the stance for a legal claim. However, the framing of the task varies within these domains, in terms of the data collection protocol, the label dictionary and the number of available annotations. Furthermore, these stance annotations are significantly imbalanced on a per-topic and inter-topic basis. These make multi-domain stance detection a challenging task, requiring standardization and domain adaptation. To overcome this challenge, we propose $\textbf{T}$opic $\textbf{E}$fficient $\textbf{St}$anc$\textbf{E}$ $\textbf{D}$etection (TESTED), consisting of a topic-guided diversity sampling technique and a contrastive objective that is used for fine-tuning a stance classifier. We evaluate the method on an existing benchmark of $16$ datasets with in-domain, i.e. all topics seen and out-of-domain, i.e. unseen topics, experiments. The results show that our method outperforms the state-of-the-art with an average of $3.5$ F1 points increase in-domain, and is more generalizable with an averaged increase of $10.2$ F1 on out-of-domain evaluation while using $\leq10\%$ of the training data. We show that our sampling technique mitigates both inter- and per-topic class imbalances. Finally, our analysis demonstrates that the contrastive learning objective allows the model a more pronounced segmentation of samples with varying labels.

computational linguistic, machine learning, natural language, (14 more...)

arXiv.org Machine Learning

doi: 10.18653/v1/2023.acl-long.752

2306.00765

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(28 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Government > Regional Government > North America Government > United States Government (0.69)
Law (0.67)
Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.46)

Add feedback

Measuring Intersectional Biases in Historical Documents

Borenstein, Nadav, Stańczak, Karolina, Rolskov, Thea, Perez, Natália da Silva, Käfer, Natacha Klein, Augenstein, Isabelle

arXiv.org Artificial IntelligenceMay-21-2023

Data-driven analyses of biases in historical texts can help illuminate the origin and development of biases prevailing in modern society. However, digitised historical documents pose a challenge for NLP practitioners as these corpora suffer from errors introduced by optical character recognition (OCR) and are written in an archaic language. In this paper, we investigate the continuities and transformations of bias in historical newspapers published in the Caribbean during the colonial era (18th to 19th centuries). Our analyses are performed along the axes of gender, race, and their intersection. We examine these biases by conducting a temporal study in which we measure the development of lexical associations using distributional semantics models and word embeddings. Further, we evaluate the effectiveness of techniques designed to process OCR-generated data and assess their stability when trained on and applied to the noisy historical newspapers. We find that there is a trade-off between the stability of the word embeddings and their compatibility with the historical dataset. We provide evidence that gender and racial biases are interdependent, and their intersection triggers distinct effects. These findings align with the theory of intersectionality, which stresses that biases affecting people with multiple marginalised identities compound to more than the sum of their constituents.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.12376

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Trinidad and Tobago (0.14)
North America > Antigua and Barbuda (0.14)
(69 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

Add feedback

Modeling Information Change in Science Communication with Semantically Matched Paraphrases

Wright, Dustin, Pei, Jiaxin, Jurgens, David, Augenstein, Isabelle

arXiv.org Artificial IntelligenceOct-24-2022

Whether the media faithfully communicate scientific information has long been a core issue to the science community. Automatically identifying paraphrased scientific findings could enable large-scale tracking and analysis of information changes in the science communication process, but this requires systems to understand the similarity between scientific information across multiple domains. To this end, we present the SCIENTIFIC PARAPHRASE AND INFORMATION CHANGE DATASET (SPICED), the first paraphrase dataset of scientific findings annotated for degree of information change. SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers. We demonstrate that SPICED poses a challenging task and that models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims. Finally, we show that models trained on SPICED can reveal large-scale trends in the degrees to which people and organizations faithfully communicate new scientific findings. Data, code, and pre-trained models are available at http://www.copenlu.com/publication/2022_emnlp_wright/.

information, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2210.13001

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(13 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > News (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.69)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)

Add feedback

Augenstein

AAAI ConferencesFeb-8-2022, 11:33:56 GMT

A mixed-integer linear program (MILP) approach to scheduling a large constellation of Earth-imaging satellites is presented. The algorithm optimizes the assignment of imagery collects, image data downlinks, and "health & safety" contacts, generating schedules for all satellites and ground stations in a network. Hardware-driven constraints (e.g., the limited agility of the satellites) and operations-driven constraints (e.g., guaranteeing a minimum contact frequency for each satellite) are both addressed. Of critical importance to the use of this algorithm in real-world operations, it runs fast enough to allow for human operator interaction and repeated rescheduling. This is achieved by a partitioning of the problem into sequential steps for downlink scheduling and image scheduling, with a novel dynamic programming (DP) heuristic providing a stand-in for imaging activity in the MILP when scheduling the downlinks.

augenstein, constraint, satellite, (2 more...)

AAAI Conferences

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Does Typological Blinding Impede Cross-Lingual Sharing?

Bjerva, Johannes, Augenstein, Isabelle

arXiv.org Artificial IntelligenceJan-28-2021

Bridging the performance gap between high- and low-resource languages has been the focus of much previous work. Typological features from databases such as the World Atlas of Language Structures (WALS) are a prime candidate for this, as such data exists even for very low-resource languages. However, previous work has only found minor benefits from using typological information. Our hypothesis is that a model trained in a cross-lingual setting will pick up on typological cues from the input data, thus overshadowing the utility of explicitly using such features. We verify this hypothesis by blinding a model to typological information, and investigate how cross-lingual sharing and performance is impacted. Our model is based on a cross-lingual architecture in which the latent weights governing the sharing between languages is learnt during training. We show that (i) preventing this model from exploiting typology severely reduces performance, while a control experiment reaffirms that (ii) encouraging sharing according to typology somewhat improves performance.

augenstein, computational linguistic, typological feature, (15 more...)

arXiv.org Artificial Intelligence

2101.11888

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(9 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

What reading 3.5 million books tells us about gender stereotypes

#artificialintelligenceSep-10-2019, 07:53:27 GMT

Huge social questions like "how are men and women perceived differently" cannot be easily answered without analyzing rhetoric on a massive scale. But what if we could analyze millions of words, all at once, to get a sense of what patterns emerge in how men and women were described? It wasn't until recently that machine learning algorithms could help researchers do just that. In a recent study, Dr. Isabelle Augenstein, a computer scientist at the University of Copenhagen, worked with fellow researchers from the United States to analyze 11 billion words in an effort to find out whether there was a difference between the adjectives used to describe men and women in literature. The researchers examined a dataset of 3.5 million books, all published in English between 1900 to 2008.

artificial intelligence, augenstein, machine learning, (12 more...)

#artificialintelligence

Country:

North America > United States (0.25)
Europe > Denmark > Capital Region > Copenhagen (0.25)

Genre: Research Report (0.56)

Industry: Law > Civil Rights & Constitutional Law (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Massive Machine Learning Study Demonstrates Gender Stereotyping And Sexist Language In Literature

#artificialintelligenceSep-1-2019, 13:43:14 GMT

An unsupervised machine learning study presented at the 2019 meeting of Association for Computational Linguistics--which examined 3.5M books published between 1900 and 2008--indicates that men are described based on their behavior, where women are described based on appearance. In specific, words like "beautiful" and "sexy" are two of the adjectives most frequently used to describe women, while common descriptors for men were "brave," "rational," and "righteous." The books, which amounted to approximately 11B words in sum, included a mix of fiction and non-fiction. "We are clearly able to see that the words used for women refer much more to their appearances than the words used to describe men," said University of Copenhagen computer scientist and assistant professor Isabelle Augenstein in a statement. "Thus, we have been able to confirm a widespread perception, only now at a statistical level."

artificial intelligence, machine learning, university, (14 more...)

#artificialintelligence

Country:

Europe > Denmark > Capital Region > Copenhagen (0.26)
North America > United States > Massachusetts (0.16)

Industry: Media > News (0.74)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.73)

Add feedback

Massive Machine Learning Study Demonstrates Gender Stereotyping And Sexist Language In Literature

#artificialintelligenceAug-30-2019, 14:00:13 GMT

artificial intelligence, machine learning, university, (12 more...)

#artificialintelligence

Country:

Europe > Denmark > Capital Region > Copenhagen (0.27)
North America > United States > Massachusetts (0.17)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.74)

Add feedback

Algorithms find top 11 adjectives for men v. women in 3.5M books - Futurity

#artificialintelligenceAug-29-2019, 12:14:52 GMT

You are free to share this article under the Attribution 4.0 International license. Machine learning analyzed 3.5 million books to find that adjectives ascribed to women tend to describe physical appearance, whereas words that refer to behavior go to men. "Beautiful" and "sexy" are two of the adjectives most frequently used to describe women. Commonly used descriptors for men include righteous, rational, and courageous. Researchers trawled through an enormous quantity of books in an effort to find out whether there is a difference between the types of words that describe men and women in literature.

adjective, algorithm find top 11, university, (11 more...)

#artificialintelligence

Country:

Europe > Denmark > Capital Region > Copenhagen (0.07)
North America > United States > Massachusetts > Hampshire County > Amherst (0.05)
North America > United States > Maryland (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.74)

Add feedback

If AI Can Fix Peer Review in Science, AI Can Do Anything

WIREDFeb-21-2017, 13:35:06 GMT

Here's how science works: You have a question about some infinitesimal sliver of the universe. You form a hypothesis, test it, and eventually gather enough data to support or disprove what you thought was going on. The next bit is less glamorous: You write a manuscript, submit it to an academic journal, and endure the gauntlet of peer review, where a small group of anonymous experts in your field scrutinize the quality of your work. Peer review has its flaws. Human beings (even scientists) are biased, lazy, and self-interested.

artificial intelligence, augenstein, manuscript, (15 more...)

WIRED

Country:

North America > United States > Oregon (0.05)
Europe > Finland > Uusimaa > Helsinki (0.05)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback