AITopics | Callison-Burch, Chris

Collaborating Authors

Callison-Burch, Chris

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

NSF-SciFy: Mining the NSF Awards Database for Scientific Claims

Rao, Delip, You, Weiqiu, Wong, Eric, Callison-Burch, Chris

arXiv.org Artificial IntelligenceMar-15-2025

We present NSF-SciFy, a large-scale dataset for scientific claim extraction derived from the National Science Foundation (NSF) awards database, comprising over 400K grant abstracts spanning five decades. While previous datasets relied on published literature, we leverage grant abstracts which offer a unique advantage: they capture claims at an earlier stage in the research lifecycle before publication takes effect. We also introduce a new task to distinguish between existing scientific claims and aspirational research intentions in proposals. Using zero-shot prompting with frontier large language models, we jointly extract 114K scientific claims and 145K investigation proposals from 16K grant abstracts in the materials science domain to create a focused subset called NSF-SciFy-MatSci. We use this dataset to evaluate 3 three key tasks: (1) technical to non-technical abstract generation, where models achieve high BERTScore (0.85+ F1); (2) scientific claim extraction, where fine-tuned models outperform base models by 100% relative improvement; and (3) investigation proposal extraction, showing 90%+ improvement with fine-tuning. We introduce novel LLM-based evaluation metrics for robust assessment of claim/proposal extraction quality. As the largest scientific claim dataset to date -- with an estimated 2.8 million claims across all STEM disciplines funded by the NSF -- NSF-SciFy enables new opportunities for claim verification and meta-scientific research. We publicly release all datasets, trained models, and evaluation code to facilitate further research.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.086

Country: North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.69)
Government > Regional Government > North America Government > United States Government (0.67)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

mStyleDistance: Multilingual Style Embeddings and their Evaluation

Qiu, Justin, Zhu, Jiacheng, Patel, Ajay, Apidianaki, Marianna, Callison-Burch, Chris

arXiv.org Artificial IntelligenceFeb-20-2025

Style embeddings are useful for stylistic analysis and style transfer; however, only English style embeddings have been made available. We introduce Multilingual StyleDistance (mStyleDistance), a multilingual style embedding model trained using synthetic data and contrastive learning. We train the model on data from nine languages and create a multilingual STEL-or-Content benchmark (Wegmann et al., 2022) that serves to assess the embeddings' quality. We also employ our embeddings in an authorship verification task involving different languages. Our results show that mStyleDistance embeddings outperform existing models on these multilingual style benchmarks and generalize well to unseen features and languages. We make our model publicly available at https://huggingface.co/StyleDistance/mstyledistance .

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.15168

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.68)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation

Yang, Yue, Patel, Ajay, Deitke, Matt, Gupta, Tanmay, Weihs, Luca, Head, Andrew, Yatskar, Mark, Callison-Burch, Chris, Krishna, Ranjay, Kembhavi, Aniruddha, Clark, Christopher

arXiv.org Artificial IntelligenceFeb-20-2025

Reasoning about images with rich text, such as charts and documents, is a critical application of vision-language models (VLMs). However, VLMs often struggle in these domains due to the scarcity of diverse text-rich vision-language data. To address this challenge, we present CoSyn, a framework that leverages the coding capabilities of text-only large language models (LLMs) to automatically create synthetic text-rich multimodal data. Given input text describing a target domain (e.g., "nutrition fact labels"), CoSyn prompts an LLM to generate code (Python, HTML, LaTeX, etc.) for rendering synthetic images. With the underlying code as textual representations of the synthetic images, CoSyn can generate high-quality instruction-tuning data, again relying on a text-only LLM. Using CoSyn, we constructed a dataset comprising 400K images and 2.7M rows of vision-language instruction-tuning data. Comprehensive experiments on seven benchmarks demonstrate that models trained on our synthetic data achieve state-of-the-art performance among competitive open-source models, including Llama 3.2, and surpass proprietary models such as GPT-4V and Gemini 1.5 Flash. Furthermore, CoSyn can produce synthetic pointing data, enabling VLMs to ground information within input images, showcasing its potential for developing multimodal agents capable of acting in real-world environments.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.14846

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Industry:

Government > Regional Government > North America Government > United States Government (0.46)
Media (0.46)
Materials > Chemicals (0.46)
Health & Medicine > Consumer Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

GenAI Content Detection Task 3: Cross-Domain Machine-Generated Text Detection Challenge

Dugan, Liam, Zhu, Andrew, Alam, Firoj, Nakov, Preslav, Apidianaki, Marianna, Callison-Burch, Chris

arXiv.org Artificial IntelligenceJan-15-2025

Recently there have been many shared tasks targeting the detection of generated text from Large Language Models (LLMs). However, these shared tasks tend to focus either on cases where text is limited to one particular domain or cases where text can be from many domains, some of which may not be seen during test time. In this shared task, using the newly released RAID benchmark, we aim to answer whether or not models can detect generated text from a large, yet fixed, number of domains and LLMs, all of which are seen during training. Over the course of three months, our task was attempted by 9 teams with 23 detector submissions. We find that multiple participants were able to obtain accuracies of over 99% on machine-generated text from RAID while maintaining a 5% False Positive Rate -- suggesting that detectors are able to robustly detect text from many domains and models simultaneously. We discuss potential interpretations of this result and provide directions for future research.

detector, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.08913

Country:

North America > United States (0.93)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.15)
North America > Mexico > Mexico City (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.70)
Government (0.48)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.89)

Add feedback

WHAT-IF: Exploring Branching Narratives by Meta-Prompting Large Language Models

Huang, Runsheng "Anson", Martin, Lara J., Callison-Burch, Chris

arXiv.org Artificial IntelligenceDec-17-2024

WHAT-IF--Writing a Hero's Alternate Timeline through Interactive Fiction--is a system that uses zero-shot meta-prompting to create branching narratives from a prewritten story. Played as an interactive fiction (IF) game, WHAT-IF lets the player choose between decisions that the large language model (LLM) GPT-4 generates as possible branches in the story. Starting with an existing linear plot as input, a branch is created at each key decision taken by the main character. By meta-prompting the LLM to consider the major plot points from the story, the system produces coherent and well-structured alternate storylines. WHAT-IF stores the branching plot tree in a graph which helps it to both keep track of the story for prompting and maintain the structure for the final IF system. Figure 1: The WHAT-IF user interface, filled with the A video demo of our system can be found here: main character, title, and the plot of the TV show WandaVision https://youtu.be/8vBqjqtupcc.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.10582

Country:

North America > United States (1.00)
Europe (0.93)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Media (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

Deitke, Matt, Clark, Christopher, Lee, Sangho, Tripathi, Rohun, Yang, Yue, Park, Jae Sung, Salehi, Mohammadreza, Muennighoff, Niklas, Lo, Kyle, Soldaini, Luca, Lu, Jiasen, Anderson, Taira, Bransom, Erin, Ehsani, Kiana, Ngo, Huong, Chen, YenSung, Patel, Ajay, Yatskar, Mark, Callison-Burch, Chris, Head, Andrew, Hendrix, Rose, Bastani, Favyen, VanderBilt, Eli, Lambert, Nathan, Chou, Yvonne, Chheda, Arnavi, Sparks, Jenna, Skjonsberg, Sam, Schmitz, Michael, Sarnat, Aaron, Bischoff, Byron, Walsh, Pete, Newell, Chris, Wolters, Piper, Gupta, Tanmay, Zeng, Kuo-Hao, Borchardt, Jon, Groeneveld, Dirk, Nam, Crystal, Lebrecht, Sophie, Wittlif, Caitlin, Schoenick, Carissa, Michel, Oscar, Krishna, Ranjay, Weihs, Luca, Smith, Noah A., Hajishirzi, Hannaneh, Girshick, Ross, Farhadi, Ali, Kembhavi, Aniruddha

arXiv.org Artificial IntelligenceDec-5-2024

Today's most advanced vision-language models (VLMs) remain proprietary. The strongest open-weight models rely heavily on synthetic data from proprietary VLMs to achieve good performance, effectively distilling these closed VLMs into open ones. As a result, the community has been missing foundational knowledge about how to build performant VLMs from scratch. We present Molmo, a new family of VLMs that are state-of-the-art in their class of openness. Our key contribution is a collection of new datasets called PixMo, including a dataset of highly detailed image captions for pre-training, a free-form image Q&A dataset for fine-tuning, and an innovative 2D pointing dataset, all collected without the use of external VLMs. The success of our approach relies on careful modeling choices, a well-tuned training pipeline, and, most critically, the quality of our newly collected datasets. Our best-in-class 72B model not only outperforms others in the class of open weight and data models, but also outperforms larger proprietary models including Claude 3.5 Sonnet, and Gemini 1.5 Pro and Flash, second only to GPT-4o based on both academic benchmarks and on a large human evaluation. Our model weights, new datasets, and source code are available at https://molmo.allenai.org/blog.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2409.17146

Country: North America > United States > Texas > Harris County > Houston (0.14)

Genre:

Research Report (0.63)
Questionnaire & Opinion Survey (0.46)

Industry:

Consumer Products & Services (0.67)
Media (0.67)
Leisure & Entertainment (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

WithdrarXiv: A Large-Scale Dataset for Retraction Study

Rao, Delip, Young, Jonathan, Dietterich, Thomas, Callison-Burch, Chris

arXiv.org Artificial IntelligenceDec-4-2024

Retractions play a vital role in maintaining scientific integrity, yet systematic studies of retractions in computer science and other STEM fields remain scarce. We present WithdrarXiv, the first large-scale dataset of withdrawn papers from arXiv, containing over 14,000 papers and their associated retraction comments spanning the repository's entire history through September 2024. Through careful analysis of author comments, we develop a comprehensive taxonomy of retraction reasons, identifying 10 distinct categories ranging from critical errors to policy violations. We demonstrate a simple yet highly accurate zero-shot automatic categorization of retraction reasons, achieving a weighted average F1-score of 0.96. Additionally, we release WithdrarXiv-SciFy, an enriched version including scripts for parsed full-text PDFs, specifically designed to enable research in scientific feasibility studies, claim verification, and automated theorem proving. These findings provide valuable insights for improving scientific quality control and automated verification systems. Finally, and most importantly, we discuss ethical issues and take a number of steps to implement responsible data release while fostering open science in this area.

category, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2412.03775

Country:

North America > United States (1.00)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.85)

Industry:

Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.94)
Law (0.70)
Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples

Patel, Ajay, Zhu, Jiacheng, Qiu, Justin, Horvitz, Zachary, Apidianaki, Marianna, McKeown, Kathleen, Callison-Burch, Chris

arXiv.org Artificial IntelligenceOct-16-2024

Style representations aim to embed texts with similar writing styles closely and texts with different styles far apart, regardless of content. However, the contrastive triplets often used for training these representations may vary in both style and content, leading to potential content leakage in the representations. We introduce StyleDistance, a novel approach to training stronger content-independent style embeddings. We use a large language model to create a synthetic dataset of near-exact paraphrases with controlled style variations, and produce positive and negative examples across 40 distinct style features for precise contrastive learning. We assess the quality of our synthetic data and embeddings through human and automatic evaluations. StyleDistance enhances the content-independence of style embeddings, which generalize to real-world benchmarks and outperform leading style representations in downstream applications. Our model can be found at https://huggingface.co/StyleDistance/styledistance .

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2410.12757

Country:

North America > United States > Louisiana (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

MiRAGeNews: Multimodal Realistic AI-Generated News Detection

Huang, Runsheng, Dugan, Liam, Yang, Yue, Callison-Burch, Chris

arXiv.org Artificial IntelligenceOct-11-2024

The proliferation of inflammatory or misleading "fake" news content has become increasingly common in recent years. Simultaneously, it has become easier than ever to use AI tools to generate photorealistic images depicting any scene imaginable. Combining these two -- AI-generated fake news content -- is particularly potent and dangerous. To combat the spread of AI-generated fake news, we propose the MiRAGeNews Dataset, a dataset of 12,500 high-quality real and AI-generated image-caption pairs from state-of-the-art generators. We find that our dataset poses a significant challenge to humans (60% F-1) and state-of-the-art multi-modal LLMs (< 24% F-1). Using our dataset we train a multi-modal detector (MiRAGe) that improves by +5.1% F-1 over state-of-the-art baselines on image-caption pairs from out-of-domain image generators and news publishers. We release our code and data to aid future work on detecting AI-generated content.

caption, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2410.09045

Country:

North America > United States (1.00)
Asia > China > Henan Province (0.14)

Genre: Research Report (1.00)

Industry:

Government (0.93)
Media > News (0.90)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.30)

Add feedback

BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation

Li, Bryan, Haider, Samar, Luo, Fiona, Agashe, Adwait, Callison-Burch, Chris

arXiv.org Artificial IntelligenceOct-1-2024

Large language models excel at creative generation but continue to struggle with the issues of hallucination and bias. While retrieval-augmented generation (RAG) provides a framework for grounding LLMs' responses in accurate and up-to-date information, it still raises the question of bias: which sources should be selected for inclusion in the context? And how should their importance be weighted? In this paper, we study the challenge of cross-lingual RAG and present a dataset to investigate the robustness of existing systems at answering queries about geopolitical disputes, which exist at the intersection of linguistic, cultural, and political boundaries. Our dataset is sourced from Wikipedia pages containing information relevant to the given queries and we investigate the impact of including additional context, as well as the composition of this context in terms of language and source, on an LLM's response. Our results show that existing RAG systems continue to be challenged by cross-lingual use cases and suffer from a lack of consistency when they are provided with competing information in multiple languages. We present case studies to illustrate these issues and outline steps for future research to address these challenges. We make our dataset and code publicly available at https://github.com/manestay/bordIRlines.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.01171

Country:

Europe > Ukraine (1.00)
Asia (1.00)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
Europe > Russia > Southern Federal District > Krasnodar Krai (0.14)

Genre: Research Report > New Finding (0.54)

Industry: Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback