AITopics | Rose, Carolyn

Collaborating Authors

Rose, Carolyn

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing

Xie, Yiqing, Xie, Alex, Sheth, Divyanshu, Liu, Pengfei, Fried, Daniel, Rose, Carolyn

arXiv.org Artificial IntelligenceMar-10-2025

We present RepoST, a scalable method to construct environments that provide execution feedback for repository-level code generation for both training and evaluation. Unlike existing works that aim to build entire repositories for execution, which is challenging for both human and LLMs, we provide execution feedback with sandbox testing, which isolates a given target function and its dependencies to a separate script for testing. Sandbox testing reduces the complexity of external dependencies and enables constructing environments at a large scale. We use our method to construct RepoST-Train, a large-scale train set with 7,415 functions from 832 repositories. Training with the execution feedback provided by RepoST-Train leads to a performance gain of 5.5% Pass@1 on HumanEval and 3.5% Pass@1 on RepoEval. We also build an evaluation dataset, RepoST-Eval, and benchmark 12 code generation models.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.07358

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction

Naik, Atharva, Agrawal, Darsh, Sng, Hong, Marr, Clayton, Zhang, Kexun, Robinson, Nathaniel R, Chang, Kalvin, Byrnes, Rebecca, Mysore, Aravind, Rose, Carolyn, Mortensen, David R

arXiv.org Artificial IntelligenceJan-27-2025

Historical linguists have long written "programs" that convert reconstructed words in an ancestor language into their attested descendants via ordered string rewrite functions (called sound laws) However, writing these programs is time-consuming, motivating the development of automated Sound Law Induction (SLI) which we formulate as Programming by Examples (PBE) with Large Language Models (LLMs) in this paper. While LLMs have been effective for code generation, recent work has shown that PBE is challenging but improvable by fine-tuning, especially with training data drawn from the same distribution as evaluation data. In this paper, we create a conceptual framework of what constitutes a "similar distribution" for SLI and propose four kinds of synthetic data generation methods with varying amounts of inductive bias to investigate what leads to the best performance. Based on the results we create a SOTA open-source model for SLI as PBE (+6% pass rate with a third of the parameters of the second-best LLM) and also highlight exciting future directions for PBE research.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.16524

Country:

Europe (0.67)
North America > United States > New York (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Improving Model Factuality with Fine-grained Critique-based Evaluator

Xie, Yiqing, Zhou, Wenxuan, Prakash, Pradyot, Jin, Di, Mao, Yuning, Fettes, Quintin, Talebzadeh, Arya, Wang, Sinong, Fang, Han, Rose, Carolyn, Fried, Daniel, Zhang, Hejia

arXiv.org Artificial IntelligenceOct-23-2024

Factuality evaluation aims to detect factual errors produced by language models (LMs) and hence guide the development of more factual models. Towards this goal, we train a factuality evaluator, FenCE, that provides LM generators with claim-level factuality feedback. We conduct data augmentation on a combination of public judgment datasets to train FenCE to (1) generate textual critiques along with scores and (2) make claim-level judgment based on diverse source documents obtained by various tools. We then present a framework that leverages FenCE to improve the factuality of LM generators by constructing training data. Specifically, we generate a set of candidate responses, leverage FenCE to revise and score each response without introducing lesser-known facts, and train the generator by preferring highly scored revised responses. Experiments show that our data augmentation methods improve the evaluator's accuracy by 2.9% on LLM-AggreFact. With FenCE, we improve Llama3-8B-chat's factuality rate by 14.45% on FActScore, outperforming state-of-the-art factuality finetuning methods by 6.96%.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.18359

Country:

North America > United States > North Carolina (0.28)
North America > United States > South Carolina (0.28)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry:

Media (1.00)
Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells

Naik, Atharva, Alenius, Marcus, Fried, Daniel, Rose, Carolyn

arXiv.org Artificial IntelligenceSep-29-2024

The task of automated code review has recently gained a lot of attention from the machine learning community. However, current review comment evaluation metrics rely on comparisons with a human-written reference for a given code change (also called a diff), even though code review is a one-to-many problem like generation and summarization with many "valid reviews" for a diff. To tackle these issues we develop a CRScore - a reference-free metric to measure dimensions of review quality like conciseness, comprehensiveness, and relevance. We design CRScore to evaluate reviews in a way that is grounded in claims and potential issues detected in the code by LLMs and static analyzers. We demonstrate that CRScore can produce valid, fine-grained scores of review quality that have the greatest alignment with human judgment (0.54 Spearman correlation) and are more sensitive than reference-based metrics. We also release a corpus of 2.6k human-annotated review quality scores for machine-generated and GitHub review comments to support the development of automated metrics.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2409.19801

Country:

Europe (0.92)
North America > United States > Pennsylvania (0.14)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

Xie, Yiqing, Xie, Alex, Sheth, Divyanshu, Liu, Pengfei, Fried, Daniel, Rose, Carolyn

arXiv.org Artificial IntelligenceMay-7-2024

To facilitate evaluation of code generation systems across diverse scenarios, we present CodeBenchGen, a framework to create scalable execution-based benchmarks that only requires light guidance from humans. Specifically, we leverage a large language model (LLM) to convert an arbitrary piece of code into an evaluation example, including test cases for execution-based evaluation. We illustrate the usefulness of our framework by creating a dataset, Exec-CSN, which includes 1,931 examples involving 293 libraries revised from code in 367 GitHub repositories taken from the CodeSearchNet dataset. To demonstrate the complexity and solvability of examples in Exec-CSN, we present a human study demonstrating that 81.3% of the examples can be solved by humans and 61% are rated as "requires effort to solve". We conduct code generation experiments on open-source and proprietary models and analyze the performance of both humans and models.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2404.00566

Country: North America > United States > Nebraska (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.99)

Add feedback

Data Augmentation for Code Translation with Comparable Corpora and Multiple References

Xie, Yiqing, Naik, Atharva, Fried, Daniel, Rose, Carolyn

arXiv.org Artificial IntelligenceNov-1-2023

One major challenge of translating code between programming languages is that parallel training data is often limited. To overcome this challenge, we present two data augmentation techniques, one that builds comparable corpora (i.e., code pairs with similar functionality), and another that augments existing parallel data with multiple reference translations. Specifically, we build and analyze multiple types of comparable corpora, including programs generated from natural language documentation using a code generation model. Furthermore, to reduce overfitting to a single reference translation, we automatically generate additional translation references for available parallel data and filter the translations by unit tests, which increases variation in target translations. Experiments show that our data augmentation techniques significantly improve CodeT5 for translation between Java, Python, and C++ by an average of 7.5% Computational Accuracy (CA@1), which verifies the correctness of translations by execution. The code is available at https://github.com/Veronicium/CMTrans.

artificial intelligence, comparable corpora and multiple reference, machine learning, (2 more...)

arXiv.org Artificial Intelligence

2311.00317

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback

Linguistic representations for fewer-shot relation extraction across domains

Gururaja, Sireesh, Dutt, Ritam, Liao, Tinglong, Rose, Carolyn

arXiv.org Artificial IntelligenceJul-7-2023

Recent work has demonstrated the positive impact of incorporating linguistic representations as additional context and scaffolding on the in-domain performance of several NLP tasks. We extend this work by exploring the impact of linguistic representations on cross-domain performance in a few-shot transfer setting. An important question is whether linguistic representations enhance generalizability by providing features that function as cross-domain pivots. We focus on the task of relation extraction on three datasets of procedural text in two domains, cooking and materials science. Our approach augments a popular transformer-based architecture by alternately incorporating syntactic and semantic graphs constructed by freely available off-the-shelf tools. We examine their utility for enhancing generalization, and investigate whether earlier findings, e.g. that semantic representations can be more helpful than syntactic ones, extend to relation extraction in multiple domains. We find that while the inclusion of these graphs results in significantly higher performance in few-shot transfer, both types of graph exhibit roughly equivalent utility.

artificial intelligence, natural language, text processing, (18 more...)

arXiv.org Artificial Intelligence

2307.03823

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.86)

Add feedback

Evaluating the Impact of a Hierarchical Discourse Representation on Entity Coreference Resolution Performance

Khosla, Sopan, Fiacco, James, Rose, Carolyn

arXiv.org Artificial IntelligenceApr-20-2021

The contribution of this paper is an empirical investigation of the impact of including a representation Historically, theories of discourse coherence of the hierarchical structure of discourse within (Chafe, 1976; Hobbs, 1979; Grosz and a neural entity coreference approach. To this end, Sidner, 1986; Clark and Brennan, 1991) have offered we leverage a state-of-the-art RST discourse-parser elaborate expositions on how the patterns of to convert a flat document into a tree-like structure anaphoric references in discourse are constrained from which we can derive features that model the by limitations in human capacity to manage structural constraints. We embed this representation attention and resolve ambiguity. Hobbs (1979) within an architecture that is enabled to learn to acknowledges that these human limitations have use this information deferentially depending upon meant that coreference resolution in natural text the type of mention. The results demonstrate that can be achieved with relatively high accuracy using this level of nuance enables a small but significant a combination of recency and simple semantic improvement in coreference accuracy, even with constraints. State-of-the-art neural approaches for automatically constructed RST trees.

coreference resolution, neural network, text processing, (18 more...)

arXiv.org Artificial Intelligence

2104.10215

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)

Add feedback

Using Type Information to Improve Entity Coreference Resolution

Khosla, Sopan, Rose, Carolyn

arXiv.org Artificial IntelligenceOct-12-2020

Coreference resolution (CR) is an essential part of discourse analysis. Most recently, neural approaches have been proposed to improve over SOTA models from earlier paradigms. So far none of the published neural models leverage external semantic knowledge such as type information. This paper offers the first such model and evaluation, demonstrating modest gains in accuracy by introducing either gold standard or predicted types. In the proposed approach, type information serves both to (1) improve mention representation and (2) create a soft type consistency check between coreference candidate mentions. Our evaluation covers two different grain sizes of types over four different benchmark corpora.

artificial intelligence, dataset, text processing, (19 more...)

arXiv.org Artificial Intelligence

2010.05738

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

You Too?! Mixed-Initiative LDA Story Matching to Help Teens in Distress

Dinakar, Karthik (Massachusetts Institute of Technology) | Jones, Birago (Massachusetts Institute of Technology) | Lieberman, Henry (Massachusetts Institute of Technology) | Picard, Rosalind (Massachusetts Institute of Technology) | Rose, Carolyn (Carnegie Mellon University) | Thoman, Matthew (Northeastern University) | Reichart, Roi (Massachusetts Institute of Technology)

AAAI ConferencesFeb-22-2012

Adolescent cyber-bullying on social networks is a phenomenon that has received widespread attention. Recent work by sociologists has examined this phenomenon under the larger context of teenage drama and it's manifestations on social networks. Tackling cyber-bullying involves two key components – automatic detection of possible cases, and interaction strategies that encourage reflection and emotional support. Key is showing distressed teenagers that they are not alone in their plight. Conventional topic spotting and document classification into labels like "dating" or "sports" are not enough to effectively match stories for this task. In this work, we examine a corpus of 5500 stories from distressed teenagers from a major youth social network. We combine Latent Dirichlet Allocation and human interpretation of its output using principles from sociolinguistics to extract high-level themes in the stories and use them to match new stories to similar ones. A user evaluation of the story matching shows that theme-based retrieval does a better job of finding relevant and effective stories for this application than conventional approaches.

corpus, educational setting, social media, (21 more...)

AAAI Conferences

Sixth International AAAI Conference on Weblogs and Social Media

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.58)
Information Technology > Security & Privacy (0.58)
Health & Medicine > Therapeutic Area (0.47)
Education > Educational Setting (0.47)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.34)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.34)

Add feedback