AITopics

2502.01636

Country:

North America > United States (1.00)
Asia (1.00)

Genre: Research Report (0.84)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications

Wu, Yixin, Yang, Ziqing, Shen, Yun, Backes, Michael, Zhang, Yang

Large language models (LLMs) have facilitated the generation of high-quality, cost-effective synthetic data for developing downstream models and conducting statistical analyses in various domains. However, the increased reliance on synthetic data may pose potential negative impacts. Numerous studies have demonstrated that LLM-generated synthetic data can perpetuate and even amplify societal biases and stereotypes, and produce erroneous outputs known as ``hallucinations'' that deviate from factual knowledge. In this paper, we aim to audit artifacts, such as classifiers, generators, or statistical plots, to identify those trained on or derived from synthetic data and raise user awareness, thereby reducing unexpected consequences and risks in downstream applications. To this end, we take the first step to introduce synthetic artifact auditing to assess whether a given artifact is derived from LLM-generated synthetic data. We then propose an auditing framework with three methods including metric-based auditing, tuning-based auditing, and classification-based auditing. These methods operate without requiring the artifact owner to disclose proprietary training details. We evaluate our auditing framework on three text classification tasks, two text summarization tasks, and two data visualization tasks across three training scenarios. Our evaluation demonstrates the effectiveness of all proposed auditing methods across all these tasks. For instance, black-box metric-based auditing can achieve an average accuracy of $0.868 \pm 0.071$ for auditing classifiers and $0.880 \pm 0.052$ for auditing generators using only 200 random queries across three scenarios. We hope our research will enhance model transparency and regulatory compliance, ensuring the ethical and responsible use of synthetic data.

classifier, large language model, machine learning, (21 more...)

2502.00808

Country:

North America > United States > California (0.14)
Asia > China (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Law (0.87)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Paper Copilot: The Artificial Intelligence and Machine Learning Community Should Adopt a More Transparent and Regulated Peer Review Process

Yang, Jing

The rapid growth of submissions to top-tier Artificial Intelligence (AI) and Machine Learning (ML) conferences has prompted many venues to transition from closed to open review platforms. Some have fully embraced open peer reviews, allowing public visibility throughout the process, while others adopt hybrid approaches, such as releasing reviews only after final decisions or keeping reviews private despite using open peer review systems. In this work, we analyze the strengths and limitations of these models, highlighting the growing community interest in transparent peer review. To support this discussion, we examine insights from Paper Copilot, a website launched two years ago to aggregate and analyze AI / ML conference data while engaging a global audience. The site has attracted over 200,000 early-career researchers, particularly those aged 18-34 from 177 countries, many of whom are actively engaged in the peer review process. Drawing on our findings, this position paper advocates for a more transparent, open, and well-regulated peer review aiming to foster greater community involvement and propel advancements in the field.

large language model, machine learning, natural language, (16 more...)

2502.00874

Country:

North America > United States (0.28)
Oceania > Australia (0.04)
Europe > United Kingdom (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Law > Intellectual Property & Technology Law (0.93)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Loeffler, Christoffer, Freile, Andrea Martínez, Pizarro, Tomás Rey

Predicting potentially unfair clauses in Chilean terms of services with natural language processing

This study addresses the growing concern of information asymmetry in consumer contracts, exacerbated by the proliferation of online services with complex Terms of Service that are rarely even read. Even though research on automatic analysis methods is conducted, the problem is aggravated by the general focus on English-language Machine Learning approaches and on major jurisdictions, such as the European Union. We introduce a new methodology and a substantial dataset addressing this gap. We propose a novel annotation scheme with four categories and a total of 20 classes, and apply it on 50 online Terms of Service used in Chile. Our evaluation of transformer-based models highlights how factors like language- and/or domain-specific pre-training, few-shot sample size, and model architecture affect the detection and classification of potentially abusive clauses. Results show a large variability in performance for the different tasks and models, with the highest macro-F1 scores for the detection task ranging from 79% to 89% and micro-F1 scores up to 96%, while macro-F1 scores for the classification task range from 60% to 70% and micro-F1 scores from 64% to 80%. Notably, this is the first Spanish-language multi-label classification dataset for legal clauses, applying Chilean law and offering a comprehensive evaluation of Spanish-language models in the legal domain. Our work lays the ground for future research in method development for rarely considered legal analysis and potentially leads to practical applications to support consumers in Chile and Latin America as a whole.

information retrieval, large language model, machine learning, (20 more...)

2502.00865

Country:

North America > Central America (0.24)
South America > Brazil (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(8 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Government > Regional Government > Europe Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)

Mohammadi, Hadi, Bagheri, Ayoub, Giachanou, Anastasia, Oberski, Daniel L.

Explainability in Practice: A Survey of Explainable NLP Across Various Domains

Natural Language Processing (NLP) has become a cornerstone in many critical sectors, including healthcare, finance, and customer relationship management. This is especially true with the development and use of advanced models such as GPT-based architectures and BERT, which are widely used in decision-making processes. However, the black-box nature of these advanced NLP models has created an urgent need for transparency and explainability. This review explores explainable NLP (XNLP) with a focus on its practical deployment and real-world applications, examining its implementation and the challenges faced in domain-specific contexts. The paper underscores the importance of explainability in NLP and provides a comprehensive perspective on how XNLP can be designed to meet the unique demands of various sectors, from healthcare's need for clear insights to finance's emphasis on fraud detection and risk assessment. Additionally, this review aims to bridge the knowledge gap in XNLP literature by offering a domain-specific exploration and discussing underrepresented areas such as real-world applicability, metric evaluation, and the role of human interaction in model assessment. The paper concludes by suggesting future research directions that could enhance the understanding and broader application of XNLP.

explanation, large language model, machine learning, (23 more...)

2502.00837

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New York > New York County > New York City (0.04)
(19 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (1.00)
Banking & Finance (1.00)
Law (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(8 more...)

Nikolic, Ivica, Baluta, Teodora, Saxena, Prateek

Model Provenance Testing for Large Language Models

Large language models are increasingly customized through fine-tuning and other adaptations, creating challenges in enforcing licensing terms and managing downstream impacts. Tracking model origins is crucial both for protecting intellectual property and for identifying derived models when biases or vulnerabilities are discovered in foundation models. We address this challenge by developing a framework for testing model provenance: Whether one model is derived from another. Our approach is based on the key observation that real-world model derivations preserve significant similarities in model outputs that can be detected through statistical analysis. Using only black-box access to models, we employ multiple hypothesis testing to compare model similarities against a baseline established by unrelated models. On two comprehensive real-world benchmarks spanning models from 30M to 4B parameters and comprising over 600 models, our tester achieves 90-95% precision and 80-90% recall in identifying derived models. These results demonstrate the viability of systematic provenance verification in production environments even when only API access is available.

large language model, machine learning, natural language, (21 more...)

2502.00706

Country:

Asia > Singapore (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

PDE-Controller: LLMs for Autoformalization and Reasoning of PDEs

Soroco, Mauricio, Song, Jialin, Xia, Mengzhou, Emond, Kye, Sun, Weiran, Chen, Wuyang

While recent AI-for-math has made strides in pure mathematics, areas of applied mathematics, particularly PDEs, remain underexplored despite their significant real-world applications. We present PDE-Controller, a framework that enables large language models (LLMs) to control systems governed by partial differential equations (PDEs). Our approach enables LLMs to transform informal natural language instructions into formal specifications, and then execute reasoning and planning steps to improve the utility of PDE control. We build a holistic solution comprising datasets (both human-written cases and 2 million synthetic samples), math-reasoning models, and novel evaluation metrics, all of which require significant effort. Our PDE-Controller significantly outperforms prompting the latest open-source and GPT models in reasoning, autoformalization, and program synthesis, achieving up to a 62% improvement in utility gain for PDE control. By bridging the gap between language generation and PDE systems, we demonstrate the potential of LLMs in addressing complex scientific and engineering challenges. We will release all data, model checkpoints, and code at https://pde-controller.github.io/.

large language model, machine learning, natural language, (14 more...)

2502.00963

Genre: Research Report (0.81)

Industry:

Law (0.92)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Ou, Litu, Lapata, Mirella

Context-Aware Hierarchical Merging for Long Document Summarization

Hierarchical Merging is a technique commonly used to summarize very long texts ($>$100K tokens) by breaking down the input into smaller sections, summarizing those sections individually, and then merging or combining those summaries into a final coherent summary. Although it helps address the limitations of large language models (LLMs) with fixed input length constraints, the recursive merging process can amplify LLM hallucinations, increasing the risk of factual inaccuracies. In this paper, we seek to mitigate hallucinations by enriching hierarchical merging with context from the source document. Specifically, we propose different approaches to contextual augmentation ranging from \emph{replacing} intermediate summaries with relevant input context, to \emph{refining} them while using the context as supporting evidence, and \emph{aligning} them implicitly (via citations) to the input. Experimental results on datasets representing legal and narrative domains show that contextual augmentation consistently outperforms zero-shot and hierarchical merging baselines for the Llama 3.1 model family. Our analysis further reveals that refinement methods tend to perform best when paired with extractive summarization for identifying relevant input.

large language model, machine learning, natural language, (20 more...)

2502.00977

Country:

North America > United States > Virginia (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Singapore (0.04)
(17 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Fernandez-Bes, Jesus, Cid-Sueiro, Jesus, Marques, Antonio G.

An MDP Model for Censoring in Harvesting Sensors: Optimal and Approximated Solutions

In this paper, we propose a novel censoring policy for energy-efficient transmissions in energy-harvesting sensors. The problem is formulated as an infinite-horizon Markov Decision Process (MDP). The objective to be optimized is the expected sum of the importance (utility) of all transmitted messages. Assuming that such importance can be evaluated at the transmitting node, we show that, under certain conditions on the battery model, the optimal censoring policy is a threshold function on the importance value. Specifically, messages are transmitted only if their importance is above a threshold whose value depends on the battery level. Exploiting this property, we propose a model-based stochastic scheme that approximates the optimal solution, with less computational complexity and faster convergence speed than a conventional Q-learning algorithm. Numerical experiments in single-hop and multi-hop networks confirm the analytical advantages of the proposed scheme.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

doi: 10.1109/JSAC.2015.2391792

2502.0094

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Spain > Galicia > Madrid (0.05)
Europe > Spain > Canary Islands (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry:

Energy > Energy Storage (1.00)
Electrical Industrial Apparatus (1.00)
Law > Civil Rights & Constitutional Law (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Learning to Plan with Personalized Preferences

Xu, Manjie, Yang, Xinyi, Liang, Wei, Zhang, Chi, Zhu, Yixin

Effective integration of AI agents into daily life requires them to understand and adapt to individual human preferences, particularly in collaborative roles. Although recent studies on embodied intelligence have advanced significantly, they typically adopt generalized approaches that overlook personal preferences in planning. We address this limitation by developing agents that not only learn preferences from few demonstrations but also learn to adapt their planning strategies based on these preferences. Our research leverages the observation that preferences, though implicitly expressed through minimal demonstrations, can generalize across diverse planning scenarios. To systematically evaluate this hypothesis, we introduce Preference-based Planning (PbP) benchmark, an embodied benchmark featuring hundreds of diverse preferences spanning from atomic actions to complex sequences. Our evaluation of SOTA methods reveals that while symbol-based approaches show promise in scalability, significant challenges remain in learning to generate and execute plans that satisfy personalized preferences. We further demonstrate that incorporating learned preferences as intermediate representations in planning significantly improves the agent's ability to construct personalized plans. These findings establish preferences as a valuable abstraction layer for adaptive planning, opening new directions for research in preference-guided plan generation and execution.

large language model, machine learning, natural language, (17 more...)

2502.00858

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology (0.93)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)