AITopics

Topic model and document-clustering evaluations either use automated metrics that align poorly with human preferences or require expert labels that are intractable to scale. We design a scalable human evaluation protocol and a corresponding automated approximation that reflect practitioners' real-world usage of models. Annotators -- or an LLM-based proxy -- review text items assigned to a topic or cluster, infer a category for the group, then apply that category to other documents. Using this protocol, we collect extensive crowdworker annotations of outputs from a diverse set of topic models on two datasets. We then use these annotations to validate automated proxies, finding that the best LLM proxies are statistically indistinguishable from a human annotator and can therefore serve as a reasonable substitute in automated evaluations. Package, web interface, and data are at https://github.com/ahoho/proxann

annotator, large language model, machine learning, (20 more...)

2507.00828

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Ohio (0.04)
North America > United States > Maryland (0.04)
(25 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Baseball (1.00)
Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Machine LearningJul-2-2025

What Makes Local Updates Effective: The Role of Data Heterogeneity and Smoothness

Patel, Kumar Kshitij

This thesis contributes to the theoretical understanding of local update algorithms, especially Local SGD, in distributed and federated optimization under realistic models of data heterogeneity. A central focus is on the bounded second-order heterogeneity assumption, which is shown to be both necessary and sufficient for local updates to outperform centralized or mini-batch methods in convex and non-convex settings. The thesis establishes tight upper and lower bounds in several regimes for various local update algorithms and characterizes the min-max complexity of multiple problem classes. At its core is a fine-grained consensus-error-based analysis framework that yields sharper finite-time convergence bounds under third-order smoothness and relaxed heterogeneity assumptions. The thesis also extends to online federated learning, providing fundamental regret bounds under both first-order and bandit feedback. Together, these results clarify when and why local updates offer provable advantages, and the thesis serves as a self-contained guide for analyzing Local SGD in heterogeneous environments.

data mining, machine learning, neural information processing system, (19 more...)

arXiv.org Machine Learning

2507.00195

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
South America > Peru > Lima Department > Lima Province > Lima (0.04)
North America > United States > Virginia (0.04)
(3 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Law (0.67)
Government (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
(2 more...)

Bush, Sam, DeLorenzo, Matthew, Tieu, Phat, Rajendran, Jeyavijayan

Free and Fair Hardware: A Pathway to Copyright Infringement-Free Verilog Generation using LLMs

Limitations in Large Language Model (LLM) capabilities for hardware design tasks, such as generating functional Verilog codes, have motivated various fine-tuning optimizations utilizing curated hardware datasets from open-source repositories. However, these datasets remain limited in size and contain minimal checks on licensing for reuse, resulting in potential copyright violations by fine-tuned LLMs. Therefore, we propose an evaluation benchmark to estimate the risk of Verilog-trained LLMs to generate copyright-protected codes. To minimize this risk, we present an open-source Verilog dataset, FreeSet, containing over 220k files, along with the automated dataset curation framework utilized to provide additional guarantees of fair-use Verilog data. We then execute an LLM fine-tuning framework consisting of continual pre-training, resulting in a fine-tuned Llama model for Verilog, FreeV. Our results indicate that FreeV demonstrates the smallest risk of copyright-infringement among prior works, with only a 3% violation rate. Furthermore, experimental results demonstrate improvements in Verilog generation functionality over its baseline model, improving VerilogEval pass@10 rates by over 10%.

large language model, machine learning, natural language, (19 more...)

2505.06096

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.54)

Industry: Law > Intellectual Property & Technology Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Seeking and Updating with Live Visual Knowledge

Fu, Mingyang, Peng, Yuyang, Chen, Dongping, Zhou, Zetong, Liu, Benlin, Wan, Yao, Zhao, Zhou, Yu, Philip S., Krishna, Ranjay

The visual world around us constantly evolves, from real-time news and social media trends to global infrastructure changes visible through satellite imagery and augmented reality enhancements. However, Multimodal Large Language Models (MLLMs), which automate many tasks, struggle to stay current, limited by the cutoff dates in their fixed training datasets. To quantify this stagnation, we introduce LiveVQA, the first-of-its-kind dataset featuring 107,143 samples and 12 categories data specifically designed to support research in both seeking and updating with live visual knowledge. Drawing from recent news articles, video platforms, and academic publications in April 2024-May 2025, LiveVQA enables evaluation of how models handle latest visual information beyond their knowledge boundaries and how current methods help to update them. Our comprehensive benchmarking of 17 state-of-the-art MLLMs reveals significant performance gaps on content beyond knowledge cutoff, and tool-use or agentic visual seeking framework drastically gain an average of 327% improvement. Furthermore, we explore parameter-efficient fine-tuning (PEFT) methods to update MLLMs with new visual knowledge. We dive deeply to the critical balance between adapter capacity and model capability when updating MLLMs with new visual knowledge. All the experimental dataset and source code are publicly available at: https://livevqa.github.io.

large language model, machine learning, question answering, (23 more...)

2504.05288

Country: North America > United States > California > Los Angeles County (0.27)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Media > News (1.00)
Media > Film (1.00)
Leisure & Entertainment (1.00)
(5 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(6 more...)

Grandury, María, Aula-Blasco, Javier, Falcão, Júlia, Fourrier, Clémentine, González, Miguel, Martínez, Gonzalo, Santamaría, Gonzalo, Agerri, Rodrigo, Aldama, Nuria, Chiruzzo, Luis, Conde, Javier, Gómez, Helena, Guerrero, Marta, Ivetta, Guido, López, Natalia, Plaza-del-Arco, Flor Miriam, Martín-Valdivia, María Teresa, Montoro, Helena, Muñoz, Carmen, Reviriego, Pedro, Rosado, Leire, Vaca, Alejandro, Vallecillo-Rodríguez, María Estrella, Vallego, Jorge, Zubiaga, Irune

La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America

Leaderboards showcase the current capabilities and limitations of Large Language Models (LLMs). To motivate the development of LLMs that represent the linguistic and cultural diversity of the Spanish-speaking community, we present La Leaderboard, the first open-source leaderboard to evaluate generative LLMs in languages and language varieties of Spain and Latin America. La Leaderboard is a community-driven project that aims to establish an evaluation standard for everyone interested in developing LLMs for the Spanish-speaking community. This initial version combines 66 datasets in Basque, Catalan, Galician, and different Spanish varieties, showcasing the evaluation results of 50 models. To encourage community-driven development of leaderboards in other languages, we explain our methodology, including guidance on selecting the most suitable evaluation setup for each downstream task. In particular, we provide a rationale for using fewer few-shot examples than typically found in the literature, aiming to reduce environmental impact and facilitate access to reproducible results for a broader research community.

large language model, machine learning, natural language, (19 more...)

2507.00999

Country:

South America (1.00)
North America (1.00)
Europe > Spain (1.00)
Asia > Middle East > UAE (0.46)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Education (0.94)
Information Technology > Security & Privacy (0.94)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Enhancing LLM Agent Safety via Causal Influence Prompting

Hahm, Dongyoon, Jin, Woogyeol, Choi, June Suk, Ahn, Sungsoo, Lee, Kimin

As autonomous agents powered by large language models (LLMs) continue to demonstrate potential across various assistive tasks, ensuring their safe and reliable behavior is crucial for preventing unintended consequences. In this work, we introduce CIP, a novel technique that leverages causal influence diagrams (CIDs) to identify and mitigate risks arising from agent decision-making. CIDs provide a structured representation of cause-and-effect relationships, enabling agents to anticipate harmful outcomes and make safer decisions. Our approach consists of three key steps: (1) initializing a CID based on task specifications to outline the decision-making process, (2) guiding agent interactions with the environment using the CID, and (3) iteratively refining the CID based on observed behaviors and outcomes. Experimental results demonstrate that our method effectively enhances safety in both code execution and mobile device control tasks.

large language model, machine learning, natural language, (21 more...)

2507.00979

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)
(2 more...)

Natural language processing for African languages

Adelani, David Ifeoluwa

Recent advances in word embeddings and language models use large-scale, unlabelled data and self-supervised learning to boost NLP performance. Multilingual models, often trained on web-sourced data like Wikipedia, face challenges: few low-resource languages are included, their data is often noisy, and lack of labeled datasets makes it hard to evaluate performance outside high-resource languages like English. In this dissertation, we focus on languages spoken in Sub-Saharan Africa where all the indigenous languages in this region can be regarded as low-resourced in terms of the availability of labelled data for NLP tasks and unlabelled data found on the web. We analyse the noise in the publicly available corpora, and curate a high-quality corpus, demonstrating that the quality of semantic representations learned in word embeddings does not only depend on the amount of data but on the quality of pre-training data. We demonstrate empirically the limitations of word embeddings, and the opportunities the multilingual pre-trained language model (PLM) offers especially for languages unseen during pre-training and low-resource scenarios. We further study how to adapt and specialize multilingual PLMs to unseen African languages using a small amount of monolingual texts. To address the under-representation of the African languages in NLP research, we developed large scale human-annotated labelled datasets for 21 African languages in two impactful NLP tasks: named entity recognition and machine translation. We conduct an extensive empirical evaluation using state-of-the-art methods across supervised, weakly-supervised, and transfer learning settings.

artificial intelligence, large language model, machine learning, (23 more...)

2507.00297

Country:

Africa > Middle East (1.00)
Africa > Nigeria (0.93)
Asia > Middle East (0.92)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > News (1.00)
Information Technology (1.00)
Government > Regional Government (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

ROSE: Toward Reality-Oriented Safety Evaluation of Large Language Models

Ding, Jiale, Zheng, Xiang, Wang, Cong, Lee, Wei-Bin, Ma, Xingjun, Jiang, Yu-Gang

As Large Language Models (LLMs) are increasingly deployed as black-box components in real-world applications, evaluating their safety-especially under adversarial prompting-has become critical. Arguably, effective safety evaluations should be adaptive, evolving with LLM capabilities, and also cover a broad spectrum of harmful topics and real-world scenarios to fully expose potential vulnerabilities. Existing manual safety benchmarks, built on handcrafted adversarial prompts, are limited by their static nature and the intensive labor required to update them, making it difficult to keep pace with rapidly advancing LLMs. In contrast, automated adversarial prompt generation offers a promising path toward adaptive evaluation. However, current methods often suffer from insufficient adversarial topic coverage (topic-level diversity) and weak alignment with real-world contexts. These shortcomings stem from the exploration-exploitation dilemma in black-box optimization and a lack of real-world contextualization, resulting in adversarial prompts that are both topically narrow and scenario-repetitive. To address these issues, we propose Reality-Oriented Safety Evaluation (ROSE), a novel framework that uses multi-objective reinforcement learning to fine-tune an adversarial LLM for generating topically diverse and contextually rich adversarial prompts. Experiments show that ROSE outperforms existing methods in uncovering safety vulnerabilities in state-of-the-art LLMs, with notable improvements in integrated evaluation metrics. We hope ROSE represents a step toward more practical and reality-oriented safety evaluation of LLMs. WARNING: This paper contains examples of potentially harmful text.

large language model, machine learning, natural language, (16 more...)

2507.00026

Country: Asia (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Law Enforcement & Public Safety (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Ye, Linfeng, Hamidi, Shayan Mohajer, Yang, En-hui

Towards Undistillable Models by Minimizing Conditional Mutual Information

A deep neural network (DNN) is said to be undistillable if, when used as a black-box input-output teacher, it cannot be distilled through knowledge distillation (KD). In this case, the distilled student (referred to as the knockoff student) does not outperform a student trained independently with label smoothing (LS student) in terms of prediction accuracy. To protect intellectual property of DNNs, it is desirable to build undistillable DNNs. To this end, it is first observed that an undistillable DNN may have the trait that each cluster of its output probability distributions in response to all sample instances with the same label should be highly concentrated to the extent that each cluster corresponding to each label should ideally collapse into one probability distribution. Based on this observation and by measuring the concentration of each cluster in terms of conditional mutual information (CMI), a new training method called CMI minimized (CMIM) method is proposed, which trains a DNN by jointly minimizing the conventional cross entropy (CE) loss and the CMI values of all temperature scaled clusters across the entire temperature spectrum. The resulting CMIM model is shown, by extensive experiments, to be undistillable by all tested KD methods existing in the literature. That is, the knockoff students distilled by these KD methods from the CMIM model underperform the respective LS students. In addition, the CMIM model is also shown to performs better than the model trained with the CE loss alone in terms of their own prediction accuracy.

artificial intelligence, machine learning, student, (19 more...)

2507.00012

Country:

North America (0.28)
Europe (0.28)

Genre: Research Report (1.00)

Industry:

Education (0.68)
Law > Intellectual Property & Technology Law (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Al JazeeraJul-1-2025, 17:20:39 GMT

UN report lists companies complicit in Israel's 'genocide': Who are they?

The United Nations special rapporteur on the situation of human rights in the occupied Palestinian territory (oPt) has released a new report mapping the corporations aiding Israel in the displacement of Palestinians and its genocidal war on Gaza, in breach of international law. Francesca Albanese's latest report, which is scheduled to be presented at a news conference in Geneva on Thursday, names 48 corporate actors, including United States tech giants Microsoft, Alphabet Inc. – Google's parent company – and Amazon. A database of more than 1000 corporate entities was also put together as part of the investigation. "[Israel's] forever-occupation has become the ideal testing ground for arms manufacturers and Big Tech – providing significant supply and demand, little oversight, and zero accountability – while investors and private and public institutions profit freely," the report said. "Companies are no longer merely implicated in occupation – they may be embedded in an economy of genocide," it said, in a reference to Israel's ongoing assault on the Gaza Strip.

genocide, israel, occupation, (13 more...)

Al Jazeera

Country:

Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.27)
North America > United States (0.25)
South America > Colombia (0.05)
(8 more...)

Genre: Research Report (0.54)

Industry:

Information Technology (1.00)
Banking & Finance > Trading (0.99)
Law > International Law (0.92)
Government > Military (0.91)

Technology: Information Technology > Artificial Intelligence (0.97)