AITopics

We present the iNaturalist Sounds Dataset (iNatSounds), a collection of 230,000 audio files capturing sounds from over 5,500 species, contributed by more than 27,000 recordists worldwide. The dataset encompasses sounds from birds, mammals, insects, reptiles, and amphibians, with audio and species labels derived from observations submitted to iNaturalist, a global citizen science platform. Each recording in the dataset varies in length and includes a single species annotation. We benchmark multiple backbone architectures, comparing multiclass classification objectives with multilabel objectives. Despite weak labeling, we demonstrate that iNatSounds serves as a useful pretraining resource by benchmarking it on strongly labeled downstream evaluation datasets. The dataset is available as a single, freely accessible archive, promoting accessibility and research in this important domain. We envision models trained on this data powering next-generation public engagement applications, and assisting biologists, ecologists, and land use managers in processing large audio collections, thereby contributing to the understanding of species compositions in diverse soundscapes.

artificial intelligence, machine learning, natural language, (19 more...)

2506.00343

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (0.46)
Law (0.34)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Ward, Chris M., Harguess, Josh

Adversarial Threat Vectors and Risk Mitigation for Retrieval-Augmented Generation Systems

Retrieval-Augmented Generation (RAG) systems, which integrate Large Language Models (LLMs) with external knowledge sources, are vulnerable to a range of adversarial attack vectors. This paper examines the importance of RAG systems through recent industry adoption trends and identifies the prominent attack vectors for RAG: prompt injection, data poisoning, and adversarial query manipulation. We analyze these threats under risk management lens, and propose robust prioritized control list that includes risk-mitigating actions like input validation, adversarial training, and real-time monitoring.

large language model, machine learning, natural language, (18 more...)

doi: 10.1117/12.3055931

2506.00281

Country: North America > United States (1.00)

Genre: Research Report (0.84)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.94)
Government > Regional Government > North America Government > United States Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Sharma, Aasish Kumar, Kyosev, Dimitar, Kunkel, Julian

Ethical AI: Towards Defining a Collective Evaluation Framework

Artificial Intelligence (AI) is transforming sectors such as healthcare, finance, and autonomous systems, offering powerful tools for innovation. Yet its rapid integration raises urgent ethical concerns related to data ownership, privacy, and systemic bias. Issues like opaque decision-making, misleading outputs, and unfair treatment in high-stakes domains underscore the need for transparent and accountable AI systems. This article addresses these challenges by proposing a modular ethical assessment framework built on ontological blocks of meaning-discrete, interpretable units that encode ethical principles such as fairness, accountability, and ownership. By integrating these blocks with FAIR (Findable, Accessible, Interoperable, Reusable) principles, the framework supports scalable, transparent, and legally aligned ethical evaluations, including compliance with the EU AI Act. Using a real-world use case in AI-powered investor profiling, the paper demonstrates how the framework enables dynamic, behavior-informed risk classification. The findings suggest that ontological blocks offer a promising path toward explainable and auditable AI ethics, though challenges remain in automation and probabilistic reasoning.

ai system, artificial intelligence, ontological block, (14 more...)

2506.00233

Country: Europe (0.46)

Genre: Research Report > New Finding (0.34)

Industry:

Law (1.00)
Information Technology (1.00)
Banking & Finance (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Bottom-Up Perspectives on AI Governance: Insights from User Reviews of AI Products

Pasch, Stefan

With the growing importance of AI governance, numerous high - level frameworks and principles have been articulated by policymakers, institutions, and expert communities to guide the development and application of AI . While such frameworks offer valuable normative orientation, they may not fully capture the practical concerns of those who interact with AI systems in organizational and operational contexts. To address this gap, this study adopts a bottom - up approach to explore how governance - relevant themes are expressed in user discourse. Drawing on over 100,000 user reviews of AI products from G2.com, we apply BERTopic to extract latent themes and identify those most semantically related to AI governance. The analysis reveals a diverse set of governance - relevant topics spanning both technical and non - technical domains. These include concerns across organizational processes -- such as planning, coordination, and communication -- as well as stages of the AI value chain, includ ing deployment infrastructure, data handling, and analytics. The findings show considerable overlap with institutional AI governance and ethics frameworks on issues like privacy and transparency, but also surface overlooked areas such as project management, strategy development, and customer interaction. This highlights the need for more empirically grounded, user - centered approaches to AI governance -- approaches that complement normative models by capturing how governance unfolds in applied settings . By foregrounding how governance is enacted in practice, this study contributes to more inclusive and operationally grounded approaches to AI governance and digital policy.

governance, machine learning, natural language, (22 more...)

2506.0008

Country: Europe (0.93)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Communications > Social Media (0.95)
(4 more...)

Neural Information Processing SystemsJun-2-2025, 13:37:14 GMT

Review for NeurIPS paper: Predictive coding in balanced neural networks with noise, chaos and delays

This paper theoretically investigates and experimentally verifies properties of predictive coding network. Although limited to simple representations, the mathematical analysis has impressed the reviewers. The AC thus recommends acceptance of this work.

balanced neural network, chaos and delay, neurips paper, (2 more...)

Neural Information Processing Systems

Industry: Law > Litigation (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

BBC NewsJun-2-2025, 06:22:46 GMT

The AI copyright standoff continues - with no solution in sight

The argument is over how best to balance the demands of two huge industries: the tech and creative sectors. More specifically, it's about the fairest way to allow AI developers access to creative content in order to make better AI tools - without undermining the livelihoods of the people who make that content in the first place. What's sparked it is the uninspiringly-titled Data (Use and Access) Bill. This proposed legislation was broadly expected to finish its long journey through parliament this week and sail off into the law books. Instead, it is currently stuck in limbo, ping-ponging between the House of Lords and the House of Commons.

standoff continue

BBC News

Country: North America > United States (0.20)

Industry:

Law > Statutes (1.00)
Government > Regional Government (0.81)

Technology: Information Technology > Artificial Intelligence (0.80)

Rabanser, Stephan, Shamsabadi, Ali Shahin, Franzese, Olive, Wang, Xiao, Weller, Adrian, Papernot, Nicolas

Confidential Guardian: Cryptographically Prohibiting the Abuse of Model Abstention

arXiv.org Machine LearningJun-2-2025

Cautious predictions -- where a machine learning model abstains when uncertain -- are crucial for limiting harmful errors in safety-critical applications. In this work, we identify a novel threat: a dishonest institution can exploit these mechanisms to discriminate or unjustly deny services under the guise of uncertainty. We demonstrate the practicality of this threat by introducing an uncertainty-inducing attack called Mirage, which deliberately reduces confidence in targeted input regions, thereby covertly disadvantaging specific individuals. At the same time, Mirage maintains high predictive performance across all data points. To counter this threat, we propose Confidential Guardian, a framework that analyzes calibration metrics on a reference dataset to detect artificially suppressed confidence. Additionally, it employs zero-knowledge proofs of verified inference to ensure that reported confidence scores genuinely originate from the deployed model. This prevents the provider from fabricating arbitrary model confidence values while protecting the model's proprietary details. Our results confirm that Confidential Guardian effectively prevents the misuse of cautious predictions, providing verifiable assurances that abstention reflects genuine model uncertainty rather than malicious intent.

artificial intelligence, machine learning, uncertainty region, (15 more...)

arXiv.org Machine Learning

2505.23968

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(7 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

arXiv.org Machine LearningJun-2-2025

M-learner:A Flexible And Powerful Framework To Study Heterogeneous Treatment Effect In Mediation Model

Li, Xingyu, Liu, Qing, Jiang, Tony, Xia, Hong Amy, Hobbs, Brian P., Wei, Peng

We propose a novel method, termed the M-learner, for estimating heterogeneous indirect and total treatment effects and identifying relevant subgroups within a mediation framework. The procedure comprises four key steps. First, we compute individual-level conditional average indirect/total treatment effect Second, we construct a distance matrix based on pairwise differences. Third, we apply tSNE to project this matrix into a low-dimensional Euclidean space, followed by K-means clustering to identify subgroup structures. Finally, we calibrate and refine the clusters using a threshold-based procedure to determine the optimal configuration. To the best of our knowledge, this is the first approach specifically designed to capture treatment effect heterogeneity in the presence of mediation. Experimental results validate the robustness and effectiveness of the proposed framework. Application to the real-world Jobs II dataset highlights the broad adaptability and potential applicability of our method.Code is available at https: //anonymous.4open.science/r/M-learner-C4BB.

artificial intelligence, machine learning, treatment effect, (17 more...)

arXiv.org Machine Learning

2505.17917

Country:

North America > United States > Texas (0.04)
North America > Greenland (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.94)
Law > Alternative Dispute Resolution (0.85)
Education (0.68)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

arXiv.org Artificial IntelligenceJun-2-2025

Mind the Quote: Enabling Quotation-Aware Dialogue in LLMs via Plug-and-Play Modules

Zhang, Yueqi, Yuan, Peiwen, Feng, Shaoxiong, Li, Yiwei, Wang, Xinglin, Shi, Jiayi, Tan, Chuyi, Pan, Boyuan, Hu, Yao, Li, Kan

Human-AI conversation frequently relies on quoting earlier text-"check it with the formula I just highlighted"-yet today's large language models (LLMs) lack an explicit mechanism for locating and exploiting such spans. We formalise the challenge as span-conditioned generation, decomposing each turn into the dialogue history, a set of token-offset quotation spans, and an intent utterance. Building on this abstraction, we introduce a quotation-centric data pipeline that automatically synthesises task-specific dialogues, verifies answer correctness through multi-stage consistency checks, and yields both a heterogeneous training corpus and the first benchmark covering five representative scenarios. To meet the benchmark's zero-overhead and parameter-efficiency requirements, we propose QuAda, a lightweight training-based method that attaches two bottleneck projections to every attention head, dynamically amplifying or suppressing attention to quoted spans at inference time while leaving the prompt unchanged and updating < 2.8% of backbone weights. Experiments across models show that QuAda is suitable for all scenarios and generalises to unseen topics, offering an effective, plug-and-play solution for quotation-aware dialogue.

large language model, machine learning, natural language, (18 more...)

2505.24292

Country:

Africa > Middle East > Egypt (0.46)
Asia > Middle East > Israel (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Government > Regional Government (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Dam, Harvey, Knochelmann, Jonas, Joseph, Vinu, Gopalakrishnan, Ganesh

Derailing Non-Answers via Logit Suppression at Output Subspace Boundaries in RLHF-Aligned Language Models

arXiv.org Artificial IntelligenceJun-2-2025

We introduce a method to reduce refusal rates of large language models (LLMs) on sensitive content without modifying model weights or prompts. Motivated by the observation that refusals in certain models were often preceded by the specific token sequence of a token marking the beginning of the chain-of-thought (CoT) block () followed by a double newline token (\n\n), we investigate the impact of two simple formatting adjustments during generation: suppressing \n\n after and suppressing the end-of-sequence token after the end of the CoT block (). Our method requires no datasets, parameter changes, or training, relying solely on modifying token probabilities during generation. In our experiments with official DeepSeek-R1 distillations, these interventions increased the proportion of substantive answers to sensitive prompts without affecting performance on standard benchmarks. Our findings suggest that refusal behaviors can be circumvented by blocking refusal subspaces at specific points in the generation process.

intervention, large language model, machine learning, (15 more...)

2505.23848

Country: Asia > China (0.29)

Genre: Research Report > New Finding (1.00)

Industry: Law > Civil Rights & Constitutional Law (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)