AITopics | black-box

Collaborating Authors

black-box

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SupplementaryMaterialforLearningoutsidethe Black-Box: Thepursuitofinterpretablemodels

Neural Information Processing SystemsFeb-10-2026, 11:14:35 GMT

This lemma is a trivial consequence of the definition of Meijer G-functions. The only nontrivial step in the abovereasoning is going from the second to the third line. To speed up the process, the experiments are done by using a restriction ofGH excluding the inverse trigonometric functions as well as some Bessel functions. Also note that, as suggested by LIME,X8, X9 also have an important weight in this polynomial. Wefinishonalast remark on the benefits offered by our projection pursuit approach. Wesee that both the symbolic model and its local approximation take a very concise form when we consider the new variables zk, k =1,...,K.

artificial intelligence, black-box, thepursuitofinterpretablemodel, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Industry: Transportation > Air (0.42)

Technology: Information Technology > Artificial Intelligence (0.49)

Add feedback

Investigating Advanced Reasoning of Large Language Models via Black-Box Interaction

Yin, Congchi, Wu, Tianyi, Shu, Yankai, Gu, Alex, Wang, Yunhan, Shao, Jun, Jiang, Xun, Li, Piji

arXiv.org Artificial IntelligenceAug-27-2025

Existing tasks fall short in evaluating reasoning ability of Large Language Models (LLMs) in an interactive, unknown environment. This deficiency leads to the isolated assessment of deductive, inductive, and abductive reasoning, neglecting the integrated reasoning process that is indispensable for humans discovery of real world. We introduce a novel evaluation paradigm, \textit{black-box interaction}, to tackle this challenge. A black-box is defined by a hidden function that maps a specific set of inputs to outputs. LLMs are required to unravel the hidden function behind the black-box by interacting with it in given exploration turns, and reasoning over observed input-output pairs. Leveraging this idea, we build the \textsc{Oracle} benchmark which comprises 6 types of black-box task and 96 black-boxes. 19 modern LLMs are benchmarked. o3 ranks first in 5 of the 6 tasks, achieving over 70\% accuracy on most easy black-boxes. But it still struggles with some hard black-box tasks, where its average performance drops below 40\%. Further analysis indicates a universal difficulty among LLMs: They lack the high-level planning capability to develop efficient and adaptive exploration strategies for hypothesis refinement.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.19035

Country: Asia (0.27)

Genre: Research Report (1.00)

Industry:

Transportation > Air (1.00)
Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Supplementary Material for Learning outside the Black-Box: The pursuit of interpretable models

Neural Information Processing SystemsAug-16-2025, 13:20:27 GMT

International series in pure and applied mathematics.

meijer g-function, nullnull 2, symbolic model, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > Canada (0.04)

Industry: Transportation > Air (0.44)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.48)
Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

JULI: Jailbreak Large Language Models by Self-Introspection

Wang, Jesson, Hu, Zhanhao, Wagner, David

arXiv.org Artificial IntelligenceAug-8-2025

Large Language Models (LLMs) are trained with safety alignment to prevent generating malicious content. Although some attacks have highlighted vulnerabilities in these safety-aligned LLMs, they typically have limitations, such as necessitating access to the model weights or the generation process. Since proprietary models through API-calling do not grant users such permissions, these attacks find it challenging to compromise them. In this paper, we propose Jailbreaking Using LLM Introspection (JULI), which jailbreaks LLMs by manipulating the token log probabilities, using a tiny plug-in block, BiasNet. JULI relies solely on the knowledge of the target LLM's predicted token log probabilities. It can effectively jailbreak API-calling LLMs under a black-box setting and knowing only top-$5$ token log probabilities. Our approach demonstrates superior effectiveness, outperforming existing state-of-the-art (SOTA) approaches across multiple metrics.

juli, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2505.1179

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Insights on Adversarial Attacks for Tabular Machine Learning via a Systematic Literature Review

Dyrmishi, Salijona, Djilani, Mohamed, Simonetto, Thibault, Ghamizi, Salah, Cordy, Maxime

arXiv.org Artificial IntelligenceJun-19-2025

Adversarial attacks in machine learning have been extensively reviewed in areas like computer vision and NLP, but research on tabular data remains scattered. This paper provides the first systematic literature review focused on adversarial attacks targeting tabular machine learning models. We highlight key trends, categorize attack strategies and analyze how they address practical considerations for real-world applicability. Additionally, we outline current challenges and open research questions. By offering a clear and structured overview, this review aims to guide future efforts in understanding and addressing adversarial vulnerabilities in tabular machine learning.

adversarial example, evolutionary algorithm, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2506.15506

Country:

Asia > Japan (0.04)
Europe > United Kingdom > England > Staffordshire > Keele (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.87)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(3 more...)

Add feedback

Learning outside the Black-Box: The pursuit of interpretable models

Neural Information Processing SystemsOct-11-2024, 10:00:06 GMT

Machine learning has proved its ability to produce accurate models -- but the deployment of these models outside the machine learning community has been hindered by the difficulties of interpreting these models. This paper proposes an algorithm that produces a continuous global interpretation of any given continuous black-box function. Our algorithm employs a variation of projection pursuit in which the ridge functions are chosen to be Meijer G-functions, rather than the usual polynomial splines. Because Meijer G-functions are differentiable in their parameters, we can "tune" the parameters of the representation by gradient descent; as a consequence, our algorithm is efficient. Using five familiar data sets from the UCI repository and two familiar machine learning algorithms, we demonstrate that our algorithm produces global interpretations that are both faithful (highly accurate) and parsimonious (involve a small number of terms).

algorithm, black-box, interpretable model, (3 more...)

Neural Information Processing Systems

Industry: Transportation > Air (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models

Zou, Wei, Geng, Runpeng, Wang, Binghui, Jia, Jinyuan

arXiv.org Artificial IntelligenceFeb-12-2024

Large language models (LLMs) have achieved remarkable success due to their exceptional generative capabilities. Despite their success, they also have inherent limitations such as a lack of up-to-date knowledge and hallucination. Retrieval-Augmented Generation (RAG) is a state-of-the-art technique to mitigate those limitations. In particular, given a question, RAG retrieves relevant knowledge from a knowledge database to augment the input of the LLM. For instance, the retrieved knowledge could be a set of top-k texts that are most semantically similar to the given question when the knowledge database contains millions of texts collected from Wikipedia. As a result, the LLM could utilize the retrieved knowledge as the context to generate an answer for the given question. Existing studies mainly focus on improving the accuracy or efficiency of RAG, leaving its security largely unexplored. We aim to bridge the gap in this work. Particularly, we propose PoisonedRAG , a set of knowledge poisoning attacks to RAG, where an attacker could inject a few poisoned texts into the knowledge database such that the LLM generates an attacker-chosen target answer for an attacker-chosen target question. We formulate knowledge poisoning attacks as an optimization problem, whose solution is a set of poisoned texts. Depending on the background knowledge (e.g., black-box and white-box settings) of an attacker on the RAG, we propose two solutions to solve the optimization problem, respectively. Our results on multiple benchmark datasets and LLMs show our attacks could achieve 90% attack success rates when injecting 5 poisoned texts for each target question into a database with millions of texts. We also evaluate recent defenses and our results show they are insufficient to defend against our attacks, highlighting the need for new defenses.

knowledge database, poisonedrag, target question, (14 more...)

arXiv.org Artificial Intelligence

2402.07867

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > United Kingdom > England (0.04)
North America > United States > Florida > Broward County (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)
Media (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models

Klein, Tassilo, Nabi, Moin

arXiv.org Artificial IntelligenceJan-24-2024

The generation of undesirable and factually incorrect content of large language models poses a significant challenge and remains largely an unsolved issue. This paper studies the integration of a contrastive learning objective for fine-tuning LLMs for implicit knowledge editing and controlled text generation. Optimizing the training objective entails aligning text perplexities in a contrastive fashion. To facilitate training the model in a self-supervised fashion, we leverage an off-the-shelf LLM for training data generation. We showcase applicability in the domain of detoxification. Herein, the proposed approach leads to a significant decrease in the generation of toxic content while preserving general utility for downstream tasks such as commonsense reasoning and reading comprehension. The proposed approach is conceptually simple but empirically powerful.

computational linguistic, llm, toxicity, (15 more...)

arXiv.org Artificial Intelligence

2401.08491

Country:

North America > United States (0.14)
North America > Mexico (0.05)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Genre: Research Report (0.84)

Industry:

Law (0.46)
Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Symbolic Imitation Learning: From Black-Box to Explainable Driving Policies

Sharifi, Iman, Fallah, Saber

arXiv.org Artificial IntelligenceSep-27-2023

Current methods of imitation learning (IL), primarily based on deep neural networks, offer efficient means for obtaining driving policies from real-world data but suffer from significant limitations in interpretability and generalizability. These shortcomings are particularly concerning in safety-critical applications like autonomous driving. In this paper, we address these limitations by introducing Symbolic Imitation Learning (SIL), a groundbreaking method that employs Inductive Logic Programming (ILP) to learn driving policies which are transparent, explainable and generalisable from available datasets. Utilizing the real-world highD dataset, we subject our method to a rigorous comparative analysis against prevailing neural-network-based IL methods. Our results demonstrate that SIL not only enhances the interpretability of driving policies but also significantly improves their applicability across varied driving situations. Hence, this work offers a novel pathway to more reliable and safer autonomous driving systems, underscoring the potential of integrating ILP into the domain of IL.

black-box, explainable driving policy, symbolic imitation learning

arXiv.org Artificial Intelligence

2309.16025

Genre: Research Report > New Finding (0.53)

Industry: Transportation > Air (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.73)

Add feedback

PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection

Guo, Hanqing, Wang, Guangjing, Wang, Yuanda, Chen, Bocheng, Yan, Qiben, Xiao, Li

arXiv.org Artificial IntelligenceSep-13-2023

In this paper, we propose PhantomSound, a query-efficient black-box attack toward voice assistants. Existing black-box adversarial attacks on voice assistants either apply substitution models or leverage the intermediate model output to estimate the gradients for crafting adversarial audio samples. However, these attack approaches require a significant amount of queries with a lengthy training stage. PhantomSound leverages the decision-based attack to produce effective adversarial audios, and reduces the number of queries by optimizing the gradient estimation. In the experiments, we perform our attack against 4 different speech-to-text APIs under 3 real-world scenarios to demonstrate the real-time attack impact. The results show that PhantomSound is practical and robust in attacking 5 popular commercial voice controllable devices over the air, and is able to bypass 3 liveness detection mechanisms with >95% success rate. The benchmark result shows that PhantomSound can generate adversarial examples and launch the attack in a few minutes. We significantly enhance the query efficiency and reduce the cost of a successful untargeted and targeted adversarial attack by 93.1% and 65.5% compared with the state-of-the-art black-box attacks, using merely ~300 queries (~5 minutes) and ~1,500 queries (~25 minutes), respectively.

phantomsound, query-efficient audio adversarial attack, split-second phoneme injection, (1 more...)

arXiv.org Artificial Intelligence

2309.0696

Genre: Research Report (0.89)

Industry:

Transportation > Air (1.00)
Information Technology > Security & Privacy (0.80)
Government > Military (0.80)

Technology:

Information Technology > Security & Privacy (0.80)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.73)

Add feedback