AITopics

Technology: Information Technology > Artificial Intelligence > Natural Language (0.60)

Neural Information Processing SystemsFeb-11-2026, 15:46:40 GMT

1e0feeaff84a19bf3936e693311fa66d-AuthorFeedback.pdf

decoder, information, neural data, (10 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)

Neural Information Processing SystemsFeb-8-2026, 01:36:58 GMT

2d4027d6df9c0256b8d4474ce88f8c88-Paper.pdf

construction, international conference, proceedings, (14 more...)

Country:

Europe > Sweden > Stockholm > Stockholm (0.05)
Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(13 more...)

Genre: Research Report (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Bean, Andrew M., Seedat, Nabeel, Chen, Shengzhuang, Schwarz, Jonathan Richard

Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings

arXiv.org Artificial IntelligenceOct-31-2025

The prohibitive cost of evaluating large language models (LLMs) on comprehensive benchmarks necessitates the creation of small yet representative data subsets (i.e., tiny benchmarks) that enable efficient assessment while retaining predictive fidelity. Current methods for this task operate under a model-centric paradigm, selecting benchmarking items based on the collective performance of existing models. Such approaches are limited by large upfront costs, an inability to immediately handle new benchmarks (`cold-start'), and the fragile assumption that future models will share the failure patterns of their predecessors. In this work, we challenge this paradigm and propose a item-centric approach to benchmark subset selection, arguing that selection should be based on the intrinsic properties of the task items themselves, rather than on model-specific failure patterns. We instantiate this item-centric efficient benchmarking approach via a novel method, Scales++, where data selection is based on the cognitive demands of the benchmark samples. Empirically, we show Scales++ reduces the upfront selection cost by over 18x while achieving competitive predictive fidelity. On the Open LLM Leaderboard, using just a 0.5\% data subset, we predict full benchmark scores with a 2.9% mean absolute error. We demonstrate that this item-centric approach enables more efficient model evaluation without significant fidelity degradation, while also providing better cold-start performance and more interpretable benchmarking.

large language model, machine learning, natural language, (10 more...)

2510.26384

Country: Europe (0.67)

Genre: Research Report (0.85)

Industry:

Media (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)

Neural Information Processing SystemsOct-3-2025, 04:27:24 GMT

Brick-by-Brick: Combinatorial Construction with Deep Reinforcement Learning

We demonstrate that the proposed method successfully learns to construct an unseen object conditioned on a single image or multiple views of a target object.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Country:

Europe > Sweden > Stockholm > Stockholm (0.05)
Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(13 more...)

Genre: Research Report (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Neural Information Processing SystemsOct-2-2025, 08:01:57 GMT

Significance difficulty of the We will add the following to The relies on neural data inputs not

ASIC implementations could offer substantial power and mobility benefits. Wavelet features show superior results for all the decoders (Example: Rebuttal Figure 1c). We will add the following to sec.2: Integrating both state and neural information in this way leads to smoother predictions (Figure 4a). (Zhang 2017).

artificial intelligence, machine learning, neural data, (13 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)

Dobrzeniecka, Alicja, Fokkens, Antske, Sommerauer, Pia

Improving Causal Interventions in Amnesic Probing with Mean Projection or LEACE

arXiv.org Artificial IntelligenceJun-16-2025

Amnesic probing is a technique used to examine the influence of specific linguistic information on the behaviour of a model. This involves identifying and removing the relevant information and then assessing whether the model's performance on the main task changes. If the removed information is relevant, the model's performance should decline. The difficulty with this approach lies in removing only the target information while leaving other information unchanged. It has been shown that Iterative Nullspace Projection (INLP), a widely used removal technique, introduces random modifications to representations when eliminating target information. We demonstrate that Mean Projection (MP) and LEACE, two proposed alternatives, remove information in a more targeted manner, thereby enhancing the potential for obtaining behavioural explanations through Amnesic Probing.

information, machine learning, natural language, (20 more...)

2506.11673

Country: Europe (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

arXiv.org Artificial IntelligenceDec-17-2024

XPath Agent: An Efficient XPath Programming Agent Based on LLM for Web Crawler

Li, Yu, Wang, Bryce, Luan, Xinyu

We present XPath Agent, a production-ready XPath programming agent specifically designed for web crawling and web GUI testing. A key feature of XPath Agent is its ability to automatically generate XPath queries from a set of sampled web pages using a single natural language query. To demonstrate its effectiveness, we benchmark XPath Agent against a state-of-the-art XPath programming agent across a range of web crawling tasks. Our results show that XPath Agent achieves comparable performance metrics while significantly reducing token usage and improving clock-time efficiency. The well-designed two-stage pipeline allows for seamless integration into existing web crawling or web GUI testing workflows, thereby saving time and effort in manual XPath query development. The source code for XPath Agent is available at https://github.com/eavae/feilian.

information, target information, xpath query, (13 more...)

2502.15688

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Middle East > Malta (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining > Web Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Casula, Camilla, Tonelli, Sara

A Target-Aware Analysis of Data Augmentation for Hate Speech Detection

arXiv.org Artificial IntelligenceOct-10-2024

Hate speech is one of the main threats posed by the widespread use of social networks, despite efforts to limit it. Although attention has been devoted to this issue, the lack of datasets and case studies centered around scarcely represented phenomena, such as ableism or ageism, can lead to hate speech detection systems that do not perform well on underrepresented identity groups. Given the unpreceded capabilities of LLMs in producing high-quality data, we investigate the possibility of augmenting existing data with generative language models, reducing target imbalance. We experiment with augmenting 1,000 posts from the Measuring Hate Speech corpus, an English dataset annotated with target identity information, adding around 30,000 synthetic examples using both simple data augmentation methods and different types of generative models, comparing autoregressive and sequence-to-sequence approaches. We find traditional DA methods to often be preferable to generative models, but the combination of the two tends to lead to the best results. Indeed, for some hate categories such as origin, religion, and disability, hate speech classification using augmented data for training improves by more than 10% F1 over the no augmentation baseline. This work contributes to the development of systems for hate speech detection that are not only better performing but also fairer and more inclusive towards targets that have been neglected so far.

computational linguistic, information, proceedings, (15 more...)

2410.08053

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
(12 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (0.46)
Information Technology > Services (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

arXiv.org Artificial IntelligenceJun-20-2024

Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell

Lu, Taiming, Gao, Muhan, Yu, Kuai, Byerly, Adam, Khashabi, Daniel

Large Language Models (LLMs) exhibit positional bias, struggling to utilize information from the middle or end of long contexts. Our study explores LLMs' long-context reasoning by probing their hidden representations. We find that while LLMs encode the position of target information, they often fail to leverage this in generating accurate responses. This reveals a disconnect between information retrieval and utilization, a "know but don't tell" phenomenon. We further analyze the relationship between extraction time and final accuracy, offering insights into the underlying mechanics of transformer models.

accuracy, classifier, information, (16 more...)

2406.14673

Country:

Europe > Germany (0.04)
Asia > Singapore (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)