target information
- Europe > Sweden > Stockholm > Stockholm (0.05)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (13 more...)
Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings
Bean, Andrew M., Seedat, Nabeel, Chen, Shengzhuang, Schwarz, Jonathan Richard
The prohibitive cost of evaluating large language models (LLMs) on comprehensive benchmarks necessitates the creation of small yet representative data subsets (i.e., tiny benchmarks) that enable efficient assessment while retaining predictive fidelity. Current methods for this task operate under a model-centric paradigm, selecting benchmarking items based on the collective performance of existing models. Such approaches are limited by large upfront costs, an inability to immediately handle new benchmarks (`cold-start'), and the fragile assumption that future models will share the failure patterns of their predecessors. In this work, we challenge this paradigm and propose a item-centric approach to benchmark subset selection, arguing that selection should be based on the intrinsic properties of the task items themselves, rather than on model-specific failure patterns. We instantiate this item-centric efficient benchmarking approach via a novel method, Scales++, where data selection is based on the cognitive demands of the benchmark samples. Empirically, we show Scales++ reduces the upfront selection cost by over 18x while achieving competitive predictive fidelity. On the Open LLM Leaderboard, using just a 0.5\% data subset, we predict full benchmark scores with a 2.9% mean absolute error. We demonstrate that this item-centric approach enables more efficient model evaluation without significant fidelity degradation, while also providing better cold-start performance and more interpretable benchmarking.
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
- Europe > Sweden > Stockholm > Stockholm (0.05)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (13 more...)
Significance difficulty of the We will add the following to The relies on neural data inputs not
ASIC implementations could offer substantial power and mobility benefits. Wavelet features show superior results for all the decoders (Example: Rebuttal Figure 1c). We will add the following to sec.2: Integrating both state and neural information in this way leads to smoother predictions (Figure 4a). (Zhang 2017).
Improving Causal Interventions in Amnesic Probing with Mean Projection or LEACE
Dobrzeniecka, Alicja, Fokkens, Antske, Sommerauer, Pia
Amnesic probing is a technique used to examine the influence of specific linguistic information on the behaviour of a model. This involves identifying and removing the relevant information and then assessing whether the model's performance on the main task changes. If the removed information is relevant, the model's performance should decline. The difficulty with this approach lies in removing only the target information while leaving other information unchanged. It has been shown that Iterative Nullspace Projection (INLP), a widely used removal technique, introduces random modifications to representations when eliminating target information. We demonstrate that Mean Projection (MP) and LEACE, two proposed alternatives, remove information in a more targeted manner, thereby enhancing the potential for obtaining behavioural explanations through Amnesic Probing.
- Asia > Middle East > Israel (0.04)
- Europe > Poland > Masovia Province > Warsaw (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Data Science (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
XPath Agent: An Efficient XPath Programming Agent Based on LLM for Web Crawler
Li, Yu, Wang, Bryce, Luan, Xinyu
We present XPath Agent, a production-ready XPath programming agent specifically designed for web crawling and web GUI testing. A key feature of XPath Agent is its ability to automatically generate XPath queries from a set of sampled web pages using a single natural language query. To demonstrate its effectiveness, we benchmark XPath Agent against a state-of-the-art XPath programming agent across a range of web crawling tasks. Our results show that XPath Agent achieves comparable performance metrics while significantly reducing token usage and improving clock-time efficiency. The well-designed two-stage pipeline allows for seamless integration into existing web crawling or web GUI testing workflows, thereby saving time and effort in manual XPath query development. The source code for XPath Agent is available at https://github.com/eavae/feilian.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Middle East > Malta (0.04)
- Information Technology > Information Management > Search (1.00)
- Information Technology > Data Science > Data Mining > Web Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
A Target-Aware Analysis of Data Augmentation for Hate Speech Detection
Casula, Camilla, Tonelli, Sara
Hate speech is one of the main threats posed by the widespread use of social networks, despite efforts to limit it. Although attention has been devoted to this issue, the lack of datasets and case studies centered around scarcely represented phenomena, such as ableism or ageism, can lead to hate speech detection systems that do not perform well on underrepresented identity groups. Given the unpreceded capabilities of LLMs in producing high-quality data, we investigate the possibility of augmenting existing data with generative language models, reducing target imbalance. We experiment with augmenting 1,000 posts from the Measuring Hate Speech corpus, an English dataset annotated with target identity information, adding around 30,000 synthetic examples using both simple data augmentation methods and different types of generative models, comparing autoregressive and sequence-to-sequence approaches. We find traditional DA methods to often be preferable to generative models, but the combination of the two tends to lead to the best results. Indeed, for some hate categories such as origin, religion, and disability, hate speech classification using augmented data for training improves by more than 10% F1 over the no augmentation baseline. This work contributes to the development of systems for hate speech detection that are not only better performing but also fairer and more inclusive towards targets that have been neglected so far.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- (12 more...)
- Information Technology > Security & Privacy (0.46)
- Information Technology > Services (0.34)
Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell
Lu, Taiming, Gao, Muhan, Yu, Kuai, Byerly, Adam, Khashabi, Daniel
Large Language Models (LLMs) exhibit positional bias, struggling to utilize information from the middle or end of long contexts. Our study explores LLMs' long-context reasoning by probing their hidden representations. We find that while LLMs encode the position of target information, they often fail to leverage this in generating accurate responses. This reveals a disconnect between information retrieval and utilization, a "know but don't tell" phenomenon. We further analyze the relationship between extraction time and final accuracy, offering insights into the underlying mechanics of transformer models.
- Europe > Germany (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
SNAP: Unlearning Selective Knowledge in Large Language Models with Negative Instructions
Choi, Minseok, Rim, Daniel, Lee, Dohyun, Choo, Jaegul
Instruction-following large language models (LLMs), such as ChatGPT, have become increasingly popular with the general audience, many of whom are incorporating them into their daily routines. However, these LLMs inadvertently disclose personal or copyrighted information, which calls for a machine unlearning method to remove selective knowledge. Previous attempts sought to forget the link between the target information and its associated entities, but it rather led to generating undesirable responses about the target, compromising the end-user experience. In this work, we propose SNAP, an innovative framework designed to selectively unlearn information by 1) training an LLM with negative instructions to generate obliterated responses, 2) augmenting hard positives to retain the original LLM performance, and 3) applying the novel Wasserstein regularization to ensure adequate deviation from the initial weights of the LLM. We evaluate our framework on various NLP benchmarks and demonstrate that our approach retains the original LLM capabilities, while successfully unlearning the specified information.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Africa (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- (12 more...)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Education (1.00)
- (3 more...)