malt
Semantic categories of artifacts and animals reflect efficient coding
Zaslavsky, Noga, Regier, Terry, Tishby, Naftali, Kemp, Charles
It has been argued that semantic categories across languages reflect pressure for efficient communication. Recently, this idea has been cast in terms of a general information-theoretic principle of efficiency, the Information Bottleneck (IB) principle, and it has been shown that this principle accounts for the emergence and evolution of named color categories across languages, including soft structure and patterns of inconsistent naming. However, it is not yet clear to what extent this account generalizes to semantic domains other than color. Here we show that it generalizes to two qualitatively different semantic domains: names for containers, and for animals. First, we show that container naming in Dutch and French is near-optimal in the IB sense, and that IB broadly accounts for soft categories and inconsistent naming patterns in both languages. Second, we show that a hierarchy of animal categories derived from IB captures cross-linguistic tendencies in the growth of animal taxonomies. Taken together, these findings suggest that fundamental information-theoretic principles of efficient coding may shape semantic categories across languages and across domains.
- North America > United States > California > Alameda County > Berkeley (0.14)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- Oceania > Australia (0.04)
- (4 more...)
MALT: Mechanistic Ablation of Lossy Translation in LLMs for a Low-Resource Language: Urdu
LLMs are predominantly trained on English data, which leads to a significant drop in performance on low-resource languages. Understanding how LLMs handle these languages is crucial for improving their effectiveness. This study focuses on Urdu as a use case for exploring the challenges faced by LLMs in processing low-resource languages. LLMs primarily reason in English when prompted in another language, with the final layers acting as translators to convert the English response into the target language. This study finds that even for low-resource languages, the internal latent response of LLMs in English is quite coherent; however, the translation features are lossy and result in poor translations, leading to reduced performance. By mechanistically removing these translation features and using a separate translation model to translate the internal latent response of LLM, the performance of LLMs improves significantly while also preserving the cultural nuances of the input in low-resource languages.
MALT: Improving Reasoning with Multi-Agent LLM Training
Motwani, Sumeet Ramesh, Smith, Chandler, Das, Rocktim Jyoti, Rybchuk, Markian, Torr, Philip H. S., Laptev, Ivan, Pizzati, Fabio, Clark, Ronald, de Witt, Christian Schroeder
Enabling effective collaboration among LLMs is a crucial step toward developing autonomous systems capable of solving complex problems. While LLMs are typically used as single-model generators, where humans critique and refine their outputs, the potential for jointly-trained collaborative models remains largely unexplored. Despite promising results in multi-agent communication and debate settings, little progress has been made in training models to work together on tasks. In this paper, we present a first step toward "Multi-agent LLM training" (MALT) on reasoning problems. Our approach employs a sequential multi-agent setup with heterogeneous LLMs assigned specialized roles: a generator, verifier, and refinement model iteratively solving problems. We propose a trajectory-expansion-based synthetic data generation process and a credit assignment strategy driven by joint outcome based rewards. This enables our post-training setup to utilize both positive and negative trajectories to autonomously improve each model's specialized capabilities as part of a joint sequential system. We evaluate our approach across MATH, GSM8k, and CQA, where MALT on Llama 3.1 8B models achieves relative improvements of 14.14%, 7.12%, and 9.40% respectively over the same baseline model. This demonstrates an early advance in multi-agent cooperative capabilities for performance on mathematical and common sense reasoning questions. More generally, our work provides a concrete direction for research around multi-agent LLM training approaches.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > Middle East > Jordan (0.04)
MALT Powers Up Adversarial Attacks
Melamed, Odelia, Yehudai, Gilad, Shamir, Adi
Neural networks are widely known to be susceptible to adversarial perturbations (Szegedy et al. [2013]), which are typically imperceptible by humans. Many different papers have shown how to construct such attacks, where adding a small perturbation to the input significantly changes the output of the model (Carlini and Wagner [2017], Papernot et al. [2017], Athalye et al. [2018]). To protect from these attacks, researchers have tried to develop more robust models using several different techniques, such as adversarial training using different attacks (Madry et al. [2017], Papernot et al. [2016], Liu et al. [2023], Wang et al. [2023]). The current state of the art adversarial attack, known as AutoAttack (Croce and Hein [2020b]), combines several different parameter-free attacks, some targeted and some untargeted. AutoAttack currently leads the RobustBench benchmark (Croce et al. [2020]), which is the standard benchmark for adversarial robustness. Notably, the targeted attacks used in AutoAttack pick the adversarial target classes according to the model's confidence levels and attack the top nine classes, even though CIFAR-100 and ImageNet have many more possible target classes. The reason for attacking only a limited number of classes, rather than all possible classes, is computational, as each such attack has a significant running time. The reason that adversarial examples exist remains a hot topic of debate, specifically, whether it is due to the highly non-linear landscape of neural networks or rather to their local linearity properties.
- Asia > Middle East > Israel (0.04)
- North America > United States > New York (0.04)
- Europe > Slovakia > Bratislava > Bratislava (0.04)
MALT: Multi-scale Action Learning Transformer for Online Action Detection
Yang, Zhipeng, Wang, Ruoyu, Tan, Yang, Xie, Liping
Online action detection (OAD) aims to identify ongoing actions from streaming video in real-time, without access to future frames. Since these actions manifest at varying scales of granularity, ranging from coarse to fine, projecting an entire set of action frames to a single latent encoding may result in a lack of local information, necessitating the acquisition of action features across multiple scales. In this paper, we propose a multi-scale action learning transformer (MALT), which includes a novel recurrent decoder (used for feature fusion) that includes fewer parameters and can be trained more efficiently. A hierarchical encoder with multiple encoding branches is further proposed to capture multi-scale action features. The output from the preceding branch is then incrementally input to the subsequent branch as part of a cross-attention calculation. In this way, output features transition from coarse to fine as the branches deepen. We also introduce an explicit frame scoring mechanism employing sparse attention, which filters irrelevant frames more efficiently, without requiring an additional network. The proposed method achieved state-of-the-art performance on two benchmark datasets (THUMOS'14 and TVSeries), outperforming all existing models used for comparison, with an mAP of 0.2% for THUMOS'14 and an mcAP of 0.1% for TVseries.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Information Technology > Artificial Intelligence > Vision (0.71)
- Information Technology > Sensing and Signal Processing > Image Processing (0.68)
- Information Technology > Data Science > Data Mining (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Interpretable Causal Inference for Analyzing Wearable, Sensor, and Distributional Data
Katta, Srikar, Parikh, Harsh, Rudin, Cynthia, Volfovsky, Alexander
Many modern causal questions ask how treatments affect complex outcomes that are measured using wearable devices and sensors. Current analysis approaches require summarizing these data into scalar statistics (e.g., the mean), but these summaries can be misleading. For example, disparate distributions can have the same means, variances, and other statistics. Researchers can overcome the loss of information by instead representing the data as distributions. We develop an interpretable method for distributional data analysis that ensures trustworthy and robust decision-making: Analyzing Distributional Data via Matching After Learning to Stretch (ADD MALTS). We (i) provide analytical guarantees of the correctness of our estimation strategy, (ii) demonstrate via simulation that ADD MALTS outperforms other distributional data analysis methods at estimating treatment effects, and (iii) illustrate ADD MALTS' ability to verify whether there is enough cohesion between treatment and control units within subpopulations to trustworthily estimate treatment effects. We demonstrate ADD MALTS' utility by studying the effectiveness of continuous glucose monitors in mitigating diabetes risks.
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
MALTS: Matching After Learning to Stretch
Parikh, Harsh, Rudin, Cynthia, Volfovsky, Alexander
We introduce a flexible framework that produces high-quality almost-exact matches for causal inference. Most prior work in matching uses ad-hoc distance metrics, often leading to poor quality matches, particularly when there are irrelevant covariates. In this work, we learn an interpretable distance metric for matching, which leads to substantially higher quality matches. The learned distance metric stretches the covariate space according to each covariate's contribution to outcome prediction: this stretching means that mismatches on important covariates carry a larger penalty than mismatches on irrelevant covariates. Our ability to learn flexible distance metrics leads to matches that are interpretable and useful for the estimation of conditional average treatment effects.
- North America > United States > North Carolina > Durham County > Durham (0.04)
- South America > Chile (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- Health & Medicine (1.00)
- Education (0.67)
#hardtoparse: POS Tagging and Parsing the Twitterverse
Foster, Jennifer (Dublin City University) | Cetinoglu, Ozlem (Dublin City University) | Wagner, Joachim (Dublin City University) | Roux, Joseph Le (LIF - CNRS) | Hogan, Stephen (Dublin City University) | Nivre, Joakim (Uppsala University) | Hogan, Deirdre (Dublin City University) | Genabith, Josef van (Dublin City University)
We evaluate the statistical dependency parser, Malt, on a new dataset of sentences taken from tweets. We use a version of Malt which is trained on gold standard phrase structure Wall Street Journal (WSJ) trees converted to Stanford labelled dependencies. We observe a drastic drop in performance moving from our in-domain WSJ test set to the new Twitter dataset, much of which has to do with the propagation of part-of-speech tagging errors. Retraining Malt on dependency trees produced by a state-of-the-art phrase structure parser, which has itself been self-trained on Twitter material, results in a significant improvement. We analyse this improvement by examining in detail the effect of the retraining on individual dependency types.
- Europe > Ireland (0.04)
- North America > United States > Pennsylvania (0.04)
- Europe > Sweden > Uppsala County > Uppsala (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- Information Technology > Services (0.47)
- Media > News (0.34)