Overview
Language Model Tokenizers Introduce Unfairness Between Languages
Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tokenization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.
Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities
We investigate safe multi-agent reinforcement learning, where agents seek to collectively maximize an aggregate sum of local objectives while satisfying their own safety constraints. The objective and constraints are described by general utilities, i.e., nonlinear functions of the long-term state-action occupancy measure, which encompass broader decision-making goals such as risk, exploration, or imitations. The exponential growth of the state-action space size with the number of agents presents challenges for global observability, further exacerbated by the global coupling arising from agents' safety constraints. To tackle this issue, we propose a primal-dual method utilizing shadow reward and κ-hop neighbor truncation under a form of correlation decay property, where κ is the communication radius.
Appendix A Related Work A.1 Multimodal Large Language Models 3 A.2 Trustworthiness of LLMs
A.1 Multimodal Large Language Models Building on the foundational capabilities of groundbreaking Large Language Models (LLMs) such as GPT [3], PALM [6], Mistral [49], and LLama [108], which excel in language understanding and reasoning, recent innovations have integrated these models with other modalities (especially vision), leading to the development of Multimodal Large Language Models (MLLMs). These advanced MLLMs combine and process visual and textual data, demonstrating enhanced versatility in addressing both traditional vision tasks [21, 40, 42, 133] and complex multimodal challenges [34, 70, 136]. Among all MLLMs, proprietary models consistently perform well. OpenAI's GPT-4-Vision [82] pioneered this space by adeptly handling both text and image content. Anthropic's Claude 3 series [7] integrates advanced vision capabilities and multilingual support, enhancing its application across diverse cognitive and real-time tasks.
Automatic differentiation in ML: Where we are and where we should be going
Bart van Merrienboer, Olivier Breuleux, Arnaud Bergeron, Pascal Lamblin
We review the current state of automatic differentiation (AD) for array programming in machine learning (ML), including the different approaches such as operator overloading (OO) and source transformation (ST) used for AD, graph-based intermediate representations for programs, and source languages. Based on these insights, we introduce a new graph-based intermediate representation (IR) which specifically aims to efficiently support fully-general AD for array programming. Unlike existing dataflow programming representations in ML frameworks, our IR naturally supports function calls, higher-order functions and recursion, making ML models easier to implement. The ability to represent closures allows us to perform AD using ST without a tape, making the resulting derivative (adjoint) program amenable to ahead-of-time optimization using tools from functional language compilers, and enabling higher-order derivatives. Lastly, we introduce a proof of concept compiler toolchain called Myia which uses a subset of Python as a front end.
Encoding Time-Series Explanations through Self-Supervised Model Behavior Consistency
Interpreting time series models is uniquely challenging because it requires identifying both the location of time series signals that drive model predictions and their matching to an interpretable temporal pattern. While explainers from other modalities can be applied to time series, their inductive biases do not transfer well to the inherently challenging interpretation of time series.