Not enough data to create a plot.
Try a different view from the menu above.
995f693b73050f90977ed2828202645c-Supplemental-Conference.pdf
As described in Section 3.2, we implement categorical attention by associating each attention head In this example, an attention head (left) calculates the histogram for each position. An MLP (top right) reads the histogram values and outputs a value of 0 if the histogram value is greater than one, and 4 otherwise. Inspecting the corresponding classifier weights (bottom right), we see that an output value of 0--meaning a histogram count greater than 1--increases the likelihood that the double-histogram value is 1 or 2, and decreases the likelihood of larger values. Because the input length is limited to 8, this reflects the fact that if one number appears many times, it is unlikely that another number appears the same number of times. An output of 4 (meaning a histogram count of 1) increases the likelihood that the double-histogram is greater than 1.
Learning Transformer Programs
Recent research in mechanistic interpretability has attempted to reverse-engineer Transformer models by carefully inspecting network weights and activations. However, these approaches require considerable manual effort and still fall short of providing complete, faithful descriptions of the underlying algorithms. In this work, we introduce a procedure for training Transformers that are mechanistically interpretable by design. We build on RASP [Weiss et al., 2021], a programming language that can be compiled into Transformer weights. Instead of compiling human-written programs into Transformers, we design a modified Transformer that can be trained using gradient-based optimization and then automatically converted into a discrete, human-readable program. We refer to these models as Transformer Programs. To validate our approach, we learn Transformer Programs for a variety of problems, including an in-context learning task, a suite of algorithmic problems (e.g.
Entropy testing and its application to testing Bayesian networks
This paper studies the problem of entropy identity testing: given sample access to a distribution p and a fully described distribution q (both discrete distributions over a domain of size k), and the promise that either p = q or |H(p) H(q)| ฮต, where H() denotes the Shannon entropy, a tester needs to distinguish between the two cases with high probability.
Jungo Kasai
Q: How many home runs has Shohei Ohtani hit? Why was the dataset created? Q: How many home runs has Shohei Ohtani hit? QA was created to provide a to benchmark question answering at the dynamic platform that asks questions about the present time: answers (e.g., the number of current world, challenging QA systems to provide Shohei Ohtani's home runs) change in real time. QA may identify areas of potential research, such as improving how QA systems deal with unanswerable What are the instances?
Right this way: Can VLMs Guide Us to See More to Answer Questions?
In question-answering scenarios, humans can assess whether the available information is sufficient and seek additional information if necessary, rather than providing a forced answer. In contrast, Vision Language Models (VLMs) typically generate direct, one-shot responses without evaluating the sufficiency of the information. To investigate this gap, we identify a critical and challenging task in the Visual Question Answering (VQA) scenario: can VLMs indicate how to adjust an image when the visual information is insufficient to answer a question? This capability is especially valuable for assisting visually impaired individuals who often need guidance to capture images correctly. To evaluate this capability of current VLMs, we introduce a human-labeled dataset as a benchmark for this task. Additionally, we present an automated framework that generates synthetic training data by simulating "where to know" scenarios. Our empirical results show significant performance improvements in mainstream VLMs when fine-tuned with this synthetic data. This study demonstrates the potential to narrow the gap between information assessment and acquisition in VLMs, bringing their performance closer to humans.