Goto

Collaborating Authors

 magnet


MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks

Neural Information Processing Systems

Large multimodal models (LMMs) have shown remarkable progress in audiovisual understanding, yet they struggle with real-world scenarios that require complex reasoning across extensive video collections. Existing benchmarks for video question answering remain limited in scope, typically involving one clip per query, which falls short of representing the challenges of large-scale, audiovisual retrieval and reasoning encountered in practical applications. To bridge this gap, we introduce a novel task named AVHaystacksQA, where the goal is to identify salient segments across different videos in response to a query and link them together to generate the most informative answer. To this end, we present AVHaystacks, an audio-visual benchmark comprising 3100 annotated QA pairs designed to assess the capabilities of LMMs in multi-video retrieval and temporal grounding task. Additionally, we propose a model-agnostic, multi-agent framework MAGNET to address this challenge, achieving up to 89% and 65% relative improvements over baseline methods on BLEU@4 and GPT evaluation scores in QA task on our proposed AVHaystacks. To enable robust evaluation of multi-video retrieval and temporal grounding for optimal response generation, we introduce two new metrics, STEM, which captures alignment errors between a ground truth and a predicted step sequence and MTGS, to facilitate balanced and interpretable evaluation of segment-level grounding performance.



Magnet: We Never Know How Text-to-Image Diffusion Models Work, Until We Learn How Vision-Language Models Function

Neural Information Processing Systems

Text-to-image diffusion models particularly Stable Diffusion, have revolutionized the field of computer vision. However, the synthesis quality often deteriorates when asked to generate images that faithfully represent complex prompts involving multiple attributes and objects. While previous studies suggest that blended text embeddings lead to improper attribute binding, few have explored this in depth. In this work, we critically examine the limitations of the CLIP text encoder in understanding attributes and investigate how this affects diffusion models. We discern a phenomenon of attribute bias in the text space and highlight a contextual issue in padding embeddings that entangle different concepts. We propose Magnet, a novel training-free approach to tackle the attribute binding problem. We introduce positive and negative binding vectors to enhance disentanglement, further with a neighbor strategy to increase accuracy. Extensive experiments show that Magnet significantly improves synthesis quality and binding accuracy with negligible computational cost, enabling the generation of unconventional and unnatural concepts.


MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization

Neural Information Processing Systems

In multilingual settings, non-Latin scripts and low-resource languages are usually disadvantaged in terms of language models' utility, efficiency, and cost. Specifically, previous studies have reported multiple modeling biases that the current tokenization algorithms introduce to non-Latin script languages, the main one being over-segmentation. In this work, we propose MAGNET-- multilingual adaptive gradient-based tokenization--to reduce over-segmentation via adaptive gradient-based subword tokenization. MAGNET learns to predict segment boundaries between byte tokens in a sequence via sub-modules within the model, which act as internal boundary predictors (tokenizers). Previous gradient-based tokenization methods aimed for uniform compression across sequences by integrating a single boundary predictor during training and optimizing it end-to-end through stochastic reparameterization alongside the next token prediction objective. However, this approach still results in over-segmentation for non-Latin script languages in multilingual settings. In contrast, MAGNET offers a customizable architecture where byte-level sequences are routed through language-script-specific predictors, each optimized for its respective language script. This modularity enforces equitable segmentation granularity across different language scripts compared to previous methods. Through extensive experiments, we demonstrate that in addition to reducing segmentation disparities, MAGNET also enables faster language modeling and improves downstream utility.


Homemade chess board moves its own pieces. And wins.

Popular Science

Technology AI Homemade chess board moves its own pieces. Maker Joshua Stanley Robotics used magnets and an open source chess platform to build this unique board. Breakthroughs, discoveries, and DIY tips sent six days a week. It's been nearly 30 years since chess champion Garry Kasparov lost to IBM's Deep Blue, marking the first time a reigning world champion was defeated by a computer in a match. Chess engines have since improved so dramatically that even a simple smartphone app can now make top grandmasters sweat .





Dennis Whyte's fusion quest

MIT Technology Review

When the US Department of Energy announced that it would stop funding the tokamak at MIT's Plasma Science and Fusion Center, Dennis Whyte considered giving up on fusion research. But then he had a brainstorm--and challenged his students to bring the idea to life. This full-scale high-temperature superconducting magnet designed and built by Commonwealth Fusion Systems and MIT's Plasma Science and Fusion Center (PSFC) has demonstrated a recordbreaking 20 tesla magnetic field. It is the strongest fusion magnet in the world. Ever since nuclear fusion was discovered in the 1930s, scientists have wondered if we could somehow replicate and harness the phenomenon behind starlight--the smashing together of hydrogen atoms to form helium and a stupendous amount of clean energy. Fusing hydrogen would yield times more energy than simply burning it. Unlike nuclear fission, which powers the world's 440 atomic reactors, hydrogen fusion produces no harmful radiation, only neutrons that are captured and added back to the reaction.


MAgNet: Mesh Agnostic Neural PDE Solver

Neural Information Processing Systems

As an important example, climate predictions require fine spatio-temporal resolutions to resolve all turbulent scales in the fluid simulations. This makes the task of accurately resolving these scales computationally out of reach even with modern supercomputers. As a result, current numerical modelers solve PDEs on grids that are too coarse (3km to 200km on each side), which hinders the accuracy and usefulness of the predictions. In this paper, we leverage the recent advances in Implicit Neural Representations (INR) to design a novel architecture that predicts the spatially continuous solution of a PDE given a spatial position query. By augmenting coordinate-based architectures with Graph Neural Networks (GNN), we enable zero-shot generalization to new non-uniform meshes and long-term predictions up to 250 frames ahead that are physically consistent. Our Mesh Agnostic Neural PDE Solver (MAgNet) is able to make accurate predictions across a variety of PDE simulation datasets and compares favorably with existing baselines. Moreover, our model generalizes well to different meshes and resolutions up to four times those trained on.