Goto

Collaborating Authors

 liang


7 Ways to Get So Good at AI, People Will Think You Are AI

WIRED

From killing your chatbots to optimizing your prompts, here are the best ways to go full AI native and conquer the new world. Sam Liang is appalled as I confess my technique for recording an interview: running the Voice Memos app on an iPhone and transferring the transcript manually to a Google Doc. The CEO of Otter, a transcription service for analyzing meetings, looks at me as if I tried to call into our video chat using a rotary phone. He believes, naturally, that I should switch to Otter. Time-saving productivity tools like next-gen note-takers, task-based agents, and chatty inbox assistants are exploding in popularity as they invade every nook and cranny of our digital lives.


Unleashing Region Understanding in Intermediate Layers for MLLM-based Referring Expression Generation

Neural Information Processing Systems

The Multi-modal Large Language Model (MLLM) based Referring Expression Generation (REG) task has gained increasing popularity, which aims to generate an unambiguous text description that applies to exactly one object or region in the image by leveraging foundation models. We empirically found that there exists a potential trade-off between the detailedness and the correctness of the descriptions for the referring objects. On the one hand, generating sentences with more details is usually required in order to provide more precise object descriptions. On the other hand, complicated sentences could easily increase the probability of hallucinations. To address this issue, we propose a training-free framework, named ``unleash-then-eliminate'', which first elicits the latent information in the intermediate layers, and then adopts a cycle-consistency-based decoding method to alleviate the production of hallucinations. Furthermore, to reduce the computational load of cycle-consistency-based decoding, we devise a Probing-based Importance Estimation method to statistically estimate the importance weights of intermediate layers within a subset. These importance weights are then incorporated into the decoding process over the entire dataset, intervening in the next token prediction from intermediate layers.Extensive experiments conducted on the RefCOCOg and PHD benchmarks show that our proposed framework could outperform existing methods on both semantic and hallucination-related metrics.


Diffusion4D: Fast Spatial-temporal Consistent 4D generation via Video Diffusion Models

Neural Information Processing Systems

The availability of large-scale multimodal datasets and advancements in diffusion models have significantly accelerated progress in 4D content generation. Most prior approaches rely on multiple images or video diffusion models, utilizing score distillation sampling for optimization or generating pseudo novel views for direct supervision. However, these methods are hindered by slow optimization speeds and multi-view inconsistency issues. Spatial and temporal consistency in 4D geometry has been extensively explored respectively in 3D-aware diffusion models and traditional monocular video diffusion models. Building on this foundation, we propose a strategy to migrate the temporal consistency in video diffusion models to the spatial-temporal consistency required for 4D generation.


Communication Efficient Distributed Training with Distributed Lion

Neural Information Processing Systems

The Lion optimizer has been a promising competitor with the AdamW for training large AI models, with advantages in memory, computation, and sample efficiency. In this paper, we introduce Distributed Lion, an innovative adaptation of Lion for distributed training environments. Leveraging the sign operator in Lion, our Distributed Lion only requires to communicate binary or lower-precision vectorsbetween workers to the center server, significantly reducing the communication cost. Our theoretical analysis confirms Distributed Lion's convergence properties. Empirical results demonstrate its robustness across a range of tasks, worker counts, and batch sizes, on both vision and language problems. Notably, Distributed Lion attains comparable performance to standard Lion or AdamW optimizers applied on aggregated gradients, but with significantly reduced communication bandwidth. This feature is particularly advantageous for training large models. In addition, we also demonstrate that \mavolion{} presents a more favorable performance-bandwidth balance compared to existing efficient distributed methods such as deep gradient compression and ternary gradients.



On Exact Computation with an Infinitely Wide Neural Net

Neural Information Processing Systems

Moreo randominitializationH( 0)conv deterministic H asthewidthNeur ker ( , ) (Equation (2)) evaluatedH(t)= H forallt, then (3) becomes du(t) dt = H (u(t) y). Suppose (z)= max ( 0,z), 1/ = poly ( 1/ ,log (n / )) and d1 = d2 = = dL = m with m poly ( 1/ , L,1/ 0,n,log ( 1/ )).




cffb6e2288a630c2a787a64ccc67097c-Paper.pdf

Neural Information Processing Systems

Inthis paper,we theoretically extend spectral-based graph convolution todigraphs and deriveasimplified form usingpersonalizedPageRank. Specifically,we present theDigraph Inception Convolutional Networks(DiGCN) whichutilizes digraph convolution andkth-order proximity to achievelarger receptivefields and learn multi-scale features in digraphs.