Oceania


Revisiting Few-Shot Object Detection with Vision-Language Models

Neural Information Processing Systems

The era of vision-language models (VLMs) trained on web-scale datasets challenges conventional formulations of "open-world" perception. In this work, we revisit the task of few-shot object detection (FSOD) in the context of recent foundational VLMs. First, we point out that zero-shot predictions from VLMs such as GroundingDINO significantly outperform state-of-the-art few-shot detectors (48 vs. 33 AP) on COCO. Despite their strong zero-shot performance, such foundation models may still be sub-optimal. For example, trucks on the web may be defined differently from trucks for a target application such as autonomous vehicle perception.


American tennis star Danielle Collins defends outburst toward cameraman during tournament

FOX News

PongBot is an artificial intelligence-powered tennis robot. American tennis star Danielle Collins on Tuesday defended her outburst toward a cameraman during a tournament last week. Collins' incident occurred at the Internationaux de Strasbourg against Emma Raducanu. During a changeover, she told the cameraman to keep their distance as she refilled her water bottle. She said the cameraman was acting "wildly inappropriate."


Beyond accuracy: Tracking more like Human via Visual Search Xuchen Li1,2

Neural Information Processing Systems

Human visual search ability enables efficient and accurate tracking of an arbitrary moving target, which is a significant research interest in cognitive neuroscience. The recently proposed Central-Peripheral Dichotomy (CPD) theory sheds light on how humans effectively process visual information and track moving targets in complex environments. However, existing visual object tracking algorithms still fall short of matching human performance in maintaining tracking over time, particularly in complex scenarios requiring robust visual search skills. These scenarios often involve Spatio-Temporal Discontinuities (i.e., STDChallenge), prevalent in long-term tracking and global instance tracking. To address this issue, we conduct research from a human-like modeling perspective: (1) Inspired by the CPD, we propose a new tracker named CPDTrack to achieve human-like visual search ability. The central vision of CPDTrack leverages the spatio-temporal continuity of videos to introduce priors and enhance localization precision, while the peripheral vision improves global awareness and detects object movements.



The Expressive Power of Neural Networks: A View from the Width

Neural Information Processing Systems

The expressive power of neural networks is important for understanding deep learning. Most existing works consider this problem from the view of the depth of a network. In this paper, we study how width affects the expressiveness of neural networks. Classical results state that depth-bounded (e.g.


These robot cats have glowing eyes and artificial heartbeats โ€“ and could help reduce stress in children

The Guardian

At Springwood library in the Blue Mountains, a librarian appears with a cat carrier in each hand. About 30 children gather around in a semicircle. Inside each carrier, a pair of beaming, sci-fi-like eyes peer out at the expectant crowd. "That is the funniest thing ever," one child says. The preschoolers have just finished reading The Truck Cat by Deborah Frenkel and Danny Snell for the annual National Simultaneous Storytime.


Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs

Neural Information Processing Systems

The loss functions of deep neural networks are complex and their geometric properties are not well understood. We show that the optima of these complex loss functions are in fact connected by simple curves over which training and test accuracy are nearly constant. We introduce a training procedure to discover these high-accuracy pathways between modes. Inspired by this new geometric insight, we also propose a new ensembling method entitled Fast Geometric Ensembling (FGE). Using FGE we can train high-performing ensembles in the time required to train a single model. We achieve improved performance compared to the recent state-of-the-art Snapshot Ensembles, on CIFAR-10, CIFAR-100, and ImageNet.


Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters

Neural Information Processing Systems

We study the decentralized distributed computation of discrete approximations for the regularized Wasserstein barycenter of a finite set of continuous probability measures distributedly stored over a network. We assume there is a network of agents/machines/computers, and each agent holds a private continuous probability measure and seeks to compute the barycenter of all the measures in the network by getting samples from its local measure and exchanging information with its neighbors. Motivated by this problem, we develop, and analyze, a novel accelerated primal-dual stochastic gradient method for general stochastic convex optimization problems with linear equality constraints. Then, we apply this method to the decentralized distributed optimization setting to obtain a new algorithm for the distributed semi-discrete regularized Wasserstein barycenter problem. Moreover, we show explicit non-asymptotic complexity for the proposed algorithm. Finally, we show the effectiveness of our method on the distributed computation of the regularized Wasserstein barycenter of univariate Gaussian and von Mises distributions, as well as some applications to image aggregation.