Goto

Collaborating Authors

 image







Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment

Neural Information Processing Systems

Existing image-text modality alignment in Vision Language Models (VLMs) treats each text token equally in an autoregressive manner. Despite being simple and effective, this method results in sub-optimal cross-modal alignment by over-emphasizing the text tokens that are less correlated with or even contradictory with the input images. In this paper, we advocate for distinct contributions for each text token based on its visual correlation. Specifically, we present by contrasting image inputs, the difference in prediction logits on each text token provides strong guidance of visual correlation. We therefore introduce Contrastive Alignment (CAL), a simple yet effective re-weighting strategy that prioritizes training visually correlated tokens.


From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos

Neural Information Processing Systems

Three-dimensional (3D) understanding of objects and scenes play a key role in humans' ability to interact with the world and has been an active area of research in computer vision, graphics, and robotics. Large scale synthetic and object-centric 3D datasets have shown to be effective in training models that have 3D understanding of objects. However, applying a similar approach to real-world objects and scenes is difficult due to a lack of large-scale data. Videos are a potential source for real-world 3D data, but finding diverse yet corresponding views of the same content have shown to be difficult at scale. Furthermore, standard videos come with fixed viewpoints, determined at the time of capture.


American tennis star Danielle Collins accuses cameraman of 'wildly inappropriate' behavior

FOX News

PongBot is an artificial intelligence-powered tennis robot. American tennis player Danielle Collins had some choice words for the cameraman during her Internationaux de Strasbourg match against Emma Raducanu on Wednesday afternoon. Collins was in the middle of a changeover when she felt the cameraman's hovering was a bit too close for comfort in the middle of the third and defining set. She got off the bench and made the point clear. Danielle Collins celebrates during her match against Madison Keys in the third round of the women's singles at the 2025 Australian Open at Melbourne Park in Melbourne, Australia, on Jan. 18, 2025.


Images of AI – between fiction and function

AIHub

In this blog post, Dominik Vrabič Dežman provides a summary of his recent research article, 'Promising the future, encoding the past: AI hype and public media imagery'. Dominik also draws attention to the algorithms which perpetuate the dominance of familiar and sensationalist visuals and calls for movements which reshape media systems to make better images of AI more visible in public discourse. The full paper is published in the AI and Ethics Journal's special edition on'The Ethical Implications of AI Hype, a collection edited by We and AI. AI promises innovation, yet its imagery remains trapped in the past. Deep-blue, sci-fi-inflected visuals have flooded public media, saturating our collective imagination with glowing, retro-futuristic interfaces and humanoid robots.


Bronny James explains what fuels him throughout tumultuous rookie season: 'People think I'm a f---ing robot'

FOX News

Paul Pierce explains how LeBron's absence has actually been good for the Lakers. Los Angeles Lakers shooting guard Bronny James has been the center of debate from the moment he was drafted in June. The 20-year-old said he tries to filter it all out, but he sees it all. "My first thought about everything is I always try to just let it go through one ear and out the other, put my head down and come to work and be positive every day. I see everything that people are saying, and people think, like, I'm a f---ing robot, like I don't have any feelings or emotions," James said via The Athletic.