Goto

Collaborating Authors

 frame



ObjectMover: Generative Object Movement with Video Prior

Yu, Xin, Wang, Tianyu, Kim, Soo Ye, Guerrero, Paul, Chen, Xi, Liu, Qing, Lin, Zhe, Qi, Xiaojuan

arXiv.org Artificial Intelligence

Simple as it seems, moving an object to another location within an image is, in fact, a challenging image-editing task that requires re-harmonizing the lighting, adjusting the pose based on perspective, accurately filling occluded regions, and ensuring coherent synchronization of shadows and reflections while maintaining the object identity. In this paper, we present ObjectMover, a generative model that can perform object movement in highly challenging scenes. Our key insight is that we model this task as a sequence-to-sequence problem and fine-tune a video generation model to leverage its knowledge of consistent object generation across video frames. We show that with this approach, our model is able to adjust to complex real-world scenarios, handling extreme lighting harmonization and object effect movement. As large-scale data for object movement are unavailable, we construct a data generation pipeline using a modern game engine to synthesize high-quality data pairs. We further propose a multi-task learning strategy that enables training on real-world video data to improve the model generalization. Through extensive experiments, we demonstrate that ObjectMover achieves outstanding results and adapts well to real-world scenarios.


GigaSLAM: Large-Scale Monocular SLAM with Hierachical Gaussian Splats

Deng, Kai, Yang, Jian, Wang, Shenlong, Xie, Jin

arXiv.org Artificial Intelligence

Tracking and mapping in large-scale, unbounded outdoor environments using only monocular RGB input presents substantial challenges for existing SLAM systems. Traditional Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) SLAM methods are typically limited to small, bounded indoor settings. To overcome these challenges, we introduce GigaSLAM, the first NeRF/3DGS-based SLAM framework for kilometer-scale outdoor environments, as demonstrated on the KITTI and KITTI 360 datasets. Our approach employs a hierarchical sparse voxel map representation, where Gaussians are decoded by neural networks at multiple levels of detail. This design enables efficient, scalable mapping and high-fidelity viewpoint rendering across expansive, unbounded scenes. For front-end tracking, GigaSLAM utilizes a metric depth model combined with epipolar geometry and PnP algorithms to accurately estimate poses, while incorporating a Bag-of-Words-based loop closure mechanism to maintain robust alignment over long trajectories. Consequently, GigaSLAM delivers high-precision tracking and visually faithful rendering on urban outdoor benchmarks, establishing a robust SLAM solution for large-scale, long-term scenarios, and significantly extending the applicability of Gaussian Splatting SLAM systems to unbounded outdoor environments.


A linguistically-motivated evaluation methodology for unraveling model's abilities in reading comprehension tasks

Antoine, Elie, Béchet, Frédéric, Damnati, Géraldine, Langlais, Philippe

arXiv.org Artificial Intelligence

We introduce an evaluation methodology for reading comprehension tasks based on the intuition that certain examples, by the virtue of their linguistic complexity, consistently yield lower scores regardless of model size or architecture. We capitalize on semantic frame annotation for characterizing this complexity, and study seven complexity factors that may account for model's difficulty. We first deploy this methodology on a carefully annotated French reading comprehension benchmark showing that two of those complexity factors are indeed good predictors of models' failure, while others are less so. We further deploy our methodology on a well studied English benchmark by using Chat-GPT as a proxy for semantic annotation. Our study reveals that fine-grained linguisticallymotivated automatic evaluation of a reading comprehension task is not only possible, but helps understand models' abilities to handle specific linguistic characteristics of input examples. It also shows that current state-of-the-art models fail with some for those characteristics which suggests that adequately handling them requires more than merely increasing model size.


FRAME: Forward Recursive Adaptive Model Extraction -- A Technique for Advance Feature Selection

Kapure, Nachiket, Joshi, Harsh, Kumari, Parul, mistri, Rajeshwari, Mali, Manasi

arXiv.org Artificial Intelligence

Feature selection is a crucial preprocessing step in machine learning, impacting model performance, interpretability, and computational efficiency. This study introduces a novel hybrid approach, the Forward Recursive Adaptive Model Extraction Technique (FRAME), which combines Forward Selection and Recursive Feature Elimination (RFE) to enhance feature selection across diverse datasets. FRAME integrates the strengths of both methods, balancing exploration and exploitation of features to optimize selection. A comprehensive evaluation of FRAME was conducted against traditional methods such as SelectKBest and Lasso Regression, using high-dimensional, noisy, and heterogeneous datasets. The results demonstrate that FRAME consistently delivers superior predictive performance based on downstream machine learning evaluation metrics. It effectively reduces dimensionality while maintaining robust model performance, making it particularly valuable for applications requiring interpretable and accurate predictions, such as biomedical diagnostics. This study highlights the importance of assessing feature selection methods across varied datasets to ensure their robustness and generalizability. The findings suggest that FRAME has significant potential for further enhancement, particularly through integration with deep learning architectures for adaptive and real-time feature selection in dynamic environments. By advancing feature selection methodologies, FRAME offers a practical and effective solution to improve machine learning applications across multiple domains.


Exploring through Random Curiosity with General Value Functions

Ramesh, Aditya, Kirsch, Louis, van Steenkiste, Sjoerd, Schmidhuber, Jürgen

arXiv.org Artificial Intelligence

Efficient exploration in reinforcement learning is a challenging problem commonly addressed through intrinsic rewards. Recent prominent approaches are based on state novelty or variants of artificial curiosity. However, directly applying them to partially observable environments can be ineffective and lead to premature dissipation of intrinsic rewards. Here we propose random curiosity with general value functions (RC-GVF), a novel intrinsic reward function that draws upon connections between these distinct approaches. Instead of using only the current observation's novelty or a curiosity bonus for failing to predict precise environment dynamics, RC-GVF derives intrinsic rewards through predicting temporally extended general value functions. We demonstrate that this improves exploration in a hard-exploration diabolical lock problem. Furthermore, RC-GVF significantly outperforms previous methods in the absence of ground-truth episodic counts in the partially observable MiniGrid environments. Panoramic observations on Mini-Grid further boost RC-GVF's performance such that it is competitive to baselines exploiting privileged information in form of episodic counts.


Redeeming Intrinsic Rewards via Constrained Optimization

Chen, Eric, Hong, Zhang-Wei, Pajarinen, Joni, Agrawal, Pulkit

arXiv.org Artificial Intelligence

State-of-the-art reinforcement learning (RL) algorithms typically use random sampling (e.g., $\epsilon$-greedy) for exploration, but this method fails on hard exploration tasks like Montezuma's Revenge. To address the challenge of exploration, prior works incentivize exploration by rewarding the agent when it visits novel states. Such intrinsic rewards (also called exploration bonus or curiosity) often lead to excellent performance on hard exploration tasks. However, on easy exploration tasks, the agent gets distracted by intrinsic rewards and performs unnecessary exploration even when sufficient task (also called extrinsic) reward is available. Consequently, such an overly curious agent performs worse than an agent trained with only task reward. Such inconsistency in performance across tasks prevents the widespread use of intrinsic rewards with RL algorithms. We propose a principled constrained optimization procedure called Extrinsic-Intrinsic Policy Optimization (EIPO) that automatically tunes the importance of the intrinsic reward: it suppresses the intrinsic reward when exploration is unnecessary and increases it when exploration is required. The results is superior exploration that does not require manual tuning in balancing the intrinsic reward against the task reward. Consistent performance gains across sixty-one ATARI games validate our claim. The code is available at https://github.com/Improbable-AI/eipo.


It Happened One Frame: incredibly accurate video content search with OpenAI CLIP

#artificialintelligence

I love movies, so as a fun exercise for my fast.ai It's named "It Happened One Frame", in tribute to the classic 1934 romantic comedy "It Happened One Night". To use this app, all you need is the link to a Youtube video. For example, you could search "Macaulay Culkin screams with hands on his cheeks" in a Home Alone movie clip and get the screenshots that capture the most iconic scene in this classic. This particular image is so popular that you can easily get it from a google search.


Graph data science: What you need to know

#artificialintelligence

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Whether you're genuinely interested in getting insights and solving problems using data, or just attracted by what has been called "the most promising career" by LinkedIn and the "best job in America" by Glassdoor, chances are you're familiar with data science. As we've elaborated previously, graphs are a universal data structure with manifestations that span a wide spectrum: from analytics to databases, and from knowledge management to data science, machine learning and even hardware. Graph data science is when you want to answer questions, not just with your data, but with the connections between your data points -- that's the 30-second explanation, according to Alicia Frame. Frame is the senior director of product management for data science at Neo4j, a leading graph database vendor.


RheFrameDetect: A Text Classification System for Automatic Detection of Rhetorical Frames in AI from Open Sources

Ghosh, Saurav, Loustaunau, Philippe

arXiv.org Artificial Intelligence

Rhetorical Frames in AI can be thought of as expressions that describe AI development as a competition between two or more actors, such as governments or companies. Examples of such Frames include robotic arms race, AI rivalry, technological supremacy, cyberwarfare dominance and 5G race. Detection of Rhetorical Frames from open sources can help us track the attitudes of governments or companies towards AI, specifically whether attitudes are becoming more cooperative or competitive over time. Given the rapidly increasing volumes of open sources (online news media, twitter, blogs), it is difficult for subject matter experts to identify Rhetorical Frames in (near) real-time. Moreover, these sources are in general unstructured (noisy) and therefore, detecting Frames from these sources will require state-of-the-art text classification techniques. In this paper, we develop RheFrameDetect, a text classification system for (near) real-time capture of Rhetorical Frames from open sources. Given an input document, RheFrameDetect employs text classification techniques at multiple levels (document level and paragraph level) to identify all occurrences of Frames used in the discussion of AI. We performed extensive evaluation of the text classification techniques used in RheFrameDetect against human annotated Frames from multiple news sources. To further demonstrate the effectiveness of RheFrameDetect, we show multiple case studies depicting the Frames identified by RheFrameDetect compared against human annotated Frames.