gibson
FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation
Learning to navigate to an image-specified goal is an important but challenging task for autonomous systems like household robots. The agent is required to well understand and reason the location of the navigation goal from a picture shot in the goal position. Existing methods try to solve this problem by learning a navigation policy, which captures semantic features of the goal image and observation image independently and lastly fuses them for predicting a sequence of navigation actions. However, these methods suffer from two major limitations.
Graded strength of comparative illusions is explained by Bayesian inference
Zhang, Yuhan, Wang, Erxiao, Shain, Cory
Like visual processing, language processing is susceptible to illusions in which people systematically misperceive stimuli. In one such case--the comparative illusion (CI), e.g., More students have been to Russia than I have--comprehenders tend to judge the sentence as acceptable despite its underlying nonsensical comparison. Prior research has argued that this phenomenon can be explained as Bayesian inference over a noisy channel: the posterior probability of an interpretation of a sentence is proportional to both the prior probability of that interpretation and the likelihood of corruption into the observed (CI) sentence. Initial behavioral work has supported this claim by evaluating a narrow set of alternative interpretations of CI sentences and showing that comprehenders favor interpretations that are more likely to have been corrupted into the illusory sentence. In this study, we replicate and go substantially beyond this earlier work by directly predicting the strength of illusion with a quantitative model of the posterior probability of plausible interpretations, which we derive through a novel synthesis of statistical language models with human behavioral data. Our model explains not only the fine gradations in the strength of CI effects, but also a previously unexplained effect caused by pronominal vs. full noun phrase than-clause subjects. These findings support a noisy-channel theory of sentence comprehension by demonstrating that the theory makes novel predictions about the comparative illusion that bear out empirically. This outcome joins related evidence of noisy channel processing in both illusory and non-illusory contexts to support noisy channel inference as a unified computational-level theory of diverse language processing phenomena.
- Europe > Russia (0.26)
- Asia > Russia (0.26)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (7 more...)
- Leisure & Entertainment (1.00)
- Media (0.93)
- Health & Medicine > Therapeutic Area (0.66)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)
Perspective from a Broader Context: Can Room Style Knowledge Help Visual Floorplan Localization?
Chen, Bolei, Yan, Shengsheng, Cui, Yongzheng, Kang, Jiaxu, Zhong, Ping, Wang, Jianxin
Since a building's floorplan remains consistent over time and is inherently robust to changes in visual appearance, visual Floorplan Loc alization (FLoc) has received increasing attention from researchers. However, as a compact and minimalist representation of the building's layout, floorplans contain many repetitive structures (e.g., hallways and corners), thus easily result in ambiguous localization. Existing methods either pin their hopes on matching 2D structural cues in floorplans or rely on 3D geometry-constrained visual pre-trainings, ignoring the richer contextual information provided by visual images. In this paper, we suggest using broader visual scene context to empower FLoc algorithms with scene layout priors to eliminate localization uncertainty. In particular, we propose an unsupervised learning technique with clustering constraints to pre-train a room discriminator on self-collected unlabeled room images. Such a discriminator can empirically extract the hidden room type of the observed image and distinguish it from other room types. By injecting the scene context information summarized by the discriminator into an FLoc algorithm, the room style knowledge is effectively exploited to guide definite visual FLoc. We conducted sufficient comparative studies on two standard visual Floc benchmarks. Our experiments show that our approach outperforms state-of-the-art methods and achieves significant improvements in robustness and accuracy.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > China > Hunan Province (0.04)
Perspective from a Higher Dimension: Can 3D Geometric Priors Help Visual Floorplan Localization?
Chen, Bolei, Kang, Jiaxu, Yang, Haonan, Zhong, Ping, Wang, Jianxin
Since a building's floorplans are easily accessible, consistent over time, and inherently robust to changes in visual appearance, self-localization within the floorplan has attracted researchers' interest. However, since floorplans are minimalist representations of a building's structure, modal and geometric differences between visual perceptions and floorplans pose challenges to this task. While existing methods cleverly utilize 2D geometric features and pose filters to achieve promising performance, they fail to address the localization errors caused by frequent visual changes and view occlusions due to variously shaped 3D objects. To tackle these issues, this paper views the 2D Floorplan Localization (FLoc) problem from a higher dimension by injecting 3D geometric priors into the visual FLoc algorithm. For the 3D geometric prior modeling, we first model geometrically aware view invariance using multi-view constraints, i.e., leveraging imaging geometric principles to provide matching constraints between multiple images that see the same points. Then, we further model the view-scene aligned geometric priors, enhancing the cross-modal geometry-color correspondences by associating the scene's surface reconstruction with the RGB frames of the sequence. Both 3D priors are modeled through self-supervised contrastive learning, thus no additional geometric or semantic annotations are required. These 3D priors summarized in extensive realistic scenes bridge the modal gap while improving localization success without increasing the computational burden on the FLoc algorithm. Sufficient comparative studies demonstrate that our method significantly outperforms state-of-the-art methods and substantially boosts the FLoc accuracy. All data and code will be released after the anonymous review.
- Asia > China > Hunan Province (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Strategic resource allocation in memory encoding: An efficiency principle shaping language processing
How is the limited capacity of working memory efficiently used to support human linguistic behaviors? In this paper, we investigate strategic resource allocation as an efficiency principle for memory encoding in sentence processing. The idea is that working memory resources are dynamically and strategically allocated to prioritize novel and unexpected information, enhancing their representations to make them less susceptible to memory decay and interference. Theoretically, from a resource-rational perspective, we argue that this efficiency principle naturally arises from two functional assumptions about working memory, namely, its limited capacity and its noisy representation. Empirically, through naturalistic corpus data, we find converging evidence for strategic resource allocation in the context of dependency locality from both the production and the comprehension side, where non-local dependencies with less predictable antecedents are associated with reduced locality effect. However, our results also reveal considerable cross-linguistic variability, highlighting the need for a closer examination of how strategic resource allocation, as a universal efficiency principle, interacts with language-specific phrase structures.
- Asia > Indonesia > Bali (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.66)
'Trump Gaza' AI video intended as political satire, says creator
The creator of the viral "Trump Gaza" AI-generated video depicting the Gaza Strip as a Dubai-style paradise has said it was intended as a political satire of Trump's "megalomaniac idea". The video – posted by Trump on his Truth Social account last week – depicts a family emerging from the wreckage of war-torn Gaza into a beachside resort town lined with skyscrapers. Trump is seen sipping cocktails with a topless Benjamin Netanyahu on sun loungers, while Elon Musk tears flatbread into dips. The video first emerged in February, shortly after Trump unveiled his property development plan for Gaza, under which he said he wants to "clean out" the population of about 2 million people to create the "Riviera of the Middle East". Trump then posted the clip without any explanation on his Truth Social platform on 26 February.
- Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (1.00)
- Asia > Middle East > Israel (0.36)
- Europe > Middle East (0.25)
- (3 more...)
Engadget Podcast: iPhone 16e review and Amazon's AI-powered Alexa
The keyword for the iPhone 16e seems to be "compromise." In this episode, Devindra chats with Cherlynn about her iPhone 16e review and try to figure out who this phone is actually for. Also, they dive into Amazon's Alexa event, where we finally learned more about the company's AI-powered voice assistant. Alexa seems useful, but can we trust it? Listen below or subscribe on your podcast app of choice. If you've got suggestions or topics you'd like covered on the show, be sure to email us or drop a note in the comments! And be sure to check out our other podcast, Engadget News! Framework unveils a cheap 2-in-1 laptop and a…modular desktop? Devindra: This week, it's the iPhone 16e, which Cherlynn has reviewed. We're going to get her full thoughts on that thing. And also, Amazon held an AI event this week. We expected a lot of devices, but they spent 75 minutes talking about Alexa plus, which is the AI powered Alexa. Cherlynn: we expected a lot of devices. Cherlynn: one, at least one it's been a while. Devindra: Mr. Panos Panay was there, the father of the service and no devices, just him talking about AI. Cherlynn: Oh, and stay tuned at the end of this episode. Uh, I, we included an interview that I did with, um, the vice president of Alexa to talk more about the new Alexa plus. Devindra: Anyway, folks, if you're enjoying the show, please be sure to subscribe to us on iTunes or your podcaster of choice, leave us a review on iTunes and drop us an email at podcast@engadget.com. You can also join us on our live [00:01:00] stream on Thursday mornings, typically around 11 a. m. Um, you'll see our faces. Sometimes we'll do Q& A and show off devices as well. This week, uh, Sherilyn has the iPhone 16e, which is the least, um, impressive thing to show off. It's just like, Hey, you have an iPhone from 10 years ago, five, a while ago, Devindra: last, was there a single camera back iPhone? Cherlynn: Oh God, before that was 11. So, you know, it's like a flashback. So let's talk about this thing, Sherlynn. And I checked out your review. First of all, you gave it a really, um, I think serviceable score. Your title is what's your acceptable compromise. And really when we were talking about it last week, it really was like compromise seemed like the key word. The thing we kept coming back to was like just one camera, no mag safe, no fast wireless charging. What are your overall thoughts on this thing? Cherlynn: I mean, so that headline is like all thanks to our EIC, Aaron [00:02:00]Souppouris, because I was like, where, where do I go from here? How do I, so, so he's right. It is like, instead of what's in your wallet, it's like, what are you willing to take out your wallet? I'll tell you the story. So yesterday I was at the Amazon devices and services event where there were no devices and A bunch of other reporters had gathered and we were all like, you know, the, like, review's going up soon, right?
- North America > United States > New York (0.04)
- Asia > China (0.04)
- Leisure & Entertainment (1.00)
- Information Technology > Services (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- (2 more...)
Review for NeurIPS paper: Semantic Visual Navigation by Watching YouTube Videos
This paper proposes to leverage (mostly real-estate) unlabelled YouTube videos of egocentric navigation in indoor environments, to train the Q value function network for the high-level part of a hierarchical RL policy for goal-driven indoor robot navigation. The lower-level part relies on depth-based obstacle avoidance and planning in 2D maps. The method works in an unsupervised way by relying on two ways of augmenting the egocentric navigation video dataset: 1) extract action labels from motion classifiers and 2) extract semantic goal labels from object detection. It uses these two to 3) build experience replay tuples of (previous image, action, next image, goal) and then train the goal-conditional value function using Q-Learning. The high-level policy predicts Q values for navigating a topological graph.
A New Group Aims to Protect Whistleblowers In the Trump Era
The world needs whistleblowers, perhaps now more than ever. But whistleblowing has never been more dangerous. Jennifer Gibson has seen this problem develop up close. As a whistleblower lawyer based in the U.K., she has represented concerned insiders in the national security and tech worlds for more than a decade. She's represented family members of civilians killed by Pentagon drone strikes, and executives from top tech companies who've turned against their billionaire bosses.
- Europe > United Kingdom (0.25)
- North America > United States > California (0.05)
- Europe > France (0.05)
- Information Technology (1.00)
- Government > Military (0.90)
- Government > Regional Government > North America Government > United States Government (0.48)
- Information Technology > Communications > Social Media (0.49)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.36)
FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation
Learning to navigate to an image-specified goal is an important but challenging task for autonomous systems like household robots. The agent is required to well understand and reason the location of the navigation goal from a picture shot in the goal position. Existing methods try to solve this problem by learning a navigation policy, which captures semantic features of the goal image and observation image independently and lastly fuses them for predicting a sequence of navigation actions. However, these methods suffer from two major limitations. In this paper, we aim to overcome these limitations by designing a Fine-grained Goal Prompting (\sexyname) method for image-goal navigation. In particular, we leverage fine-grained and high-resolution feature maps in the goal image as prompts to perform conditioned embedding, which preserves detailed information in the goal image and guides the observation encoder to pay attention to goal-relevant regions.