visor
- Leisure & Entertainment > Games (1.00)
- Law (1.00)
- Information Technology (1.00)
- Leisure & Entertainment > Games (1.00)
- Law (1.00)
- Information Technology (1.00)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Michigan (0.04)
- Law (1.00)
- Information Technology (0.92)
- Government (0.67)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Michigan (0.04)
- Europe > United Kingdom > England > Bristol (0.04)
VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models
Phute, Mansi, Balakrishnan, Ravikumar
Vision Language Models (VLMs) are increasingly being used in a broad range of applications, bringing their security and behavioral control to the forefront. While existing approaches for behavioral control or output redirection, like system prompting in VLMs, are easily detectable and often ineffective, activation-based steering vectors require invasive runtime access to model internals--incompatible with API-based services and closed-source deployments. We introduce VISOR (Visual Input-based Steering for Output Redirection), a novel method that achieves sophisticated behavioral control through optimized visual inputs alone. By crafting universal steering images that induce target activation patterns, VISOR enables practical deployment across all VLM serving modalities while remaining imperceptible compared to explicit textual instructions. We validate VISOR on LLaVA-1.5-7B across three critical alignment tasks: refusal, sycophancy and survival instinct. A single 150KB steering image matches steering vector performance within 1-2% for positive behavioral shifts while dramatically exceeding it for negative steering--achieving up to 25% shifts from baseline compared to steering vectors' modest changes. Unlike system prompting (3-4% shifts), VISOR provides robust bidirectional control while maintaining 99.9% performance on 14,000 unrelated MMLU tasks. Beyond eliminating runtime overhead and model access requirements, VISOR exposes a critical security vulnerability: adversaries can achieve sophisticated behavioral manipulation through visual channels alone, bypassing text-based defenses. Our work fundamentally re-imagines multimodal model control and highlights the urgent need for defenses against visual steering attacks.
- North America > United States > Virginia (0.04)
- North America > Canada > Alberta > Census Division No. 19 > Saddle Hills County (0.04)
Leveraging Tactile Sensing to Render both Haptic Feedback and Virtual Reality 3D Object Reconstruction in Robotic Telemanipulation
Giudici, Gabriele, Bonzini, Aramis Augusto, Coppola, Claudio, Althoefer, Kaspar, Farkhatdinov, Ildar, Jamone, Lorenzo
Abstract-- Dexterous robotic manipulator teleoperation is widely used in many applications, either where it is convenient to keep the human inside the control loop, or to train advanced robot agents. So far, this technology has been used in combination with camera systems with remarkable success. On the other hand, only a limited number of studies have focused on leveraging haptic feedback from tactile sensors in contexts where camera-based systems fail, such as due to self-occlusions or poor light conditions like smoke. This study demonstrates the feasibility of precise pick-and-place teleoperation without cameras by leveraging tactile-based 3D object reconstruction in VR and providing haptic feedback to a blindfolded user. Our preliminary results show that integrating these technologies enables the successful completion of telemanipulation tasks previously dependent on cameras, paving the way for more complex future applications.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)
The Immersed Visor aims for spatial computing's sweet spot
The Immersed Visor aims for spatial computing's sweet spot The $1,050 device has 4K per-eye resolution and weighs less than an iPhone 16 Pro. An Austin-based startup best known for its VR and mixed reality workspace software for other companies' headsets now has hardware of its own. The Immersed Visor appears to sit somewhere between a Vision Pro Lite and Xreal Plus: a lightweight head-worn device that creates a high-resolution spatial computing environment on the cheap (well, relatively speaking). Teased to death for months, Immersed founder Renji Bijoy finally unveiled the Visor at an Austin event on Thursday. The device, a bit more than glasses but much less than a full headset, gives each eye the equivalent of a 4K OLED screen.
Benchmarking Spatial Relationships in Text-to-Image Generation
Gokhale, Tejas, Palangi, Hamid, Nushi, Besmira, Vineet, Vibhav, Horvitz, Eric, Kamar, Ece, Baral, Chitta, Yang, Yezhou
Spatial understanding is a fundamental aspect of computer vision and integral for human-level reasoning about images, making it an important component for grounded language understanding. While recent text-to-image synthesis (T2I) models have shown unprecedented improvements in photorealism, it is unclear whether they have reliable spatial understanding capabilities. We investigate the ability of T2I models to generate correct spatial relationships among objects and present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image. To benchmark existing models, we introduce a dataset, $\mathrm{SR}_{2D}$, that contains sentences describing two or more objects and the spatial relationships between them. We construct an automated evaluation pipeline to recognize objects and their spatial relationships, and employ it in a large-scale evaluation of T2I models. Our experiments reveal a surprising finding that, although state-of-the-art T2I models exhibit high image quality, they are severely limited in their ability to generate multiple objects or the specified spatial relations between them. Our analyses demonstrate several biases and artifacts of T2I models such as the difficulty with generating multiple objects, a bias towards generating the first object mentioned, spatially inconsistent outputs for equivalent relationships, and a correlation between object co-occurrence and spatial understanding capabilities. We conduct a human study that shows the alignment between VISOR and human judgement about spatial understanding. We offer the $\mathrm{SR}_{2D}$ dataset and the VISOR metric to the community in support of T2I reasoning research.
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- (11 more...)
- Transportation (0.46)
- Leisure & Entertainment (0.46)
EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations
Darkhalil, Ahmad, Shan, Dandan, Zhu, Bin, Ma, Jian, Kar, Amlan, Higgins, Richard, Fidler, Sanja, Fouhey, David, Damen, Dima
We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. In total, we publicly release 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, 67K hand-object relations, covering 36 hours of 179 untrimmed videos. Along with the annotations, we introduce three challenges in video object segmentation, interaction understanding and long-term reasoning. For data, code and leaderboards: http://epic-kitchens.github.io/VISOR
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Michigan (0.04)
- Europe > United Kingdom > England > Bristol (0.04)
- Law (1.00)
- Information Technology (1.00)
- Government (0.67)
Forget vacuum cleaners! Dyson is secretly developing ROBOTS to carry out household chores
While Dyson is best known for its vacuum cleaners and hairdryers, the tech giant has revealed that it is secretly developing a range of futuristic robots. The robots are designed to carry out a variety of household chores, including tidying up toys and doing the dishes. Dyson has given a glimpse of the new robot prototypes in a video released at the International Conference on Robotics and Automation in Philadelphia today. Jake Dyson, Chief Engineer at Dyson, said: 'There's a big future in robotics and saving people time, performing chores for people, and improving daily lives. 'I'm a parent, I spend half my life cleaning up after my kids, and it's pretty tedious.'
- Appliances & Durable Goods (0.64)
- Information Technology (0.57)