Goto

Collaborating Authors

 finn


DistributionallyAdaptiveMetaReinforcement Learning

Neural Information Processing Systems

The diversity and dynamism of the real world require reinforcement learning (RL) agents that can quickly adapt and learn new behaviors when placed in novel situations.



076c3e48fa502c660902105965fdd9f6-Paper-Conference.pdf

Neural Information Processing Systems

We compare to state-of-the-art imitation learning and LVM baselines and see that QueST'sarchitecture leads to strongperformance on several multitask and fewshot learning benchmarks. Further results and videos are available athttps: //quest-model.github.io.


KDPE: A Kernel Density Estimation Strategy for Diffusion Policy Trajectory Selection

arXiv.org Artificial Intelligence

Learning robot policies that capture multimodality in the training data has been a long-standing open challenge for behavior cloning. Recent approaches tackle the problem by modeling the conditional action distribution with generative models. One of these approaches is Diffusion Policy, which relies on a diffusion model to denoise random points into robot action trajectories. While achieving state-of-the-art performance, it has two main drawbacks that may lead the robot out of the data distribution during policy execution. First, the stochasticity of the denoising process can highly impact on the quality of generated trajectory of actions. Second, being a supervised learning approach, it can learn data outliers from the dataset used for training. Recent work focuses on mitigating these limitations by combining Diffusion Policy either with large-scale training or with classical behavior cloning algorithms. Instead, we propose KDPE, a Kernel Density Estimation-based strategy that filters out potentially harmful trajectories output of Diffusion Policy while keeping a low test-time computational overhead. For Kernel Density Estimation, we propose a manifold-aware kernel to model a probability density function for actions composed of end-effector Cartesian position, orientation, and gripper state. KDPE overall achieves better performance than Diffusion Policy on simulated single-arm tasks and real robot experiments. Additional material and code are available on our project page at https://hsp-iit.github.io/KDPE/.


10 Open Challenges Steering the Future of Vision-Language-Action Models

arXiv.org Artificial Intelligence

Due to their ability of follow natural language instructions, vision-language-action (VLA) models are increasingly prevalent in the embodied AI arena, following the widespread success of their precursors -- LLMs and VLMs. In this paper, we discuss 10 principal milestones in the ongoing development of VLA models -- multimodality, reasoning, data, evaluation, cross-robot action generalization, efficiency, whole-body coordination, safety, agents, and coordination with humans. Furthermore, we discuss the emerging trends of using spatial understanding, modeling world dynamics, post training, and data synthesis -- all aiming to reach these milestones. Through these discussions, we hope to bring attention to the research avenues that may accelerate the development of VLA models into wider acceptability.


A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning

arXiv.org Artificial Intelligence

Robotic real-world reinforcement learning (RL) with vision-language-action (VLA) models is bottlenecked by sparse, handcrafted rewards and inefficient exploration. We introduce VLAC, a general process reward model built upon InternVL and trained on large scale heterogeneous datasets. Given pairwise observations and a language goal, it outputs dense progress delta and done signal, eliminating task-specific reward engineering, and supports one-shot in-context transfer to unseen tasks and environments. VLAC is trained on vision-language datasets to strengthen perception, dialogic and reasoning capabilities, together with robot and human trajectories data that ground action generation and progress estimation, and additionally strengthened to reject irrelevant prompts as well as detect regression or stagnation by constructing large numbers of negative and semantically mismatched samples. With prompt control, a single VLAC model alternately generating reward and action tokens, unifying critic and policy. Deployed inside an asynchronous real-world RL loop, we layer a graded human-in-the-loop protocol (offline demonstration replay, return and explore, human guided explore) that accelerates exploration and stabilizes early learning. Across four distinct real-world manipulation tasks, VLAC lifts success rates from about 30\% to about 90\% within 200 real-world interaction episodes; incorporating human-in-the-loop interventions yields a further 50% improvement in sample efficiency and achieves up to 100% final success.


85b9a5ac91cd629bd3afe396ec07270a-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their time, feedback and highly encouraging comments. If the reviewer recommends, we will add a sensitivity analysis for network sizes to the Appendix. We shall remove this figure if it is not considered informative by the reviewers. RL, where models learnt online from a temporal data stream should undergo considerable forgetting . R1: Lookahead search: We added the following: "In optimisation literature, lookahead search usually evaluates the These proposals are then modified based on evaluated fitness to make an actual update.


A Representation Engineering Perspective on the Effectiveness of Multi-Turn Jailbreaks

arXiv.org Artificial Intelligence

Recent research has demonstrated that state-of-the-art LLMs and defenses remain susceptible to multi-turn jailbreak attacks. These attacks require only closed-box model access and are often easy to perform manually, posing a significant threat to the safe and secure deployment of LLM-based systems. We study the effectiveness of the Crescendo multi-turn jailbreak at the level of intermediate model representations and find that safety-aligned LMs often represent Crescendo responses as more benign than harmful, especially as the number of conversation turns increases. Our analysis indicates that at each turn, Crescendo prompts tend to keep model outputs in a "benign" region of representation space, effectively tricking the model into fulfilling harmful requests. Further, our results help explain why single-turn jailbreak defenses like circuit breakers are generally ineffective against multi-turn attacks, motivating the development of mitigations that address this generalization gap.


Red grape detection with accelerated artificial neural networks in the FPGA's programmable logic

arXiv.org Artificial Intelligence

Robots usually slow down for canning to detect objects while moving. Additionally, the robot's camera is configured with a low framerate to track the velocity of the detection algorithms. This would be constrained while executing tasks and exploring, making robots increase the task execution time. AMD has developed the Vitis-AI framework to deploy detection algorithms into FPGAs. However, this tool does not fully use the FPGAs' PL. In this work, we use the FINN architecture to deploy three ANNs, MobileNet v1 with 4-bit quantisation, CNV with 2-bit quantisation, and CNV with 1-bit quantisation (BNN), inside an FPGA's PL. The models were trained on the RG2C dataset. This is a self-acquired dataset released in open access. MobileNet v1 performed better, reaching a success rate of 98 % and an inference speed of 6611 FPS. In this work, we proved that we can use FPGAs to speed up ANNs and make them suitable for attention mechanisms.


Biotech firm aims to create 'ChatGPT of biology' – will it work?

New Scientist

A British biotech firm called Basecamp Research has spent the past few years collecting troves of genetic data from microbes living in extreme environments around the world, identifying more than a million species and nearly 10 billion genes new to science. It claims that this massive database of the planet's biodiversity will help train a "ChatGPT of biology" that will answer questions about life on Earth – but there's no guarantee this will work. A hydrogen fuel revolution is coming – here's why we might not want it Jörg Overmann at the Leibniz Institute DSMZ in Germany, which houses one of the world's most diverse collections of microbial cultures, says increasing known genetic sequences is valuable, but may not result in useful findings for things like drug discovery or chemistry without more information about the organisms from which they were collected. "I'm not convinced that in the end the understanding of really novel functions will be accelerated by this brute-force increase in the sequence space," he says. Recent years have seen researchers develop a number of machine learning models trained to identify patterns and predict relationships amid vast amounts of biological data.