Goto

Collaborating Authors

 finn



076c3e48fa502c660902105965fdd9f6-Paper-Conference.pdf

Neural Information Processing Systems

We compare to state-of-the-art imitation learning and LVM baselines and see that QueST'sarchitecture leads to strongperformance on several multitask and fewshot learning benchmarks. Further results and videos are available athttps: //quest-model.github.io.


10 Open Challenges Steering the Future of Vision-Language-Action Models

Poria, Soujanya, Majumder, Navonil, Hung, Chia-Yu, Bagherzadeh, Amir Ali, Li, Chuan, Kwok, Kenneth, Wang, Ziwei, Tan, Cheston, Wu, Jiajun, Hsu, David

arXiv.org Artificial Intelligence

Due to their ability of follow natural language instructions, vision-language-action (VLA) models are increasingly prevalent in the embodied AI arena, following the widespread success of their precursors -- LLMs and VLMs. In this paper, we discuss 10 principal milestones in the ongoing development of VLA models -- multimodality, reasoning, data, evaluation, cross-robot action generalization, efficiency, whole-body coordination, safety, agents, and coordination with humans. Furthermore, we discuss the emerging trends of using spatial understanding, modeling world dynamics, post training, and data synthesis -- all aiming to reach these milestones. Through these discussions, we hope to bring attention to the research avenues that may accelerate the development of VLA models into wider acceptability.


A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning

Zhai, Shaopeng, Zhang, Qi, Zhang, Tianyi, Huang, Fuxian, Zhang, Haoran, Zhou, Ming, Zhang, Shengzhe, Liu, Litao, Lin, Sixu, Pang, Jiangmiao

arXiv.org Artificial Intelligence

Robotic real-world reinforcement learning (RL) with vision-language-action (VLA) models is bottlenecked by sparse, handcrafted rewards and inefficient exploration. We introduce VLAC, a general process reward model built upon InternVL and trained on large scale heterogeneous datasets. Given pairwise observations and a language goal, it outputs dense progress delta and done signal, eliminating task-specific reward engineering, and supports one-shot in-context transfer to unseen tasks and environments. VLAC is trained on vision-language datasets to strengthen perception, dialogic and reasoning capabilities, together with robot and human trajectories data that ground action generation and progress estimation, and additionally strengthened to reject irrelevant prompts as well as detect regression or stagnation by constructing large numbers of negative and semantically mismatched samples. With prompt control, a single VLAC model alternately generating reward and action tokens, unifying critic and policy. Deployed inside an asynchronous real-world RL loop, we layer a graded human-in-the-loop protocol (offline demonstration replay, return and explore, human guided explore) that accelerates exploration and stabilizes early learning. Across four distinct real-world manipulation tasks, VLAC lifts success rates from about 30\% to about 90\% within 200 real-world interaction episodes; incorporating human-in-the-loop interventions yields a further 50% improvement in sample efficiency and achieves up to 100% final success.


85b9a5ac91cd629bd3afe396ec07270a-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their time, feedback and highly encouraging comments. If the reviewer recommends, we will add a sensitivity analysis for network sizes to the Appendix. We shall remove this figure if it is not considered informative by the reviewers. RL, where models learnt online from a temporal data stream should undergo considerable forgetting . R1: Lookahead search: We added the following: "In optimisation literature, lookahead search usually evaluates the These proposals are then modified based on evaluated fitness to make an actual update.


A Representation Engineering Perspective on the Effectiveness of Multi-Turn Jailbreaks

Bullwinkel, Blake, Russinovich, Mark, Salem, Ahmed, Zanella-Beguelin, Santiago, Jones, Daniel, Severi, Giorgio, Kim, Eugenia, Hines, Keegan, Minnich, Amanda, Zunger, Yonatan, Kumar, Ram Shankar Siva

arXiv.org Artificial Intelligence

Recent research has demonstrated that state-of-the-art LLMs and defenses remain susceptible to multi-turn jailbreak attacks. These attacks require only closed-box model access and are often easy to perform manually, posing a significant threat to the safe and secure deployment of LLM-based systems. We study the effectiveness of the Crescendo multi-turn jailbreak at the level of intermediate model representations and find that safety-aligned LMs often represent Crescendo responses as more benign than harmful, especially as the number of conversation turns increases. Our analysis indicates that at each turn, Crescendo prompts tend to keep model outputs in a "benign" region of representation space, effectively tricking the model into fulfilling harmful requests. Further, our results help explain why single-turn jailbreak defenses like circuit breakers are generally ineffective against multi-turn attacks, motivating the development of mitigations that address this generalization gap.


Red grape detection with accelerated artificial neural networks in the FPGA's programmable logic

Magalhães, Sandro Costa, Almeida, Marco, Santos, Filipe Neves dos, Moreira, António Paulo, Dias, Jorge

arXiv.org Artificial Intelligence

Robots usually slow down for canning to detect objects while moving. Additionally, the robot's camera is configured with a low framerate to track the velocity of the detection algorithms. This would be constrained while executing tasks and exploring, making robots increase the task execution time. AMD has developed the Vitis-AI framework to deploy detection algorithms into FPGAs. However, this tool does not fully use the FPGAs' PL. In this work, we use the FINN architecture to deploy three ANNs, MobileNet v1 with 4-bit quantisation, CNV with 2-bit quantisation, and CNV with 1-bit quantisation (BNN), inside an FPGA's PL. The models were trained on the RG2C dataset. This is a self-acquired dataset released in open access. MobileNet v1 performed better, reaching a success rate of 98 % and an inference speed of 6611 FPS. In this work, we proved that we can use FPGAs to speed up ANNs and make them suitable for attention mechanisms.


Biotech firm aims to create 'ChatGPT of biology' – will it work?

New Scientist

A British biotech firm called Basecamp Research has spent the past few years collecting troves of genetic data from microbes living in extreme environments around the world, identifying more than a million species and nearly 10 billion genes new to science. It claims that this massive database of the planet's biodiversity will help train a "ChatGPT of biology" that will answer questions about life on Earth – but there's no guarantee this will work. A hydrogen fuel revolution is coming – here's why we might not want it Jörg Overmann at the Leibniz Institute DSMZ in Germany, which houses one of the world's most diverse collections of microbial cultures, says increasing known genetic sequences is valuable, but may not result in useful findings for things like drug discovery or chemistry without more information about the organisms from which they were collected. "I'm not convinced that in the end the understanding of really novel functions will be accelerated by this brute-force increase in the sequence space," he says. Recent years have seen researchers develop a number of machine learning models trained to identify patterns and predict relationships amid vast amounts of biological data.


Perception-Informed Neural Networks: Beyond Physics-Informed Neural Networks

Mazandarani, Mehran, Najariyan, Marzieh

arXiv.org Artificial Intelligence

This article introduces Perception-Informed Neural Networks (PrINNs), a framework designed to incorporate perception-based information into neural networks, addressing both systems with known and unknown physics laws or differential equations. Moreover, PrINNs extend the concept of Physics-Informed Neural Networks (PINNs) and their variants, offering a platform for the integration of diverse forms of perception precisiation, including singular, probability distribution, possibility distribution, interval, and fuzzy graph. In fact, PrINNs allow neural networks to model dynamical systems by integrating expert knowledge and perception-based information through loss functions, enabling the creation of modern data-driven models. Some of the key contributions include Mixture of Experts Informed Neural Networks (MOEINNs), which combine heterogeneous expert knowledge into the network, and Transformed-Knowledge Informed Neural Networks (TKINNs), which facilitate the incorporation of meta-information for enhanced model performance. Additionally, Fuzzy-Informed Neural Networks (FINNs) as a modern class of fuzzy deep neural networks leverage fuzzy logic constraints within a deep learning architecture, allowing online training without pre-training and eliminating the need for defuzzification. PrINNs represent a significant step forward in bridging the gap between traditional physics-based modeling and modern data-driven approaches, enabling neural networks to learn from both structured physics laws and flexible perception-based rules. This approach empowers neural networks to operate in uncertain environments, model complex systems, and discover new forms of differential equations, making PrINNs a powerful tool for advancing computational science and engineering.


The AI Black-Scholes: Finance-Informed Neural Network

Aboussalah, Amine M., Li, Xuanze, Chi, Cheng, Patel, Raj

arXiv.org Machine Learning

In the realm of option pricing, existing models are typically classified into principle-driven methods, such as solving partial differential equations (PDEs) that pricing function satisfies, and data-driven approaches, such as machine learning (ML) techniques that parameterize the pricing function directly. While principle-driven models offer a rigorous theoretical framework, they often rely on unrealistic assumptions, such as asset processes adhering to fixed stochastic differential equations (SDEs). Moreover, they can become computationally intensive, particularly in high-dimensional settings when analytical solutions are not available and thus numerical solutions are needed. In contrast, data-driven models excel in capturing market data trends, but they often lack alignment with core financial principles, raising concerns about interpretability and predictive accuracy, especially when dealing with limited or biased datasets. This work proposes a hybrid approach to address these limitations by integrating the strengths of both principled and data-driven methodologies. Our framework combines the theoretical rigor and interpretability of PDE-based models with the adaptability of machine learning techniques, yielding a more versatile methodology for pricing a broad spectrum of options. We validate our approach across different volatility modeling approaches-both with constant volatility (Black-Scholes) and stochastic volatility (Heston), demonstrating that our proposed framework, Finance-Informed Neural Network (FINN), not only enhances predictive accuracy but also maintains adherence to core financial principles. FINN presents a promising tool for practitioners, offering robust performance across a variety of market conditions.