Goto

Collaborating Authors

 zest


Scaling to Long Videos

Neural Information Processing Systems

We introduce a full-stack framework that scales up reasoning in vision-language models (VLMs) to long videos, leveraging reinforcement learning. We address the unique challenges of long video reasoning by integrating three critical components: (1) a large-scale dataset, LongVideo-Reason, comprising 104K long video QA pairs with high-quality reasoning annotations across diverse domains such as sports, games, and vlogs; (2) a two-stage training pipeline that extends VLMs with chainof-thought supervised fine-tuning (CoT-SFT) and reinforcement learning (RL); and (3) a training infrastructure for long video RL, named Multi-modal Reinforcement Sequence Parallelism (MR-SP), which incorporates sequence parallelism and a vLLM-based engine tailored for long video, using cached video embeddings for efficient rollout and prefilling. In our experiments, LongVILA-R1-7B achieves strong performance on video benchmarks, reaching 65.1% and 71.1% accuracy on VideoMME without and with subtitles, respectively, and consistently outperforming LongVILA-7B across multiple benchmarks. Moreover, LongVILA-R1-7B supports processing up to 8,192 video frames per video, and configurable FPS settings. Notably, our MR-SP system achieves up to 2.1 speedup on long video RL training. In addition, we release our training system for public availability that supports RL training on various modalities (video, text, and audio), various models (VILA and Qwen series), and even image and video generation models. On a single A100 node (8 GPUs), it supports RL training on hour-long videos (e.g., 3,600 frames). Code and models are available at https://github.com/NVlabs/Long-RL


ZeST: an LLM-based Zero-Shot Traversability Navigation for Unknown Environments

arXiv.org Artificial Intelligence

--The advancement of robotics and autonomous navigation systems hinges on the ability to accurately predict terrain traversability. Traditional methods for generating datasets to train these prediction models often involve putting robots into potentially hazardous environments, posing risks to equipment and safety. T o solve this problem, we present ZeST, a novel approach leveraging visual reasoning capabilities of Large Language Models (LLMs) to create a traversability map in real-time without exposing robots to danger . Our approach not only performs zero-shot traversability and mitigates the risks associated with real-world data collection but also accelerates the development of advanced navigation systems, offering a cost-effective and scalable solution. T o support our findings, we present navigation results, in both controlled indoor and unstructured outdoor environments. As shown in the experiments, our method provides safer navigation when compared to other state-of-the-art methods, constantly reaching the final goal. The development of autonomous navigation systems is a cornerstone of robotics, with terrain traversability prediction being a critical component [1], [2], [3], [4], [5]. Traversability prediction refers to the ability of a robot to assess whether a given terrain is passable or poses risks to its operation.


Zeroth-Order Sharpness-Aware Learning with Exponential Tilting

arXiv.org Machine Learning

Classic zeroth-order optimization approaches typically optimize for a smoothed version of the original function, i.e., the expected objective under randomly perturbed model parameters. This can be interpreted as encouraging the loss values in the perturbation set to be small on average. Popular sharpness-aware minimization (SAM) objectives, however, typically focus on the largest loss within the neighborhood to arrive at flat minima more effectively. In this work, we connect zeroth-order optimization (and its corresponding objectives) with SAM approaches explicitly, through an exponential tilting objective that provides a smooth transition between the average- and the max-loss formulations. We explore new zeroth-order algorithms to solve a soft SAM objective parameterized by a tilting parameter $t$. We provide precise characterizations of the sharpness notions of the tilted SAM framework. Practically, our approach can be used as a gradient-free and memory-efficient alternative to SAM variants, and it achieves better generalization compared to vanilla zeroth-order baselines on a wide range of downstream tasks, including classification, multiple choice QA, and language generation.


Zero-Shot Contextual Embeddings via Offline Synthetic Corpus Generation

arXiv.org Artificial Intelligence

Context-aware embedding methods boost retrieval accuracy by conditioning on corpus statistics (e.g., term co-occurrence and topical patterns) extracted from neighboring documents. However, this context-aware approach requires access to the target corpus or requires domain-specific finetuning, posing practical barriers in privacy-sensitive or resource-constrained settings. We present ZEST, a zero-shot contextual adaptation framework that replaces real corpus access with a one-time offline synthesis of a compact proxy. Given only a handful exemplar documents representative of the general target domain, we use a multi-step hierarchical procedure to generate a synthetic context corpus of several hundred documents that aims to emulate key domain-specific distributions. At inference, the frozen context-aware encoder uses this proxy corpus -- without any finetuning or target corpus access -- to produce domain-adapted embeddings. Across the MTEB benchmark, ZEST's zero-shot synthetic context adaptation using only five example documents performs within 0.5% of models leveraging full target corpus access -- demonstrating remarkable efficacy without any retraining. ZEST thus provides a practical method for deploying high-performance, adaptable embeddings in constrained environments.


Integrating a Digital Twin Concept in the Zero Emission Sea Transporter (ZEST) Project for Sustainable Maritime Transport using Stonefish Simulator

arXiv.org Artificial Intelligence

In response to stringent emission reduction targets imposed by the International Maritime Organization (IMO) and the European Green Deal's Fit for 55 legislation package, the maritime industry has shifted its focus towards decarbonization. This abstract introduces the Zero Emission Sea Transporter (ZEST) project, designed to address this issue activities: by developing a zero-emissions multi-purpose catamaran for short sea routes, shown in Figure 1. Decarbonization Technologies: ZEST provides a test The ZEST [1] is envisioned as a vessel and a multifaceted bed for various decarbonization technologies, methodologies, research platform with a broad spectrum of applications. It is a platform for evaluating objectives encompass supporting the research activities of the alternative propulsion systems, including fuel cells CMMI Cyprus Marine and Maritime Institute and its vast and hybrid systems and testing various alternative fuels partners network, serving as a testing ground for industrial in conventional internal combustion engines, such as technologies, and aiding CMMI's vocational education and gaseous and liquid bio-fuels and blends with fossil fuels. Navigational Autonomy: The project involves designing, into distinct activities, each addressing critical aspects of testing, and validating algorithms for navigational sustainable maritime transport and education and training autonomy.


RLPeri: Accelerating Visual Perimetry Test with Reinforcement Learning and Convolutional Feature Extraction

arXiv.org Artificial Intelligence

Visual perimetry is an important eye examination that helps detect vision problems caused by ocular or neurological conditions. During the test, a patient's gaze is fixed at a specific location while light stimuli of varying intensities are presented in central and peripheral vision. Based on the patient's responses to the stimuli, the visual field mapping and sensitivity are determined. However, maintaining high levels of concentration throughout the test can be challenging for patients, leading to increased examination times and decreased accuracy. In this work, we present RLPeri, a reinforcement learning-based approach to optimize visual perimetry testing. By determining the optimal sequence of locations and initial stimulus values, we aim to reduce the examination time without compromising accuracy. Additionally, we incorporate reward shaping techniques to further improve the testing performance. To monitor the patient's responses over time during testing, we represent the test's state as a pair of 3D matrices. We apply two different convolutional kernels to extract spatial features across locations as well as features across different stimulus values for each location. Through experiments, we demonstrate that our approach results in a 10-20% reduction in examination time while maintaining the accuracy as compared to state-of-the-art methods. With the presented approach, we aim to make visual perimetry testing more efficient and patient-friendly, while still providing accurate results.


ZEST: Attention-based Zero-Shot Learning for Unseen IoT Device Classification

arXiv.org Artificial Intelligence

Recent research works have proposed machine learning models for classifying IoT devices connected to a network. However, there is still a practical challenge of not having all devices (and hence their traffic) available during the training of a model. This essentially means, during the operational phase, we need to classify new devices not seen in the training phase. To address this challenge, we propose ZEST -- a ZSL (zero-shot learning) framework based on self-attention for classifying both seen and unseen devices. ZEST consists of i) a self-attention based network feature extractor, termed SANE, for extracting latent space representations of IoT traffic, ii) a generative model that trains a decoder using latent features to generate pseudo data, and iii) a supervised model that is trained on the generated pseudo data for classifying devices. We carry out extensive experiments on real IoT traffic data; our experiments demonstrate i) ZEST achieves significant improvement (in terms of accuracy) over the baselines; ii) SANE is able to better extract meaningful representations than LSTM which has been commonly used for modeling network traffic.


Top 25 Machine Learning Startups To Watch In 2021 Based On Crunchbase

#artificialintelligence

Throughout 2020, venture capital firms continued expanding into new global markets, with London, New York, Tel Aviv, Toronto, Boston, Seattle and Singapore startups receiving increased funding. Out of the 79 most popular A.I. & ML startup locations, 15 are in the San Francisco Bay Area, making that region home to 19% of startups who received funding in the last year. Israel's Tel Aviv region has 37 startups who received venture funding over the last year, including those launched in Herzliya, a region of the city known for its robust startup and entrepreneurial culture. Please see the Roundup Of Machine Learning Forecasts And Market Estimates, 2020 for additional market research on A.I. and machine learning. The following graphic compares the top 10 most popular locations for A.I. & ML startups globally based on Crunchbase data as of today: Augury – Augury combines real-time monitoring data from production machinery with AI and machine learning algorithms to determine machine health, asset performance management (APM) and predictive maintenance (PdM) to provide manufacturing companies with new insights into their operations.


How You Can Tell If An AI Startup Is Bogus

#artificialintelligence

Written by Zest AI CTO Jay Budzik. Zest's ZAML software uses machine learning technology to help lenders make more effective credit decisions safely, fairly and transparently. Founded by Google CIO Douglas Merrill and backed by Matrix Partners, Lightspeed, Upfront, Flybridge and Baidu, Zest works with finance companies worldwide to help more people access fair and transparent credit. It's been a year since MMC Ventures printed the accidental finding that 40 percent of AI startups had no material use of AI in their tech stack. As an AI company CTO, I can tell you the buzz can be deafening.


10 Ways AI Is Going To Improve Fintech In 2020

#artificialintelligence

Bottom Line: AI & machine learning will improve Fintech in 2020 by increasing the accuracy and personalization of payment, lending, and insurance services while also helping to discover new borrower pools. Zest.ai's 2020 Predictions For AI In Credit And Lending captures the gradual improvements I've also been seeing across Fintech, especially at the tech stack level. Fintech startups, enterprise software providers, and the investors backing them believe cloud-based payments, lending, and insurance apps are must-haves to drive future growth. Combined with Internet & public cloud infrastructure and mobile apps, Fintech is evolving into a fourth platform that provides embedded financial services to any business needing to subscribe to them, as Matt Harris of Bain Capital Ventures writes in Fintech: The Fourth Platform – Part Two. Embedded Fintech has the potential to deliver $3.6 trillion in market value, according to Bain's estimates, surpassing the $3 trillion in value created by cloud and mobile platforms.