Sarkar, Soumik
Optimizing Navigation And Chemical Application in Precision Agriculture With Deep Reinforcement Learning And Conditional Action Tree
Khosravi, Mahsa, Jiang, Zhanhong, Waite, Joshua R, Jonesc, Sarah, Torres, Hernan, Singh, Arti, Ganapathysubramanian, Baskar, Singh, Asheesh Kumar, Sarkar, Soumik
We introduce a domain-specific reward mechanism that maximizes yield recovery while minimizing chemical usage by effectively handling noisy infection data and enforcing physical field constraints via action masking. We conduct a rigorous empirical evaluation across diverse, realistic biotic stress scenarios, capturing varying infection distributions and severity levels in row-crop fields. The proposed scheme is evaluated thoroughly, showing the framework's effectiveness and robustness. Experimental results demonstrate that our approach significantly reduces non-target spraying, chemical consumption, and operational costs compared to baseline methods. Optimizing Navigation And Chemical Application in Precision Agriculture With Deep Reinforcement Learning And Conditional Action Tree Mahsa Khosravi a, Zhanhong Jiang b, Joshua R Waite b, Sarah Jones c, Hernan Torres c, Arti Singh c, Baskar Ganapathysubramanian b, Asheesh Kumar Singh c, Soumik Sarkar b a Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, Iowa, USA b Department of Mechanical Engineering, Iowa State University, Ames, Iowa, USA c Department of Agronomy, Iowa State University, Ames, Iowa, USAAbstract This paper presents a novel reinforcement learning (RL)-based planning scheme for optimized robotic management of biotic stresses in precision agriculture.
FUSE: First-Order and Second-Order Unified SynthEsis in Stochastic Optimization
Jiang, Zhanhong, Hasan, Md Zahid, Balu, Aditya, Waite, Joshua R., Huang, Genyi, Sarkar, Soumik
Stochastic optimization methods have actively been playing a critical role in modern machine learning algorithms to deliver decent performance. While numerous works have proposed and developed diverse approaches, first-order and second-order methods are in entirely different situations. The former is significantly pivotal and dominating in emerging deep learning but only leads convergence to a stationary point. However, second-order methods are less popular due to their computational intensity in large-dimensional problems. This paper presents a novel method that leverages both the first-order and second-order methods in a unified algorithmic framework, termed FUSE, from which a practical version (PV) is derived accordingly. FUSE-PV stands as a simple yet efficient optimization method involving a switch-over between first and second orders. Additionally, we develop different criteria that determine when to switch. FUSE-PV has provably shown a smaller computational complexity than SGD and Adam. To validate our proposed scheme, we present an ablation study on several simple test functions and show a comparison with baselines for benchmark datasets.
Enhancing PPO with Trajectory-Aware Hybrid Policies
Liu, Qisai, Jiang, Zhanhong, Yang, Hsin-Jung, Khosravi, Mahsa, Waite, Joshua R., Sarkar, Soumik
Proximal policy optimization (PPO) is one of the most popular state-of-the-art on-policy algorithms that has become a standard baseline in modern reinforcement learning with applications in numerous fields. Though it delivers stable performance with theoretical policy improvement guarantees, high variance, and high sample complexity still remain critical challenges in on-policy algorithms. To alleviate these issues, we propose Hybrid-Policy Proximal Policy Optimization (HP3O), which utilizes a trajectory replay buffer to make efficient use of trajectories generated by recent policies. Particularly, the buffer applies the "first in, first out" (FIFO) strategy so as to keep only the recent trajectories to attenuate the data distribution drift. A batch consisting of the trajectory with the best return and other randomly sampled ones from the buffer is used for updating the policy networks. The strategy helps the agent to improve its capability on top of the most recent best performance and in turn reduce variance empirically. We theoretically construct the policy improvement guarantees for the proposed algorithm. HP3O is validated and compared against several baseline algorithms using multiple continuous control environments. Our code is available here.
RLS3: RL-Based Synthetic Sample Selection to Enhance Spatial Reasoning in Vision-Language Models for Indoor Autonomous Perception
Waite, Joshua R., Hasan, Md. Zahid, Liu, Qisai, Jiang, Zhanhong, Hegde, Chinmay, Sarkar, Soumik
Vision-language model (VLM) fine-tuning for application-specific visual grounding based on natural language instructions has become one of the most popular approaches for learning-enabled autonomous systems. However, such fine-tuning relies heavily on high-quality datasets to achieve successful performance in various downstream tasks. Additionally, VLMs often encounter limitations due to insufficient and imbalanced fine-tuning data. To address these issues, we propose a new generalizable framework to improve VLM fine-tuning by integrating it with a reinforcement learning (RL) agent. Our method utilizes the RL agent to manipulate objects within an indoor setting to create synthetic data for fine-tuning to address certain vulnerabilities of the VLM. Specifically, we use the performance of the VLM to provide feedback to the RL agent to generate informative data that efficiently fine-tune the VLM over the targeted task (e.g. spatial reasoning). The key contribution of this work is developing a framework where the RL agent serves as an informative data sampling tool and assists the VLM in order to enhance performance and address task-specific vulnerabilities. By targeting the data sampling process to address the weaknesses of the VLM, we can effectively train a more context-aware model. In addition, generating synthetic data allows us to have precise control over each scene and generate granular ground truth captions. Our results show that the proposed data generation approach improves the spatial reasoning performance of VLMs, which demonstrates the benefits of using RL-guided data generation in vision-language tasks.
Procedural Generation of 3D Maize Plant Architecture from LIDAR Data
Hadadi, Mozhgan, Saraeian, Mehdi, Godbersen, Jackson, Jubery, Talukder, Li, Yawei, Attigala, Lakshmi, Balu, Aditya, Sarkar, Soumik, Schnable, Patrick S., Krishnamurthy, Adarsh, Ganapathysubramanian, Baskar
This study introduces a robust framework for generating procedural 3D models of maize (Zea mays) plants from LiDAR point cloud data, offering a scalable alternative to traditional field-based phenotyping. Our framework leverages Non-Uniform Rational B-Spline (NURBS) surfaces to model the leaves of maize plants, combining Particle Swarm Optimization (PSO) for an initial approximation of the surface and a differentiable programming framework for precise refinement of the surface to fit the point cloud data. In the first optimization phase, PSO generates an approximate NURBS surface by optimizing its control points, aligning the surface with the LiDAR data, and providing a reliable starting point for refinement. The second phase uses NURBS-Diff, a differentiable programming framework, to enhance the accuracy of the initial fit by refining the surface geometry and capturing intricate leaf details. Our results demonstrate that, while PSO establishes a robust initial fit, the integration of differentiable NURBS significantly improves the overall quality and fidelity of the reconstructed surface. This hierarchical optimization strategy enables accurate 3D reconstruction of maize leaves across diverse genotypes, facilitating the subsequent extraction of complex traits like phyllotaxy. We demonstrate our approach on diverse genotypes of field-grown maize plants. All our codes are open-source to democratize these phenotyping approaches.
STITCH: Surface reconstrucTion using Implicit neural representations with Topology Constraints and persistent Homology
Jignasu, Anushrut, Herron, Ethan, Jiang, Zhanhong, Sarkar, Soumik, Hegde, Chinmay, Ganapathysubramanian, Baskar, Balu, Aditya, Krishnamurthy, Adarsh
We present STITCH, a novel approach for neural implicit surface reconstruction of a sparse and irregularly spaced point cloud while enforcing topological constraints (such as having a single connected component). We develop a new differentiable framework based on persistent homology to formulate topological loss terms that enforce the prior of a single 2-manifold object. Our method demonstrates excellent performance in preserving the topology of complex 3D geometries, evident through both visual and empirical comparisons. We supplement this with a theoretical analysis, and provably show that optimizing the loss with stochastic (sub)gradient descent leads to convergence and enables reconstructing shapes with a single connected component. Our approach showcases the integration of differentiable topological data analysis tools for implicit surface reconstruction.
Geometry Matters: Benchmarking Scientific ML Approaches for Flow Prediction around Complex Geometries
Rabeh, Ali, Herron, Ethan, Balu, Aditya, Sarkar, Soumik, Hegde, Chinmay, Krishnamurthy, Adarsh, Ganapathysubramanian, Baskar
Rapid yet accurate simulations of fluid dynamics around complex geometries is critical in a variety of engineering and scientific applications, including aerodynamics and biomedical flows. However, while scientific machine learning (SciML) has shown promise, most studies are constrained to simple geometries, leaving complex, real-world scenarios underexplored. This study addresses this gap by benchmarking diverse SciML models, including neural operators and vision transformer-based foundation models, for fluid flow prediction over intricate geometries. Using a high-fidelity dataset of steady-state flows across various geometries, we evaluate the impact of geometric representations -- Signed Distance Fields (SDF) and binary masks -- on model accuracy, scalability, and generalization. Central to this effort is the introduction of a novel, unified scoring framework that integrates metrics for global accuracy, boundary layer fidelity, and physical consistency to enable a robust, comparative evaluation of model performance. Our findings demonstrate that foundation models significantly outperform neural operators, particularly in data-limited scenarios, and that SDF representations yield superior results with sufficient training data. Despite these advancements, all models struggle with out-of-distribution generalization, highlighting a critical challenge for future SciML applications. By advancing both evaluation methodologies and modeling capabilities, this work paves the way for robust and scalable ML solutions for fluid dynamics across complex geometries.
FAWAC: Feasibility Informed Advantage Weighted Regression for Persistent Safety in Offline Reinforcement Learning
Koirala, Prajwal, Jiang, Zhanhong, Sarkar, Soumik, Fleming, Cody
Safe offline reinforcement learning aims to learn policies that maximize cumulative rewards while adhering to safety constraints, using only offline data for training. A key challenge is balancing safety and performance, particularly when the policy encounters out-of-distribution (OOD) states and actions, which can lead to safety violations or overly conservative behavior during deployment. To address these challenges, we introduce Feasibility Informed Advantage Weighted Actor-Critic (FAWAC), a method that prioritizes persistent safety in constrained Markov decision processes (CMDPs). FAWAC formulates policy optimization with feasibility conditions derived specifically for offline datasets, enabling safe policy updates in non-parametric policy space, followed by projection into parametric space for constrained actor training. By incorporating a cost-advantage term into Advantage Weighted Regression (AWR), FAWAC ensures that the safety constraints are respected while maximizing performance. Additionally, we propose a strategy to address a more challenging class of problems that involves tempting datasets where trajectories are predominantly high-rewarded but unsafe. Empirical evaluations on standard benchmarks demonstrate that FAWAC achieves strong results, effectively balancing safety and performance in learning policies from the static datasets.
Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning
Koirala, Prajwal, Jiang, Zhanhong, Sarkar, Soumik, Fleming, Cody
In safe offline reinforcement learning (RL), the objective is to develop a policy that maximizes cumulative rewards while strictly adhering to safety constraints, utilizing only offline data. Traditional methods often face difficulties in balancing these constraints, leading to either diminished performance or increased safety risks. We address these issues with a novel approach that begins by learning a conservatively safe policy through the use of Conditional Variational Autoencoders, which model the latent safety constraints. Subsequently, we frame this as a Constrained Reward-Return Maximization problem, wherein the policy aims to optimize rewards while complying with the inferred latent safety constraints. This is achieved by training an encoder with a reward-Advantage Weighted Regression objective within the latent constraint space. Our methodology is supported by theoretical analysis, including bounds on policy performance and sample complexity. Extensive empirical evaluation on benchmark datasets, including challenging autonomous driving scenarios, demonstrates that our approach not only maintains safety compliance but also excels in cumulative reward optimization, surpassing existing methods. Additional visualizations provide further insights into the effectiveness and underlying mechanisms of our approach. Although Reinforcement learning (RL) is a popular approach for decision-making and control applications across various domains, its deployment in industrial contexts is limited by safety concerns during the training phase. In traditional online RL, agents learn optimal policies through trial and error, interacting with their environments to maximize cumulative rewards. This process inherently involves exploration, which can lead to the agent encountering unsafe states and/or taking unsafe actions, posing substantial risks in industrial applications such as autonomous driving, robotics, and manufacturing systems (Garcıa & Fernández, 2015; Gu et al., 2022; Moldovan & Abbeel, 2012; Shen et al., 2014; Yang et al., 2020). The primary challenge lies in ensuring that the agent's learning process does not compromise safety, as failures during training can result in costly damages, operational disruptions, or even endanger human lives (Achiam et al., 2017; Stooke et al., 2020). To address these challenges, researchers have explored several approaches aimed at minimizing safety risks while maintaining the efficacy of RL algorithms. One effective method to mitigate safety risks associated with training an agent is offline RL. This dataset comprises trajectory rollouts generated by an arbitrary behavior policy or multiple policies, collected beforehand.
Incorporating System-level Safety Requirements in Perception Models via Reinforcement Learning
Fan, Weisi, Lane, Jesse, Liu, Qisai, Sarkar, Soumik, Wongpiromsarn, Tichakorn
Perception components in autonomous systems are often developed and optimized independently of downstream decision-making and control components, relying on established performance metrics like accuracy, precision, and recall. Traditional loss functions, such as cross-entropy loss and negative log-likelihood, focus on reducing misclassification errors but fail to consider their impact on system-level safety, overlooking the varying severities of system-level failures caused by these errors. To address this limitation, we propose a novel training paradigm that augments the perception component with an understanding of system-level safety objectives. Central to our approach is the translation of system-level safety requirements, formally specified using the rulebook formalism, into safety scores. These scores are then incorporated into the reward function of a reinforcement learning framework for fine-tuning perception models with system-level safety objectives. Simulation results demonstrate that models trained with this approach outperform baseline perception models in terms of system-level safety.