Goto

Collaborating Authors

 lanner


DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping

Fan, Wei, Yao, Wenlin, Li, Zheng, Yao, Feng, Liu, Xin, Qiu, Liang, Yin, Qingyu, Song, Yangqiu, Yin, Bing

arXiv.org Artificial Intelligence

Large language models (LLMs) augmented with multi-step reasoning and action generation abilities have shown promise in leveraging external tools to tackle complex tasks that require long-horizon planning. However, existing approaches either rely on implicit planning in the reasoning stage or introduce explicit planners without systematically addressing how to optimize the planning stage. As evidence, we observe that under vanilla reinforcement learning (RL), planning tokens exhibit significantly higher entropy than other action tokens, revealing uncertain decision points that remain under-optimized. To address this, we propose DeepPlanner, an end-to-end RL framework that effectively enhances the planning capabilities of deep research agents. Our approach shapes token-level advantage with an entropy-based term to allocate larger updates to high entropy tokens, and selectively upweights sample-level advantages for planning-intensive rollouts. Extensive experiments across seven deep research benchmarks demonstrate that DeepPlanner improves planning quality and achieves state-of-the-art results under a substantially lower training budget.


Structure of the supplementary material

Neural Information Processing Systems

Appendix B provides the proofs for the results of the basic setting presented in Section 3. Appendix C provides the proofs and additional discussion for the results of the concave-convex setting presented in Section 4. Appendix F provides auxiliary concentration lemmas useful for the derivation of our results. RL, is presented at Algorithm 1. In this setting, unlike basic setting, objective and constraints are not linear. Similar to before, expressing this program based on occupation measures provides a convex program. We define the bonus-enhanced cMDP, i.e.


Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks

Erdogan, Lutfi Eren, Lee, Nicholas, Kim, Sehoon, Moon, Suhong, Furuta, Hiroki, Anumanchipalli, Gopala, Keutzer, Kurt, Gholami, Amir

arXiv.org Artificial Intelligence

Large language models (LLMs) have shown remarkable advancements in enabling language agents to tackle simple tasks. However, applying them for complex, multi-step, long-horizon tasks remains a challenge. Recent work have found success by separating high-level planning from low-level execution, which enables the model to effectively balance high-level planning objectives and low-level execution details. However, generating accurate plans remains difficult since LLMs are not inherently trained for this task. To address this, we propose Plan-and-Act, a novel framework that incorporates explicit planning into LLM-based agents and introduces a scalable method to enhance plan generation through a novel synthetic data generation method. Plan-and-Act consists of a Planner model which generates structured, high-level plans to achieve user goals, and an Executor model that translates these plans into environment-specific actions. To train the Planner effectively, we introduce a synthetic data generation method that annotates ground-truth trajectories with feasible plans, augmented with diverse and extensive examples to enhance generalization. We evaluate Plan-and-Act using web navigation as a representative long-horizon planning environment, demonstrating a state-of the-art 54% success rate on the WebArena-Lite benchmark.


A Game-Theoretic Framework for Joint Forecasting and Planning

Kedia, Kushal, Dan, Prithwish, Choudhury, Sanjiban

arXiv.org Artificial Intelligence

Planning safe robot motions in the presence of humans requires reliable forecasts of future human motion. However, simply predicting the most likely motion from prior interactions does not guarantee safety. Such forecasts fail to model the long tail of possible events, which are rarely observed in limited datasets. On the other hand, planning for worst-case motions leads to overtly conservative behavior and a "frozen robot". Instead, we aim to learn forecasts that predict counterfactuals that humans guard against. We propose a novel game-theoretic framework for joint planning and forecasting with the payoff being the performance of the planner against the demonstrator, and present practical algorithms to train models in an end-to-end fashion. We demonstrate that our proposed algorithm results in safer plans in a crowd navigation simulator and real-world datasets of pedestrian motion. We release our code at https://github.com/portal-cornell/Game-Theoretic-Forecasting-Planning.


Tree-Planner: Efficient Close-loop Task Planning with Large Language Models

Hu, Mengkang, Mu, Yao, Yu, Xinmiao, Ding, Mingyu, Wu, Shiguang, Shao, Wenqi, Chen, Qiguang, Wang, Bin, Qiao, Yu, Luo, Ping

arXiv.org Artificial Intelligence

This paper studies close-loop task planning, which refers to the process of generating a sequence of skills (a plan) to accomplish a specific goal while adapting the plan based on real-time observations. Recently, prompting Large Language Models (LLMs) to generate actions iteratively has become a prevalent paradigm due to its superior performance and user-friendliness. However, this paradigm is plagued by two inefficiencies: high token consumption and redundant error correction, both of which hinder its scalability for large-scale testing and applications. To address these issues, we propose Tree-Planner, which reframes task planning with LLMs into three distinct phases: plan sampling, action tree construction, and grounded deciding. Tree-Planner starts by using an LLM to sample a set of potential plans before execution, followed by the aggregation of them to form an action tree. Finally, the LLM performs a top-down decision-making process on the tree, taking into account real-time environmental information. Experiments show that Tree-Planner achieves state-of-the-art performance while maintaining high efficiency. By decomposing LLM queries into a single plan-sampling call and multiple grounded-deciding calls, a considerable part of the prompt are less likely to be repeatedly consumed. As a result, token consumption is reduced by 92.2% compared to the previously best-performing model. Additionally, by enabling backtracking on the action tree as needed, the correction process becomes more flexible, leading to a 40.5% decrease in error corrections. Project page: https://tree-planner.github.io/


CEM-GD: Cross-Entropy Method with Gradient Descent Planner for Model-Based Reinforcement Learning

Huang, Kevin, Lale, Sahin, Rosolia, Ugo, Shi, Yuanyuan, Anandkumar, Anima

arXiv.org Machine Learning

Current state-of-the-art model-based reinforcement learning algorithms use trajectory sampling methods, such as the Cross-Entropy Method (CEM), for planning in continuous control settings. These zeroth-order optimizers require sampling a large number of trajectory rollouts to select an optimal action, which scales poorly for large prediction horizons or high dimensional action spaces. First-order methods that use the gradients of the rewards with respect to the actions as an update can mitigate this issue, but suffer from local optima due to the non-convex optimization landscape. To overcome these issues and achieve the best of both worlds, we propose a novel planner, Cross-Entropy Method with Gradient Descent (CEM-GD), that combines first-order methods with CEM. At the beginning of execution, CEM-GD uses CEM to sample a significant amount of trajectory rollouts to explore the optimization landscape and avoid poor local minima. It then uses the top trajectories as initialization for gradient descent and applies gradient updates to each of these trajectories to find the optimal action sequence. At each subsequent time step, however, CEM-GD samples much fewer trajectories from CEM before applying gradient updates. We show that as the dimensionality of the planning problem increases, CEM-GD maintains desirable performance with a constant small number of samples by using the gradient information, while avoiding local optima using initially well-sampled trajectories. Furthermore, CEM-GD achieves better performance than CEM on a variety of continuous control benchmarks in MuJoCo with 100x fewer samples per time step, resulting in around 25% less computation time and 10% less memory usage. The implementation of CEM-GD is available at $\href{https://github.com/KevinHuang8/CEM-GD}{\text{https://github.com/KevinHuang8/CEM-GD}}$.


Lanner Partners with Zededa to Deliver Industrial-Strength Cloud-Managed Gateway

#artificialintelligence

The joint partnership announces a secure, cloud-managed gateway solution ready for deployment in critical infrastructures, such as power substations, manufacturing, transportation, intelligent buildings, and smart cities. MISSISSAUGA, ONT. Lanner Electronics, Inc., a global leader in design and manufacturing of network appliances and industrial IoT gateways, will be demonstrating a joint solution with Zededa at the IoT Tech Expo in Santa Clara, Nov. 28-29, 2018. Zededa has developed an innovative cloud-managed software solution which provisions, administers and secures IIoT infrastructures. The joint solution of Lanner's validated white-box gateway integrated with the Zededa real-time software is developed to enable interoperability and connectivity on both hardware and software levels in industrial automation, energy and retail applications. Since 2013, Lanner has introduced a series of industrial gateway products designed for key applications, such as cyber security, edge computing, wireless connectivity and SD-WAN.


Manufacturing Digitalization: RPA a Key Enabler of Industry 4.0 Lanner

#artificialintelligence

According to a study conducted by Information Services Group back, 72% of the companies by the year 2019 will be relying on Robotic Process Automation (RPA) to increase operational efficiency, productivity and increase compliance. Manufacturing companies that traditionally heavily rely on robotics and automation for production are now looking into RPA to transform other departments that may have a number of error-prone, slow or costly processes. The changing landscape within industries such as manufacturing is due to the vast digital transformation brought about by the increasing integration of cutting-edge technologies such as the Internet of Things and artificial intelligence and machine learning as well as advanced automation. Within manufacturing in particular, the use of automated robotic machines on assembly lines has been common practices for many years now, however transferring this kind of automation to other key areas within manufacturing such as accounts receivable, invoice processing and purchase order management has been somewhat difficult up until fairly recently. In this article, we'll be looking at what robotic process automation is, the role it has within manufacturing and the benefits it brings as well as how it could affect the future of automation within the manufacturing industry.


What is SD Branch and How is It Helping Drive SDN Adoption in Business? Lanner

#artificialintelligence

Over the past few years, significant changes have been seen within both branch architecture and networking in business and technology has been at the forefront. Network function virtualization (NFV) and software-defined networking (SDN) are currently two hot topics within this area and with them, automation, artificial intelligence (AI) and machine learning, and other cutting-edge technologies are being brought into the fold. One of the more recent developments has been the advent of SD Branch, a concept that encompasses SDN into branch architecture so as to bring the various benefits of SD-WAN to branch networking. It is also currently being suggested that SD branch represents the next stage of SD-WAN's evolution and that this process is redefining how businesses and organisations are connecting their branch sites to either corporate or cloud data centers. In this article, we'll be looking at what exactly SD Branch is, what benefits it brings to those who invest in it, and what it could mean for the future of SD-WAN and software-defined networking.


50 Examples of Video Analytics Applications (Infographic) Lanner

#artificialintelligence

Artificial Intelligence and advancements in cloud edge computing infrastructure are enabling development of intelligent real-time video analytics solutions that are solving several problems and creating many opportunities in different sectors. By harnessing the capabilities of machine learning and bid data, AI-powered video analytics solutions have started to play a crucial role in automating several functions and duties based on video intelligence collected through application specific cameras. From street crime deterrence, to missing person search, patient monitoring, land surveying, vehicle classification, product fault detection and wildlife poaching control, there are numerous applications of video analytics solutions. In below linked infographic we will show you 50 of those applications across different sectors.