Goto

Collaborating Authors

 Search


Retrieval-Augmented Perception: High-Resolution Image Perception Meets Visual RAG

arXiv.org Artificial Intelligence

High-resolution (HR) image perception remains a key challenge in multimodal large language models (MLLMs). To overcome the limitations of existing methods, this paper shifts away from prior dedicated heuristic approaches and revisits the most fundamental idea to HR perception by enhancing the long-context capability of MLLMs, driven by recent advances in long-context techniques like retrieval-augmented generation (RAG) for general LLMs. Towards this end, this paper presents the first study exploring the use of RAG to address HR perception challenges. Specifically, we propose Retrieval-Augmented Perception (RAP), a training-free framework that retrieves and fuses relevant image crops while preserving spatial context using the proposed Spatial-Awareness Layout. To accommodate different tasks, the proposed Retrieved-Exploration Search (RE-Search) dynamically selects the optimal number of crops based on model confidence and retrieval scores. Experimental results on HR benchmarks demonstrate the significant effectiveness of RAP, with LLaVA-v1.5-13B achieving a 43% improvement on $V^*$ Bench and 19% on HR-Bench.


Adversarial Generative Flow Network for Solving Vehicle Routing Problems

arXiv.org Artificial Intelligence

Recent research into solving vehicle routing problems (VRPs) has gained significant traction, particularly through the application of deep (reinforcement) learning for end-to-end solution construction. However, many current construction-based neural solvers predominantly utilize Transformer architectures, which can face scalability challenges and struggle to produce diverse solutions. To address these limitations, we introduce a novel framework beyond Transformer-based approaches, i.e., Adversarial Generative Flow Networks (AGFN). These models are trained alternately in an adversarial manner to improve the overall solution quality, followed by a proposed hybrid decoding method to construct the solution. We apply the AGFN framework to solve the capacitated vehicle routing problem (CVRP) and the travelling salesman problem (TSP), and our experimental results demonstrate that AGFN surpasses the popular construction-based neural solvers, showcasing strong generalization capabilities on synthetic and real-world benchmark instances. Our code is available at https://github.com/ZHANG-NI/AGFN . The vehicle routing problem (VRP) represents a fundamental and intricate combinatorial optimization challenge with extensive real-world implications (Toth & Vigo, 2014), including supply chain management (Lee et al., 2006), last-mile delivery services (Koc et al., 2020), and public transportation (Hassold & Ceder, 2014). Given its widespread occurrence across numerous domains, the VRPs have been the subject of extensive research for decades within the Operations Research (OR) community. Particularly, practitioners employ both exact and heuristic methods to tackle complex optimization problems including VRPs. Exact methods, such as branch-and-bound (Lawler & Wood, 1966), branch-and-cut (Tawarmalani & Sahinidis, 2005), and column generation (Barnhart et al., 1998), guarantee optimal solutions but often face computational limitations for large-scale instances.


Apollo-MILP: An Alternating Prediction-Correction Neural Solving Framework for Mixed-Integer Linear Programming

arXiv.org Artificial Intelligence

Leveraging machine learning (ML) to predict an initial solution for mixed-integer linear programming (MILP) has gained considerable popularity in recent years. These methods predict a solution and fix a subset of variables to reduce the problem dimension. Then, they solve the reduced problem to obtain the final solutions. However, directly fixing variable values can lead to low-quality solutions or even infeasible reduced problems if the predicted solution is not accurate enough. To address this challenge, we propose an A lternating p redictio n-correction neural sol ving framewo rk (Apollo-MILP) that can identify and select accurate and reliable predicted values to fix. In each iteration, Apollo-MILP conducts a prediction step for the unfixed variables, followed by a correction step to obtain an improved solution (called reference solution) through a trust-region search. By incorporating the predicted and reference solutions, we introduce a novel U ncertainty-based E rror upper BO und (UEBO) to evaluate the uncertainty of the predicted values and fix those with high confidence. A notable feature of Apollo-MILP is the superior ability for problem reduction while preserving optimality, leading to high-quality final solutions. Experiments on commonly used benchmarks demonstrate that our proposed Apollo-MILP significantly outperforms other ML-based approaches in terms of solution quality, achieving over a 50% reduction in the solution gap. Mixed-integer linear programming (MILP) is one of the most fundamental models for combinatorial optimization with broad applications in operations research (Bixby et al., 2004), engineering (Ma et al., 2019), and daily scheduling or planning (Li et al., 2024b). However, solving large-size MILPs remains time-consuming and computationally expensive, as many are NP-hard and have exponential expansion of search spaces as instance sizes grow. To mitigate this challenge, researchers have explored a wide suite of machine learning (ML) methods (Gasse et al., 2022). In practice, MILP instances from the same scenario often share similar patterns and structures, which ML models can capture to achieve improved performance (Bengio et al., 2021). Recently, extensive research has focused on using ML models to predict solutions for MILPs. Notable approaches include Neural Diving (ND) (Nair et al., 2020; Y oon, 2021; Paulus & Krause, 2023) and Predict-and-Search (PS) (Han et al., 2023; Huang et al., 2024), as illustrated in Figure 1. Given a MILP instance, ND and PS begin by employing an ML model to predict an initial solution. ND with SelectiveNet (Nair et al., 2020) assigns fixed values to a subset of variables based on the prediction, thereby constructing a reduced MILP problem with a reduced dimensionality of decision variables. Then, ND solves the reduced problem to obtain the final solutions.


Hybrid Metaheuristic Vehicle Routing Problem for Security Dispatch Operations

arXiv.org Artificial Intelligence

This paper investigates the optimization of the Vehicle Routing Problem for Security Dispatch (VRPSD). VRPSD focuses on security and patrolling applications which involve challenging constraints including precise timing and strict time windows. We propose three algorithms based on different metaheuristics, which are Adaptive Large Neighborhood Search (ALNS), Tabu Search (TS), and Threshold Accepting (TA). The first algorithm combines single-phase ALNS with TA, the second employs a multiphase ALNS with TA, and the third integrates multiphase ALNS, TS, and TA. Experiments are conducted on an instance comprising 251 customer requests. The results demonstrate that the third algorithm, the hybrid multiphase ALNS-TS-TA algorithm, delivers the best performance. This approach simultaneously leverages the large-area search capabilities of ALNS for exploration and effectively escapes local optima when the multiphase ALNS is coupled with TS and TA. Furthermore, in our experiments, the hybrid multiphase ALNS-TS-TA algorithm is the only one that shows potential for improving results with increased computation time across all attempts.


Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction

arXiv.org Artificial Intelligence

If we were to apply MCTS directly to this abstracted space, we would encounter two main issues: inefficient utilization of our pre-built search space, with the search potentially diverging prematurely into unexplored regions, and difficulty in building sufficiently deep trees for high-quality long-term decision-making, particularly in areas of high stochasticity or uncertainty (Cou etoux et al., 2011). Therefore, we use progressive widening to extend MCTS to incrementally expand the search tree. It balances the exploration of new states with the exploitation of already visited states based on two hyperparameters: ฮฑ [0, 1] and ฯต R + . Let |C (s, z) | denote the number of children for the state-action pair (s, z) . The key idea is to alternate between adding new child nodes and selecting among existing child nodes, depending on the number of times a state-action pair ( s, z) has been visited. A new state is added to the tree if |C ( s, z)| < ฯต N (s, z) ฮฑ, where N (s, z) is the number of times the state-action pair has been visited. The hyperparameter ฮฑ controls the propensity to select among existing children, with ฮฑ = 0 leading to always selecting among existing child and ฮฑ = 1 leading to vanilla MCTS behavior (always adding a new child). In this way, we could enhance our approach by efficiently utilizing the pre-built search space, prioritizing the exploration of promising macro actions while allowing for incremental expansion of the search tree. This technique enables our method to make quick decisions in an anytime manner, leveraging the cached information, and further refine the planning tree if additional time is available.


Enabling AutoML for Zero-Touch Network Security: Use-Case Driven Analysis

arXiv.org Artificial Intelligence

Zero-Touch Networks (ZTNs) represent a state-of-the-art paradigm shift towards fully automated and intelligent network management, enabling the automation and intelligence required to manage the complexity, scale, and dynamic nature of next-generation (6G) networks. ZTNs leverage Artificial Intelligence (AI) and Machine Learning (ML) to enhance operational efficiency, support intelligent decision-making, and ensure effective resource allocation. However, the implementation of ZTNs is subject to security challenges that need to be resolved to achieve their full potential. In particular, two critical challenges arise: the need for human expertise in developing AI/ML-based security mechanisms, and the threat of adversarial attacks targeting AI/ML models. In this survey paper, we provide a comprehensive review of current security issues in ZTNs, emphasizing the need for advanced AI/ML-based security mechanisms that require minimal human intervention and protect AI/ML models themselves. Furthermore, we explore the potential of Automated ML (AutoML) technologies in developing robust security solutions for ZTNs. Through case studies, we illustrate practical approaches to securing ZTNs against both conventional and AI/ML-specific threats, including the development of autonomous intrusion detection systems and strategies to combat Adversarial ML (AML) attacks. The paper concludes with a discussion of the future research directions for the development of ZTN security approaches.


Enhanced Derivative-Free Optimization Using Adaptive Correlation-Induced Finite Difference Estimators

arXiv.org Machine Learning

Gradient-based methods are well-suited for derivative-free optimization (DFO), where finite-difference (FD) estimates are commonly used as gradient surrogates. Traditional stochastic approximation methods, such as Kiefer-Wolfowitz (KW) and simultaneous perturbation stochastic approximation (SPSA), typically utilize only two samples per iteration, resulting in imprecise gradient estimates and necessitating diminishing step sizes for convergence. In this paper, we first explore an efficient FD estimate, referred to as correlation-induced FD estimate, which is a batch-based estimate. Then, we propose an adaptive sampling strategy that dynamically determines the batch size at each iteration. By combining these two components, we develop an algorithm designed to enhance DFO in terms of both gradient estimation efficiency and sample efficiency. Furthermore, we establish the consistency of our proposed algorithm and demonstrate that, despite using a batch of samples per iteration, it achieves the same convergence rate as the KW and SPSA methods. Additionally, we propose a novel stochastic line search technique to adaptively tune the step size in practice. Finally, comprehensive numerical experiments confirm the superior empirical performance of the proposed algorithm.


Minimax Optimal Kernel Two-Sample Tests with Random Features

arXiv.org Machine Learning

Reproducing Kernel Hilbert Space (RKHS) embedding of probability distributions has proved to be an effective approach, via MMD (maximum mean discrepancy) for nonparametric hypothesis testing problems involving distributions defined over general (non-Euclidean) domains. While a substantial amount of work has been done on this topic, only recently, minimax optimal two-sample tests have been constructed that incorporate, unlike MMD, both the mean element and a regularized version of the covariance operator. However, as with most kernel algorithms, the computational complexity of the optimal test scales cubically in the sample size, limiting its applicability. In this paper, we propose a spectral regularized two-sample test based on random Fourier feature (RFF) approximation and investigate the trade-offs between statistical optimality and computational efficiency. We show the proposed test to be minimax optimal if the approximation order of RFF (which depends on the smoothness of the likelihood ratio and the decay rate of the eigenvalues of the integral operator) is sufficiently large. We develop a practically implementable permutation-based version of the proposed test with a data-adaptive strategy for selecting the regularization parameter and the kernel. Finally, through numerical experiments on simulated and benchmark datasets, we demonstrate that the proposed RFF-based test is computationally efficient and performs almost similar (with a small drop in power) to the exact test.


Toward Fully Autonomous Flexible Chunk-Based Aerial Additive Manufacturing: Insights from Experimental Validation

arXiv.org Artificial Intelligence

A novel autonomous chunk-based aerial additive manufacturing framework is presented, supported with experimental demonstration advancing aerial 3D printing. An optimization-based decomposition algorithm transforms structures into sub-components, or chunks, treated as individual tasks coordinated via a dependency graph, ensuring sequential assignment to UA Vs considering inter-dependencies and printability constraints for seamless execution. A specially designed hexacopter equipped with a pressurized canister for lightweight expandable foam extrusion is utilized to deposit the material in a controlled manner. To further enhance precise execution of the printing, an offset-free Model Predictive Control mechanism is considered compensating reactively for disturbances and ground effect during execution. Additionally, an interlocking mechanism is introduced in the chunking process to enhance structural cohesion and improve layer adhesion. Extensive experiments demonstrate the framework's effectiveness in constructing precise structures of various shapes, while seamlessly adapting to practical challenges, proving its potential for a transformative leap in aerial robotic capability for autonomous construction. A video with the overall demonstration can be found here: https://youtu.be/WC1rLMLKEg4. Preprint submitted to Journal of Automation In Construction February 27, 2025 1. Introduction In recent times, ground breaking advancement in additive manufacturing, seamlessly integrated with autonomous robotics, are unlocking an exciting frontier in next generation construction and manufacturing process. Additive manufacturing has demonstrated a paradigm shift impact, addressing complex manufacturing processes with unprecedented precision and efficiency. Its transformative potential is becoming increasingly evident as it evolves and finds applications across a wide range of industries [1, 2, 3], while simultaneously paving the way for further innovations in the future. An intriguing development is its recent integration into the construction industry, capitalizing on its ability to automate construction processes, provide extensive design flexibility, and construct intricate structures designed using Computer-Aided Design (CAD) software [4, 5]. Numerous studies have demonstrated the design and deployment of large-scale robotic arms and gantry systems for printing building components and even entire houses using a variety of base materials [6]. A key advantage of such methods is their ability to adapt with high level of automation throughout the construction process, making them particularly well-suited for deployment in remote, inaccessible, and harsh environments[7, 8]. Notable examples include disaster-stricken areas, such as regions impacted by fires and earthquakes, where the rapid construction of shelters and basic infrastructure is imperative.


FSMP: A Frontier-Sampling-Mixed Planner for Fast Autonomous Exploration of Complex and Large 3-D Environments

arXiv.org Artificial Intelligence

In this paper, we propose a systematic framework for fast exploration of complex and large 3-D environments using micro aerial vehicles (MAVs). The key insight is the organic integration of the frontier-based and sampling-based strategies that can achieve rapid global exploration of the environment. Specifically, a field-of-view-based (FOV) frontier detector with the guarantee of completeness and soundness is devised for identifying 3-D map frontiers. Different from random sampling-based methods, the deterministic sampling technique is employed to build and maintain an incremental road map based on the recorded sensor FOVs and newly detected frontiers. With the resulting road map, we propose a two-stage path planner. First, it quickly computes the global optimal exploration path on the road map using the lazy evaluation strategy. Then, the best exploration path is smoothed for further improving the exploration efficiency. We validate the proposed method both in simulation and real-world experiments. The comparative results demonstrate the promising performance of our planner in terms of exploration efficiency, computational time, and explored volume.