design and implementation
Design and Implementation of a Secure RAG-Enhanced AI Chatbot for Smart Tourism Customer Service: Defending Against Prompt Injection Attacks -- A Case Study of Hsinchu, Taiwan
As smart tourism evolves, AI-powered chatbots have become indispensable for delivering personalized, real-time assistance to travelers while promoting sustainability and efficiency. However, these systems are increasingly vulnerable to prompt injection attacks, where adversaries manipulate inputs to elicit unintended behaviors such as leaking sensitive information or generating harmful content. This paper presents a case study on the design and implementation of a secure retrieval-augmented generation (RAG) chatbot for Hsinchu smart tourism services. The system integrates RAG with API function calls, multi-layered linguistic analysis, and guardrails against injections, achieving high contextual awareness and security. Key features include a tiered response strategy, RAG-driven knowledge grounding, and intent decomposition across lexical, semantic, and pragmatic levels. Defense mechanisms include system norms, gatekeepers for intent judgment, and reverse RAG text to prioritize verified data. We also benchmark a GPT-5 variant (released 2025-08-07) to assess inherent robustness. Evaluations with 674 adversarial prompts and 223 benign queries show over 95% accuracy on benign tasks and substantial detection of injection attacks. GPT-5 blocked about 85% of attacks, showing progress yet highlighting the need for layered defenses. Findings emphasize contributions to sustainable tourism, multilingual accessibility, and ethical AI deployment. This work offers a practical framework for deploying secure chatbots in smart tourism and contributes to resilient, trustworthy AI applications.
Verifying Computational Graphs in Production-Grade Distributed Machine Learning Frameworks
Zulkifli, Kahfi S., Qian, Wenbo, Zhu, Shaowei, Zhou, Yuan, Zhang, Zhen, Lou, Chang
Modern machine learning frameworks support very large models by incorporating parallelism and optimization techniques. Yet, these very techniques add new layers of complexity, introducing silent errors that severely degrade model performance. Existing solutions are either ad hoc or too costly for production. We present Scalify, a lightweight framework that exposes silent errors by verifying semantic equivalence of computational graphs using equality saturation and Datalog-style reasoning. To scale, Scalify partitions graphs with parallel rewriting and layer memoization, reuses rewrite templates, and augments equality saturation with relational reasoning and symbolic bijection inference. It further localizes discrepancies to precise code sites, turning verification results into actionable debugging guidance. Scalify verifies models as large as Llama-3.1-405B within minutes on a commodity machine and exposed five unknown bugs in Amazon production machine learning frameworks.
Boosting Skeleton-Driven SMT Solver Fuzzing by Leveraging LLM to Produce Formula Generators
Sun, Maolin, Yang, Yibiao, Zhou, Yuming
Satisfiability Modulo Theory (SMT) solvers are foundational to modern systems and programming languages research, providing the foundation for tasks like symbolic execution and automated verification. Because these solvers sit on the critical path, their correctness is essential, and high-quality test formulas are key to uncovering bugs. However, while prior testing techniques performed well on earlier solver versions, they struggle to keep pace with rapidly evolving features. Recent approaches based on Large Language Models (LLMs) show promise in exploring advanced solver capabilities, but two obstacles remain: nearly half of the generated formulas are syntactically invalid, and iterative interactions with the LLMs introduce substantial computational overhead. In this study, we present Chimera, a novel LLM-assisted fuzzing framework that addresses both issues by shifting from direct formula generation to the synthesis of reusable term (i.e., logical expression) generators. Particularly, Chimera uses LLMs to (1) automatically extract context-free grammars (CFGs) for SMT theories, including solver-specific extensions, from documentation, and (2) synthesize composable Boolean term generators that adhere to these grammars. During fuzzing, Chimera populates structural skeletons derived from existing formulas with the terms iteratively produced by the LLM-synthesized generators. This design ensures syntactic validity while promoting semantic diversity. Notably, Chimera requires only one-time LLM interaction investment, dramatically reducing runtime cost. We evaluated Chimera on two leading SMT solvers: Z3 and cvc5. Our experiments show that Chimera has identified 43 confirmed bugs, 40 of which have already been fixed by developers.
Design and Implementation of an OCR-Powered Pipeline for Table Extraction from Invoices
This paper presents a robust system for automated invoice data extraction using a hybrid pipeline that combines OpenCV-based pre-processing with OCR and advanced table extraction techniques. Our approach addresses real-world challenges including skewed perspectives, variable lighting, noise from signatures, barcodes, staplers, and broken table structures. We segment invoices into detail and product sections, apply hybrid table detection using both Img2Table and manual fallback methods, and finally generate structured JSON outputs using row-wise OCR. This method proves particularly effective for physical invoices with multiple products and complex layouts, significantly reducing the need for manual data entry.
VeriLocc: End-to-End Cross-Architecture Register Allocation via LLM
Jin, Lesheng, Ruan, Zhenyuan, Mai, Haohui, Shang, Jingbo
Modern GPUs evolve rapidly, yet production compilers still rely on hand-crafted register allocation heuristics that require substantial re-tuning for each hardware generation. We introduce VeriLocc, a framework that combines large language models (LLMs) with formal compiler techniques to enable generalizable and verifiable register allocation across GPU architectures. VeriLocc fine-tunes an LLM to translate intermediate representations (MIRs) into target-specific register assignments, aided by static analysis for cross-architecture normalization and generalization and a verifier-guided regeneration loop to ensure correctness. Evaluated on matrix multiplication (GEMM) and multi-head attention (MHA), VeriLocc achieves 85-99% single-shot accuracy and near-100% pass@100. Case study shows that VeriLocc discovers more performant assignments than expert-tuned libraries, outperforming rocBLAS by over 10% in runtime.
SurfAAV: Design and Implementation of a Novel Multimodal Surfing Aquatic-Aerial Vehicle
Liu, Kun, Xiao, Junhao, Lin, Hao, Cao, Yue, Peng, Hui, Huang, Kaihong, Lu, Huimin
Despite significant advancements in the research of aquatic-aerial robots, existing configurations struggle to efficiently perform underwater, surface, and aerial movement simultaneously. In this paper, we propose a novel multimodal surfing aquatic-aerial vehicle, SurfAA V, which efficiently integrates underwater navigation, surface gliding, and aerial flying capabilities. Thanks to the design of the novel differential thrust vectoring hydrofoil, SurfAA V can achieve efficient surface gliding and underwater navigation without the need for a buoyancy adjustment system. This design provides flexible operational capabilities for both surface and underwater tasks, enabling the robot to quickly carry out underwater monitoring activities. Additionally, when it is necessary to reach another water body, SurfAA V can switch to aerial mode through a gliding takeoff, flying to the target water area to perform corresponding tasks. The main contribution of this letter lies in proposing a new solution for underwater, surface, and aerial movement, designing a novel hybrid prototype concept, developing the required control laws, and validating the robot's ability to successfully perform surface gliding and gliding takeoff. SurfAA V achieves a maximum surface gliding speed of 7.96 m/s and a maximum underwater speed of 3.1 m/s. The prototype's surface gliding maneuverability and underwater cruising maneuverability both exceed those of existing aquatic-aerial vehicles. N recent years, with the rapid development of robotics technology, unmanned aquatic-aerial vehicles(UAA Vs) capable of adapting to complex environments and performing diversified tasks have gradually become a research hotspot. These robots integrate the advantages of both autonomous underwater vehicles(AUVs) and unmanned aerial vehicles(UA Vs), allowing them to freely switch between motion modes in water and air. This capability greatly broadens the application scope of traditional robots, demonstrating enormous potential in multi-domain missions such as environmental monitoring[1], disaster rescue[2], and national defense[3].
Design and Implementation of an FPGA-Based Tiled Matrix Multiplication Accelerator for Transformer Self-Attention on the Xilinx KV260 SoM
Li, Zhaoqin "Richie", Chen, Sicheng
Transformer-based LLMs spend most of their compute in large matrix multiplications for attention and feed-forward layers. Recognizing that the Q, K, and V linear projections within the Multi-Head Self-Attention (MHA) module represent a critical computational bottleneck, we strategically focused our efforts on accelerating these operations. We present a tiled matrix multiplication accelerator optimized for such workloads on a Xilinx KV260 on-board FPGA. Key innovations include persistent on-chip storage for one matrix operand, two-level tiling for data reuse, and a systolic-like unrolled compute engine. Implemented via high-level synthesis (HLS) and integrated with DistilBERT for Q, K, V projections, our accelerator achieves significant speedup and energy efficiency gains over CPU baselines. Standalone GEMM benchmarks show up to a 7x speedup over an ARM CPU (PyTorch) and ~200x over naive numpy, with a throughput of up to 3.1 GFLOPs on 768x3072 matrices. Although the overall end-to-end DistilBERT acceleration is more modest, our results validate the potential of FPGA-based acceleration for critical components of Transformer models.
NeurDB: On the Design and Implementation of an AI-powered Autonomous Database
Zhao, Zhanhao, Cai, Shaofeng, Gao, Haotian, Pan, Hexiang, Xiang, Siqi, Xing, Naili, Chen, Gang, Ooi, Beng Chin, Shen, Yanyan, Wu, Yuncheng, Zhang, Meihui
Databases are increasingly embracing AI to provide autonomous system optimization and intelligent in-database analytics, aiming to relieve end-user burdens across various industry sectors. Nonetheless, most existing approaches fail to account for the dynamic nature of databases, which renders them ineffective for real-world applications characterized by evolving data and workloads. This paper introduces NeurDB, an AI-powered autonomous database that deepens the fusion of AI and databases with adaptability to data and workload drift. NeurDB establishes a new in-database AI ecosystem that seamlessly integrates AI workflows within the database. This integration enables efficient and effective in-database AI analytics and fast-adaptive learned system components. Empirical evaluations demonstrate that NeurDB substantially outperforms existing solutions in managing AI analytics tasks, with the proposed learned components more effectively handling environmental dynamism than state-of-the-art approaches.
HawkVision: Low-Latency Modeless Edge AI Serving
Lao, ChonLam, Gao, Jiaqi, Ananthanarayanan, Ganesh, Akella, Aditya, Yu, Minlan
The trend of modeless ML inference is increasingly growing in popularity as it hides the complexity of model inference from users and caters to diverse user and application accuracy requirements. Previous work mostly focuses on modeless inference in data centers. To provide low-latency inference, in this paper, we promote modeless inference at the edge. The edge environment introduces additional challenges related to low power consumption, limited device memory, and volatile network environments. To address these challenges, we propose HawkVision, which provides low-latency modeless serving of vision DNNs. HawkVision leverages a two-layer edge-DC architecture that employs confidence scaling to reduce the number of model options while meeting diverse accuracy requirements. It also supports lossy inference under volatile network environments. Our experimental results show that HawkVision outperforms current serving systems by up to 1.6X in P99 latency for providing modeless service. Our FPGA prototype demonstrates similar performance at certain accuracy levels with up to a 3.34X reduction in power consumption.
Visual Servoing Based on 3D Features: Design and Implementation for Robotic Insertion Tasks
Rosales, Antonio, Heikkilä, Tapio, Suomalainen, Markku
This paper proposes a feature-based Visual Servoing (VS) method for insertion task skills. A camera mounted on the robot's end-effector provides the pose relative to a cylinder (hole), allowing a contact-free and damage-free search of the hole and avoiding uncertainties emerging when the pose is computed via robot kinematics. Two points located on the hole's principal axis and three mutually orthogonal planes defining the flange's reference frame are associated with the pose of the hole and the flange, respectively. The proposed VS drives to zero the distance between the two points and the three planes aligning the robot's flange with the hole's direction. Compared with conventional VS where the Jacobian is difficult to compute in practice, the proposed featured-based uses a Jacobian easily calculated from the measured hole pose. Furthermore, the feature-based VS design considers the robot's maximum cartesian velocity. The VS method is implemented in an industrial robot and the experimental results support its usefulness.