Not enough data to create a plot.
Try a different view from the menu above.
Wang, Zehao
Multi-Span Optical Power Spectrum Evolution Modeling using ML-based Multi-Decoder Attention Framework
Raj, Agastya, Wang, Zehao, Slyne, Frank, Chen, Tingjun, Kilper, Dan, Ruffini, Marco
We implement a ML-based attention framework with component-specific decoders, improving optical power spectrum prediction in multi-span networks. By reducing the need for in-depth training on each component, the framework can be scaled to multi-span topologies with minimal data collection, making it suitable for brown-field scenarios.
A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models
Wu, Yixi, He, Pengfei, Wang, Zehao, Wang, Shaowei, Tian, Yuan, Chen, Tse-Hsun
Large language models (LLMs) like GitHub Copilot and ChatGPT have emerged as powerful tools for code generation, significantly enhancing productivity and accelerating software development. However, existing benchmarks primarily focus on general code generation without considering API-oriented code generation, i.e., generating code that invokes APIs from specific libraries. Given the growing demand for API-oriented code generation, there is a pressing need for a systematic and automated approach to evaluate LLM on API-oriented code generation. To address this gap, we propose AutoAPIEval, a lightweight and automated framework designed to evaluate the capabilities of LLMs in API-oriented code generation. Our framework works with any library that provides API documentation and focuses on two unit tasks: API recommendation and code example generation, along with four metrics to evaluate the generated APIs and code examples, such as the proportion of incorrect API recommendations for Task 1, and the proportion of code examples where no specific API is invoked and uncompilable/unexecutable code examples for Task 2. In addition, we conducted a case study on three LLMs (ChatGPT, MagiCoder, and DeepSeek Coder) and Java Runtime Environment 8 to demonstrate the framework's effectiveness. Our findings reveal substantial variability in LLM performance across tasks, with ChatGPT adhering better to instructions, while sharing similar effectiveness in code example generation with its counterparts (i.e., MagiCoder and DeekSeek Coder). We also identify key factors associated with code quality, such as API popularity and model confidence, and build classifiers that achieve high accuracy in detecting incorrect API recommendations and erroneous code examples. Retrieval-augmented generation enhances the quality of code generated by LLMs, though its effectiveness varies across different LLMs.
Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation
Wang, Zehao, Wu, Minye, Cao, Yixin, Ma, Yubo, Chen, Meiqi, Tuytelaars, Tinne
This study presents a novel evaluation framework for the Vision-Language Navigation (VLN) task. It aims to diagnose current models for various instruction categories at a finer-grained level. The framework is structured around the context-free grammar (CFG) of the task. The CFG serves as the basis for the problem decomposition and the core premise of the instruction categories design. We propose a semi-automatic method for CFG construction with the help of Large-Language Models (LLMs). Then, we induct and generate data spanning five principal instruction categories (i.e. direction change, landmark recognition, region recognition, vertical movement, and numerical comprehension). Our analysis of different models reveals notable performance discrepancies and recurrent issues. The stagnation of numerical comprehension, heavy selective biases over directional concepts, and other interesting findings contribute to the development of future language-guided navigation systems.
FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs
Zeng, Shulin, Liu, Jun, Dai, Guohao, Yang, Xinhao, Fu, Tianyu, Wang, Hongyi, Ma, Wenheng, Sun, Hanbo, Li, Shiyao, Huang, Zixiao, Dai, Yadong, Li, Jintao, Wang, Zehao, Zhang, Ruoyu, Wen, Kairui, Ning, Xuefei, Wang, Yu
Transformer-based Large Language Models (LLMs) have made a significant impact on various domains. However, LLMs' efficiency suffers from both heavy computation and memory overheads. Compression techniques like sparsification and quantization are commonly used to mitigate the gap between LLM's computation/memory overheads and hardware capacity. However, existing GPU and transformer-based accelerators cannot efficiently process compressed LLMs, due to the following unresolved challenges: low computational efficiency, underutilized memory bandwidth, and large compilation overheads. This paper proposes FlightLLM, enabling efficient LLMs inference with a complete mapping flow on FPGAs. In FlightLLM, we highlight an innovative solution that the computation and memory overhead of LLMs can be solved by utilizing FPGA-specific resources (e.g., DSP48 and heterogeneous memory hierarchy). We propose a configurable sparse DSP chain to support different sparsity patterns with high computation efficiency. Second, we propose an always-on-chip decode scheme to boost memory bandwidth with mixed-precision support. Finally, to make FlightLLM available for real-world LLMs, we propose a length adaptive compilation method to reduce the compilation overhead. Implemented on the Xilinx Alveo U280 FPGA, FlightLLM achieves 6.0$\times$ higher energy efficiency and 1.8$\times$ better cost efficiency against commercial GPUs (e.g., NVIDIA V100S) on modern LLMs (e.g., LLaMA2-7B) using vLLM and SmoothQuant under the batch size of one. FlightLLM beats NVIDIA A100 GPU with 1.2$\times$ higher throughput using the latest Versal VHK158 FPGA.
From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding
Li, Yong-Lu, Wu, Xiaoqian, Liu, Xinpeng, Wang, Zehao, Dou, Yiming, Ji, Yikun, Zhang, Junyi, Li, Yixing, Tan, Jingru, Lu, Xudong, Lu, Cewu
As a vital step toward the intelligent agent, Action understanding matters for intelligent agents and has attracted long-term attention. It can be formed as the mapping from the action physical space to the semantic space. Typically, researchers built action datasets according to idiosyncratic choices to define classes and push the envelope of benchmarks respectively. Thus, datasets are incompatible with each other like "Isolated Islands" due to semantic gaps and various class granularities, e.g., do housework in dataset A and wash plate in dataset B. We argue that a more principled semantic space is an urgent need to concentrate the community efforts and enable us to use all datasets together to pursue generalizable action learning. To this end, we design a structured action semantic space in view of verb taxonomy hierarchy and covering massive actions. By aligning the classes of previous datasets to our semantic space, we gather (image/video/skeleton/MoCap) datasets into a unified database in a unified label system, i.e., bridging ``isolated islands'' into a "Pangea". Accordingly, we propose a novel model mapping from the physical space to semantic space to fully use Pangea. In extensive experiments, our new system shows significant superiority, especially in transfer learning. Code and data will be made publicly available.
Self-Normalizing Neural Network, Enabling One Shot Transfer Learning for Modeling EDFA Wavelength Dependent Gain
Raj, Agastya, Wang, Zehao, Slyne, Frank, Chen, Tingjun, Kilper, Dan, Ruffini, Marco
We present a novel ML framework for modeling the wavelength-dependent gain of multiple EDFAs, based on semi-supervised, self-normalizing neural networks, enabling one-shot transfer learning. Our experiments on 22 EDFAs in Open Ireland and COSMOS testbeds show high-accuracy transfer-learning even when operated across different amplifier types.
Few-shot Event Detection: An Empirical Study and a Unified View
Ma, Yubo, Wang, Zehao, Cao, Yixin, Sun, Aixin
Few-shot event detection (ED) has been widely studied, while this brings noticeable discrepancies, e.g., various motivations, tasks, and experimental settings, that hinder the understanding of models for future progress.This paper presents a thorough empirical study, a unified view of ED models, and a better unified baseline. For fair evaluation, we compare 12 representative methods on three datasets, which are roughly grouped into prompt-based and prototype-based models for detailed analysis. Experiments consistently demonstrate that prompt-based methods, including ChatGPT, still significantly trail prototype-based methods in terms of overall performance. To investigate their superior performance, we break down their design elements along several dimensions and build a unified framework on prototype-based methods. Under such unified view, each prototype-method can be viewed a combination of different modules from these design elements. We further combine all advantageous modules and propose a simple yet effective baseline, which outperforms existing methods by a large margin (e.g., 2.7% F1 gains under low-resource setting).
AeCoM: An Aerial Continuum Manipulator with Precise Kinematic Modeling for Variable Loading and Tendon-slacking Prevention
Peng, Rui, Wang, Zehao, Lu, Peng
Aerial robotic systems have raised emerging interests in recent years. In this article, we propose a novel aerial manipulator system that is significantly different from conventional aerial discrete manipulators: An Aerial Continuum Manipulator (AeCoM). The AeCoM compactly integrates a quadrotor with a tendon-driven continuum robotic manipulator. Due to the compact design and the payload bearing ability of tendon-driven continuum robotic arms, the proposed system solved the conflict between payload capacity and dexterity lying in conventional aerial manipulators. Two contributions are made in this paper: 1) a sensor-based kinematic model is developed for precise modeling in the presence of variable loading; and 2) a tendon slacking prevention system is developed in the presence of aggressive motions. The detailed design of the system is presented and extensive experimental validations have been performed to validate the system self-initialization, payload capacity, precise kinematic modeling with variable end-effector (EE) loadings during aerial grasping and tendon-slacking prevention. The experimental results demonstrate that the proposed novel aerial continuum manipulator system solves the constraints in conventional aerial manipulators and has more potential applications in clustered environments.
Layout-aware Dreamer for Embodied Referring Expression Grounding
Li, Mingxiao, Wang, Zehao, Tuytelaars, Tinne, Moens, Marie-Francine
In this work, we study the problem of Embodied Referring Expression Grounding, where an agent needs to navigate in a previously unseen environment and localize a remote object described by a concise high-level natural language instruction. When facing such a situation, a human tends to imagine what the destination may look like and to explore the environment based on prior knowledge of the environmental layout, such as the fact that a bathroom is more likely to be found near a bedroom than a kitchen. We have designed an autonomous agent called Layout-aware Dreamer (LAD), including two novel modules, that is, the Layout Learner and the Goal Dreamer to mimic this cognitive decision process. The Layout Learner learns to infer the room category distribution of neighboring unexplored areas along the path for coarse layout estimation, which effectively introduces layout common sense of room-to-room transitions to our agent. To learn an effective exploration of the environment, the Goal Dreamer imagines the destination beforehand. Our agent achieves new state-of-the-art performance on the public leaderboard of the REVERIE dataset in challenging unseen test environments with improvement in navigation success (SR) by 4.02% and remote grounding success (RGS) by 3.43% compared to the previous state-of-the-art. The code is released at https://github.com/zehao-wang/LAD