AITopics | Chen, Zhang

Collaborating Authors

Chen, Zhang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

InfinitePOD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers

Shou, Chenchen, Liu, Guyue, Nie, Hao, Meng, Huaiyu, Zhou, Yu, Jiang, Yimin, Lv, Wenqing, Xu, Yelong, Lu, Yuanwei, Chen, Zhang, Yu, Yanbo, Shen, Yichen, Zhu, Yibo, Jiang, Daxin

arXiv.org Artificial IntelligenceFeb-7-2025

Scaling Large Language Model (LLM) training relies on multi-dimensional parallelism, where High-Bandwidth Domains (HBDs) are critical for communication-intensive parallelism like Tensor Parallelism (TP) and Expert Parallelism (EP). However, existing HBD architectures face fundamental limitations in scalability, cost, and fault resiliency: switch-centric HBDs (e.g., NVL-72) incur prohibitive scaling costs, while GPU-centric HBDs (e.g., TPUv3/Dojo) suffer from severe fault propagation. Switch-GPU hybrid HBDs such as TPUv4 takes a middle-ground approach by leveraging Optical Circuit Switches, but the fault explosion radius remains large at the cube level (e.g., 64 TPUs). We propose InfinitePOD, a novel transceiver-centric HBD architecture that unifies connectivity and dynamic switching at the transceiver level using Optical Circuit Switching (OCS). By embedding OCS within each transceiver, InfinitePOD achieves reconfigurable point-to-multipoint connectivity, allowing the topology to adapt into variable-size rings. This design provides: i) datacenter-wide scalability without cost explosion; ii) fault resilience by isolating failures to a single node, and iii) full bandwidth utilization for fault-free GPUs. Key innovations include a Silicon Photonic (SiPh) based low-cost OCS transceiver (OCSTrx), a reconfigurable k-hop ring topology co-designed with intra-/inter-node communication, and an HBD-DCN orchestration algorithm maximizing GPU utilization while minimizing cross-ToR datacenter network traffic. The evaluation demonstrates that InfinitePOD achieves 31% of the cost of NVL-72, near-zero GPU waste ratio (over one order of magnitude lower than NVL-72 and TPUv4), near-zero cross-ToR traffic when node fault ratios under 7%, and improves Model FLOPs Utilization by 3.37x compared to NVIDIA DGX (8 GPUs per Node).

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.03885

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Industry: Information Technology (0.69)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Equilibrium Adaptation-Based Control for Track Stand of Single-Track Two-Wheeled Robots

Wang, Boyi, Deng, Yang, Jing, Feilong, Sun, Yiyong, Chen, Zhang, Liang, Bin

arXiv.org Artificial IntelligenceNov-7-2024

Stationary balance control is challenging for single-track two-wheeled (STTW) robots due to the lack of elegant balancing mechanisms and the conflict between the limited attraction domain and external disturbances. To address the absence of balancing mechanisms, we draw inspiration from cyclists and leverage the track stand maneuver, which relies solely on steering and rear-wheel actuation. To achieve accurate tracking in the presence of matched and mismatched disturbances, we propose an equilibrium adaptation-based control (EABC) scheme that can be seamlessly integrated with standard disturbance observers and controllers. This scheme enables adaptation to slow-varying disturbances by utilizing a disturbed equilibrium estimator, effectively handling both matched and mismatched disturbances in a unified manner while ensuring accurate tracking with zero steady-state error. We integrate the EABC scheme with nonlinear model predictive control (MPC) for the track stand of STTW robots and validate its effectiveness through two experimental scenarios. Our method demonstrates significant improvements in tracking accuracy, reducing errors by several orders of magnitude.

artificial intelligence, disturbance, robot, (15 more...)

arXiv.org Artificial Intelligence

2410.19615

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (0.59)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (0.64)

Add feedback

Multi-Agent Causal Discovery Using Large Language Models

Le, Hao Duong, Xia, Xin, Chen, Zhang

arXiv.org Artificial IntelligenceJul-21-2024

Large Language Models (LLMs) have demonstrated significant potential in causal discovery tasks by utilizing their vast expert knowledge from extensive text corpora. However, the multi-agent capabilities of LLMs in causal discovery remain underexplored. This paper introduces a general framework to investigate this potential. The first is the Meta Agents Model, which relies exclusively on reasoning and discussions among LLM agents to conduct causal discovery. The second is the Coding Agents Model, which leverages the agents' ability to plan, write, and execute code, utilizing advanced statistical libraries for causal discovery. The third is the Hybrid Model, which integrates both the Meta Agents Model and Coding Agents Model approaches, combining the statistical analysis and reasoning skills of multiple agents. Our proposed framework shows promising results by effectively utilizing LLMs' expert knowledge, reasoning capabilities, multi-agent cooperation, and statistical causal methods. By exploring the multi-agent potential of LLMs, we aim to establish a foundation for further research in utilizing LLMs multi-agent for solving causal-related problems.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2407.15073

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
(2 more...)

Add feedback

Over-parameterization and Adversarial Robustness in Neural Networks: An Overview and Empirical Analysis

Chen, Zhang, Demetrio, Luca, Gupta, Srishti, Feng, Xiaoyi, Xia, Zhaoqiang, Cinà, Antonio Emanuele, Pintor, Maura, Oneto, Luca, Demontis, Ambra, Biggio, Battista, Roli, Fabio

arXiv.org Artificial IntelligenceJun-14-2024

However, having a large parameter space is considered one of the main suspects of the neural networks' vulnerability to adversarial examples-- input samples crafted ad-hoc to induce a desired misclassification. Relevant literature has claimed contradictory remarks in support of and against the robustness of over-parameterized networks. These contradictory findings might be due to the failure of the attack employed to evaluate the networks' robustness. Previous research has demonstrated that depending on the considered model, the algorithm employed to generate adversarial examples may not function properly, leading to overestimating the model's robustness. In this work, we empirically study the robustness of over-parameterized networks against adversarial examples. However, unlike the previous works, we also evaluate the considered attack's reliability to support the results' veracity. Our results show that over-parameterized networks are robust against adversarial attacks as opposed to their under-parameterized counterparts.

artificial intelligence, machine learning, robustness, (14 more...)

arXiv.org Artificial Intelligence

2406.1009

Country:

Europe > Italy (0.46)
North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Sidahmed, Hakim, Phatale, Samrat, Hutcheson, Alex, Lin, Zhuonan, Chen, Zhang, Yu, Zac, Jin, Jarvis, Komarytsia, Roman, Ahlheim, Christiane, Zhu, Yonghao, Chaudhary, Simral, Li, Bowen, Ganesh, Saravanan, Byrne, Bill, Hoffmann, Jessica, Mansoor, Hassan, Li, Wei, Rastogi, Abhinav, Dixon, Lucas

arXiv.org Artificial IntelligenceMar-15-2024

Reinforcement Learning from Human Feedback (RLHF) has proven to be a strong method to align Pretrained Large Language Models (LLMs) with human preferences. But training models with RLHF is computationally expensive, and an overall complex process. In this work, we study RLHF where the underlying models are trained using the parameter efficient method of Low-Rank Adaptation (LoRA) introduced by Hu et al. [2021]. We investigate the setup of "Parameter Efficient Reinforcement Learning" (PERL), in which we perform reward model training and reinforcement learning using LoRA. We compare PERL to conventional fine-tuning (full-tuning) across various configurations for 7 benchmarks, including 2 novel datasets, of reward modeling and reinforcement learning. We find that PERL performs on par with the conventional RLHF setting, while training faster, and with less memory. This enables the high performance of RLHF, while reducing the computational burden that limits its adoption as an alignment technique for Large Language Models. We also release 2 novel thumbs up/down preference datasets: "Taskmaster Coffee", and "Taskmaster Ticketing" to promote research around RLHF.

large language model, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2403.10704

Country:

North America > United States (0.93)
Europe (0.67)

Genre: Research Report (1.00)

Industry:

Education (0.46)
Health & Medicine > Therapeutic Area (0.46)
Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Bearing-Angle Approach for Unknown Target Motion Analysis Based on Visual Measurements

Ning, Zian, Zhang, Yin, Li, Jianan, Chen, Zhang, Zhao, Shiyu

arXiv.org Artificial IntelligenceJan-30-2024

Vision-based estimation of the motion of a moving target is usually formulated as a bearing-only estimation problem where the visual measurement is modeled as a bearing vector. Although the bearing-only approach has been studied for decades, a fundamental limitation of this approach is that it requires extra lateral motion of the observer to enhance the target's observability. Unfortunately, the extra lateral motion conflicts with the desired motion of the observer in many tasks. It is well-known that, once a target has been detected in an image, a bounding box that surrounds the target can be obtained. Surprisingly, this common visual measurement especially its size information has not been well explored up to now. In this paper, we propose a new bearing-angle approach to estimate the motion of a target by modeling its image bounding box as bearing-angle measurements. Both theoretical analysis and experimental results show that this approach can significantly enhance the observability without relying on additional lateral motion of the observer. The benefit of the bearing-angle approach comes with no additional cost because a bounding box is a standard output of object detection algorithms. The approach simply exploits the information that has not been fully exploited in the past. No additional sensing devices or special detection algorithms are required.

artificial intelligence, machine learning, observer, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1177/02783649241229172

2401.17117

Country:

Asia > China (0.28)
North America > United States > Rhode Island (0.14)

Genre: Research Report (0.84)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Graphics (0.90)
Information Technology > Artificial Intelligence > Vision (0.88)
(2 more...)

Add feedback

NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions

Chen, Zhang, Li, Zhong, Song, Liangchen, Chen, Lele, Yu, Jingyi, Yuan, Junsong, Xu, Yi

arXiv.org Artificial IntelligenceSep-27-2023

We present a novel type of neural fields that uses general radial bases for signal representation. State-of-the-art neural fields typically rely on grid-based representations for storing local neural features and N-dimensional linear kernels for interpolating features at continuous query points. The spatial positions of their neural features are fixed on grid nodes and cannot well adapt to target signals. Our method instead builds upon general radial bases with flexible kernel position and shape, which have higher spatial adaptivity and can more closely fit target signals. To further improve the channel-wise capacity of radial basis functions, we propose to compose them with multi-frequency sinusoid functions. This technique extends a radial basis to multiple Fourier radial bases of different frequency bands without requiring extra parameters, facilitating the representation of details. Moreover, by marrying adaptive radial bases with grid-based ones, our hybrid combination inherits both adaptivity and interpolation smoothness. We carefully designed weighting schemes to let radial bases adapt to different types of signals effectively. Our experiments on 2D image and 3D signed distance field representation demonstrate the higher accuracy and compactness of our method than prior arts. When applied to neural radiance field reconstruction, our method achieves state-of-the-art rendering quality, with small model size and comparable training speed.

adaptive radial basis function, artificial intelligence, neural field representation, (1 more...)

arXiv.org Artificial Intelligence

2309.15426

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.60)
Information Technology > Artificial Intelligence > Vision (0.53)

Add feedback