AITopics | Wang, Yan

Collaborating Authors

Wang, Yan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Causal Explainable Guardrails for Large Language Models

Chu, Zhixuan, Wang, Yan, Li, Longfei, Wang, Zhibo, Qin, Zhan, Ren, Kui

arXiv.org Artificial IntelligenceMay-7-2024

Large Language Models (LLMs) have shown impressive performance in natural language tasks, but their outputs can exhibit undesirable attributes or biases. Existing methods for steering LLMs towards desired attributes often assume unbiased representations and rely solely on steering prompts. However, the representations learned from pre-training can introduce semantic biases that influence the steering process, leading to suboptimal results. We propose LLMGuardaril, a novel framework that incorporates causal analysis and adversarial learning to obtain unbiased steering representations in LLMs. LLMGuardaril systematically identifies and blocks the confounding effects of biases, enabling the extraction of unbiased steering representations. Additionally, it includes an explainable component that provides insights into the alignment between the generated output and the desired direction. Experiments demonstrate LLMGuardaril's effectiveness in steering LLMs towards desired attributes while mitigating biases. Our work contributes to the development of safe and reliable LLMs that align with desired attributes. We discuss the limitations and future research directions, highlighting the need for ongoing research to address the ethical implications of large language models.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2405.0416

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Language-Image Models with 3D Understanding

Cho, Jang Hyun, Ivanovic, Boris, Cao, Yulong, Schmerling, Edward, Wang, Yue, Weng, Xinshuo, Li, Boyi, You, Yurong, Krähenbühl, Philipp, Wang, Yan, Pavone, Marco

arXiv.org Artificial IntelligenceMay-6-2024

Multi-modal large language models (MLLMs) have shown incredible capabilities in a variety of 2D vision and language tasks. We extend MLLMs' perceptual capabilities to ground and reason about images in 3-dimensional space. To that end, we first develop a large-scale pre-training dataset for 2D and 3D called LV3D by combining multiple existing 2D and 3D recognition datasets under a common task formulation: as multi-turn question-answering. Next, we introduce a new MLLM named Cube-LLM and pre-train it on LV3D. We show that pure data scaling makes a strong 3D perception capability without 3D specific architectural design or training objective. Cube-LLM exhibits intriguing properties similar to LLMs: (1) Cube-LLM can apply chain-of-thought prompting to improve 3D understanding from 2D context information. (2) Cube-LLM can follow complex and diverse instructions and adapt to versatile input and output formats. (3) Cube-LLM can be visually prompted such as 2D box or a set of candidate 3D boxes from specialists. Our experiments on outdoor benchmarks demonstrate that Cube-LLM significantly outperforms existing baselines by 21.3 points of AP-BEV on the Talk2Car dataset for 3D grounded reasoning and 17.7 points on the DriveLM dataset for complex reasoning about driving scenarios, respectively. Cube-LLM also shows competitive results in general MLLM benchmarks such as refCOCO for 2D grounding with (87.0) average score, as well as visual question answering benchmarks such as VQAv2, GQA, SQA, POPE, etc. for complex reasoning. Our project is available at https://janghyuncho.github.io/Cube-LLM.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.03685

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.81)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Multi-stream Transmission for Directional Modulation Network via Distributed Multi-UAV-aided Multi-active-IRS

Yang, Ke, Dong, Rongen, Gao, Wei, Shu, Feng, Shi, Weiping, Wang, Yan, Wang, Xuehui, Wang, Jiangzhou

arXiv.org Artificial IntelligenceApr-28-2024

Active intelligent reflecting surface (IRS) is a revolutionary technique for the future 6G networks. The conventional far-field single-IRS-aided directional modulation(DM) networks have only one (no direct path) or two (existing direct path) degrees of freedom (DoFs). This means that there are only one or two streams transmitted simultaneously from base station to user and will seriously limit its rate gain achieved by IRS. How to create multiple DoFs more than two for DM? In this paper, single large-scale IRS is divided to multiple small IRSs and a novel multi-IRS-aided multi-stream DM network is proposed to achieve a point-to-point multi-stream transmission by creating $K$ ($\geq3$) DoFs, where multiple small IRSs are placed distributively via multiple unmanned aerial vehicles (UAVs). The null-space projection, zero-forcing (ZF) and phase alignment are adopted to design the transmit beamforming vector, receive beamforming vector and phase shift matrix (PSM), respectively, called NSP-ZF-PA. Here, $K$ PSMs and their corresponding beamforming vectors are independently optimized. The weighted minimum mean-square error (WMMSE) algorithm is involved in alternating iteration for the optimization variables by introducing the power constraint on IRS, named WMMSE-PC, where the majorization-minimization (MM) algorithm is used to solve the total PSM. To achieve a lower computational complexity, a maximum trace method, called Max-TR-SVD, is proposed by optimize the PSM of all IRSs. Numerical simulation results has shown that the proposed NSP-ZF-PA performs much better than Max-TR-SVD in terms of rate. In particular, the rate of NSP-ZF-PA with sixteen small IRSs is about five times that of NSP-ZF-PA with combining all small IRSs as a single large IRS. Thus, a dramatic rate enhancement may be achieved by multiple distributed IRSs.

artificial intelligence, irs, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2404.15297

Country: North America > United States (1.00)

Genre: Research Report (0.69)

Industry:

Government > Tax (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.70)
Information Technology > Communications > Networks (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Task-Aware Encoder Control for Deep Video Compression

Ge, Xingtong, Luo, Jixiang, Zhang, Xinjie, Xu, Tongda, Lu, Guo, He, Dailan, Geng, Jing, Wang, Yan, Zhang, Jun, Qin, Hongwei

arXiv.org Artificial IntelligenceApr-20-2024

Prior research on deep video compression (DVC) for machine tasks typically necessitates training a unique codec for each specific task, mandating a dedicated decoder per task. In contrast, traditional video codecs employ a flexible encoder controller, enabling the adaptation of a single codec to different tasks through mechanisms like mode prediction. Drawing inspiration from this, we introduce an innovative encoder controller for deep video compression for machines. This controller features a mode prediction and a Group of Pictures (GoP) selection module. Our approach centralizes control at the encoding stage, allowing for adaptable encoder adjustments across different tasks, such as detection and tracking, while maintaining compatibility with a standard pre-trained DVC decoder. Empirical evidence demonstrates that our method is applicable across multiple tasks with various existing pre-trained DVCs. Moreover, extensive experiments demonstrate that our method outperforms previous DVC by about 25% bitrate for different tasks, with only one pre-trained decoder.

artificial intelligence, image understanding, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2404.04848

Country: Asia > China (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.46)

Add feedback

RoT: Enhancing Large Language Models with Reflection on Search Trees

Hui, Wenyang, Jiang, Chengyue, Wang, Yan, Tu, Kewei

arXiv.org Artificial IntelligenceApr-11-2024

Large language models (LLMs) have demonstrated impressive capability in reasoning and planning when integrated with tree-search-based prompting methods. However, since these methods ignore the previous search experiences, they often make the same mistakes in the search process. To address this issue, we introduce Reflection on search Trees (RoT), an LLM reflection framework designed to improve the performance of tree-search-based prompting methods. It uses a strong LLM to summarize guidelines from previous tree search experiences to enhance the ability of a weak LLM. The guidelines are instructions about solving this task through tree search which can prevent the weak LLMs from making similar mistakes in the past search process. In addition, we proposed a novel state selection method, which identifies the critical information from historical search processes to help RoT generate more specific and meaningful guidelines. In our extensive experiments, we find that RoT significantly improves the performance of LLMs in reasoning or planning tasks with various tree-search-based prompting methods (e.g., BFS and MCTS). Non-tree-search-based prompting methods such as Chain-of-Thought (CoT) can also benefit from RoT guidelines since RoT can provide task-specific knowledge collected from the search experience.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2404.05449

Country:

North America > United States (0.14)
North America > Canada (0.14)
Africa > Rwanda (0.14)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Protein Conformation Generation via Force-Guided SE(3) Diffusion Models

Wang, Yan, Wang, Lihao, Shen, Yuning, Wang, Yiqun, Yuan, Huizhuo, Wu, Yue, Gu, Quanquan

arXiv.org Artificial IntelligenceMar-20-2024

The conformational landscape of proteins is crucial to understanding their functionality in complex biological processes. Traditional physics-based computational methods, such as molecular dynamics (MD) simulations, suffer from rare event sampling and long equilibration time problems, hindering their applications in general protein systems. Recently, deep generative modeling techniques, especially diffusion models, have been employed to generate novel protein conformations. However, existing score-based diffusion methods cannot properly incorporate important physical prior knowledge to guide the generation process, causing large deviations in the sampled protein conformations from the equilibrium distribution. In this paper, to overcome these limitations, we propose a force-guided SE(3) diffusion model, ConfDiff, for protein conformation generation. By incorporating a force-guided network with a mixture of data-based score models, ConfDiff can can generate protein conformations with rich diversity while preserving high fidelity. Experiments on a variety of protein conformation prediction tasks, including 12 fast-folding proteins and the Bovine Pancreatic Trypsin Inhibitor (BPTI), demonstrate that our method surpasses the state-of-the-art method.

artificial intelligence, conformation, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2403.14088

Country: North America > United States > California (0.14)

Genre: Research Report > Promising Solution (0.48)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Content-aware Masked Image Modeling Transformer for Stereo Image Compression

Zhang, Xinjie, Gao, Shenyuan, Liu, Zhening, Shao, Jiawei, Ge, Xingtong, He, Dailan, Xu, Tongda, Wang, Yan, Zhang, Jun

arXiv.org Artificial IntelligenceMar-19-2024

Existing learning-based stereo image codec adopt sophisticated transformation with simple entropy models derived from single image codecs to encode latent representations. However, those entropy models struggle to effectively capture the spatial-disparity characteristics inherent in stereo images, which leads to suboptimal rate-distortion results. In this paper, we propose a stereo image compression framework, named CAMSIC. CAMSIC independently transforms each image to latent representation and employs a powerful decoder-free Transformer entropy model to capture both spatial and disparity dependencies, by introducing a novel content-aware masked image modeling (MIM) technique. Our content-aware MIM facilitates efficient bidirectional interaction between prior information and estimated tokens, which naturally obviates the need for an extra Transformer decoder. Experiments show that our stereo image codec achieves state-of-the-art rate-distortion performance on two stereo image datasets Cityscapes and InStereo2K with fast encoding and decoding speed.

artificial intelligence, entropy model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2403.08505

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Industry: Information Technology (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Boosting Neural Representations for Videos with a Conditional Decoder

Zhang, Xinjie, Yang, Ren, He, Dailan, Ge, Xingtong, Xu, Tongda, Wang, Yan, Qin, Hongwei, Zhang, Jun

arXiv.org Artificial IntelligenceMar-16-2024

Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing, showing remarkable versatility across various video tasks. However, existing methods often fail to fully leverage their representation capabilities, primarily due to inadequate alignment of intermediate features during target frame decoding. This paper introduces a universal boosting framework for current implicit video representation approaches. Specifically, we utilize a conditional decoder with a temporal-aware affine transform module, which uses the frame index as a prior condition to effectively align intermediate features with target frames. Besides, we introduce a sinusoidal NeRV-like block to generate diverse intermediate features and achieve a more balanced parameter distribution, thereby enhancing the model's capacity. With a high-frequency information-preserving reconstruction loss, our approach successfully boosts multiple baseline INRs in the reconstruction quality and convergence speed for video regression, and exhibits superior inpainting and interpolation results. Further, we integrate a consistent entropy minimization technique and develop video codecs based on these boosted INRs. Experiments on the UVG dataset confirm that our enhanced codecs significantly outperform baseline INRs and offer competitive rate-distortion performance compared to traditional and learning-based codecs. Code is available at https://github.com/Xinjie-Q/Boosting-NeRV.

artificial intelligence, compression, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2402.18152

Country: Asia > China (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Few-shot Learning on Heterogeneous Graphs: Challenges, Progress, and Prospects

Ding, Pengfei, Wang, Yan, Liu, Guanfeng

arXiv.org Artificial IntelligenceMar-9-2024

Few-shot learning on heterogeneous graphs (FLHG) is attracting more attention from both academia and industry because prevailing studies on heterogeneous graphs often suffer from label sparsity. FLHG aims to tackle the performance degradation in the face of limited annotated data and there have been numerous recent studies proposing various methods and applications. In this paper, we provide a comprehensive review of existing FLHG methods, covering challenges, research progress, and future prospects. Specifically, we first formalize FLHG and categorize its methods into three types: single-heterogeneity FLHG, dual-heterogeneity FLHG, and multi-heterogeneity FLHG. Then, we analyze the research progress within each category, highlighting the most recent and representative developments. Finally, we identify and discuss promising directions for future research in FLHG. To the best of our knowledge, this paper is the first systematic and comprehensive review of FLHG.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2403.13834

Genre: Overview (1.00)

Industry: Information Technology (0.69)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

Wang, Zhe, Fan, Siqi, Huo, Xiaoliang, Xu, Tongda, Wang, Yan, Liu, Jingjing, Chen, Yilun, Zhang, Ya-Qin

arXiv.org Artificial IntelligenceFeb-23-2024

In autonomous driving, cooperative perception makes use of multi-view cameras from both vehicles and infrastructure, providing a global vantage point with rich semantic context of road conditions beyond a single vehicle viewpoint. Currently, two major challenges persist in vehicle-infrastructure cooperative 3D (VIC3D) object detection: $1)$ inherent pose errors when fusing multi-view images, caused by time asynchrony across cameras; $2)$ information loss in transmission process resulted from limited communication bandwidth. To address these issues, we propose a novel camera-based 3D detection framework for VIC3D task, Enhanced Multi-scale Image Feature Fusion (EMIFF). To fully exploit holistic perspectives from both vehicles and infrastructure, we propose Multi-scale Cross Attention (MCA) and Camera-aware Channel Masking (CCM) modules to enhance infrastructure and vehicle features at scale, spatial, and channel levels to correct the pose error introduced by camera asynchrony. We also introduce a Feature Compression (FC) module with channel and spatial compression blocks for transmission efficiency. Experiments show that EMIFF achieves SOTA on DAIR-V2X-C datasets, significantly outperforming previous early-fusion and late-fusion methods with comparable transmission costs.

artificial intelligence, detection, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2402.15272

Country:

Asia > China (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (0.35)
Automobiles & Trucks (0.35)
Information Technology (0.35)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback