AITopics | Gao, Wen

Collaborating Authors

Gao, Wen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OpenEarthSensing: Large-Scale Fine-Grained Benchmark for Open-World Remote Sensing

Xiang, Xiang, Xu, Zhuo, Deng, Yao, Zhou, Qinhao, Liang, Yifan, Chen, Ke, Zheng, Qingfang, Wang, Yaowei, Chen, Xilin, Gao, Wen

arXiv.org Artificial IntelligenceFeb-27-2025

In open-world remote sensing, deployed models must continuously adapt to a steady influx of new data, which often exhibits various shifts compared to what the model encountered during the training phase. To effectively handle the new data, models are required to detect semantic shifts, adapt to covariate shifts, and continuously update themselves. These challenges give rise to a variety of open-world tasks. However, existing open-world remote sensing studies typically train and test within a single dataset to simulate open-world conditions. Currently, there is a lack of large-scale benchmarks capable of evaluating multiple open-world tasks. In this paper, we introduce OpenEarthSensing, a large-scale fine-grained benchmark for open-world remote sensing. OpenEarthSensing includes 189 scene and objects categories, covering the vast majority of potential semantic shifts that may occur in the real world. Additionally, OpenEarthSensing encompasses five data domains with significant covariate shifts, including two RGB satellite domians, one RGB aerial domian, one MS RGB domian, and one infrared domian. The various domains provide a more comprehensive testbed for evaluating the generalization performance of open-world models. We conduct the baseline evaluation of current mainstream open-world tasks and methods on OpenEarthSensing, demonstrating that it serves as a challenging benchmark for open-world remote sensing.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.20668

Country:

Asia > China (0.46)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.50)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI

Liu, Yang, Chen, Weixing, Bai, Yongjie, Luo, Jingzhou, Song, Xinshuai, Jiang, Kaixuan, Li, Zhida, Zhao, Ganlong, Lin, Junyi, Li, Guanbin, Gao, Wen, Lin, Liang

arXiv.org Artificial IntelligenceJul-18-2024

Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace and the physical world. Recently, the emergence of Multi-modal Large Models (MLMs) and World Models (WMs) have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilities, making them a promising architecture for the brain of embodied agents. However, there is no comprehensive survey for Embodied AI in the era of MLMs. In this survey, we give a comprehensive exploration of the latest advancements in Embodied AI. Our analysis firstly navigates through the forefront of representative works of embodied robots and simulators, to fully understand the research focuses and their limitations. Then, we analyze four main research targets: 1) embodied perception, 2) embodied interaction, 3) embodied agent, and 4) sim-to-real adaptation, covering the state-of-the-art methods, essential paradigms, and comprehensive datasets. Additionally, we explore the complexities of MLMs in virtual and real embodied agents, highlighting their significance in facilitating interactions in dynamic digital and physical environments. Finally, we summarize the challenges and limitations of embodied AI and discuss their potential future directions. We hope this survey will serve as a foundational reference for the research community and inspire continued innovation. The associated project can be found at https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List.

large language model, machine learning, reinforcement learning, (22 more...)

arXiv.org Artificial Intelligence

2407.06886

Country: Asia > China > Guangdong Province (0.14)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.65)

Industry:

Health & Medicine (1.00)
Education > Educational Setting (0.67)
Information Technology > Robotics & Automation (0.45)
Energy > Oil & Gas (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)
Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
(6 more...)

Add feedback

Understanding is Compression

Li, Ziguang, Huang, Chao, Wang, Xuliang, Hu, Haibo, Wyeth, Cole, Bu, Dongbo, Yu, Quan, Gao, Wen, Liu, Xingwu, Li, Ming

arXiv.org Artificial IntelligenceJun-23-2024

We have previously shown all understanding or learning are compression, under reasonable assumptions. In principle, better understanding of data should improve data compression. Traditional compression methodologies focus on encoding frequencies or some other computable properties of data. Large language models approximate the uncomputable Solomonoff distribution, opening up a whole new avenue to justify our theory. Under the new uncomputable paradigm, we present LMCompress based on the understanding of data using large models. LMCompress has significantly better lossless compression ratios than all other lossless data compression methods, doubling the compression ratios of JPEG-XL for images, FLAC for audios and H264 for videos, and tripling or quadrupling the compression ratio of bz2 for texts. The better a large model understands the data, the better LMCompress compresses.

artificial intelligence, compression, natural language

arXiv.org Artificial Intelligence

2407.07723

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.53)

Add feedback

VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

Zhang, Jie, Wang, Sibo, Cao, Xiangkui, Yuan, Zheng, Shan, Shiguang, Chen, Xilin, Gao, Wen

arXiv.org Artificial IntelligenceJun-20-2024

The emergence of Large Vision-Language Models (LVLMs) marks significant strides towards achieving general artificial intelligence. However, these advancements are tempered by the outputs that often reflect biases, a concern not yet extensively investigated. Existing benchmarks are not sufficiently comprehensive in evaluating biases due to their limited data scale, single questioning format and narrow sources of bias. To address this problem, we introduce VLBiasBench, a benchmark aimed at evaluating biases in LVLMs comprehensively. In VLBiasBench, we construct a dataset encompassing nine distinct categories of social biases, including age, disability status, gender, nationality, physical appearance, race, religion, profession, social economic status and two intersectional bias categories (race x gender, and race x social economic status). To create a large-scale dataset, we use Stable Diffusion XL model to generate 46,848 high-quality images, which are combined with different questions to form 128,342 samples. These questions are categorized into open and close ended types, fully considering the sources of bias and comprehensively evaluating the biases of LVLM from multiple perspectives. We subsequently conduct extensive evaluations on 15 open-source models as well as one advanced closed-source model, providing some new insights into the biases revealing from these models. Our benchmark is available at https://github.com/Xiangkui-Cao/VLBiasBench.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.14194

Country:

Africa (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.81)

Industry:

Health & Medicine (0.92)
Media > Music (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Stream State-tying for Sign Language Recognition

Ma, Jiyong, Gao, Wen, Wang, Chunli

arXiv.org Artificial IntelligenceApr-21-2024

It is a kind of visual language via hand and arm movements accompanying facial expression and lip motion. The facial expression and lip motion are less important than hand gestures in sign language, but they may help to understand some hand gestures. Digitized devices can be used to measure the temporal and spatial information of hand gestures, the typical devices are data gloves, position trackers. In this paper, we use two CyberGloves and a position tracker, i.e., Pohelmus 3SPACE with two receivers positioned on the wrist of each CyberGlove and one fixed at thorax as input devices to measure gestures. Chinese sign language is classified into two categories. One is hand gesture in which each gesture corresponds to a Chinese phrase. The other is fingerspelling in which each alphabet corresponds to a posture, and each Chinese sign corresponds to several postures performed continuously.

artificial intelligence, probability, recognition, (15 more...)

arXiv.org Artificial Intelligence

2407.10975

Country: Asia > China (0.47)

Genre: Research Report (0.64)

Industry: Education > Curriculum > Subject-Specific Education (0.87)

Technology: Information Technology > Artificial Intelligence > Vision > Gesture Recognition (1.00)

Add feedback

IME: Integrating Multi-curvature Shared and Specific Embedding for Temporal Knowledge Graph Completion

Wang, Jiapu, Cui, Zheng, Wang, Boyue, Pan, Shirui, Gao, Junbin, Yin, Baocai, Gao, Wen

arXiv.org Artificial IntelligenceMar-28-2024

Temporal Knowledge Graphs (TKGs) incorporate a temporal dimension, allowing for a precise capture of the evolution of knowledge and reflecting the dynamic nature of the real world. Typically, TKGs contain complex geometric structures, with various geometric structures interwoven. However, existing Temporal Knowledge Graph Completion (TKGC) methods either model TKGs in a single space or neglect the heterogeneity of different curvature spaces, thus constraining their capacity to capture these intricate geometric structures. In this paper, we propose a novel Integrating Multi-curvature shared and specific Embedding (IME) model for TKGC tasks. Concretely, IME models TKGs into multi-curvature spaces, including hyperspherical, hyperbolic, and Euclidean spaces. Subsequently, IME incorporates two key properties, namely space-shared property and space-specific property. The space-shared property facilitates the learning of commonalities across different curvature spaces and alleviates the spatial gap caused by the heterogeneous nature of multi-curvature spaces, while the space-specific property captures characteristic features. Meanwhile, IME proposes an Adjustable Multi-curvature Pooling (AMP) approach to effectively retain important information. Furthermore, IME innovatively designs similarity, difference, and structure loss functions to attain the stated objective. Experimental results clearly demonstrate the superior performance of IME over existing state-of-the-art TKGC models.

machine learning, relation, temporal reasoning, (13 more...)

arXiv.org Artificial Intelligence

2403.19881

Country:

Asia > Singapore (0.18)
Asia > China (0.16)
Oceania > Australia (0.14)
North America > United States (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (0.83)

Add feedback

AI Alignment: A Comprehensive Survey

Ji, Jiaming, Qiu, Tianyi, Chen, Boyuan, Zhang, Borong, Lou, Hantao, Wang, Kaile, Duan, Yawen, He, Zhonghao, Zhou, Jiayi, Zhang, Zhaowei, Zeng, Fanzhi, Ng, Kwan Yee, Dai, Juntao, Pan, Xuehai, O'Gara, Aidan, Lei, Yingshan, Xu, Hua, Tse, Brian, Fu, Jie, McAleer, Stephen, Yang, Yaodong, Wang, Yizhou, Zhu, Song-Chun, Guo, Yike, Gao, Wen

arXiv.org Artificial IntelligenceJan-2-2024

AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey, we delve into the core concepts, methodology, and practice of alignment. First, we identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality (RICE). Guided by these four principles, we outline the landscape of current alignment research and decompose them into two key components: forward alignment and backward alignment. The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks. On forward alignment, we discuss techniques for learning from feedback and learning under distribution shift. On backward alignment, we discuss assurance techniques and governance practices. We also release and continually update the website (www.alignmentsurvey.com) which features tutorials, collections of papers, blog posts, and other resources.

large language model, machine learning, simulation of human behavior, (32 more...)

arXiv.org Artificial Intelligence

2310.19852

Country:

North America > United States > California (1.00)
Asia > Middle East (0.67)
Europe > United Kingdom > England > Greater London > London (0.27)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Transportation (1.00)
Social Sector (1.00)
Law (1.00)
(10 more...)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
(18 more...)

Add feedback

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Wang, Xiao, Chen, Guangyao, Qian, Guangwu, Gao, Pengcheng, Wei, Xiao-Yong, Wang, Yaowei, Tian, Yonghong, Gao, Wen

arXiv.org Artificial IntelligenceOct-31-2023

With the urgent demand for generalized deep models, many pre-trained big models are proposed, such as BERT, ViT, GPT, etc. Inspired by the success of these models in single domains (like computer vision and natural language processing), the multi-modal pre-trained big models have also drawn more and more attention in recent years. In this work, we give a comprehensive survey of these models and hope this paper could provide new insights and helps fresh researchers to track the most cutting-edge works. Specifically, we firstly introduce the background of multi-modal pre-training by reviewing the conventional deep learning, pre-training works in natural language process, computer vision, and speech. Then, we introduce the task definition, key challenges, and advantages of multi-modal pre-training models (MM-PTMs), and discuss the MM-PTMs with a focus on data, objectives, network architectures, and knowledge enhanced pre-training. After that, we introduce the downstream tasks used for the validation of large-scale MM-PTMs, including generative, classification, and regression tasks. We also give visualization and analysis of the model parameters and results on representative downstream tasks. Finally, we point out possible research directions for this topic that may benefit future works. In addition, we maintain a continuously updated paper list for large-scale pre-trained multi-modal big models: https://github.com/wangxiao5791509/MultiModal_BigModels_Survey

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2302.10035

Country:

Asia > China (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Overview (1.00)

Industry:

Leisure & Entertainment (1.00)
Education (0.92)
Media (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Intelligence-Endogenous Management Platform for Computing and Network Convergence

Hong, Zicong, Qiu, Xiaoyu, Lin, Jian, Chen, Wuhui, Yu, Yue, Wang, Hui, Guo, Song, Gao, Wen

arXiv.org Artificial IntelligenceAug-7-2023

Massive emerging applications are driving demand for the ubiquitous deployment of computing power today. This trend not only spurs the recent popularity of the \emph{Computing and Network Convergence} (CNC), but also introduces an urgent need for the intelligentization of a management platform to coordinate changing resources and tasks in the CNC. Therefore, in this article, we present the concept of an intelligence-endogenous management platform for CNCs called \emph{CNC brain} based on artificial intelligence technologies. It aims at efficiently and automatically matching the supply and demand with high heterogeneity in a CNC via four key building blocks, i.e., perception, scheduling, adaptation, and governance, throughout the CNC's life cycle. Their functionalities, goals, and challenges are presented. To examine the effectiveness of the proposed concept and framework, we also implement a prototype for the CNC brain based on a deep reinforcement learning technology. Also, it is evaluated on a CNC testbed that integrates two open-source and popular frameworks (OpenFaas and Kubernetes) and a real-world business dataset provided by Microsoft Azure. The evaluation results prove the proposed method's effectiveness in terms of resource utilization and performance. Finally, we highlight the future research directions of the CNC brain.

cloud computing, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2308.0345

Country: Asia (0.14)

Genre: Research Report (0.84)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Survey on Temporal Knowledge Graph Completion: Taxonomy, Progress, and Prospects

Wang, Jiapu, Wang, Boyue, Qiu, Meikang, Pan, Shirui, Xiong, Bo, Liu, Heng, Luo, Linhao, Liu, Tengfei, Hu, Yongli, Yin, Baocai, Gao, Wen

arXiv.org Artificial IntelligenceAug-4-2023

Temporal characteristics are prominently evident in a substantial volume of knowledge, which underscores the pivotal role of Temporal Knowledge Graphs (TKGs) in both academia and industry. However, TKGs often suffer from incompleteness for three main reasons: the continuous emergence of new knowledge, the weakness of the algorithm for extracting structured information from unstructured data, and the lack of information in the source dataset. Thus, the task of Temporal Knowledge Graph Completion (TKGC) has attracted increasing attention, aiming to predict missing items based on the available information. In this paper, we provide a comprehensive review of TKGC methods and their details. Specifically, this paper mainly consists of three components, namely, 1)Background, which covers the preliminaries of TKGC methods, loss functions required for training, as well as the dataset and evaluation protocol; 2)Interpolation, that estimates and predicts the missing elements or set of elements through the relevant available information. It further categorizes related TKGC methods based on how to process temporal information; 3)Extrapolation, which typically focuses on continuous TKGs and predicts future events, and then classifies all extrapolation methods based on the algorithms they utilize. We further pinpoint the challenges and discuss future research directions of TKGC.

information, machine learning, temporal reasoning, (17 more...)

arXiv.org Artificial Intelligence

2308.02457

Country:

North America > United States (0.28)
Asia > China (0.28)
Oceania > Australia (0.28)

Genre: Overview (1.00)

Industry:

Health & Medicine (0.46)
Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.87)

Add feedback