AITopics | Li, Xinyang

Collaborating Authors

Li, Xinyang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Evolving High-Quality Rendering and Reconstruction in a Unified Framework with Contribution-Adaptive Regularization

Shen, You, Zhang, Zhipeng, Li, Xinyang, Qu, Yansong, Lin, Yu, Zhang, Shengchuan, Cao, Liujuan

arXiv.org Artificial IntelligenceMar-2-2025

Representing 3D scenes from multiview images is a core challenge in computer vision and graphics, which requires both precise rendering and accurate reconstruction. Recently, 3D Gaussian Splatting (3DGS) has garnered significant attention for its high-quality rendering and fast inference speed. Yet, due to the unstructured and irregular nature of Gaussian point clouds, ensuring accurate geometry reconstruction remains difficult. Existing methods primarily focus on geometry regularization, with common approaches including primitive-based and dual-model frameworks. However, the former suffers from inherent conflicts between rendering and reconstruction, while the latter is computationally and storage-intensive. To address these challenges, we propose CarGS, a unified model leveraging Contribution-adaptive regularization to achieve simultaneous, high-quality rendering and surface reconstruction. The essence of our framework is learning adaptive contribution for Gaussian primitives by squeezing the knowledge from geometry regularization into a compact MLP. Additionally, we introduce a geometry-guided densification strategy with clues from both normals and Signed Distance Fields (SDF) to improve the capability of capturing high-frequency details. Our design improves the mutual learning of the two tasks, meanwhile its unified structure does not require separate models as in dual-model based approaches, guaranteeing efficiency. Extensive experiments demonstrate the ability to achieve state-of-the-art (SOTA) results in both rendering fidelity and reconstruction accuracy while maintaining real-time speed and minimal storage size.

artificial intelligence, machine learning, reconstruction, (18 more...)

arXiv.org Artificial Intelligence

2503.00881

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.34)

Add feedback

Learning to Predict Global Atrial Fibrillation Dynamics from Sparse Measurements

Jenkins, Alexander, Cini, Andrea, Barker, Joseph, Sharp, Alexander, Sau, Arunashis, Valentine, Varun, Valasang, Srushti, Li, Xinyang, Wong, Tom, Betts, Timothy, Mandic, Danilo, Alippi, Cesare, Ng, Fu Siong

arXiv.org Artificial IntelligenceFeb-14-2025

Catheter ablation of Atrial Fibrillation (AF) consists of a one-size-fits-all treatment with limited success in persistent AF. This may be due to our inability to map the dynamics of AF with the limited resolution and coverage provided by sequential contact mapping catheters, preventing effective patient phenotyping for personalised, targeted ablation. Here we introduce FibMap, a graph recurrent neural network model that reconstructs global AF dynamics from sparse measurements. Trained and validated on 51 non-contact whole atria recordings, FibMap reconstructs whole atria dynamics from 10% surface coverage, achieving a 210% lower mean absolute error and an order of magnitude higher performance in tracking phase singularities compared to baseline methods. Clinical utility of FibMap is demonstrated on real-world contact mapping recordings, achieving reconstruction fidelity comparable to non-contact mapping. FibMap's state-spaces and patient-specific parameters offer insights for electrophenotyping AF. Integrating FibMap into clinical practice could enable personalised AF care and improve outcomes.

artificial intelligence, machine learning, predict global atrial fibrillation dynamic, (1 more...)

arXiv.org Artificial Intelligence

2502.09473

Genre: Research Report (0.69)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)

Add feedback

Generative AI for Cel-Animation: A Survey

Tang, Yunlong, Guo, Junjia, Liu, Pinxin, Wang, Zhiyuan, Hua, Hang, Zhong, Jia-Xing, Xiao, Yunzhong, Huang, Chao, Song, Luchuan, Liang, Susan, Song, Yizhi, He, Liu, Bi, Jing, Feng, Mingqian, Li, Xinyang, Zhang, Zeliang, Xu, Chenliang

arXiv.org Artificial IntelligenceJan-8-2025

Traditional Celluloid (Cel) Animation production pipeline encompasses multiple essential steps, including storyboarding, layout design, keyframe animation, inbetweening, and colorization, which demand substantial manual effort, technical expertise, and significant time investment. These challenges have historically impeded the efficiency and scalability of Cel-Animation production. The rise of generative artificial intelligence (GenAI), encompassing large language models, multimodal models, and diffusion models, offers innovative solutions by automating tasks such as inbetween frame generation, colorization, and storyboard creation. This survey explores how GenAI integration is revolutionizing traditional animation workflows by lowering technical barriers, broadening accessibility for a wider range of creators through tools like AniDoc, ToonCrafter, and AniSora, and enabling artists to focus more on creative expression and artistic innovation. Despite its potential, issues such as maintaining visual consistency, ensuring stylistic coherence, and addressing ethical considerations continue to pose challenges. Furthermore, this paper discusses future directions and explores potential advancements in AI-assisted animation. For further exploration and resources, please visit our GitHub repository: https://github.com/yunlong10/Awesome-AI4Animation

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.0625

Country: Asia > Japan > Honshū (0.14)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.34)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Graphics > Animation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.72)

Add feedback

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

Tang, Yunlong, Guo, Junjia, Hua, Hang, Liang, Susan, Feng, Mingqian, Li, Xinyang, Mao, Rui, Huang, Chao, Bi, Jing, Zhang, Zeliang, Fazli, Pooyan, Xu, Chenliang

arXiv.org Artificial IntelligenceNov-25-2024

The advancement of Multimodal Large Language Models (MLLMs) has enabled significant progress in multimodal understanding, expanding their capacity to analyze video content. However, existing evaluation benchmarks for MLLMs primarily focus on abstract video comprehension, lacking a detailed assessment of their ability to understand video compositions, the nuanced interpretation of how visual elements combine and interact within highly compiled video contexts. We introduce VidComposition, a new benchmark specifically designed to evaluate the video composition understanding capabilities of MLLMs using carefully curated compiled videos and cinematic-level annotations. VidComposition includes 982 videos with 1706 multiple-choice questions, covering various compositional aspects such as camera movement, angle, shot size, narrative structure, character actions and emotions, etc. Our comprehensive evaluation of 33 open-source and proprietary MLLMs reveals a significant performance gap between human and model capabilities. This highlights the limitations of current MLLMs in understanding complex, compiled video compositions and offers insights into areas for further improvement. The leaderboard and evaluation code are available at https://yunlong10.github.io/VidComposition/.

benchmark, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2411.10979

Country: North America > United States (0.28)

Genre:

Research Report (0.50)
Questionnaire & Opinion Survey (0.34)

Industry:

Media > Film (0.49)
Media > Television (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State Space Model and Multi-View Matching

Yao, Gongxin, Li, Xinyang, Fu, Luowei, Pan, Yu

arXiv.org Artificial IntelligenceOct-8-2024

Achieving monocular camera localization within pre-built LiDAR maps can bypass the simultaneous mapping process of visual SLAM systems, potentially reducing the computational overhead of autonomous localization. To this end, one of the key challenges is cross-modal place recognition, which involves retrieving 3D scenes (point clouds) from a LiDAR map according to online RGB images. In this paper, we introduce an efficient framework to learn descriptors for both RGB images and point clouds. It takes visual state space model (VMamba) as the backbone and employs a pixel-view-scene joint training strategy for cross-modal contrastive learning. To address the field-of-view differences, independent descriptors are generated from multiple evenly distributed viewpoints for point clouds. A visible 3D points overlap strategy is then designed to quantify the similarity between point cloud views and RGB images for multi-view supervision. Additionally, when generating descriptors from pixel-level features using NetVLAD, we compensate for the loss of geometric information, and introduce an efficient scheme for multi-view generation. Experimental results on the KITTI and KITTI-360 datasets demonstrate the effectiveness and generalization of our method. The code will be released upon acceptance.

artificial intelligence, machine learning, point cloud, (14 more...)

arXiv.org Artificial Intelligence

2410.06285

Country: Asia > China (0.14)

Genre:

Instructional Material (0.46)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Morphology and Behavior Co-Optimization of Modular Satellites for Attitude Control

Wang, Yuxing, Li, Jie, Yu, Cong, Li, Xinyang, Huang, Simeng, Chang, Yongzhe, Wang, Xueqian, Liang, Bin

arXiv.org Artificial IntelligenceSep-19-2024

The emergence of modular satellites marks a significant transformation in spacecraft engineering, introducing a new paradigm of flexibility, resilience, and scalability in space exploration endeavors. In addressing complex challenges such as attitude control, both the satellite's morphological architecture and the controller are crucial for optimizing performance. Despite substantial research on optimal control, there remains a significant gap in developing optimized and practical assembly strategies for modular satellites tailored to specific mission constraints. This research gap primarily arises from the inherently complex nature of co-optimizing design and control, a process known for its notorious bi-level optimization loop. Conventionally tackled through artificial evolution, this issue involves optimizing the morphology based on the fitness of individual controllers, which is sample-inefficient and computationally expensive. In this paper, we introduce a novel gradient-based approach to simultaneously optimize both morphology and control for modular satellites, enhancing their performance and efficiency in attitude control missions. Our Monte Carlo simulations demonstrate that this co-optimization approach results in modular satellites with better mission performance compared to those designed by evolution-based approaches. Furthermore, this study discusses potential avenues for future research.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2409.13166

Country: Asia > China > Guangdong Province (0.16)

Genre: Research Report (1.00)

Industry: Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models

Djuhera, Aladin, Andrei, Vlad C., Li, Xinyang, Mönich, Ullrich J., Boche, Holger, Saad, Walid

arXiv.org Artificial IntelligenceJul-16-2024

Split federated learning (SFL) is a compute-efficient paradigm in distributed machine learning (ML), where components of large ML models are outsourced to remote servers. A significant challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming that could jeopardize the learning process. This is particularly pronounced for word embedding parameters in large language models (LLMs), which are crucial for language understanding. In this paper, rigorous insights are provided into the influence of jamming LLM word embeddings in SFL by deriving an expression for the ML training loss divergence and showing that it is upper-bounded by the mean squared error (MSE). Based on this analysis, a physical layer framework is developed for resilient SFL with LLMs (R-SFLLM) over wireless networks. R-SFLLM leverages wireless sensing data to gather information on the jamming directions-of-arrival (DoAs) for the purpose of devising a novel, sensing-assisted anti-jamming strategy while jointly optimizing beamforming, user scheduling, and resource allocation. Extensive experiments using BERT and RoBERTa models demonstrate R-SFLLM's effectiveness, achieving close-to-baseline performance across various natural language processing (NLP) tasks and datasets. The proposed methodology further introduces an adversarial training component, where controlled noise exposure significantly enhances the LLM's resilience to perturbed parameters during training. The results show that more noise-sensitive models, such as RoBERTa, benefit from this feature, especially when resource allocation is unfair. It is also shown that worst-case jamming in particular translates into worst-case model outcomes, thereby necessitating the need for jamming-resilient SFL protocols.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2407.11654

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Digital Twinning Platform for Integrated Sensing, Communications and Robotics

Andrei, Vlad C., Li, Xinyang, Fees, Maresa, Feik, Andreas, Mönich, Ullrich J., Boche, Holger

arXiv.org Artificial IntelligenceFeb-23-2024

In this paper, a digital twinning framework for indoor integrated sensing, communications, and robotics is proposed, designed, and implemented. Besides leveraging powerful robotics and ray-tracing technologies, the framework also enables integration with real-world sensors and reactive updates triggered by changes in the environment. The framework is designed with commercial, off-the-shelf components in mind, thus facilitating experimentation in the different areas of communication, sensing, and robotics. Experimental results showcase the feasibility and accuracy of indoor localization using digital twins and validate our implementation both qualitatively and quantitatively.

artificial intelligence, communication, robot, (14 more...)

arXiv.org Artificial Intelligence

2402.15191

Country:

Europe > Germany (0.29)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Industry:

Telecommunications (0.48)
Information Technology (0.48)
Energy > Oil & Gas (0.35)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

VRMN-bD: A Multi-modal Natural Behavior Dataset of Immersive Human Fear Responses in VR Stand-up Interactive Games

Zhang, He, Li, Xinyang, Sun, Yuanxi, Fu, Xinyi, Qiu, Christine, Carroll, John M.

arXiv.org Artificial IntelligenceJan-22-2024

Understanding and recognizing emotions are important and challenging issues in the metaverse era. Understanding, identifying, and predicting fear, which is one of the fundamental human emotions, in virtual reality (VR) environments plays an essential role in immersive game development, scene development, and next-generation virtual human-computer interaction applications. In this article, we used VR horror games as a medium to analyze fear emotions by collecting multi-modal data (posture, audio, and physiological signals) from 23 players. We used an LSTM-based model to predict fear with accuracies of 65.31% and 90.47% under 6-level classification (no fear and five different levels of fear) and 2-level classification (no fear and fear), respectively. We constructed a multi-modal natural behavior dataset of immersive human fear responses (VRMN-bD) and compared it with existing relevant advanced datasets. The results show that our dataset has fewer limitations in terms of collection method, data scale and audience scope. We are unique and advanced in targeting multi-modal datasets of fear and behavior in VR stand-up interactive environments. Moreover, we discussed the implications of this work for communities and applications. The dataset and pre-trained model are available at https://github.com/KindOPSTAR/VRMN-bD.

artificial intelligence, human computer interaction, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2401.12133

Country: North America > United States > New York (0.14)

Genre:

Research Report > New Finding (0.88)
Research Report > Experimental Study (0.67)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)

Add feedback

MGR: Multi-generator Based Rationalization

Liu, Wei, Wang, Haozhao, Wang, Jun, Li, Ruixuan, Li, Xinyang, Zhang, Yuankai, Qiu, Yang

arXiv.org Artificial IntelligenceJul-23-2023

Rationalization is to employ a generator and a predictor to construct a self-explaining NLP model in which the generator selects a subset of human-intelligible pieces of the input text to the following predictor. However, rationalization suffers from two key challenges, i.e., spurious correlation and degeneration, where the predictor overfits the spurious or meaningless pieces solely selected by the not-yet well-trained generator and in turn deteriorates the generator. Although many studies have been proposed to address the two challenges, they are usually designed separately and do not take both of them into account. In this paper, we propose a simple yet effective method named MGR to simultaneously solve the two problems. The key idea of MGR is to employ multiple generators such that the occurrence stability of real pieces is improved and more meaningful pieces are delivered to the predictor. Empirically, we show that MGR improves the F1 score by up to 20.9% as compared to state-of-the-art methods. Codes are available at https://github.com/jugechengzi/Rationalization-MGR .

generator, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.04492

Country:

Europe (0.93)
Asia > Middle East > Qatar (0.14)
Asia > China > Hubei Province (0.14)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback