Goto

Collaborating Authors

 shenzhen


Thousands of Companies Are Driving China's AI Boom. A Government Registry Tracks Them All

WIRED

Thousands of Companies Are Driving China's AI Boom. How the Cyberspace Administration of China inadvertently made a guide to the country's homegrown AI revolution. When DeepSeek burst onto the global stage in January 2025, it seemed to appear out of nowhere. But the large language model was just one of the thousands of generative AI tools that have been released in China since 2023--and there's a public archive of every single one of them. Here are 23 ways China is rewiring the future .


VC-Agent: An Interactive Agent for Customized Video Dataset Collection

Zhang, Yidan, Xu, Mutian, Hao, Yiming, Zhou, Kun, Chang, Jiahao, Liu, Xiaoqiang, Wan, Pengfei, Fu, Hongbo, Han, Xiaoguang

arXiv.org Artificial Intelligence

Facing scaling laws, video data from the internet becomes increasingly important. However, collecting extensive videos that meet specific needs is extremely labor-intensive and time-consuming. In this work, we study the way to expedite this collection process and propose VC-Agent, the first interactive agent that is able to understand users' queries and feedback, and accordingly retrieve/scale up relevant video clips with minimal user input. Specifically, considering the user interface, our agent defines various user-friendly ways for the user to specify requirements based on textual descriptions and confirmations. As for agent functions, we leverage existing multi-modal large language models to connect the user's requirements with the video content. More importantly, we propose two novel filtering policies that can be updated when user interaction is continually performed. Finally, we provide a new benchmark for personalized video dataset collection, and carefully conduct the user study to verify our agent's usage in various real scenarios. Extensive experiments demonstrate the effectiveness and efficiency of our agent for customized video dataset collection. Project page: https://allenyidan.github.io/vcagent_page/.


Robots in China are riding the subway to make 7-Eleven deliveries

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. Subway commuters in Shenzhen, China, may soon need to make room for a fleet of chunky, snack-carrying delivery robots. Earlier this week, more than three dozen autonomous, four-wheeled delivery robots boarded and exited active subway trains, and eventually delivered packages to several 7-Eleven convenience stores. Although this demonstration was only a preliminary test and took place during off-peak hours, the company behind the subway-riding robots believes they could soon help stock shelves at around 100 7-Eleven locations. The initiative is part of a broader effort in China and other countries to normalize the presence of delivery robots operating in public spaces.


An Object-Based Deep Learning Approach for Building Height Estimation from Single SAR Images

Memar, Babak, Russo, Luigi, Ullo, Silvia Liberata, Gamba, Paolo

arXiv.org Artificial Intelligence

Accurate estimation of building heights using very high resolution (VHR) synthetic aperture radar (SAR) imagery is crucial for various urban applications. This paper introduces a Deep Learning (DL)-based methodology for automated building height estimation from single VHR COSMO-SkyMed images: an object-based regression approach based on bounding box detection followed by height estimation. This model was trained and evaluated on a unique multi-continental dataset comprising eight geographically diverse cities across Europe, North and South America, and Asia, employing a cross-validation strategy to explicitly assess out-of-distribution (OOD) generalization. The results demonstrate highly promising performance, particularly on European cities where the model achieves a Mean Absolute Error (MAE) of approximately one building story (2.20 m in Munich), significantly outperforming recent state-of-the-art methods in similar OOD scenarios. Despite the increased variability observed when generalizing to cities in other continents, particularly in Asia with its distinct urban typologies and prevalence of high-rise structures, this study underscores the significant potential of DL for robust cross-city and cross-continental transfer learning in building height estimation from single VHR SAR data.


Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control

Li, Rongpeng, Zhu, Jianhang, Huang, Jiahao, Zhao, Zhifeng, Zhang, Honggang

arXiv.org Artificial Intelligence

Intelligent Transportation Systems (ITSs) have emerged as a promising solution towards ameliorating urban traffic congestion, with Traffic Signal Control (TSC) identified as a critical component. Although Multi-Agent Reinforcement Learning (MARL) algorithms have shown potential in optimizing TSC through real-time decision-making, their scalability and effectiveness often suffer from large-scale and complex environments. Typically, these limitations primarily stem from a fundamental mismatch between the exponential growth of the state space driven by the environmental heterogeneities and the limited modeling capacity of current solutions. To address these issues, this paper introduces a novel MARL framework that integrates Dynamic Graph Neural Networks (DGNNs) and Topological Data Analysis (TDA), aiming to enhance the expressiveness of environmental representations and improve agent coordination. Furthermore, inspired by the Mixture of Experts (MoE) architecture in Large Language Models (LLMs), a topology-assisted spatial pattern disentangling (TSD)-enhanced MoE is proposed, which leverages topological signatures to decouple graph features for specialized processing, thus improving the model's ability to characterize dynamic and heterogeneous local observations. The TSD module is also integrated into the policy and value networks of the Multi-agent Proximal Policy Optimization (MAPPO) algorithm, further improving decision-making efficiency and robustness. Extensive experiments conducted on real-world traffic scenarios, together with comprehensive theoretical analysis, validate the superior performance of the proposed framework, highlighting the model's scalability and effectiveness in addressing the complexities of large-scale TSC tasks.


FuXi-Air: Urban Air Quality Forecasting Based on Emission-Meteorology-Pollutant multimodal Machine Learning

Geng, Zhixin, Fan, Xu, Lu, Xiqiao, Zhang, Yan, Yu, Guangyuan, Huang, Cheng, Wang, Qian, Li, Yuewu, Ma, Weichun, Yu, Qi, Wu, Libo, Li, Hao

arXiv.org Artificial Intelligence

Air pollution has emerged as a major public health challenge in megacities. Numerical simulations and single-site machine learning approaches have been widely applied in air quality forecasting tasks. However, these methods face multiple limitations, including high computational costs, low operational efficiency, and limited integration with observational data. With the rapid advancement of artificial intelligence, there is an urgent need to develop a low-cost, efficient air quality forecasting model for smart urban management. An air quality forecasting model, named FuXi-Air, has been constructed in this study based on multimodal data fusion to support high-precision air quality forecasting and operated in typical megacities. The model integrates meteorological forecasts, emission inventories, and pollutant monitoring data under the guidance of air pollution mechanism. By combining an autoregressive prediction framework with a frame interpolation strategy, the model successfully completes 72-hour forecasts for six major air pollutants at an hourly resolution across multiple monitoring sites within 25-30 seconds. In terms of both computational efficiency and forecasting accuracy, it outperforms the mainstream numerical air quality models in operational forecasting work. Ablation experiments concerning key influencing factors show that although meteorological data contribute more to model accuracy than emission inventories do, the integration of multimodal data significantly improves forecasting precision and ensures that reliable predictions are obtained under differing pollution mechanisms across megacities. This study provides both a technical reference and a practical example for applying multimodal data-driven models to air quality forecasting and offers new insights into building hybrid forecasting systems to support air pollution risk warning in smart city management.


Humanoid workers and surveillance buggies: 'embodied AI' is reshaping daily life in China

The Guardian

On a misty Saturday afternoon in Shenzhen's Central Park, a gaggle of teenage girls are sheltering from the drizzle under a concrete canopy. With their bags of crisps piled high in front of them, they crowd around a couple of smartphones to sing along to Mandopop ballads. The sound of their laughter rings out across the surrounding lawn – until it is pierced by a mechanical buzzing sound. A few metres away from the impromptu karaoke session is an "airdrop cabinet", one of more than 40 in Shenzhen that is operated by Meituan, China's biggest food delivery platform. Hungry park-goers can order anything from rice noodles to Subway sandwiches to bubble tea.


Multi-Objective Large Language Model Unlearning

Pan, Zibin, Zhang, Shuwen, Zheng, Yuesheng, Li, Chi, Cheng, Yuheng, Zhao, Junhua

arXiv.org Artificial Intelligence

Machine unlearning in the domain of large language models (LLMs) has attracted great attention recently, which aims to effectively eliminate undesirable behaviors from LLMs without full retraining from scratch. In this paper, we explore the Gradient Ascent (GA) approach in LLM unlearning, which is a proactive way to decrease the prediction probability of the model on the target data in order to remove their influence. We analyze two challenges that render the process impractical: gradient explosion and catastrophic forgetting. To address these issues, we propose Multi-Objective Large Language Model Unlearning (MOLLM) algorithm. We first formulate LLM unlearning as a multi-objective optimization problem, in which the cross-entropy loss is modified to the unlearning version to overcome the gradient explosion issue. A common descent update direction is then calculated, which enables the model to forget the target data while preserving the utility of the LLM. Our empirical results verify that MoLLM outperforms the SOTA GA-based LLM unlearning methods in terms of unlearning effect and model utility preservation. The source code is available at https://github.com/zibinpan/MOLLM.


Loss-Aware Curriculum Learning for Chinese Grammatical Error Correction

Zhang, Ding, Li, Yangning, Bai, Lichen, Zhang, Hao, Li, Yinghui, Lin, Haiye, Zheng, Hai-Tao, Su, Xin, Shan, Zifei

arXiv.org Artificial Intelligence

Chinese grammatical error correction (CGEC) aims to detect and correct errors in the input Chinese sentences. Recently, Pre-trained Language Models (PLMS) have been employed to improve the performance. However, current approaches ignore that correction difficulty varies across different instances and treat these samples equally, enhancing the challenge of model learning. To address this problem, we propose a multi-granularity Curriculum Learning (CL) framework. Specifically, we first calculate the correction difficulty of these samples and feed them into the model from easy to hard batch by batch. Then Instance-Level CL is employed to help the model optimize in the appropriate direction automatically by regulating the loss function. Extensive experimental results and comprehensive analyses of various datasets prove the effectiveness of our method.


Adaptive Conditional Expert Selection Network for Multi-domain Recommendation

Dong, Kuiyao, Lou, Xingyu, Liu, Feng, Wang, Ruian, Yu, Wenyi, Wang, Ping, Wang, Jun

arXiv.org Artificial Intelligence

Mixture-of-Experts (MOE) has recently become the de facto standard in Multi-domain recommendation (MDR) due to its powerful expressive ability. However, such MOE-based method typically employs all experts for each instance, leading to scalability issue and low-discriminability between domains and experts. Furthermore, the design of commonly used domain-specific networks exacerbates the scalability issues. To tackle the problems, We propose a novel method named CESAA consists of Conditional Expert Selection (CES) Module and Adaptive Expert Aggregation (AEA) Module to tackle these challenges. Specifically, CES first combines a sparse gating strategy with domain-shared experts. Then AEA utilizes mutual information loss to strengthen the correlations between experts and specific domains, and significantly improve the distinction between experts. As a result, only domain-shared experts and selected domain-specific experts are activated for each instance, striking a balance between computational efficiency and model performance. Experimental results on both public ranking and industrial retrieval datasets verify the effectiveness of our method in MDR tasks.