Goto

Collaborating Authors

 Shanghai


China's AI-powered humanoid robots aim to transform manufacturing

The Japan Times

In a sprawling warehouse in a Shanghai suburb, dozens of humanoid robots are maneuvered by their operators to carry out tasks like folding a T-shirt, making a sandwich and opening doors, over and over again. Operating 17 hours a day, the site's goal is to generate reams of data that its owner, Chinese humanoid startup AgiBot, uses to train robots it hopes will become ubiquitous and change the way humans live, work and play. "Just imagine that one day in our own robot factory, our robots are assembling themselves," said Yao Maoqing, a partner at AgiBot.


Lenovo's Legion 9i laptop brings powerful glasses-free 3D to creators

PCWorld

Lenovo just introduced its most powerful creative laptop yet: the Legion 9i. Announced at Lenovo Tech World in Shanghai, it's built for serious game developers, 3D artists, and creators who need top-tier performance. With more than half of upcoming titles for current-gen systems built on Unreal Engine, the Legion 9i couldn't have come at a better time. The massive 18-inch PureSight display is something to behold, offering up to either 4K resolution in traditional 2D or 2K resolution in glasses-free 3D. The 3D effect is achieved using eye-tracking tech and a clever lens system, no headset or special 3D glasses needed!


As the US and China lock horns, Malaysia hopes to harness an AI revolution

Al Jazeera

Kulim, Malaysia โ€“ When tech giant AT&S decided a few years ago that it needed to ramp up production to keep pace with the artificial intelligence (AI) boom, it did not look to its largest manufacturing facilities in China. The Austrian firm's plants in Chongqing and Shanghai โ€“ opened in 2022 and 2016, respectively โ€“ employ some 9,000 workers between them, churning out high-end components used in everything from consumer electronics to cars. But AT&S was at the same time coming to grips with the risks of concentrating production in one country. Like many tech firms grappling with the disruption of the COVID-19 pandemic and the trade war salvoes between the United States and China, AT&S decided it needed to diversify its supply chains. Malaysia quickly emerged at the top of the company's list of potential locations for its next plant.


Safe and Sparse Newton Method for Entropic-Regularized Optimal Transport Zihao Tang School of Statistics and Data Science Shanghai University of Finance and Economics

Neural Information Processing Systems

Computational optimal transport (OT) has received massive interests in the machine learning community, and great advances have been gained in the direction of entropic-regularized OT. The Sinkhorn algorithm, as well as its many improved versions, has become the de facto solution to large-scale OT problems. However, most of the existing methods behave like first-order methods, which typically require a large number of iterations to converge. More recently, Newton-type methods using sparsified Hessian matrices have demonstrated promising results on OT computation, but there still remain a lot of unresolved open questions. In this article, we make major new progresses towards this direction: first, we propose a novel Hessian sparsification scheme that promises a strict control of the approximation error; second, based on this sparsification scheme, we develop a safe Newton-type method that is guaranteed to avoid singularity in computing the search directions; third, the developed algorithm has a clear implementation for practical use, avoiding most hyperparameter tuning; and remarkably, we provide rigorous global and local convergence analysis of the proposed algorithm, which is lacking in the prior literature. Various numerical experiments are conducted to demonstrate the effectiveness of the proposed algorithm in solving large-scale OT problems.


Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning Shanghai Jiao Tong University UT Austin Xianyuan Zhan Weinan Zhang

Neural Information Processing Systems

One important property of DIstribution Correction Estimation (DICE) methods is that the solution is the optimal stationary distribution ratio between the optimized and data collection policy. In this work, we show that DICE-based methods can be viewed as a transformation from the behavior distribution to the optimal policy distribution. Based on this, we propose a novel approach, Diffusion-DICE, that directly performs this transformation using diffusion models. We find that the optimal policy's score function can be decomposed into two terms: the behavior policy's score function and the gradient of a guidance term which depends on the optimal distribution ratio. The first term can be obtained from a diffusion model trained on the dataset and we propose an in-sample learning objective to learn the second term.


CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework Yunzhuo Liu Worcester Polytechnic Institute Shanghai Jiao Tong University Bo Jiang

Neural Information Processing Systems

This work presents a novel approach to neural architecture search (NAS) that aims to increase carbon efficiency for the model design process. The proposed framework CE-NAS addresses the key challenge of high carbon cost associated with NAS by exploring the carbon emission variations of energy and energy differences of different NAS algorithms. At the high level, CE-NAS leverages a reinforcement-learning agent to dynamically adjust GPU resources based on carbon intensity, predicted by a time-series transformer, to balance energy-efficient sampling and energy-intensive evaluation tasks. Furthermore, CE-NAS leverages a recently proposed multi-objective optimizer to effectively reduce the NAS search space. We demonstrate the efficacy of CE-NAS in lowering carbon emissions while achieving SOTA results for both NAS benchmarks and open-domain NAS tasks. For example, on the HW-NasBench, CE-NAS reduces carbon emissions by up to 7.22X while maintaining a search efficiency comparable to vanilla NAS. For opendomain NAS tasks, CE-NAS achieves SOTA results with 97.35% top-1 accuracy on CIFAR-10 with only 1.68M parameters and a carbon consumption of 38.53 lbs of CO


Supplementary for Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning

Neural Information Processing Systems

Shanghai Jiao Tong University Shanghai Jiao Tong University enlighten@sjtu.edu.cn In Tab. 1, we conclude the notations in this work for clarity. The size of the premise symbols set M. Logic AND. S is the symbol set, and R is the rule set. A\B The set difference of A and B. D A very large-scale activity images database. A The activity set contains multiple activity classes. A = {A} C The conclusion set contains multiple conclusions. A and C is equivalent. The premise symbols set for activity A. It is implied from an LLM. e The entailment score of a rule. The entailment score threshold to accept/reject a rule.


Aether: Geometric-Aware Unified World Modeling

arXiv.org Artificial Intelligence

The integration of geometric reconstruction and generative modeling remains a critical challenge in developing AI systems capable of human-like spatial reasoning. This paper proposes Aether, a unified framework that enables geometry-aware reasoning in world models by jointly optimizing three core capabilities: (1) 4D dynamic reconstruction, (2) action-conditioned video prediction, and (3) goal-conditioned visual planning. Through task-interleaved feature learning, Aether achieves synergistic knowledge sharing across reconstruction, prediction, and planning objectives. Building upon video generation models, our framework demonstrates unprecedented synthetic-to-real generalization despite never observing real-world data during training. Furthermore, our approach achieves zero-shot generalization in both action following and reconstruction tasks, thanks to its intrinsic geometric modeling. Remarkably, even without real-world data, its reconstruction performance is comparable with or even better than that of domain-specific models. Additionally, Aether employs camera trajectories as geometry-informed action spaces, enabling effective action-conditioned prediction and visual planning. We hope our work inspires the community to explore new frontiers in physically-reasonable world modeling and its applications.


A Comparative Analysis of Word Segmentation, Part-of-Speech Tagging, and Named Entity Recognition for Historical Chinese Sources, 1900-1950

arXiv.org Artificial Intelligence

This paper compares large language models (LLMs) and traditional natural language processing (NLP) tools for performing word segmentation, part-of-speech (POS) tagging, and named entity recognition (NER) on Chinese texts from 1900 to 1950. Historical Chinese documents pose challenges for text analysis due to their logographic script, the absence of natural word boundaries, and significant linguistic changes. Using a sample dataset from the Shanghai Library Republican Journal corpus, traditional tools such as Jieba and spaCy are compared to LLMs, including GPT-4o, Claude 3.5, and the GLM series. The results show that LLMs outperform traditional methods in all metrics, albeit at considerably higher computational costs, highlighting a trade-off between accuracy and efficiency. Additionally, LLMs better handle genre-specific challenges such as poetry and temporal variations (i.e., pre-1920 versus post-1920 texts), demonstrating that their contextual learning capabilities can advance NLP approaches to historical texts by reducing the need for domain-specific training data.


MoVie: Visual Model-Based Policy Adaptation for View Generalization Sizhe Yang 12 Shanghai Qi Zhi Institute, 2

Neural Information Processing Systems

Visual Reinforcement Learning (RL) agents trained on limited views face significant challenges in generalizing their learned abilities to unseen views. This inherent difficulty is known as the problem of view generalization. In this work, we systematically categorize this fundamental problem into four distinct and highly challenging scenarios that closely resemble real-world situations. Subsequently, we propose a straightforward yet effective approach to enable successful adaptation of visual Model-based policies for View generalization (MoVie) during test time, without any need for explicit reward signals and any modification during training time. Our method demonstrates substantial advancements across all four scenarios encompassing a total of 18 tasks sourced from DMControl, xArm, and Adroit, with a relative improvement of 33%, 86%, and 152% respectively. The superior results highlight the immense potential of our approach for real-world robotics applications. Code and videos are available at yangsizhe.github.io/MoVie.