AITopics | Qin, Zhiwei

Collaborating Authors

Qin, Zhiwei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sim2Rec: A Simulator-based Decision-making Approach to Optimize Real-World Long-term User Engagement in Sequential Recommender Systems

Chen, Xiong-Hui, He, Bowei, Yu, Yang, Li, Qingyang, Qin, Zhiwei, Shang, Wenjie, Ye, Jieping, Ma, Chen

arXiv.org Artificial IntelligenceMay-3-2023

Long-term user engagement (LTE) optimization in sequential recommender systems (SRS) is shown to be suited by reinforcement learning (RL) which finds a policy to maximize long-term rewards. Meanwhile, RL has its shortcomings, particularly requiring a large number of online samples for exploration, which is risky in real-world applications. One of the appealing ways to avoid the risk is to build a simulator and learn the optimal recommendation policy in the simulator. In LTE optimization, the simulator is to simulate multiple users' daily feedback for given recommendations. However, building a user simulator with no reality-gap, i.e., can predict user's feedback exactly, is unrealistic because the users' reaction patterns are complex and historical logs for each user are limited, which might mislead the simulator-based recommendation policy. In this paper, we present a practical simulator-based recommender policy training approach, Simulation-to-Recommendation (Sim2Rec) to handle the reality-gap problem for LTE optimization. Specifically, Sim2Rec introduces a simulator set to generate various possibilities of user behavior patterns, then trains an environment-parameter extractor to recognize users' behavior patterns in the simulators. Finally, a context-aware policy is trained to make the optimal decisions on all of the variants of the users based on the inferred environment-parameters. The policy is transferable to unseen environments (e.g., the real world) directly as it has learned to recognize all various user behavior patterns and to make the correct decisions based on the inferred environment-parameters. Experiments are conducted in synthetic environments and a real-world large-scale ride-hailing platform, DidiChuxing. The results show that Sim2Rec achieves significant performance improvement, and produces robust recommendations in unseen environments.

artificial intelligence, machine learning, simulator, (19 more...)

arXiv.org Artificial Intelligence

2305.04832

Country:

North America > United States (1.00)
Asia (0.93)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Unified Representation Framework for Rideshare Marketplace Equilibrium and Efficiency

Chin, Alex, Qin, Zhiwei

arXiv.org Artificial IntelligenceFeb-28-2023

Ridesharing platforms are a type of two-sided marketplace where ``supply-demand balance'' is critical for market efficiency and yet is complex to define and analyze. We present a unified analytical framework based on the graph-based equilibrium metric (GEM) for quantifying the supply-demand spatiotemporal state and efficiency of a ridesharing marketplace. GEM was developed as a generalized Wasserstein distance between the supply and demand distributions in a ridesharing market and has been used as an evaluation metric for algorithms expected to improve supply-demand alignment. Building upon GEM, we develop SD-GEM, a dual-perspective (supply- and demand-side) representation of rideshare market equilibrium. We show that there are often disparities between the two views and examine how this dual-view leads to the notion of market efficiency, in which we propose novel statistical tests for capturing improvement and explaining the underlying driving factors.

data mining, machine learning, rideshare marketplace equilibrium, (10 more...)

arXiv.org Artificial Intelligence

2302.14358

Country: North America > United States (0.46)

Genre: Research Report > Experimental Study (0.46)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.69)

Add feedback

A Deep Value-network Based Approach for Multi-Driver Order Dispatching

Tang, Xiaocheng, Qin, Zhiwei, Zhang, Fan, Wang, Zhaodong, Xu, Zhe, Ma, Yintai, Zhu, Hongtu, Ye, Jieping

arXiv.org Artificial IntelligenceJun-8-2021

Recent works on ride-sharing order dispatching have highlighted the importance of taking into account both the spatial and temporal dynamics in the dispatching process for improving the transportation system efficiency. At the same time, deep reinforcement learning has advanced to the point where it achieves superhuman performance in a number of fields. In this work, we propose a deep reinforcement learning based solution for order dispatching and we conduct large scale online A/B tests on DiDi's ride-dispatching platform to show that the proposed method achieves significant improvement on both total driver income and user experience related metrics. In particular, we model the ride dispatching problem as a Semi Markov Decision Process to account for the temporal aspect of the dispatching actions. To improve the stability of the value iteration with nonlinear function approximators like neural networks, we propose Cerebellar Value Networks (CVNet) with a novel distributed state representation layer. We further derive a regularized policy evaluation scheme for CVNet that penalizes large Lipschitz constant of the value network for additional robustness against adversarial perturbation and noises. Finally, we adapt various transfer learning methods to CVNet for increased learning adaptability and efficiency across multiple cities. We conduct extensive offline simulations based on real dispatching data as well as online AB tests through the DiDi's platform. Results show that CVNet consistently outperforms other recently proposed dispatching methods. We finally show that the performance can be further improved through the efficient use of transfer learning.

cvnet, ground transportation, neural network, (19 more...)

arXiv.org Artificial Intelligence

2106.04493

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Value Function is All You Need: A Unified Learning Framework for Ride Hailing Platforms

Tang, Xiaocheng, Zhang, Fan, Qin, Zhiwei, Wang, Yansheng, Shi, Dingyuan, Song, Bingchen, Tong, Yongxin, Zhu, Hongtu, Ye, Jieping

arXiv.org Artificial IntelligenceJun-4-2021

Large ride-hailing platforms, such as DiDi, Uber and Lyft, connect tens of thousands of vehicles in a city to millions of ride demands throughout the day, providing great promises for improving transportation efficiency through the tasks of order dispatching and vehicle repositioning. Existing studies, however, usually consider the two tasks in simplified settings that hardly address the complex interactions between the two, the real-time fluctuations between supply and demand, and the necessary coordinations due to the large-scale nature of the problem. In this paper we propose a unified value-based dynamic learning framework (V1D3) for tackling both tasks. At the center of the framework is a globally shared value function that is updated continuously using online experiences generated from real-time platform transactions. To improve the sample-efficiency and the robustness, we further propose a novel periodic ensemble method combining the fast online learning with a large-scale offline training scheme that leverages the abundant historical driver trajectory data. This allows the proposed framework to adapt quickly to the highly dynamic environment, to generalize robustly to recurrent patterns and to drive implicit coordinations among the population of managed vehicles. Extensive experiments based on real-world datasets show considerably improvements over other recently proposed methods on both tasks. Particularly, V1D3 outperforms the first prize winners of both dispatching and repositioning tracks in the KDD Cup 2020 RL competition, achieving state-of-the-art results on improving both total driver income and user experience related metrics.

computer based training, educational technology, value function, (22 more...)

arXiv.org Artificial Intelligence

2105.08791

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre:

Research Report (1.00)
Personal > Honors > Award (0.34)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement Learning for Ridesharing: A Survey

Qin, Zhiwei, Zhu, Hongtu, Ye, Jieping

arXiv.org Artificial IntelligenceMay-3-2021

In this paper, we present a comprehensive, in-depth survey of the literature on reinforcement learning approaches to ridesharing problems. Papers on the topics of rideshare matching, vehicle repositioning, ride-pooling, and dynamic pricing are covered. Popular data sets and open simulation environments are also introduced. Subsequently, we discuss a number of challenges and opportunities for reinforcement learning research on this important domain.

ground transportation, international conference, survey article, (20 more...)

arXiv.org Artificial Intelligence

2105.01099

Country:

North America > United States > California (0.28)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre:

Overview (0.66)
Research Report (0.50)

Industry:

Transportation > Passenger (1.00)
Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Real-world Ride-hailing Vehicle Repositioning using Deep Reinforcement Learning

Jiao, Yan, Tang, Xiaocheng, Qin, Zhiwei, Li, Shuaiji, Zhang, Fan, Zhu, Hongtu, Ye, Jieping

arXiv.org Artificial IntelligenceMar-8-2021

We present a new practical framework based on deep reinforcement learning and decision-time planning for real-world vehicle repositioning on ride-hailing (a type of mobility-on-demand, MoD) platforms. Our approach learns the spatiotemporal state-value function using a batch training algorithm with deep value networks. The optimal repositioning action is generated on-demand through value-based policy search, which combines planning and bootstrapping with the value networks. For the large-fleet problems, we develop several algorithmic features that we incorporate into our framework and that we demonstrate to induce coordination among the algorithmically-guided vehicles. We benchmark our algorithm with baselines in a ride-hailing simulation environment to demonstrate its superiority in improving income efficiency meausred by income-per-hour. We have also designed and run a real-world experiment program with regular drivers on a major ride-hailing platform. We have observed significantly positive results on key metrics comparing our method with experienced drivers who performed idle-time repositioning based on their own expertise.

artificial intelligence, ground transportation, repositioning, (20 more...)

arXiv.org Artificial Intelligence

2103.04555

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report > Experimental Study (0.69)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Value-based Bayesian Meta-reinforcement Learning and Traffic Signal Control

Zou, Yayi, Qin, Zhiwei

arXiv.org Machine LearningSep-30-2020

Reinforcement learning methods for traffic signal control has gained increasing interests recently and achieved better performances compared with traditional transportation methods. However, reinforcement learning based methods usually requires heavy training data and computational resources which largely limit its application in real-world traffic signal control. This makes meta-learning, which enables data-efficient and fast-adaptation training by leveraging the knowledge of previous learning experiences, catches attentions in traffic signal control. In this paper, we propose a novel value-based Bayesian meta-reinforcement learning framework BM-DQN to robustly speed up the learning process in new scenarios by utilizing well-trained prior knowledge learned from existing scenarios. This framework based on our proposed fast-adaptation variation to Gradient-EM Bayesian Meta-learning and the fast update advantage of DQN, which allows fast adaptation to new scenarios with continual learning ability and robustness to uncertainty. The experiments on 2D navigation and traffic signal control show that our proposed framework adapts more quickly and robustly in new scenarios than previous methods, and specifically, much better continual learning ability in heterogeneous scenarios.

bayesian inference, ground transportation, intersection, (18 more...)

arXiv.org Machine Learning

2010.00163

Country:

Asia > China (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Similarity Kernel and Clustering via Random Projection Forests

Yan, Donghui, Gu, Songxiang, Xu, Ying, Qin, Zhiwei

arXiv.org Machine LearningAug-27-2019

Similarity plays a fundamental role in many areas, including data mining, machine learning, statistics and various applied domains. Inspired by the success of ensemble methods and the flexibility of trees, we propose to learn a similarity kernel called rpf-kernel through random projection forests (rpForests). Our theoretical analysis reveals a highly desirable property of rpf-kernel: far-away (dissimilar) points have a low similarity value while nearby (similar) points would have a high similarity}, and the similarities have a native interpretation as the probability of points remaining in the same leaf nodes during the growth of rpForests. The learned rpf-kernel leads to an effective clustering algorithm--rpfCluster. On a wide variety of real and benchmark datasets, rpfCluster compares favorably to K-means clustering, spectral clustering and a state-of-the-art clustering ensemble algorithm--Cluster Forests. Our approach is simple to implement and readily adapt to the geometry of the underlying data. Given its desirable theoretical property and competitive empirical performance when applied to clustering, we expect rpf-kernel to be applicable to many problems of an unsupervised nature or as a regularizer in some supervised or weakly supervised settings.

health & medicine, similarity kernel, survey article, (20 more...)

arXiv.org Machine Learning

1908.10506

Country:

North America > United States > Massachusetts > Bristol County > Dartmouth (0.14)
North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback

Environment Reconstruction with Hidden Confounders for Reinforcement Learning based Recommendation

Shang, Wenjie, Yu, Yang, Li, Qingyang, Qin, Zhiwei, Meng, Yiping, Ye, Jieping

arXiv.org Artificial IntelligenceJul-12-2019

Reinforcement learning aims at searching the best policy model for decision making, and has been shown powerful for sequential recommendations. The training of the policy by reinforcement learning, however, is placed in an environment. In many real-world applications, however, the policy training in the real environment can cause an unbearable cost, due to the exploration in the environment. Environment reconstruction from the past data is thus an appealing way to release the power of reinforcement learning in these applications. The reconstruction of the environment is, basically, to extract the casual effect model from the data. However, real-world applications are often too complex to offer fully observable environment information. Therefore, quite possibly there are unobserved confounding variables lying behind the data. The hidden confounder can obstruct an effective reconstruction of the environment. In this paper, by treating the hidden confounder as a hidden policy, we propose a deconfounded multi-agent environment reconstruction (DEMER) approach in order to learn the environment together with the hidden confounder. DEMER adopts a multi-agent generative adversarial imitation learning framework. It proposes to introduce the confounder embedded policy, and use the compatible discriminator for training the policies. We then apply DEMER in an application of driver program recommendation. We firstly use an artificial driver program recommendation environment, abstracted from the real application, to verify and analyze the effectiveness of DEMER. We then test DEMER in the real application of Didi Chuxing. Experiment results show that DEMER can effectively reconstruct the hidden confounder, and thus can build the environment better. DEMER also derives a recommendation policy with a significantly improved performance in the test phase of the real application.

artificial intelligence, confounder, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1907.06584

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Cost-sensitive Selection of Variables by Ensemble of Model Sequences

Yan, Donghui, Qin, Zhiwei, Gu, Songxiang, Xu, Haiping, Shao, Ming

arXiv.org Machine LearningJan-2-2019

Many applications require the collection of data on different variables or measurements overa number of system performance metrics. For example, some cyber systems rely on scanning various system metrics to detect or to predict potential cyber intrusions or threats. In the maintenance of airplanes or major factorymachinery, measurements of different system components and their usage statistics are collected to determine when a maintenance is required. In medical diagnosis, a patient may be asked to take various medical tests, such 1 as on blood pressure, cholesterol level, heart rates and so on, so that the doctor coulddetermine if the patient has a certain disease. In the development of an e-commerce product that predicts the click or purchase of a product at an e-commerce website, many data related to a user's shopping behavior will be collected, and often extra data relevant to the product or the user's shopping behavior are purchased from a third-party vendor etc. The data collected on various measures need to be combined, and if cost is a concern, a subset of measures need to be selected to satisfy the budget constraint. The problem of combining measures for a target application can be formulated as follows.

model schedule, optimization problem, vascular disease, (24 more...)

arXiv.org Machine Learning

1901.00456

Country:

North America > United States > Massachusetts > Bristol County > Dartmouth (0.14)
North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Services > e-Commerce Services (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.88)
Health & Medicine > Diagnostic Medicine (0.88)
Health & Medicine > Therapeutic Area > Endocrinology (0.68)

Technology:

Information Technology > e-Commerce (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback