AITopics | Ding, Rui

Collaborating Authors

Ding, Rui

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Demonstration of InsightPilot: An LLM-Empowered Automated Data Exploration System

Ma, Pingchuan, Ding, Rui, Wang, Shuai, Han, Shi, Zhang, Dongmei

arXiv.org Artificial IntelligenceNov-12-2023

Exploring data is crucial in data analysis, as it helps users understand and interpret the data more effectively. However, performing effective data exploration requires in-depth knowledge of the dataset and expertise in data analysis techniques. Not being familiar with either can create obstacles that make the process time-consuming and overwhelming for data analysts. To address this issue, we introduce InsightPilot, an LLM (Large Language Model)-based, automated data exploration system designed to simplify the data exploration process. InsightPilot automatically selects appropriate analysis intents, such as understanding, summarizing, and explaining. Then, these analysis intents are concretized by issuing corresponding intentional queries (IQueries) to create a meaningful and coherent exploration sequence. In brief, an IQuery is an abstraction and automation of data analysis operations, which mimics the approach of data analysts and simplifies the exploration process for users. By employing an LLM to iteratively collaborate with a state-of-the-art insight engine via IQueries, InsightPilot is effective in analyzing real-world datasets, enabling users to gain valuable insights through natural language inquiries. We demonstrate the effectiveness of InsightPilot in a case study, showing how it can help users gain valuable insights from their datasets.

insightpilot, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2304.00477

Genre: Research Report (0.64)

Industry:

Education (0.47)
Automobiles & Trucks > Manufacturer (0.30)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

FRGNN: Mitigating the Impact of Distribution Shift on Graph Neural Networks via Test-Time Feature Reconstruction

Ding, Rui, Yang, Jielong, Ji, Feng, Zhong, Xionghu, Xie, Linbo

arXiv.org Artificial IntelligenceOct-13-2023

Due to inappropriate sample selection and limited training data, a distribution shift often exists between the training and test sets. This shift can adversely affect the test performance of Graph Neural Networks (GNNs). Existing approaches mitigate this issue by either enhancing the robustness of GNNs to distribution shift or reducing the shift itself. However, both approaches necessitate retraining the model, which becomes unfeasible when the model structure and parameters are inaccessible. To address this challenge, we propose FR-GNN, a general framework for GNNs to conduct feature reconstruction. FRGNN constructs a mapping relationship between the output and input of a well-trained GNN to obtain class representative embeddings and then uses these embeddings to reconstruct the features of labeled nodes. These reconstructed features are then incorporated into the message passing mechanism of GNNs to influence the predictions of unlabeled nodes at test time. Notably, the reconstructed node features can be directly utilized for testing the well-trained model, effectively reducing the distribution shift and leading to improved test performance. This remarkable achievement is attained without any modifications to the model structure or parameters. We provide theoretical guarantees for the effectiveness of our framework. Furthermore, we conduct comprehensive experiments on various public datasets. The experimental results demonstrate the superior performance of FRGNN in comparison to multiple categories of baseline methods.

artificial intelligence, machine learning, node, (15 more...)

arXiv.org Artificial Intelligence

2308.09259

Genre: Research Report > New Finding (0.48)

Industry:

Education (0.55)
Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

XInsight: eXplainable Data Analysis Through The Lens of Causality

Ma, Pingchuan, Ding, Rui, Wang, Shuai, Han, Shi, Zhang, Dongmei

arXiv.org Artificial IntelligenceMay-30-2023

In light of the growing popularity of Exploratory Data Analysis (EDA), understanding the underlying causes of the knowledge acquired by EDA is crucial. However, it remains under-researched. This study promotes a transparent and explicable perspective on data analysis, called eXplainable Data Analysis (XDA). For this reason, we present XInsight, a general framework for XDA. XInsight provides data analysis with qualitative and quantitative explanations of causal and non-causal semantics. This way, it will significantly improve human understanding and confidence in the outcomes of data analysis, facilitating accurate data interpretation and decision making in the real world. XInsight is a three-module, end-to-end pipeline designed to extract causal graphs, translate causal primitives into XDA semantics, and quantify the quantitative contribution of each explanation to a data fact. XInsight uses a set of design concepts and optimizations to address the inherent difficulties associated with integrating causality into XDA. Experiments on synthetic and real-world datasets as well as a user study demonstrate the highly promising capabilities of XInsight.

explanation, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2207.12718

Country:

Asia > China > Hong Kong (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.31)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.45)

Add feedback

A Unified and Fast Interpretable Model for Predictive Analytics

Jiang, Yuanyuan, Ding, Rui, Qiao, Tianchi, Zhu, Yunan, Han, Shi, Zhang, Dongmei

arXiv.org Artificial IntelligenceMar-11-2023

Predictive analytics aims to build machine learning models to predict behavior patterns and use predictions to guide decision-making. Predictive analytics is human involved, thus the machine learning model is preferred to be interpretable. In literature, Generalized Additive Model (GAM) is a standard for interpretability. However, due to the one-to-many and many-to-one phenomena which appear commonly in real-world scenarios, existing GAMs have limitations to serve predictive analytics in terms of both accuracy and training efficiency. In this paper, we propose FXAM (Fast and eXplainable Additive Model), a unified and fast interpretable model for predictive analytics. FXAM extends GAM's modeling capability with a unified additive model for numerical, categorical, and temporal features. FXAM conducts a novel training procedure called Three-Stage Iteration (TSI). TSI corresponds to learning over numerical, categorical, and temporal features respectively. Each stage learns a local optimum by fixing the parameters of other stages. We design joint learning over categorical features and partial learning over temporal features to achieve high accuracy and training efficiency. We prove that TSI is guaranteed to converge to the global optimum. We further propose a set of optimization techniques to speed up FXAM's training algorithm to meet the needs of interactive analysis. Thorough evaluations conducted on diverse data sets verify that FXAM significantly outperforms existing GAMs in terms of training speed, and modeling categorical and temporal features. In terms of interpretability, we compare FXAM with the typical post-hoc approach XGBoost+SHAP on two real-world scenarios, which shows the superiority of FXAM's inherent interpretability for predictive analytics.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2111.08255

Country:

Asia > Middle East > UAE (0.14)
Asia > China > Jiangsu Province (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.46)
Banking & Finance (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

ML4C: Seeing Causality Through Latent Vicinity

Dai, Haoyue, Ding, Rui, Jiang, Yuanyuan, Han, Shi, Zhang, Dongmei

arXiv.org Machine LearningOct-1-2021

Supervised Causal Learning (SCL) aims to learn causal relations from observational data by accessing previously seen datasets associated with ground truth causal relations. This paper presents a first attempt at addressing a fundamental question: What are the benefits from supervision and how does it benefit? Starting from seeing that SCL is not better than random guessing if the learning target is non-identifiable a priori, we propose a two-phase paradigm for SCL by explicitly considering structure identifiability. Following this paradigm, we tackle the problem of SCL on discrete data and propose ML4C. The core of ML4C is a binary classifier with a novel learning target: it classifies whether an Unshielded Triple (UT) is a v-structure or not. Starting from an input dataset with the corresponding skeleton provided, ML4C orients each UT once it is classified as a v-structure. These v-structures are together used to construct the final output. To address the fundamental question of SCL, we propose a principled method for ML4C featurization: we exploit the vicinity of a given UT (i.e., the neighbors of UT in skeleton), and derive features by considering the conditional dependencies and structural entanglement within the vicinity. We further prove that ML4C is asymptotically perfect. Last but foremost, thorough experiments conducted on benchmark datasets demonstrate that ML4C remarkably outperforms other state-of-the-art algorithms in terms of accuracy, robustness, tolerance and transferability. In summary, ML4C shows promising results on validating the effectiveness of supervision for causal learning.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2110.00637

Country:

Asia (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Data Science > Data Mining (0.93)

Add feedback