AITopics | Chen, Xingyan

Collaborating Authors

Chen, Xingyan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models

Chen, Zhongpu, Liu, Yinfeng, Shi, Long, Wang, Zhi-Jie, Chen, Xingyan, Zhao, Yu, Ren, Fuji

arXiv.org Artificial IntelligenceJan-24-2025

Large language models (LLMs) are expected to offer structured Markdown responses for the sake of readability in web chatbots (e.g., ChatGPT). Although there are a myriad of metrics to evaluate LLMs, they fail to evaluate the readability from the view of output content structure. To this end, we focus on an overlooked yet important metric -- Markdown Awareness, which directly impacts the readability and structure of the content generated by these language models. In this paper, we introduce MDEval, a comprehensive benchmark to assess Markdown Awareness for LLMs, by constructing a dataset with 20K instances covering 10 subjects in English and Chinese. Unlike traditional model-based evaluations, MDEval provides excellent interpretability by combining model-based generation tasks and statistical methods. Our results demonstrate that MDEval achieves a Spearman correlation of 0.791 and an accuracy of 84.1% with human, outperforming existing methods by a large margin. Extensive experimental results also show that through fine-tuning over our proposed dataset, less performant open-source models are able to achieve comparable performance to GPT-4o in terms of Markdown Awareness. To ensure reproducibility and transparency, MDEval is open sourced at https://github.com/SWUFE-DB-Group/MDEval-Benchmark.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.15

Country:

Asia > China > Sichuan Province (0.14)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Graph Dimension Attention Networks for Enterprise Credit Assessment

Wei, Shaopeng, Egressy, Beni, Chen, Xingyan, Zhao, Yu, Zhuang, Fuzhen, Wattenhofer, Roger, Kou, Gang

arXiv.org Artificial IntelligenceJul-16-2024

Enterprise credit assessment is critical for evaluating financial risk, and Graph Neural Networks (GNNs), with their advanced capability to model inter-entity relationships, are a natural tool to get a deeper understanding of these financial networks. However, existing GNN-based methodologies predominantly emphasize entity-level attention mechanisms for contagion risk aggregation, often overlooking the heterogeneous importance of different feature dimensions, thus falling short in adequately modeling credit risk levels. To address this issue, we propose a novel architecture named Graph Dimension Attention Network (GDAN), which incorporates a dimension-level attention mechanism to capture fine-grained risk-related characteristics. Furthermore, we explore the interpretability of the GNN-based method in financial scenarios and propose a simple but effective data-centric explainer for GDAN, called GDAN-DistShift. DistShift provides edge-level interpretability by quantifying distribution shifts during the message-passing process. Moreover, we collected a real-world, multi-source Enterprise Credit Assessment Dataset (ECAD) and have made it accessible to the research community since high-quality datasets are lacking in this field. Extensive experiments conducted on ECAD demonstrate the effectiveness of our methods. In addition, we ran GDAN on the well-known datasets SMEsD and DBLP, also with excellent results.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2407.11615

Country:

Asia > China (0.29)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Banking & Finance > Credit (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Towards Optimal Customized Architecture for Heterogeneous Federated Learning with Contrastive Cloud-Edge Model Decoupling

Chen, Xingyan, Du, Tian, Wang, Mu, Gu, Tiancheng, Zhao, Yu, Kou, Gang, Xu, Changqiao, Wu, Dapeng Oliver

arXiv.org Artificial IntelligenceMar-4-2024

Federated learning, as a promising distributed learning paradigm, enables collaborative training of a global model across multiple network edge clients without the need for central data collecting. However, the heterogeneity of edge data distribution drags the model towards the local minima, which can be distant from the global optimum. Such heterogeneity often leads to slow convergence and substantial communication overhead. To address these issues, we propose a novel federated learning framework called FedCMD, a model decoupling tailored to the Cloud-edge supported federated learning that separates deep neural networks into a body for capturing shared representations in Cloud and a personalized head for migrating data heterogeneity. Our motivation is that, by the deep investigation of the performance of selecting different neural network layers as the personalized head, we found rigidly assigning the last layer as the personalized head in current studies is not always optimal. Instead, it is necessary to dynamically select the personalized layer that maximizes the training performance by taking the representation difference between neighbor layers into account. To find the optimal personalized layer, we utilize the low-dimensional representation of each layer to contrast feature distribution transfer and introduce a Wasserstein-based layer selection method, aimed at identifying the best-match layer for personalization. Additionally, a weighted global aggregation algorithm is proposed based on the selected personalized layer for the practical application of FedCMD. Extensive experiments on ten benchmarks demonstrate the efficiency and superior performance of our solution compared with nine state-of-the-art solutions. All code and results are available at https://github.com/elegy112138/FedCMD.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2403.0236

Country:

Asia > China > Sichuan Province (0.14)
North America > United States > Florida > Alachua County > Gainesville (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Graph Learning and Its Advancements on Large Language Models: A Holistic Survey

Wei, Shaopeng, Zhao, Yu, Chen, Xingyan, Li, Qing, Zhuang, Fuzhen, Liu, Ji, Ren, Fuji, Kou, Gang

arXiv.org Artificial IntelligenceNov-18-2023

Graph learning is a prevalent domain that endeavors to learn the intricate relationships among nodes and the topological structure of graphs. Over the years, graph learning has transcended from graph theory to graph data mining. With the advent of representation learning, it has attained remarkable performance in diverse scenarios. Owing to its extensive application prospects, graph learning attracts copious attention. While some researchers have accomplished impressive surveys on graph learning, they failed to connect related objectives, methods, and applications in a more coherent way. As a result, they did not encompass current ample scenarios and challenging problems due to the rapid expansion of graph learning. Particularly, large language models have recently had a disruptive effect on human life, but they also show relative weakness in structured scenarios. The question of how to make these models more powerful with graph learning remains open. Our survey focuses on the most recent advancements in integrating graph learning with pre-trained language models, specifically emphasizing their application within the domain of large language models. Different from previous surveys on graph learning, we provide a holistic review that analyzes current works from the perspective of graph structure, and discusses the latest applications, trends, and challenges in graph learning. Specifically, we commence by proposing a taxonomy and then summarize the methods employed in graph learning. We then provide a detailed elucidation of mainstream applications. Finally, we propose future directions.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2212.08966

Country:

Asia (1.00)
North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology > Services (0.68)
Education (0.67)
Information Technology > Security & Privacy (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Taming Gradient Variance in Federated Learning with Networked Control Variates

Chen, Xingyan, Liu, Yaling, Du, Huaming, Wang, Mu, Zhao, Yu

arXiv.org Artificial IntelligenceOct-26-2023

Federated learning, a decentralized approach to machine learning, faces significant challenges such as extensive communication overheads, slow convergence, and unstable improvements. These challenges primarily stem from the gradient variance due to heterogeneous client data distributions. To address this, we introduce a novel Networked Control Variates (FedNCV) framework for Federated Learning. We adopt the REINFORCE Leave-One-Out (RLOO) as a fundamental control variate unit in the FedNCV framework, implemented at both client and server levels. At the client level, the RLOO control variate is employed to optimize local gradient updates, mitigating the variance introduced by data samples. Once relayed to the server, the RLOO-based estimator further provides an unbiased and low-variance aggregated gradient, leading to robust global updates. This dual-side application is formalized as a linear combination of composite control variates. We provide a mathematical expression capturing this integration of double control variates within FedNCV and present three theoretical results with corresponding proofs. This unique dual structure equips FedNCV to address data heterogeneity and scalability issues, thus potentially paving the way for large-scale applications. Moreover, we tested FedNCV on six diverse datasets under a Dirichlet distribution with {\alpha} = 0.1, and benchmarked its performance against six SOTA methods, demonstrating its superiority.

artificial intelligence, control variate, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2310.172

Country: Asia > China (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

ESIE-BERT: Enriching Sub-words Information Explicitly with BERT for Joint Intent Classification and SlotFilling

Guo, Yu, Xie, Zhilong, Chen, Xingyan, Chen, Huangen, Wang, Leilei, Du, Huaming, Wei, Shaopeng, Zhao, Yu, Li, Qing, Wu, Gang

arXiv.org Artificial IntelligenceFeb-2-2023

Natural language understanding (NLU) has two core tasks: intent classification and slot filling. The success of pre-training language models resulted in a significant breakthrough in the two tasks. One of the promising solutions called BERT can jointly optimize the two tasks. We note that BERT-based models convert each complex token into multiple sub-tokens by wordpiece algorithm, which generates a mismatch between the lengths of the tokens and the labels. This leads to BERT-based models do not do well in label prediction which limits model performance improvement. Many existing models can be compatible with this issue but some hidden semantic information is discarded in the fine-tuning process. We address the problem by introducing a novel joint method on top of BERT which explicitly models the multiple sub-tokens features after wordpiece tokenization, thereby contributing to the two tasks. Our method can well extract the contextual features from complex tokens by the proposed sub-words attention adapter (SAA), which preserves overall utterance information. Additionally, we propose an intent attention adapter (IAA) to obtain the full sentence features to aid users to predict intent. Experimental results confirm that our proposed model is significantly improved on two public benchmark datasets. In particular, the slot filling F1 score is improved from 96.1 to 98.2 (2.1% absolute) on the Airline Travel Information Systems (ATIS) dataset.

information, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2211.14829

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Stock Movement Prediction Based on Bi-typed and Hybrid-relational Market Knowledge Graph via Dual Attention Networks

Zhao, Yu, Du, Huaming, Liu, Ying, Wei, Shaopeng, Chen, Xingyan, Feng, Huali, Shuai, Qinghong, Li, Qing, Zhuang, Fuzhen, Kou, Gang

arXiv.org Artificial IntelligenceJan-11-2022

Stock Movement Prediction (SMP) aims at predicting listed companies' stock future price trend, which is a challenging task due to the volatile nature of financial markets. Recent financial studies show that the momentum spillover effect plays a significant role in stock fluctuation. However, previous studies typically only learn the simple connection information among related companies, which inevitably fail to model complex relations of listed companies in the real financial market. To address this issue, we first construct a more comprehensive Market Knowledge Graph (MKG) which contains bi-typed entities including listed companies and their associated executives, and hybrid-relations including the explicit relations and implicit relations. Afterward, we propose DanSmp, a novel Dual Attention Networks to learn the momentum spillover signals based upon the constructed MKG for stock prediction. The empirical experiments on our constructed datasets against nine SOTA baselines demonstrate that the proposed DanSmp is capable of improving stock prediction with the constructed MKG.

machine learning, natural language, relation, (18 more...)

arXiv.org Artificial Intelligence

2201.04965

Country:

Asia > China (0.48)
North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dual Hierarchical Attention Networks for Bi-typed Heterogeneous Graph Learning

Zhao, Yu, Wei, Shaopeng, Du, Huaming, Chen, Xingyan, Li, Qing, Zhuang, Fuzhen, Liu, Ji, Kou, Gang

arXiv.org Artificial IntelligenceDec-24-2021

Abstract--Bi-type multi-relational heterogeneous graph (BMHG) is one of the most common graphs in practice, for example, academic networks, e-commerce user behavior graph and enterprise knowledge graph. It is a critical and challenge problem on how to learn the numerical representation for each node to characterize subtle structures. However, most previous studies treat all node relations in BMHG as the same class of relation without distinguishing the different characteristics between the intra-class relations and inter-class relations of the bi-typed nodes, causing the loss of significant structure information. To address this issue, we propose a novel Dual Hierarchical Attention Networks (DHAN) based on the bi-typed multi-relational heterogeneous graphs to learn comprehensive node representations with the intra-class and inter-class attention-based encoder under a hierarchical mechanism. Moreover, to sufficiently model node multi-relational information in BMHG, we adopt a newly proposed hierarchical mechanism.

data mining, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2112.13078

Country:

Asia > China (0.95)
North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre: Research Report (1.00)

Industry:

Education > Educational Setting (0.46)
Information Technology > Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback