AITopics | Yang, Shenghao

Collaborating Authors

Yang, Shenghao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles

Wang, Kuang, Li, Xianfei, Yang, Shenghao, Zhou, Li, Jiang, Feng, Li, Haizhou

arXiv.org Artificial IntelligenceFeb-27-2025

User simulators are crucial for replicating human interactions with dialogue systems, supporting both collaborative training and automatic evaluation, especially for large language models (LLMs). However, existing simulators often rely solely on text utterances, missing implicit user traits such as personality, speaking style, and goals. In contrast, persona-based methods lack generalizability, as they depend on predefined profiles of famous individuals or archetypes. To address these challenges, we propose User Simulator with implicit Profiles (USP), a framework that infers implicit user profiles from human-machine conversations and uses them to generate more personalized and realistic dialogues. We first develop an LLM-driven extractor with a comprehensive profile schema. Then, we refine the simulation through conditional supervised fine-tuning and reinforcement learning with cycle consistency, optimizing it at both the utterance and conversation levels. Finally, we adopt a diverse profile sampler to capture the distribution of real-world user profiles. Experimental results demonstrate that USP outperforms strong baselines in terms of authenticity and diversity while achieving comparable performance in consistency. Furthermore, dynamic multi-turn evaluations based on USP strongly align with mainstream benchmarks, demonstrating its effectiveness in real-world applications.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.18968

Country:

Asia > China (0.28)
North America > United States > Florida > Miami-Dade County > Miami (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Leisure & Entertainment (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Using Pre-trained LLMs for Multivariate Time Series Forecasting

Wolff, Malcolm L., Yang, Shenghao, Torkkola, Kari, Mahoney, Michael W.

arXiv.org Artificial IntelligenceJan-10-2025

Time series forecasting refers to a class of techniques for the prediction of events through a sequence of time, typically to inform strategic or tactical decision making. Going beyond strategic forecasting problems (e.g., those commonly-used historically in statistics and econometrics [1]), operational forecasting problems are increasingly-important. For example, at large internet retail companies, this includes demand forecasting for products at an online retailer, work force cohorts of a company in its locations, compute capacity needs per region and server type, etc.; in scientific machine learning, this includes prediction of extreme events in, e.g., climate and weather models; and so on. In particular, MQCNN [2] and MQTransformer [3] are stateof-the-art (SOTA) neural network (NN) based multivariate time series forecasting models that are used to predict future demand at the product level for hundreds of millions of products.

data mining, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2501.06386

Genre: Research Report > New Finding (0.68)

Industry:

Government > Military (0.54)
Retail > Online (0.34)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Positional Attention: Out-of-Distribution Generalization and Expressivity for Neural Algorithmic Reasoning

de Luca, Artur Back, Giapitzakis, George, Yang, Shenghao, Veličković, Petar, Fountoulakis, Kimon

arXiv.org Artificial IntelligenceOct-2-2024

Transformers [Vaswani et al., 2017] are versatile models used in various applications, including vision [Yuan et al., 2021, Khan et al., 2022, Dehghani et al., 2023] and natural language processing [Wei et al., 2022b, Touvron et al., 2023]. Their effectiveness in complex tasks is particularly notable in Large Language Models (LLMs) [Wang et al., 2018, Hendrycks et al., 2021], where they excel at generating coherent text and understanding context. This strong performance has led to an increased interest in understanding the Transformer architecture as a computational model capable of executing instructions and solving algorithmic reasoning problems. In this context, Pérez et al. [2021], Wei et al. [2022a] show that Transformers are Turing Complete, and Giannou et al. [2023], Back De Luca and Fountoulakis [2024], Yang et al. [2024] demonstrate that Transformers can effectively encode instructions to solve linear algebra and graphs problems. Additionally, it has been shown that Transformers can perform reasoning tasks using far fewer layers than the number of reasoning steps [Liu et al., 2023], indicating a connection between Transformers and parallel algorithms. To this end, Sanford et al. [2024] further demonstrates that Transformers can simulate the Massively Parallel Computation (MPC) model [Andoni et al., 2018], which is based on the MapReduce framework for large-scale data processing [Dean and Ghemawat, 2008]. Complementing this theoretical framework, empirical studies have demonstrated the capabilities of Transformers, among other models, in executing algorithms [Veličković and Blundell, 2021]. Notable applications include basic arithmetic [Lee et al., 2024], sorting [Tay et al., 2020, Yan et al., 2020], dynamic programming

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2410.01686

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Materials > Chemicals > Industrial Gases > Liquified Gas (0.46)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.46)
Energy > Oil & Gas > Midstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Local Graph Clustering with Noisy Labels

de Luca, Artur Back, Fountoulakis, Kimon, Yang, Shenghao

arXiv.org Machine LearningOct-12-2023

The growing interest in machine learning problems over graphs with additional node information such as texts, images, or labels has popularized methods that require the costly operation of processing the entire graph. Yet, little effort has been made to the development of fast local methods (i.e. without accessing the entire graph) that extract useful information from such data. To that end, we propose a study of local graph clustering using noisy node labels as a proxy for additional node information. In this setting, nodes receive initial binary labels based on cluster affiliation: 1 if they belong to the target cluster and 0 otherwise. Subsequently, a fraction of these labels is flipped. We investigate the benefits of incorporating noisy labels for local graph clustering. By constructing a weighted graph with such labels, we study the performance of graph diffusion-based local clustering method on both the original and the weighted graphs. From a theoretical perspective, we consider recovering an unknown target cluster with a single seed node in a random graph with independent noisy node labels. We provide sufficient conditions on the label noise under which, with high probability, using diffusion in the weighted graph yields a more accurate recovery of the target cluster. This approach proves more effective than using the given labels alone or using diffusion in the label-free original graph. Empirically, we show that reliable node labels can be obtained with just a few samples from an attributed graph. Moreover, utilizing these labels via diffusion in the weighted graph leads to significantly better local clustering performance across several real-world datasets, improving F1 scores by up to 13%.

artificial intelligence, graph, machine learning, (19 more...)

arXiv.org Machine Learning

2310.08031

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Polynomial Width is Sufficient for Set Representation with High-dimensional Features

Wang, Peihao, Yang, Shenghao, Li, Shu, Wang, Zhangyang, Li, Pan

arXiv.org Artificial IntelligenceJul-19-2023

Set representation has become ubiquitous in deep learning for modeling the inductive bias of neural networks that are insensitive to the input order. DeepSets is the most widely used neural network architecture for set representation. It involves embedding each set element into a latent space with dimension $L$, followed by a sum pooling to obtain a whole-set embedding, and finally mapping the whole-set embedding to the output. In this work, we investigate the impact of the dimension $L$ on the expressive power of DeepSets. Previous analyses either oversimplified high-dimensional features to be one-dimensional features or were limited to analytic activations, thereby diverging from practical use or resulting in $L$ that grows exponentially with the set size $N$ and feature dimension $D$. To investigate the minimal value of $L$ that achieves sufficient expressive power, we present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE). We demonstrate that $L$ being poly$(N, D)$ is sufficient for set representation using both embedding layers. We also provide a lower bound of $L$ for the LP embedding layer. Furthermore, we extend our results to permutation-equivariant set functions and the complex field.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2307.04001

Country: North America > United States (0.30)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Graph Attention Retrospective

Fountoulakis, Kimon, Levi, Amit, Yang, Shenghao, Baranwal, Aseem, Jagannath, Aukosh

arXiv.org Artificial IntelligenceMay-21-2023

Graph-based learning is a rapidly growing sub-field of machine learning with applications in social networks, citation networks, and bioinformatics. One of the most popular models is graph attention networks. They were introduced to allow a node to aggregate information from features of neighbor nodes in a non-uniform way, in contrast to simple graph convolution which does not distinguish the neighbors of a node. In this paper, we theoretically study the behaviour of graph attention networks. We prove multiple results on the performance of the graph attention mechanism for the problem of node classification for a contextual stochastic block model. Here, the node features are obtained from a mixture of Gaussians and the edges from a stochastic block model. We show that in an "easy" regime, where the distance between the means of the Gaussians is large enough, graph attention is able to distinguish inter-class from intra-class edges. Thus it maintains the weights of important edges and significantly reduces the weights of unimportant edges. Consequently, we show that this implies perfect node classification. In the "hard" regime, we show that every attention mechanism fails to distinguish intra-class from inter-class edges. In addition, we show that graph attention convolution cannot (almost) perfectly classify the nodes even if intra-class edges could be separated from inter-class edges. Beyond perfect node classification, we provide a positive result on graph attention's robustness against structural noise in the graph. In particular, our robustness result implies that graph attention can be strictly better than both the simple graph convolution and the best linear classifier of node features. We evaluate our theoretical results on synthetic and real-world data.

artificial intelligence, graph attention retrospective, machine learning

arXiv.org Artificial Intelligence

2202.1306

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.87)

Add feedback

Equivariant Hypergraph Diffusion Neural Operators

Wang, Peihao, Yang, Shenghao, Liu, Yunyu, Wang, Zhangyang, Li, Pan

arXiv.org Artificial IntelligenceFeb-14-2023

Hypergraph neural networks (HNNs) using neural networks to encode hypergraphs provide a promising way to model higher-order relations in data and further solve relevant prediction tasks built upon such higher-order relations. However, higher-order relations in practice contain complex patterns and are often highly irregular. So, it is often challenging to design an HNN that suffices to express those relations while keeping computational efficiency. Inspired by hypergraph diffusion algorithms, this work proposes a new HNN architecture named ED-HNN, which provably approximates any continuous equivariant hypergraph diffusion operators that can model a wide range of higher-order relations. ED-HNN can be implemented efficiently by combining star expansions of hypergraphs with standard message passing neural networks. ED-HNN further shows great superiority in processing heterophilic hypergraphs and constructing deep models. We evaluate ED-HNN for node classification on nine real-world hypergraph datasets. ED-HNN uniformly outperforms the best baselines over these nine datasets and achieves more than 2% in prediction accuracy over four datasets therein. Machine learning on graphs has recently attracted great attention in the community due to the ubiquitous graph-structured data and the associated inference and prediction problems (Zhu, 2005; Hamilton, 2020; Nickel et al., 2015). Current works primarily focus on graphs which can model only pairwise relations in data. Emerging research has shown that higher-order relations that involve more than two entities often reveal more significant information in many applications (Benson et al., 2021; Schaub et al., 2021; Battiston et al., 2020; Lambiotte et al., 2019; Lee et al., 2021). For example, higher-order network motifs build the fundamental blocks of many real-world networks (Mangan & Alon, 2003; Benson et al., 2016; Tsourakakis et al., 2017; Li et al., 2017; Li & Milenkovic, 2017). Session-based (multi-step) behaviors often indicate the preferences of web users in more precise ways (Xia et al., 2021; Wang et al., 2020; 2021; 2022). To capture these higher-order relations, hypergraphs provide a dedicated mathematical abstraction (Berge, 1984). However, learning algorithms on hypergraphs are still far underdeveloped as opposed to those on graphs.

artificial intelligence, hypergraph, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2207.0668

Country: North America > United States (0.92)

Genre:

Research Report (0.63)
Workflow (0.46)

Industry:

Government (0.46)
Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

$p$-Norm Flow Diffusion for Local Graph Clustering

Fountoulakis, Kimon, Wang, Di, Yang, Shenghao

arXiv.org Machine LearningJul-20-2020

Local graph clustering and the closely related seed set expansion problem are primitives on graphs that are central to a wide range of analytic and learning tasks such as local clustering, community detection, nodes ranking and feature inference. Prior work on local graph clustering mostly falls into two categories with numerical and combinatorial roots respectively. In this work, we draw inspiration from both fields and propose a family of convex optimization formulations based on the idea of diffusion with p-norm network flow for $p\in (1,\infty)$. In the context of local clustering, we characterize the optimal solutions for these optimization problems and show their usefulness in finding low conductance cuts around input seed set. In particular, we achieve quadratic approximation of conductance in the case of $p=2$ similar to the Cheeger-type bounds of spectral methods, constant factor approximation when $p\rightarrow\infty$ similar to max-flow based methods, and a smooth transition for general $p$ values in between. Thus, our optimization formulation can be viewed as bridging the numerical and combinatorial approaches, and we can achieve the best of both worlds in terms of speed and noise robustness. We show that the proposed problem can be solved in strongly local running time for $p\ge 2$ and conduct empirical evaluations on both synthetic and real-world graphs to illustrate our approach compares favorably with existing methods.

artificial intelligence, diffusion, optimization problem, (18 more...)

arXiv.org Machine Learning

2005.0981

Country: North America > United States > California (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Add feedback