AITopics | Yang, Jing

Collaborating Authors

Yang, Jing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Non-asymptotic Convergence of Training Transformers for Next-token Prediction

Huang, Ruiquan, Liang, Yingbin, Yang, Jing

arXiv.org Machine LearningSep-29-2024

Transformers have achieved extraordinary success in modern machine learning due to their excellent ability to handle sequential data, especially in next-token prediction (NTP) tasks. However, the theoretical understanding of their performance in NTP is limited, with existing studies focusing mainly on asymptotic performance. This paper provides a fine-grained non-asymptotic analysis of the training dynamics of a one-layer transformer consisting of a self-attention module followed by a feed-forward layer. We first characterize the essential structural properties of training datasets for NTP using a mathematical framework based on partial orders. Then, we design a two-stage training algorithm, where the pre-processing stage for training the feed-forward layer and the main stage for training the attention layer exhibit fast convergence performance. Specifically, both layers converge sub-linearly to the direction of their corresponding max-margin solutions. We also show that the cross-entropy loss enjoys a linear convergence rate. Furthermore, we show that the trained transformer presents non-trivial prediction ability with dataset shift, which sheds light on the remarkable generalization performance of transformers. Our analysis technique involves the development of novel properties on the attention gradient and further in-depth analysis of how these properties contribute to the convergence of the training process. Our experiments further validate our theoretical findings.

large language model, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2409.17335

Country:

North America > United States > Pennsylvania (0.14)
North America > United States > Ohio (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Federated Online Prediction from Experts with Differential Privacy: Separations and Regret Speed-ups

Gao, Fengyu, Huang, Ruiquan, Yang, Jing

arXiv.org Machine LearningSep-27-2024

We study the problems of differentially private federated online prediction from experts against both stochastic adversaries and oblivious adversaries. We aim to minimize the average regret on $m$ clients working in parallel over time horizon $T$ with explicit differential privacy (DP) guarantees. With stochastic adversaries, we propose a Fed-DP-OPE-Stoch algorithm that achieves $\sqrt{m}$-fold speed-up of the per-client regret compared to the single-player counterparts under both pure DP and approximate DP constraints, while maintaining logarithmic communication costs. With oblivious adversaries, we establish non-trivial lower bounds indicating that collaboration among clients does not lead to regret speed-up with general oblivious adversaries. We then consider a special case of the oblivious adversaries setting, where there exists a low-loss expert. We design a new algorithm Fed-SVT and show that it achieves an $m$-fold regret speed-up under both pure DP and approximate DP constraints over the single-player counterparts. Our lower bound indicates that Fed-SVT is nearly optimal up to logarithmic factors. Experiments demonstrate the effectiveness of our proposed algorithms. To the best of our knowledge, this is the first work examining the differentially private online prediction from experts in the federated setting.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2409.19092

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (0.67)
Education > Educational Setting > Online (0.47)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Security & Privacy (0.67)
(2 more...)

Add feedback

TemPrompt: Multi-Task Prompt Learning for Temporal Relation Extraction in RAG-based Crowdsourcing Systems

Yang, Jing, Zhao, Yu, Yang, Linyao, Wang, Xiao, Chen, Long, Wang, Fei-Yue

arXiv.org Artificial IntelligenceJul-9-2024

Temporal relation extraction (TRE) aims to grasp the evolution of events or actions, and thus shape the workflow of associated tasks, so it holds promise in helping understand task requests initiated by requesters in crowdsourcing systems. However, existing methods still struggle with limited and unevenly distributed annotated data. Therefore, inspired by the abundant global knowledge stored within pre-trained language models (PLMs), we propose a multi-task prompt learning framework for TRE (TemPrompt), incorporating prompt tuning and contrastive learning to tackle these issues. To elicit more effective prompts for PLMs, we introduce a task-oriented prompt construction approach that thoroughly takes the myriad factors of TRE into consideration for automatic prompt generation. In addition, we design temporal event reasoning in the form of masked language modeling as auxiliary tasks to bolster the model's focus on events and temporal cues. The experimental results demonstrate that TemPrompt outperforms all compared baselines across the majority of metrics under both standard and few-shot settings. A case study on designing and manufacturing printed circuit boards is provided to validate its effectiveness in crowdsourcing scenarios.

machine learning, natural language, relation, (18 more...)

arXiv.org Artificial Intelligence

2406.14825

Country: Asia > China > Anhui Province (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.91)

Add feedback

Leveraging data-driven weather models for improving numerical weather prediction skill through large-scale spectral nudging

Husain, Syed Zahid, Separovic, Leo, Caron, Jean-François, Aider, Rabah, Buehner, Mark, Chamberland, Stéphane, Lapalme, Ervig, McTaggart-Cowan, Ron, Subich, Christopher, Vaillancourt, Paul, Yang, Jing, Zadra, Ayrton

arXiv.org Artificial IntelligenceJul-8-2024

Operational meteorological forecasting has long relied on physics-based numerical weather prediction (NWP) models. Recently, this landscape has been disrupted by the advent of data-driven artificial intelligence (AI)-based weather models, which offer tremendous computational performance and competitive forecasting skill. However, data-driven models for medium-range forecasting generally suffer from major limitations, including low effective resolution and a narrow range of predicted variables. This study illustrates the relative strengths and weaknesses of these competing paradigms using the GEM (Global Environmental Multiscale) and GraphCast models to represent physics-based and AI-based approaches, respectively. By analyzing global predictions from these two models against observations and analyses in both physical and spectral spaces, this study demonstrates that GraphCast-predicted large scales outperform GEM, particularly for longer lead times. Building on this insight, a hybrid NWP-AI system is proposed, wherein GEM-predicted large-scale state variables are spectrally nudged toward GraphCast predictions, while allowing GEM to freely generate fine-scale details critical for weather extremes. Results indicate that this hybrid approach is capable of leveraging the strengths of GraphCast to enhance the prediction skill of the GEM model. Importantly, trajectories of tropical cyclones are predicted with enhanced accuracy without significant changes in intensity. Furthermore, this new hybrid system ensures that meteorologists have access to a complete set of forecast variables, including those relevant for high-impact weather events.

artificial intelligence, machine learning, resolution, (21 more...)

arXiv.org Artificial Intelligence

2407.061

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.68)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Graph in Graph Neural Network

Wang, Jiongshu, Yang, Jing, Deng, Jiankang, Gunes, Hatice, Song, Siyang

arXiv.org Artificial IntelligenceJun-30-2024

Existing Graph Neural Networks (GNNs) are limited to process graphs each of whose vertices is represented by a vector or a single value, limited their representing capability to describe complex objects. In this paper, we propose the first GNN (called Graph in Graph Neural (GIG) Network) which can process graph-style data (called GIG sample) whose vertices are further represented by graphs. Given a set of graphs or a data sample whose components can be represented by a set of graphs (called multi-graph data sample), our GIG network starts with a GIG sample generation (GSG) module which encodes the input as a \textbf{GIG sample}, where each GIG vertex includes a graph. Then, a set of GIG hidden layers are stacked, with each consisting of: (1) a GIG vertex-level updating (GVU) module that individually updates the graph in every GIG vertex based on its internal information; and (2) a global-level GIG sample updating (GGU) module that updates graphs in all GIG vertices based on their relationships, making the updated GIG vertices become global context-aware. This way, both internal cues within the graph contained in each GIG vertex and the relationships among GIG vertices could be utilized for down-stream tasks. Experimental results demonstrate that our GIG network generalizes well for not only various generic graph analysis tasks but also real-world multi-graph data analysis (e.g., human skeleton video-based action recognition), which achieved the new state-of-the-art results on 13 out of 14 evaluated datasets. Our code is publicly available at https://github.com/wangjs96/Graph-in-Graph-Neural-Network.

artificial intelligence, machine learning, vertex, (18 more...)

arXiv.org Artificial Intelligence

2407.00696

Country: Europe > United Kingdom > England > Cambridgeshire (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Federated Representation Learning in the Under-Parameterized Regime

Liu, Renpu, Shen, Cong, Yang, Jing

arXiv.org Artificial IntelligenceJun-11-2024

In the development of machine learning (ML), the role of representation learning has become increasingly essential. It transforms raw data into meaningful features, reveals hidden patterns and insights in data, and facilitates efficient learning of various ML tasks such as meta-learning (Tripuraneni et al., 2021), multi-task learning (Wang et al., 2016a), and few-shot learning (Du et al., 2020). Recently, representation learning has been introduced to the federated learning (FL) framework to cope with the heterogeneous local datasets at participating clients (Liang et al., 2020). In the FL setting, it often assumes that all clients share a common representation, which works in conjunction with personalized local heads to realize personalized prediction while harnessing the collective training power (Arivazhagan et al., 2019; Collins et al., 2021; Zhong et al., 2022; Shen et al., 2023). Existing theoretical analysis of representation learning usually assumes the adopted model is overparameterized to almost fit the ground-truth model (Tripuraneni et al., 2021; Wang et al., 2016a). While this may be valid for expressive models like Deep Neural Networks (He et al., 2016; Liu et al., 2017) or Large Language Models (OpenAI, 2023; Touvron et al., 2023), it may be too restrictive for FL on resource-constrained devices, as adopting over-parameterized models in such a framework faces several significant challenges, as elaborated below.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2406.04596

Country: North America > United States > Virginia > Albemarle County > Charlottesville (0.14)

Genre: Research Report (1.00)

Industry: Information Technology (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RAG-based Crowdsourcing Task Decomposition via Masked Contrastive Learning with Prompts

Yang, Jing, Wang, Xiao, Zhao, Yu, Liu, Yuhang, Wang, Fei-Yue

arXiv.org Artificial IntelligenceJun-4-2024

Crowdsourcing is a critical technology in social manufacturing, which leverages an extensive and boundless reservoir of human resources to handle a wide array of complex tasks. The successful execution of these complex tasks relies on task decomposition (TD) and allocation, with the former being a prerequisite for the latter. Recently, pre-trained language models (PLMs)-based methods have garnered significant attention. However, they are constrained to handling straightforward common-sense tasks due to their inherent restrictions involving limited and difficult-to-update knowledge as well as the presence of hallucinations. To address these issues, we propose a retrieval-augmented generation-based crowdsourcing framework that reimagines TD as event detection from the perspective of natural language understanding. However, the existing detection methods fail to distinguish differences between event types and always depend on heuristic rules and external semantic analyzing tools. Therefore, we present a Prompt-Based Contrastive learning framework for TD (PBCT), which incorporates a prompt-based trigger detector to overcome dependence. Additionally, trigger-attentive sentinel and masked contrastive learning are introduced to provide varying attention to trigger and contextual features according to different event types. Experiment results demonstrate the competitiveness of our method in both supervised and zero-shot detection. A case study on printed circuit board manufacturing is showcased to validate its adaptability to unknown professional domains.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2406.06577

Country: Asia > China (0.69)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze

Xuan, Chunyu, Niu, Yazhe, Pu, Yuan, Hu, Shuai, Liu, Yu, Yang, Jing

arXiv.org Artificial IntelligenceMay-28-2024

Monte Carlo Tree Search (MCTS)-based algorithms, such as MuZero and its derivatives, have achieved widespread success in various decision-making domains. These algorithms employ the reanalyze process to enhance sample efficiency from stale data, albeit at the expense of significant wall-clock time consumption. To address this issue, we propose a general approach named ReZero to boost tree search operations for MCTS-based algorithms. Specifically, drawing inspiration from the one-armed bandit model, we reanalyze training samples through a backward-view reuse technique which obtains the value estimation of a certain child node in advance. To further adapt to this design, we periodically reanalyze the entire buffer instead of frequently reanalyzing the mini-batch. The synergy of these two designs can significantly reduce the search cost and meanwhile guarantee or even improve performance, simplifying both data collecting and reanalyzing. Experiments conducted on Atari environments and board games demonstrate that ReZero substantially improves training speed while maintaining high sample efficiency. The code is available as part of the LightZero benchmark at https://github.com/opendilab/LightZero.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2404.16364

Country: Asia > China (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.88)

Add feedback

AMCEN: An Attention Masking-based Contrastive Event Network for Two-stage Temporal Knowledge Graph Reasoning

Yang, Jing, Wang, Xiao, Wang, Yutong, Wang, Jiawei, Wang, Fei-Yue

arXiv.org Artificial IntelligenceMay-15-2024

Temporal knowledge graphs (TKGs) can effectively model the ever-evolving nature of real-world knowledge, and their completeness and enhancement can be achieved by reasoning new events from existing ones. However, reasoning accuracy is adversely impacted due to an imbalance between new and recurring events in the datasets. To achieve more accurate TKG reasoning, we propose an attention masking-based contrastive event network (AMCEN) with local-global temporal patterns for the two-stage prediction of future events. In the network, historical and non-historical attention mask vectors are designed to control the attention bias towards historical and non-historical entities, acting as the key to alleviating the imbalance. A local-global message-passing module is proposed to comprehensively consider and capture multi-hop structural dependencies and local-global temporal evolution for the in-depth exploration of latent impact factors of different event types. A contrastive event classifier is used to classify events more accurately by incorporating local-global temporal patterns into contrastive learning. Therefore, AMCEN refines the prediction scope with the results of the contrastive event classification, followed by utilizing attention masking-based decoders to finalize the specific outcomes. The results of our experiments on four benchmark datasets highlight the superiority of AMCEN. Especially, the considerable improvements in Hits@1 prove that AMCEN can make more precise predictions about future occurrences.

information, machine learning, temporal reasoning, (19 more...)

arXiv.org Artificial Intelligence

2405.10346

Country: Asia > China (0.68)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.63)

Add feedback

Visualizing Routes with AI-Discovered Street-View Patterns

Wu, Tsung Heng, Amiruzzaman, Md, Zhao, Ye, Bhati, Deepshikha, Yang, Jing

arXiv.org Artificial IntelligenceMar-30-2024

Street-level visual appearances play an important role in studying social systems, such as understanding the built environment, driving routes, and associated social and economic factors. It has not been integrated into a typical geographical visualization interface (e.g., map services) for planning driving routes. In this paper, we study this new visualization task with several new contributions. First, we experiment with a set of AI techniques and propose a solution of using semantic latent vectors for quantifying visual appearance features. Second, we calculate image similarities among a large set of street-view images and then discover spatial imagery patterns. Third, we integrate these discovered patterns into driving route planners with new visualization techniques. Finally, we present VivaRoutes, an interactive visualization prototype, to show how visualizations leveraged with these discovered patterns can help users effectively and interactively explore multiple routes. Furthermore, we conducted a user study to assess the usefulness and utility of VivaRoutes.

machine learning, natural language, street-view image, (19 more...)

arXiv.org Artificial Intelligence

2404.00431

Country: North America > United States > New York > New York County > Manhattan (0.14)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Transportation (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)

Technology:

Information Technology > Visualization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
(4 more...)

Add feedback