AITopics | Xu, Yuanyuan

Collaborating Authors

Xu, Yuanyuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unlocking Multi-Modal Potentials for Dynamic Text-Attributed Graph Representation

Xu, Yuanyuan, Zhang, Wenjie, Zhang, Ying, Lin, Xuemin, Xu, Xiwei

arXiv.org Artificial IntelligenceFeb-26-2025

Dynamic Text-Attributed Graphs (DyTAGs) are a novel graph paradigm that captures evolving temporal edges alongside rich textual attributes. A prior approach to representing DyTAGs leverages pre-trained language models to encode text attributes and subsequently integrates them into dynamic graph models. However, it follows edge-centric modeling, as in dynamic graph learning, which is limited in local structures and fails to exploit the unique characteristics of DyTAGs, leading to suboptimal performance. We observe that DyTAGs inherently comprise three distinct modalities-temporal, textual, and structural-often exhibiting dispersed or even orthogonal distributions, with the first two largely overlooked in existing research. Building on this insight, we propose MoMent, a model-agnostic multi-modal framework that can seamlessly integrate with dynamic graph models for structural modality learning. The core idea is to shift from edge-centric to node-centric modeling, fully leveraging three modalities for node representation. Specifically, MoMent presents non-shared node-centric encoders based on the attention mechanism to capture global temporal and semantic contexts from temporal and textual modalities, together with local structure learning, thus generating modality-specific tokens. To prevent disjoint latent space, we propose a symmetric alignment loss, an auxiliary objective that aligns temporal and textual tokens, ensuring global temporal-semantic consistency with a theoretical guarantee. Last, we design a lightweight adaptor to fuse these tokens, generating comprehensive and cohesive node representations. We theoretically demonstrate that MoMent enhances discriminative power over exclusive edge-centric modeling. Extensive experiments across seven datasets and two downstream tasks show that MoMent achieves up to 33.62% improvement against the baseline using four dynamic graph models.

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2502.19651

Country:

Asia > China (0.46)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Add feedback

UniDyG: A Unified and Effective Representation Learning Approach for Large Dynamic Graphs

Xu, Yuanyuan, Zhang, Wenjie, Lin, Xuemin, Zhang, Ying

arXiv.org Artificial IntelligenceFeb-22-2025

--Dynamic graphs, which capture time-evolving edges between nodes, are formulated in continuous-time or discrete-time dynamic graphs. They differ in temporal granularity: Continuous-Time Dynamic Graphs (CTDGs) exhibit rapid, localized changes, while Discrete-Time Dynamic Graphs (DTDGs) show gradual, global updates. This difference leads to isolated developments in representation learning for each type. T o advance dynamic graph representation learning, recent research attempts to design a unified model capable of handling both CTDGs and DTDGs, achieving promising results. However, it typically focuses on local dynamic propagation for temporal structure learning in the time domain, failing to accurately capture the underlying structural evolution associated with each temporal granularity and thus compromising model effectiveness. In addition, existing works-whether specific or unified-often overlook the issue of temporal noise, compromising the model's robustness. T o better model both types of dynamic graphs, we propose UniDyG, a unified and effective representation learning approach, which can scale to large dynamic graphs. Specifically, we first propose a novel Fourier Graph Attention (FGA T) mechanism that can model local and global structural correlations based on recent neighbors and complex-number selective aggregation, while theoretically ensuring consistent representations of dynamic graphs over time. Based on approximation theory, we demonstrate that FGA T is well-suited to capture the underlying structures in both CTDGs and DTDGs. We further enhance FGA T to resist temporal noise by designing an energy-gated unit, which adaptively filters out high-frequency noise according to the energy. Last, we leverage our proposed FGA T mechanisms for temporal structure learning and employ the frequency-enhanced linear function for node-level dynamic updates, facilitating the generation of high-quality temporal embeddings. Extensive experiments show that our UniDyG achieves an average improvement of 14. 4% over sixteen baselines across nine dynamic graphs while exhibiting superior robustness in noisy scenarios. YNAMIC graphs serve as a crucial data modality for representing time-evolving relationships (edges) between entities (nodes). Y uanyuan Xu and Wenjie Zhang are with the School of Computer Science and Engineering, The University of New South Wales, Sydney, NSW 2052, Australia (e-mail: yuanyuan.xu@unsw.edu.au; Xuemin Lin is with Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai 200052, china (e-mail: xuemin.lin@gmail.com). Ying Zhang is with the School of Statistics and Mathematics, School of Computer Science, Zhejiang Gongshang University, Hangzhou, Zhejiang 310018, China (e-mail: ying.zhang@zjgsu.edu.cn).

artificial intelligence, dynamic graph, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.16431

Country:

Oceania > Australia > New South Wales (0.54)
Asia > China > Shanghai > Shanghai (0.44)

Genre: Research Report (0.64)

Industry:

Information Technology (0.68)
Education (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Post-hoc Interpretability Illumination for Scientific Interaction Discovery

Zhang, Ling, Hou, Zhichao, Ji, Tingxiang, Xu, Yuanyuan, Li, Runze

arXiv.org Machine LearningDec-19-2024

Model interpretability and explainability have garnered substantial attention in recent years, particularly in decision-making applications. However, existing interpretability tools often fall short in delivering satisfactory performance due to limited capabilities or efficiency issues. To address these challenges, we propose a novel post-hoc method: Iterative Kings' Forests (iKF), designed to uncover complex multi-order interactions among variables. iKF iteratively selects the next most important variable, the "King", and constructs King's Forests by placing it at the root node of each tree to identify variables that interact with the "King". It then generates ranked short lists of important variables and interactions of varying orders. Additionally, iKF provides inference metrics to analyze the patterns of the selected interactions and classify them into one of three interaction types: Accompanied Interaction, Synergistic Interaction, and Hierarchical Interaction. Extensive experiments demonstrate the strong interpretive power of our proposed iKF, highlighting its great potential for explainable modeling and scientific discovery across diverse scientific fields.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Machine Learning

2412.16252

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning with Alignments: Tackling the Inter- and Intra-domain Shifts for Cross-multidomain Facial Expression Recognition

Yang, Yuxiang, Wen, Lu, Zeng, Xinyi, Xu, Yuanyuan, Wu, Xi, Zhou, Jiliu, Wang, Yan

arXiv.org Artificial IntelligenceJul-8-2024

Facial Expression Recognition (FER) holds significant importance in human-computer interactions. Existing cross-domain FER methods often transfer knowledge solely from a single labeled source domain to an unlabeled target domain, neglecting the comprehensive information across multiple sources. Nevertheless, cross-multidomain FER (CMFER) is very challenging for (i) the inherent inter-domain shifts across multiple domains and (ii) the intra-domain shifts stemming from the ambiguous expressions and low inter-class distinctions. In this paper, we propose a novel Learning with Alignments CMFER framework, named LA-CMFER, to handle both inter- and intra-domain shifts. Specifically, LA-CMFER is constructed with a global branch and a local branch to extract features from the full images and local subtle expressions, respectively. Based on this, LA-CMFER presents a dual-level inter-domain alignment method to force the model to prioritize hard-to-align samples in knowledge transfer at a sample level while gradually generating a well-clustered feature space with the guidance of class attributes at a cluster level, thus narrowing the inter-domain shifts. To address the intra-domain shifts, LA-CMFER introduces a multi-view intra-domain alignment method with a multi-view clustering consistency constraint where a prediction similarity matrix is built to pursue consistency between the global and local views, thus refining pseudo labels and eliminating latent noise. Extensive experiments on six benchmark datasets have validated the superiority of our LA-CMFER.

artificial intelligence, facial expression recognition, recognition, (15 more...)

arXiv.org Artificial Intelligence

2407.05688

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)

Add feedback

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Ding, Yiran, Zhang, Li Lyna, Zhang, Chengruidong, Xu, Yuanyuan, Shang, Ning, Xu, Jiahang, Yang, Fan, Yang, Mao

arXiv.org Artificial IntelligenceFeb-21-2024

Large context window is a desirable feature in large language models (LLMs). However, due to high fine-tuning costs, scarcity of long texts, and catastrophic values introduced by new token positions, current extended context windows are limited to around 128k tokens. This paper introduces LongRoPE that, for the first time, extends the context window of pre-trained LLMs to an impressive 2048k tokens, with up to only 1k fine-tuning steps at within 256k training lengths, while maintaining performance at the original short context window. This is achieved by three key innovations: (i) we identify and exploit two forms of non-uniformities in positional interpolation through an efficient search, providing a better initialization for fine-tuning and enabling an 8x extension in non-fine-tuning scenarios; (ii) we introduce a progressive extension strategy that first fine-tunes a 256k length LLM and then conducts a second positional interpolation on the fine-tuned extended LLM to achieve a 2048k context window; (iii) we readjust LongRoPE on 8k length to recover the short context window performance. Extensive experiments on LLaMA2 and Mistral across various tasks demonstrate the effectiveness of our method. Models extended via LongRoPE retain the original architecture with minor modifications to the positional embedding, and can reuse most pre-existing optimizations.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2402.13753

Country: Asia > China (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

CNS-Net: Conservative Novelty Synthesizing Network for Malware Recognition in an Open-set Scenario

Guo, Jingcai, Guo, Song, Ma, Shiheng, Sun, Yuxia, Xu, Yuanyuan

arXiv.org Artificial IntelligenceMay-2-2023

We study the challenging task of malware recognition on both known and novel unknown malware families, called malware open-set recognition (MOSR). Previous works usually assume the malware families are known to the classifier in a close-set scenario, i.e., testing families are the subset or at most identical to training families. However, novel unknown malware families frequently emerge in real-world applications, and as such, require to recognize malware instances in an open-set scenario, i.e., some unknown families are also included in the test-set, which has been rarely and non-thoroughly investigated in the cyber-security domain. One practical solution for MOSR may consider jointly classifying known and detecting unknown malware families by a single classifier (e.g., neural network) from the variance of the predicted probability distribution on known families. However, conventional well-trained classifiers usually tend to obtain overly high recognition probabilities in the outputs, especially when the instance feature distributions are similar to each other, e.g., unknown v.s. known malware families, and thus dramatically degrades the recognition on novel unknown malware families. In this paper, we propose a novel model that can conservatively synthesize malware instances to mimic unknown malware families and support a more robust training of the classifier. Moreover, we also build a new large-scale malware dataset, named MAL-100, to fill the gap of lacking large open-set malware benchmark dataset. Experimental results on two widely used malware datasets and our MAL-100 demonstrate the effectiveness of our model compared with other representative methods.

artificial intelligence, machine learning, malware family, (18 more...)

arXiv.org Artificial Intelligence

2305.01236

Country: Asia > China (0.68)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Add feedback

MDENet: Multi-modal Dual-embedding Networks for Malware Open-set Recognition

Guo, Jingcai, Xu, Yuanyuan, Xu, Wenchao, Zhan, Yufeng, Sun, Yuxia, Guo, Song

arXiv.org Artificial IntelligenceMay-2-2023

Malware open-set recognition (MOSR) aims at jointly classifying malware samples from known families and detect the ones from novel unknown families, respectively. Existing works mostly rely on a well-trained classifier considering the predicted probabilities of each known family with a threshold-based detection to achieve the MOSR. However, our observation reveals that the feature distributions of malware samples are extremely similar to each other even between known and unknown families. Thus the obtained classifier may produce overly high probabilities of testing unknown samples toward known families and degrade the model performance. In this paper, we propose the Multi-modal Dual-Embedding Networks, dubbed MDENet, to take advantage of comprehensive malware features (i.e., malware images and malware sentences) from different modalities to enhance the diversity of malware feature space, which is more representative and discriminative for down-stream recognition. Last, to further guarantee the open-set recognition, we dually embed the fused multi-modal representation into one primary space and an associated sub-space, i.e., discriminative and exclusive spaces, with contrastive sampling and rho-bounded enclosing sphere regularizations, which resort to classification and detection, respectively. Moreover, we also enrich our previously proposed large-scaled malware dataset MAL-100 with multi-modal characteristics and contribute an improved version dubbed MAL-100+. Experimental results on the widely used malware dataset Mailing and the proposed MAL-100+ demonstrate the effectiveness of our method.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.01245

Country: Asia > China (0.93)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback