AITopics | Chen, Jiahui

Collaborating Authors

Chen, Jiahui

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Augmented Conditioning Is Enough For Effective Training Image Generation

Chen, Jiahui, Zhang, Amy, Romero-Soriano, Adriana

arXiv.org Artificial IntelligenceFeb-6-2025

Image generation abilities of text-to-image diffusion models have significantly advanced, yielding highly photo-realistic images from descriptive text and increasing the viability of leveraging synthetic images to train computer vision models. To serve as effective training data, generated images must be highly realistic while also sufficiently diverse within the support of the target data distribution. Yet, state-of-the-art conditional image generation models have been primarily optimized for creative applications, prioritizing image realism and prompt adherence over conditional diversity. In this paper, we investigate how to improve the diversity of generated images with the goal of increasing their effectiveness to train downstream image classification models, without fine-tuning the image generation model. We find that conditioning the generation process on an augmented real image and text prompt produces generations that serve as effective synthetic datasets for downstream training. Conditioning on real training images contextualizes the generation process to produce images that are in-domain with the real image distribution, while data augmentations introduce visual diversity that improves the performance of the downstream classifier. We validate augmentation-conditioning on a total of five established long-tail and few-shot image classification benchmarks and show that leveraging augmentations to condition the generation process results in consistent improvements over the state-of-the-art on the long-tailed benchmark and remarkable gains in extreme few-shot regimes of the remaining four benchmarks. These results constitute an important step towards effectively leveraging synthetic data for downstream training.

artificial intelligence, diffusion model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.04475

Country: North America > Canada (0.14)

Genre: Research Report (0.65)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SimAD: A Simple Dissimilarity-based Approach for Time Series Anomaly Detection

Zhong, Zhijie, Yu, Zhiwen, Xi, Xing, Xu, Yue, Chen, Jiahui, Yang, Kaixiang

arXiv.org Artificial IntelligenceMay-18-2024

Despite the prevalence of reconstruction-based deep learning methods, time series anomaly detection remains challenging. Existing approaches often struggle with limited temporal contexts, inadequate representation of normal patterns, and flawed evaluation metrics, hindering their effectiveness in identifying aberrant behavior. To address these issues, we introduce $\textbf{{SimAD}}$, a $\textbf{{Sim}}$ple dissimilarity-based approach for time series $\textbf{{A}}$nomaly $\textbf{{D}}$etection. SimAD incorporates an advanced feature extractor adept at processing extended temporal windows, utilizes the EmbedPatch encoder to integrate normal behavioral patterns comprehensively, and introduces an innovative ContrastFusion module designed to accentuate distributional divergences between normal and abnormal data, thereby enhancing the robustness of anomaly discrimination. Additionally, we propose two robust evaluation metrics, UAff and NAff, addressing the limitations of existing metrics and demonstrating their reliability through theoretical and experimental analyses. Experiments across $\textbf{seven}$ diverse time series datasets demonstrate SimAD's superior performance compared to state-of-the-art methods, achieving relative improvements of $\textbf{19.85%}$ on F1, $\textbf{4.44%}$ on Aff-F1, $\textbf{77.79%}$ on NAff-F1, and $\textbf{9.69%}$ on AUC on six multivariate datasets. Code and pre-trained models are available at https://github.com/EmorZz1G/SimAD.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2405.11238

Country:

Asia > China (0.15)
North America > United States (0.14)
Asia > Malaysia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PepHarmony: A Multi-View Contrastive Learning Framework for Integrated Sequence and Structure-Based Peptide Encoding

Zhang, Ruochi, Wu, Haoran, Liu, Chang, Li, Huaping, Wu, Yuqian, Li, Kewei, Wang, Yifan, Deng, Yifan, Chen, Jiahui, Zhou, Fengfeng, Gao, Xin

arXiv.org Artificial IntelligenceJan-20-2024

Recent advances in protein language models have catalyzed significant progress in peptide sequence representation. Despite extensive exploration in this field, pre-trained models tailored for peptide-specific needs remain largely unaddressed due to the difficulty in capturing the complex and sometimes unstable structures of peptides. This study introduces a novel multi-view contrastive learning framework PepHarmony for the sequence-based peptide encoding task. PepHarmony innovatively combines both sequence- and structure-level information into a sequence-level encoding module through contrastive learning. We carefully select datasets from the Protein Data Bank (PDB) and AlphaFold database to encompass a broad spectrum of peptide sequences and structures. The experimental data highlights PepHarmony's exceptional capability in capturing the intricate relationship between peptide sequences and structures compared with the baseline and fine-tuned models. The robustness of our model is confirmed through extensive ablation studies, which emphasize the crucial roles of contrastive loss and strategic data sorting in enhancing predictive performance. The proposed PepHarmony framework serves as a notable contribution to peptide representations, and offers valuable insights for future applications in peptide drug discovery and peptide engineering. We have made all the source code utilized in this study publicly accessible via GitHub at https://github.com/zhangruochi/PepHarmony or http://www.healthinformaticslab.org/supp/.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.1136

Country:

Asia > China (0.69)
Asia > Middle East > Saudi Arabia (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

FedDef: Defense Against Gradient Leakage in Federated Learning-based Network Intrusion Detection Systems

Chen, Jiahui, Zhao, Yi, Li, Qi, Feng, Xuewei, Xu, Ke

arXiv.org Artificial IntelligenceAug-2-2023

Deep learning (DL) methods have been widely applied to anomaly-based network intrusion detection system (NIDS) to detect malicious traffic. To expand the usage scenarios of DL-based methods, federated learning (FL) allows multiple users to train a global model on the basis of respecting individual data privacy. However, it has not yet been systematically evaluated how robust FL-based NIDSs are against existing privacy attacks under existing defenses. To address this issue, we propose two privacy evaluation metrics designed for FL-based NIDSs, including (1) privacy score that evaluates the similarity between the original and recovered traffic features using reconstruction attacks, and (2) evasion rate against NIDSs using adversarial attack with the recovered traffic. We conduct experiments to illustrate that existing defenses provide little protection and the corresponding adversarial traffic can even evade the SOTA NIDS Kitsune. To defend against such attacks and build a more robust FL-based NIDS, we further propose FedDef, a novel optimization-based input perturbation defense strategy with theoretical guarantee. It achieves both high utility by minimizing the gradient distance and strong privacy protection by maximizing the input distance. We experimentally evaluate four existing defenses on four datasets and show that our defense outperforms all the baselines in terms of privacy protection with up to 7 times higher privacy score, while maintaining model accuracy loss within 3% under optimal parameter combination.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TIFS.2023.3297369

2210.04052

Country: Asia > China (0.14)

Genre: Research Report (0.81)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

TargetUM: Targeted High-Utility Itemset Querying

Miao, Jinbao, Wan, Shicheng, Gan, Wensheng, Sun, Jiayi, Chen, Jiahui

arXiv.org Artificial IntelligenceOct-30-2021

Traditional high-utility itemset mining (HUIM) aims to determine all high-utility itemsets (HUIs) that satisfy the minimum utility threshold (\textit{minUtil}) in transaction databases. However, in most applications, not all HUIs are interesting because only specific parts are required. Thus, targeted mining based on user preferences is more important than traditional mining tasks. This paper is the first to propose a target-based HUIM problem and to provide a clear formulation of the targeted utility mining task in a quantitative transaction database. A tree-based algorithm known as Target-based high-Utility iteMset querying using (TargetUM) is proposed. The algorithm uses a lexicographic querying tree and three effective pruning strategies to improve the mining efficiency. We implemented experimental validation on several real and synthetic databases, and the results demonstrate that the performance of \textbf{TargetUM} is satisfactory, complete, and correct. Finally, owing to the lexicographic querying tree, the database no longer needs to be scanned repeatedly for multiple queries.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2111.00309

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Retail (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Information Management (0.93)
(2 more...)

Add feedback

Graph-Enhanced Multi-Task Learning of Multi-Level Transition Dynamics for Session-based Recommendation

Huang, Chao, Chen, Jiahui, Xia, Lianghao, Xu, Yong, Dai, Peng, Chen, Yanqing, Bo, Liefeng, Zhao, Jiashu, Huang, Jimmy Xiangji

arXiv.org Artificial IntelligenceOct-8-2021

Session-based recommendation plays a central role in a wide spectrum of online applications, ranging from e-commerce to online advertising services. However, the majority of existing session-based recommendation techniques (e.g., attention-based recurrent network or graph neural network) are not well-designed for capturing the complex transition dynamics exhibited with temporally-ordered and multi-level inter-dependent relation structures. These methods largely overlook the relation hierarchy of item transitional patterns. In this paper, we propose a multi-task learning framework with Multi-level Transition Dynamics (MTD), which enables the jointly learning of intra- and inter-session item transition dynamics in automatic and hierarchical manner. Towards this end, we first develop a position-aware attention mechanism to learn item transitional regularities within individual session. Then, a graph-structured hierarchical relation encoder is proposed to explicitly capture the cross-session item transitions in the form of high-order connectivities by performing embedding propagation with the global graph context. The learning process of intra- and inter-session transition dynamics are integrated, to preserve the underlying low- and high-level item relationships in a common latent space. Extensive experiments on three real-world datasets demonstrate the superiority of MTD as compared to state-of-the-art baselines.

artificial intelligence, information technology services, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2110.03996

Country: Asia > China > Guangdong Province (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Services (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.93)

Add feedback

Practical and Configurable Network Traffic Classification Using Probabilistic Machine Learning

Chen, Jiahui, Breen, Joe, Phillips, Jeff M., Van der Merwe, Jacobus

arXiv.org Artificial IntelligenceJul-10-2021

Network traffic classification that is widely applicable and highly accurate is valuable for many network security and management tasks. A flexible and easily configurable classification framework is ideal, as it can be customized for use in a wide variety of networks. In this paper, we propose a highly configurable and flexible machine learning traffic classification method that relies only on statistics of sequences of packets to distinguish known, or approved, traffic from unknown traffic. Our method is based on likelihood estimation, provides a measure of certainty for classification decisions, and can classify traffic at adjustable certainty levels. Our classification method can also be applied in different classification scenarios, each prioritizing a different classification goal. We demonstrate how our classification scheme and all its configurations perform well on real-world traffic from a high performance computing network environment.

artificial intelligence, classification, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10586-021-03393-2

2107.0608

Country: North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Telecommunications > Networks (0.69)
Information Technology > Security & Privacy (0.66)
Information Technology > Networks (0.48)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback