AITopics | Chen, Zihao

Collaborating Authors

Chen, Zihao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Domain Adaptation for Japanese Sentence Embeddings with Contrastive Learning based on Synthetic Sentence Generation

Chen, Zihao, Handa, Hisashi, Ohsaki, Miho, Shirahama, Kimiaki

arXiv.org Artificial IntelligenceMar-12-2025

Such sentence embeddings can be further enhanced by domain adaptation that adapts a backbone model to a specific domain. However, domain adaptation for low-resource languages like Japanese is often difficult due to the scarcity of large-scale labeled datasets. To overcome this, this paper introduces SDJC (Self-supervised Domain adaptation for Japanese sentence embeddings with Contrastive learning) that utilizes a data generator to generate sentences, which have the same syntactic structure to a sentence in an unlabeled specific domain corpus but convey different semantic meanings. Generated sentences are then used to boost contrastive learning that adapts a backbone model to accurately discriminate sentences in the specific domain. In addition, the components of SDJC like a backbone model and a method to adapt it need to be carefully selected, but no benchmark dataset is available for Japanese. Thus, a comprehensive Japanese STS (Semantic Textual Similarity) benchmark dataset is constructed by combining datasets machine-translated from English with existing datasets. The experimental results validates the effectiveness of SDJC on two domain-specific downstream tasks as well as the usefulness of the constructed dataset.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.09094

Country: Asia > Japan > Honshū > Kansai (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
(2 more...)

Add feedback

Your contrastive learning problem is secretly a distribution alignment problem

Chen, Zihao, Lin, Chi-Heng, Liu, Ran, Xiao, Jingyun, Dyer, Eva L

arXiv.org Artificial IntelligenceFeb-27-2025

Despite the success of contrastive learning (CL) in vision and language, its theoretical foundations and mechanisms for building representations remain poorly understood. In this work, we build connections between noise contrastive estimation losses widely used in CL and distribution alignment with entropic optimal transport (OT). This connection allows us to develop a family of different losses and multistep iterative variants for existing CL methods. Intuitively, by using more information from the distribution of latents, our approach allows a more distribution-aware manipulation of the relationships within augmented sample sets. We provide theoretical insights and experimental evidence demonstrating the benefits of our approach for {\em generalized contrastive alignment}. Through this framework, it is possible to leverage tools in OT to build unbalanced losses to handle noisy views and customize the representation space by changing the constraints on alignment. By reframing contrastive learning as an alignment problem and leveraging existing optimization tools for OT, our work provides new insights and connections between different self-supervised learning models in addition to new tools that can be more easily adapted to incorporate domain knowledge into learning.

artificial intelligence, machine learning, optimization problem, (19 more...)

arXiv.org Artificial Intelligence

2502.20141

Country:

Asia (0.28)
North America > Canada > Ontario > Toronto (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.46)
Education > Focused Education > Special Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

ComposeOn Academy: Transforming Melodic Ideas into Complete Compositions Integrating Music Learning

Pu, Hongxi, Jiang, Futian, Chen, Zihao, Song, Xingyue

arXiv.org Artificial IntelligenceFeb-21-2025

Music composition has long been recognized as a significant art form. However, existing digital audio workstations and music production software often present high entry barriers for users lacking formal musical training. To address this, we introduce ComposeOn, a music theory-based tool designed for users with limited musical knowledge. ComposeOn enables users to easily extend their melodic ideas into complete compositions and offers simple editing features. By integrating music theory, it explains music creation at beginner, intermediate, and advanced levels. Our user study (N=10) compared ComposeOn with the baseline method, Suno AI, demonstrating that ComposeOn provides a more accessible and enjoyable composing and learning experience for individuals with limited musical skills. ComposeOn bridges the gap between theory and practice, offering an innovative solution as both a composition aid and music education platform. The study also explores the differences between theory-based music creation and generative music, highlighting the former's advantages in personal expression and learning.

composeon, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.15255

Country: North America > United States > California (0.14)

Genre: Research Report > Promising Solution (0.66)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Education > Curriculum > Subject-Specific Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Human Computer Interaction (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

An Attack Traffic Identification Method Based on Temporal Spectrum

Xie, Wenwei, Yin, Jie, Chen, Zihao

arXiv.org Artificial IntelligenceNov-11-2024

To address the issues of insufficient robustness, unstable features, and data noise interference in existing network attack detection and identification models, this paper proposes an attack traffic detection and identification method based on temporal spectrum. First, traffic data is segmented by a sliding window to construct a feature sequence and a corresponding label sequence for network traffic. Next, the proposed spectral label generation methods, SSPE and COAP, are applied to transform the label sequence into spectral labels and the feature sequence into temporal features. Spectral labels and temporal features are used to capture and represent behavioral patterns of attacks. Finally, the constructed temporal features and spectral labels are used to train models, which subsequently detects and identifies network attack behaviors. Experimental results demonstrate that compared to traditional methods, models trained with the SSPE or COAP method improve identification accuracy by 10%, and exhibit strong robustness, particularly in noisy environments.

data mining, machine learning, traffic, (18 more...)

arXiv.org Artificial Intelligence

2411.0751

Country: North America > United States (0.94)

Genre: Research Report > New Finding (0.49)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Isometric Immersion Learning with Riemannian Geometry

Chen, Zihao, Wang, Wenyong, Xiang, Yu

arXiv.org Artificial IntelligenceSep-23-2024

Manifold learning has been proven to be an effective method for capturing the implicitly intrinsic structure of non-Euclidean data, in which one of the primary challenges is how to maintain the distortion-free (isometry) of the data representations. Actually, there is still no manifold learning method that provides a theoretical guarantee of isometry. Inspired by Nash's isometric theorem, we introduce a new concept called isometric immersion learning based on Riemannian geometry principles. Following this concept, an unsupervised neural network-based model that simultaneously achieves metric and manifold learning is proposed by integrating Riemannian geometry priors. What's more, we theoretically derive and algorithmically implement a maximum likelihood estimation-based training method for the new model. In the simulation experiments, we compared the new model with the state-of-the-art baselines on various 3-D geometry datasets, demonstrating that the new model exhibited significantly superior performance in multiple evaluation metrics. Moreover, we applied the Riemannian metric learned from the new model to downstream prediction tasks in real-world scenarios, and the accuracy was improved by an average of 8.8%.

artificial intelligence, machine learning, manifold, (17 more...)

arXiv.org Artificial Intelligence

2409.1476

Genre: Research Report (0.64)

Industry: Education (0.77)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Energy-efficiency Limits on Training AI Systems using Learning-in-Memory

Chen, Zihao, Leugering, Johannes, Cauwenberghs, Gert, Chakrabartty, Shantanu

arXiv.org Artificial IntelligenceMay-21-2024

Learning-in-memory (LIM) is a recently proposed paradigm to overcome fundamental memory bottlenecks in training machine learning systems. While compute-in-memory (CIM) approaches can address the so-called memory-wall (i.e. energy dissipated due to repeated memory read access) they are agnostic to the energy dissipated due to repeated memory writes at the precision required for training (the update-wall), and they don't account for the energy dissipated when transferring information between short-term and long-term memories (the consolidation-wall). The LIM paradigm proposes that these bottlenecks, too, can be overcome if the energy barrier of physical memories is adaptively modulated such that the dynamics of memory updates and consolidation match the Lyapunov dynamics of gradient-descent training of an AI model. In this paper, we derive new theoretical lower bounds on energy dissipation when training AI systems using different LIM approaches. The analysis presented here is model-agnostic and highlights the trade-off between energy efficiency and the speed of training. The resulting non-equilibrium energy-efficiency bounds have a similar flavor as that of Landauer's energy-dissipation bounds. We also extend these limits by taking into account the number of floating-point operations (FLOPs) used for training, the size of the AI model, and the precision of the training parameters. Our projections suggest that the energy-dissipation lower-bound to train a brain scale AI system (comprising of $10^{15}$ parameters) using LIM is $10^8 \sim 10^9$ Joules, which is on the same magnitude the Landauer's adiabatic lower-bound and $6$ to $7$ orders of magnitude lower than the projections obtained using state-of-the-art AI accelerator hardware lower-bounds.

artificial intelligence, energy dissipation, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2402.14878

Country: North America > United States > California > San Diego County (0.14)

Genre: Research Report (0.63)

Industry:

Information Technology (0.67)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Statistical Barriers to Affine-equivariant Estimation

Chen, Zihao, Cherapanamjeri, Yeshwanth

arXiv.org Artificial IntelligenceOct-16-2023

We investigate the quantitative performance of affine-equivariant estimators for robust mean estimation. As a natural stability requirement, the construction of such affine-equivariant estimators has been extensively studied in the statistics literature. We quantitatively evaluate these estimators under two outlier models which have been the subject of much recent work: the heavy-tailed and adversarial corruption settings. We establish lower bounds which show that affine-equivariance induces a strict degradation in recovery error with quantitative rates degrading by a factor of $\sqrt{d}$ in both settings. We find that classical estimators such as the Tukey median (Tukey '75) and Stahel-Donoho estimator (Stahel '81 and Donoho '82) are either quantitatively sub-optimal even within the class of affine-equivariant estimators or lack any quantitative guarantees. On the other hand, recent estimators with strong quantitative guarantees are not affine-equivariant or require additional distributional assumptions to achieve it. We remedy this by constructing a new affine-equivariant estimator which nearly matches our lower bound. Our estimator is based on a novel notion of a high-dimensional median which may be of independent interest. Notably, our results are applicable more broadly to any estimator whose performance is evaluated in the Mahalanobis norm which, for affine-equivariant estimators, corresponds to an evaluation in Euclidean norm on isotropic distributions.

artificial intelligence, estimator, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2310.10758

Country:

North America > United States (1.00)
Europe > United Kingdom > England (0.28)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.67)

Add feedback

Break The Spell Of Total Correlation In betaTCVAE

Chen, Zihao, Wang, Wenyong, Zou, Sai

arXiv.org Artificial IntelligenceApr-27-2023

In the absence of artificial labels, the independent and dependent features in the data are cluttered. How to construct the inductive biases of the model to flexibly divide and effectively contain features with different complexity is the main focal point of unsupervised disentangled representation learning. This paper proposes a new iterative decomposition path of total correlation and explains the disentangled representation ability of VAE from the perspective of model capacity allocation. The newly developed objective function combines latent variable dimensions into joint distribution while relieving the independence constraints of marginal distributions in combination, leading to latent variables with a more manipulable prior distribution. The novel model enables VAE to adjust the parameter capacity to divide dependent and independent data features flexibly. Experimental results on various datasets show an interesting relevance between model capacity and the latent variable grouping size, called the "V"-shaped best ELBO trajectory. Additionally, we empirically demonstrate that the proposed method obtains better disentangling performance with reasonable parameter capacity allocation.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.08794

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.69)

Add feedback

An Interpretable Loan Credit Evaluation Method Based on Rule Representation Learner

Chen, Zihao, Wang, Xiaomeng, Huang, Yuanjiang, Jia, Tao

arXiv.org Artificial IntelligenceApr-3-2023

The interpretability of model has become one of the obstacles to its wide application in the high-stake fields. The usual way to obtain interpretability is to build a black-box first and then explain it using the post-hoc methods. However, the explanations provided by the post-hoc method are not always reliable. Instead, we design an intrinsically interpretable model based on RRL(Rule Representation Learner) for the Lending Club dataset. Specifically, features can be divided into three categories according to their characteristics of themselves and build three sub-networks respectively, each of which is similar to a neural network with a single hidden layer but can be equivalently converted into a set of rules. During the training, we learned tricks from previous research to effectively train binary weights. Finally, our model is compared with the tree-based model. The results show that our model is much better than the interpretable decision tree in performance and close to other black-box, which is of practical significance to both financial institutions and borrowers. More importantly, our model is used to test the correctness of the explanations generated by the post-hoc method, the results show that the post-hoc method is not always reliable.

artificial intelligence, explanation, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2304.00731

Genre: Research Report > New Finding (0.87)

Industry:

Banking & Finance > Loans (0.69)
Banking & Finance > Credit (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

JCSE: Contrastive Learning of Japanese Sentence Embeddings and Its Applications

Chen, Zihao, Handa, Hisashi, Shirahama, Kimiaki

arXiv.org Artificial IntelligenceJan-19-2023

Contrastive learning is widely used for sentence representation learning. Despite this prevalence, most studies have focused exclusively on English and few concern domain adaptation for domain-specific downstream tasks, especially for low-resource languages like Japanese, which are characterized by insufficient target domain data and the lack of a proper training strategy. To overcome this, we propose a novel Japanese sentence representation framework, JCSE (derived from ``Contrastive learning of Sentence Embeddings for Japanese''), that creates training data by generating sentences and synthesizing them with sentences available in a target domain. Specifically, a pre-trained data generator is finetuned to a target domain using our collected corpus. It is then used to generate contradictory sentence pairs that are used in contrastive learning for adapting a Japanese language model to a specific task in the target domain. Another problem of Japanese sentence representation learning is the difficulty of evaluating existing embedding methods due to the lack of benchmark datasets. Thus, we establish a comprehensive Japanese Semantic Textual Similarity (STS) benchmark on which various embedding models are evaluated. Based on this benchmark result, multiple embedding methods are chosen and compared with JCSE on two domain-specific tasks, STS in a clinical domain and information retrieval in an educational domain. The results show that JCSE achieves significant performance improvement surpassing direct transfer and other training strategies. This empirically demonstrates JCSE's effectiveness and practicability for downstream tasks of a low-resource language.

artificial intelligence, natural language, text processing, (16 more...)

arXiv.org Artificial Intelligence

2301.08193

Country: Asia > Japan (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback