AITopics | Song, Yue

Collaborating Authors

Song, Yue

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Is CLIP ideal? No. Can we fix it? Yes!

Kang, Raphi, Song, Yue, Gkioxari, Georgia, Perona, Pietro

arXiv.org Artificial IntelligenceMar-10-2025

Contrastive Language-Image Pre-Training (CLIP) is a popular method for learning multimodal latent spaces with well-organized semantics. Despite its wide range of applications, CLIP's latent space is known to fail at handling complex visual-textual interactions. Recent works attempt to address its shortcomings with data-centric or algorithmic approaches. But what if the problem is more fundamental, and lies in the geometry of CLIP? Toward this end, we rigorously analyze CLIP's latent space properties, and prove that no CLIP-like joint embedding space exists which can correctly do any two of the following at the same time: 1. represent basic descriptions and image content, 2. represent attribute binding, 3. represent spatial location and relationships, 4. represent negation. Informed by this analysis, we propose Dense Cosine Similarity Maps (DCSMs) as a principled and interpretable scoring method for CLIP-like models, which solves the fundamental limitations of CLIP by retaining the semantic topology of the image patches and text tokens. This method improves upon the performance of classical CLIP-like joint encoder models on a wide array of benchmarks. We share our code and data here for reproducibility: https://github.com/Raphoo/DCSM_Ideal_CLIP

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2503.08723

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Unsupervised Region-Based Image Editing of Denoising Diffusion Models

Li, Zixiang, Song, Yue, Tao, Renshuai, Jia, Xiaohong, Zhao, Yao, Wang, Wei

arXiv.org Artificial IntelligenceDec-17-2024

Although diffusion models have achieved remarkable success in the field of image generation, their latent space remains under-explored. Current methods for identifying semantics within latent space often rely on external supervision, such as textual information and segmentation masks. In this paper, we propose a method to identify semantic attributes in the latent space of pre-trained diffusion models without any further training. By projecting the Jacobian of the targeted semantic region into a low-dimensional subspace which is orthogonal to the non-masked regions, our approach facilitates precise semantic discovery and control over local masked areas, eliminating the need for annotations. We conducted extensive experiments across multiple datasets and various architectures of diffusion models, achieving state-of-the-art performance. In particular, for some specific face attributes, the performance of our proposed method even surpasses that of supervised approaches, demonstrating its superior ability in editing local image properties.

artificial intelligence, diffusion model, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2412.12912

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Media > Photography (0.43)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Morphological-Symmetry-Equivariant Heterogeneous Graph Neural Network for Robotic Dynamics Learning

Xie, Fengze, Wei, Sizhe, Song, Yue, Yue, Yisong, Gan, Lu

arXiv.org Artificial IntelligenceDec-2-2024

These structural priors are embedded into the learning architecture as constraints, ensuring high generalizability, sample and model efficiency. The proposed MS-HGNN is a versatile and general architecture that is applicable to various multi-body dynamic systems and a wide range of dynamics learning problems.

artificial intelligence, machine learning, symmetry, (17 more...)

arXiv.org Artificial Intelligence

2412.01297

Country: North America > United States > New York (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Unsupervised Representation Learning from Sparse Transformation Analysis

Song, Yue, Keller, Thomas Anderson, Yue, Yisong, Perona, Pietro, Welling, Max

arXiv.org Artificial IntelligenceOct-7-2024

There is a vast literature on representation learning based on principles such as coding efficiency, statistical independence, causality, controllability, or symmetry. In this paper we propose to learn representations from sequence data by factorizing the transformations of the latent variables into sparse components. Input data are first encoded as distributions of latent activations and subsequently transformed using a probability flow model, before being decoded to predict a future input state. The flow model is decomposed into a number of rotational (divergence-free) vector fields and a number of potential flow (curl-free) fields. Our sparsity prior encourages only a small number of these fields to be active at any instant and infers the speed with which the probability flows along these fields. Training this model is completely unsupervised using a standard variational objective and results in a new form of disentangled representations where the input is not only represented by a combination of independent factors, but also by a combination of independent transformation primitives given by the learned flow fields. When viewing the transformations as symmetries one may interpret this as learning approximately equivariant representations. Empirically we demonstrate that this model achieves state of the art in terms of both data likelihood and unsupervised approximate equivariance errors on datasets composed of sequence transformations.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.05564

Country:

Europe > Netherlands (0.14)
Europe > Italy (0.14)
North America > United States (0.14)
Europe > Belgium (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

RMLR: Extending Multinomial Logistic Regression into General Geometries

Chen, Ziheng, Song, Yue, Wang, Rui, Wu, Xiaojun, Sebe, Nicu

arXiv.org Artificial IntelligenceOct-2-2024

Riemannian neural networks, which extend deep learning techniques to Riemannian spaces, have gained significant attention in machine learning. To better classify the manifold-valued features, researchers have started extending Euclidean multinomial logistic regression (MLR) into Riemannian manifolds. However, existing approaches suffer from limited applicability due to their strong reliance on specific geometric properties. This paper proposes a framework for designing Riemannian MLR over general geometries, referred to as RMLR. Our framework only requires minimal geometric properties, thus exhibiting broad applicability and enabling its use with a wide range of geometries. Specifically, we showcase our framework on the Symmetric Positive Definite (SPD) manifold and special orthogonal group, i.e., the set of rotation matrices. On the SPD manifold, we develop five families of SPD MLRs under five types of power-deformed metrics. On rotation matrices we propose Lie MLR based on the popular bi-invariant metric. Extensive experiments on different Riemannian backbone networks validate the effectiveness of our framework.

artificial intelligence, machine learning, mlr, (18 more...)

arXiv.org Artificial Intelligence

2409.19433

Country: Europe > France (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (0.67)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry

Chen, Ziheng, Song, Yue, Wu, Xiao-Jun, Liu, Gaowen, Sebe, Nicu

arXiv.org Artificial IntelligenceJul-15-2024

Global Covariance Pooling (GCP) has been demonstrated to improve the performance of Deep Neural Networks (DNNs) by exploiting second-order statistics of high-level representations. GCP typically performs classification of the covariance matrices by applying matrix function normalization, such as matrix logarithm or power, followed by a Euclidean classifier. However, covariance matrices inherently lie in a Riemannian manifold, known as the Symmetric Positive Definite (SPD) manifold. The current literature does not provide a satisfactory explanation of why Euclidean classifiers can be applied directly to Riemannian features after the normalization of the matrix power. To mitigate this gap, this paper provides a comprehensive and unified understanding of the matrix logarithm and power from a Riemannian geometry perspective. The underlying mechanism of matrix functions in GCP is interpreted from two perspectives: one based on tangent classifiers (Euclidean classifiers on the tangent space) and the other based on Riemannian classifiers. Via theoretical analysis and empirical validation through extensive experiments on fine-grained and large-scale visual classification datasets, we conclude that the working mechanism of the matrix functions should be attributed to the Riemannian classifiers they implicitly respect.

artificial intelligence, classifier, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2407.10484

Country: Europe > France (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Product Geometries on Cholesky Manifolds with Applications to SPD Manifolds

Chen, Ziheng, Song, Yue, Wu, Xiao-Jun, Sebe, Nicu

arXiv.org Artificial IntelligenceJul-2-2024

This paper presents two new metrics on the Symmetric Positive Definite (SPD) manifold via the Cholesky manifold, i.e., the space of lower triangular matrices with positive diagonal elements. We first unveil that the existing popular Riemannian metric on the Cholesky manifold can be generally characterized as the product metric of a Euclidean metric and a Riemannian metric on the space of n-dimensional positive vectors. Based on this analysis, we propose two novel metrics on the Cholesky manifolds, i.e., Diagonal Power Euclidean Metric and Diagonal Generalized Bures-Wasserstein Metric, which are numerically stabler than the existing Cholesky metric. We also discuss the gyro structures and deformed metrics associated with our metrics. The gyro structures connect the linear and geometric properties, while the deformed metrics interpolate between our proposed metrics and the existing metric. Further, by Cholesky decomposition, the proposed deformed metrics and gyro structures are pulled back to SPD manifolds. Compared with existing Riemannian metrics on SPD manifolds, our metrics are easy to use, computationally efficient, and numerically stable.

artificial intelligence, machine learning, manifold, (13 more...)

arXiv.org Artificial Intelligence

2407.02607

Country: Europe > Italy (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Health Care Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

Aligning Large Language Models with Representation Editing: A Control Perspective

Kong, Lingkai, Wang, Haorui, Mu, Wenhao, Du, Yuanqi, Zhuang, Yuchen, Zhou, Yifei, Song, Yue, Zhang, Rongzhi, Wang, Kai, Zhang, Chao

arXiv.org Artificial IntelligenceJun-11-2024

Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model, and their performance remains dependent on the original model's capabilities. To address these challenges, we propose aligning LLMs through representation editing. The core of our method is to view a pre-trained autoregressive LLM as a discrete-time stochastic dynamical system. To achieve alignment for specific objectives, we introduce external control signals into the state space of this language dynamical system. We train a value function directly on the hidden states according to the Bellman equation, enabling gradient-based optimization to obtain the optimal control signals at test time. Our experiments demonstrate that our method outperforms existing test-time alignment techniques while requiring significantly fewer resources compared to fine-tuning methods.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2406.05954

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Consumer Health (0.68)
Banking & Finance (0.67)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Navigating Chemical Space with Latent Flows

Wei, Guanghao, Huang, Yining, Duan, Chenru, Song, Yue, Du, Yuanqi

arXiv.org Artificial IntelligenceMay-7-2024

Recent progress of deep generative models in the vision and language domain has stimulated significant interest in more structured data generation such as molecules. However, beyond generating new random molecules, efficient exploration and a comprehensive understanding of the vast chemical space are of great importance to molecular science and applications in drug design and materials discovery. In this paper, we propose a new framework, ChemFlow, to traverse chemical space through navigating the latent space learned by molecule generative models through flows. We introduce a dynamical system perspective that formulates the problem as learning a vector field that transports the mass of the molecular distribution to the region with desired molecular properties or structure diversity. Under this framework, we unify previous approaches on molecule latent space traversal and optimization and propose alternative competing methods incorporating different physical priors.

artificial intelligence, machine learning, optimization problem, (19 more...)

arXiv.org Artificial Intelligence

2405.03987

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

A Lie Group Approach to Riemannian Batch Normalization

Chen, Ziheng, Song, Yue, Liu, Yunmei, Sebe, Nicu

arXiv.org Artificial IntelligenceMar-17-2024

Manifold-valued measurements exist in numerous applications within computer vision and machine learning. Recent studies have extended Deep Neural Networks (DNNs) to manifolds, and concomitantly, normalization techniques have also been adapted to several manifolds, referred to as Riemannian normalization. Nonetheless, most of the existing Riemannian normalization methods have been derived in an ad hoc manner and only apply to specific manifolds. This paper establishes a unified framework for Riemannian Batch Normalization (RBN) techniques on Lie groups. Our framework offers the theoretical guarantee of controlling both the Riemannian mean and variance. Empirically, we focus on Symmetric Positive Definite (SPD) manifolds, which possess three distinct types of Lie group structures. Using the deformation concept, we generalize the existing Lie groups on SPD manifolds into three families of parameterized Lie groups. Specific normalization layers induced by these Lie groups are then proposed for SPD neural networks. We demonstrate the effectiveness of our approach through three sets of experiments: radar recognition, human action recognition, and electroencephalography (EEG) classification. The code is available at https://github.com/GitZH-Chen/LieBN.git.

artificial intelligence, machine learning, manifold, (19 more...)

arXiv.org Artificial Intelligence

2403.11261

Country: Europe > France (0.28)

Genre: Research Report (0.85)

Industry: Health & Medicine > Therapeutic Area (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback