AITopics | Zhang, Kun

Collaborating Authors

Zhang, Kun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad

Tang, Luyao, Yuan, Yuxuan, Chen, Chaoqi, Zhang, Zeyu, Huang, Yue, Zhang, Kun

arXiv.org Artificial IntelligenceMar-24-2025

Although foundation models (FMs) claim to be powerful, their generalization ability significantly decreases when faced with distribution shifts, weak supervision, or malicious attacks in the open world. On the other hand, most domain generalization or adversarial fine-tuning methods are task-related or model-specific, ignoring the universality in practical applications and the transferability between FMs. This paper delves into the problem of generalizing FMs to the out-of-domain data. We propose a novel framework, the Object-Concept-Relation Triad (OCRT), that enables FMs to extract sparse, high-level concepts and intricate relational structures from raw visual inputs. The key idea is to bind objects in visual scenes and a set of object-centric representations through unsupervised decoupling and iterative refinement. To be specific, we project the object-centric representations onto a semantic concept space that the model can readily interpret and estimate their importance to filter out irrelevant elements. Then, a concept-based graph, which has a flexible degree, is constructed to incorporate the set of concepts and their corresponding importance, enabling the extraction of high-order factors from informative concepts and facilitating relational reasoning among these concepts. Extensive experiments demonstrate that OCRT can substantially boost the generalizability and robustness of SAM and CLIP across multiple downstream tasks.

large language model, machine learning, ocrt, (18 more...)

arXiv.org Artificial Intelligence

2503.18695

Country:

North America > United States (0.28)
Asia > China (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.90)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Information Technology > Security & Privacy (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Analytic DAG Constraints for Differentiable DAG Learning

Zhang, Zhen, Ng, Ignavier, Gong, Dong, Liu, Yuhang, Gong, Mingming, Huang, Biwei, Zhang, Kun, Hengel, Anton van den, Shi, Javen Qinfeng

arXiv.org Artificial IntelligenceMar-24-2025

Recovering the underlying Directed Acyclic Graph (DAG) structures from observational data presents a formidable challenge, partly due to the combinatorial nature of the DAG-constrained optimization problem. Recently, researchers have identified gradient vanishing as one of the primary obstacles in differentiable DAG learning and have proposed several DAG constraints to mitigate this issue. By developing the necessary theory to establish a connection between analytic functions and DAG constraints, we demonstrate that analytic functions from the set $\{f(x) = c_0 + \sum_{i=1}^{\infty}c_ix^i | \forall i > 0, c_i > 0; r = \lim_{i\rightarrow \infty}c_{i}/c_{i+1} > 0\}$ can be employed to formulate effective DAG constraints. Furthermore, we establish that this set of functions is closed under several functional operators, including differentiation, summation, and multiplication. Consequently, these operators can be leveraged to create novel DAG constraints based on existing ones. Using these properties, we design a series of DAG constraints and develop an efficient algorithm to evaluate them. Experiments in various settings demonstrate that our DAG constraints outperform previous state-of-the-art comparators. Our implementation is available at https://github.com/zzhang1987/AnalyticDAGLearning.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2503.19218

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

ProtoGS: Efficient and High-Quality Rendering with 3D Gaussian Prototypes

Gao, Zhengqing, Hu, Dongting, Bian, Jia-Wang, Fu, Huan, Li, Yan, Liu, Tongliang, Gong, Mingming, Zhang, Kun

arXiv.org Artificial IntelligenceMar-21-2025

3D Gaussian Splatting (3DGS) has made significant strides in novel view synthesis but is limited by the substantial number of Gaussian primitives required, posing challenges for deployment on lightweight devices. Recent methods address this issue by compressing the storage size of densified Gaussians, yet fail to preserve rendering quality and efficiency. To overcome these limitations, we propose ProtoGS to learn Gaussian prototypes to represent Gaussian primitives, significantly reducing the total Gaussian amount without sacrificing visual quality. Our method directly uses Gaussian prototypes to enable efficient rendering and leverage the resulting reconstruction loss to guide prototype learning. To further optimize memory efficiency during training, we incorporate structure-from-motion (SfM) points as anchor points to group Gaussian primitives. Gaussian prototypes are derived within each group by clustering of K-means, and both the anchor points and the prototypes are optimized jointly. Our experiments on real-world and synthetic datasets prove that we outperform existing methods, achieving a substantial reduction in the number of Gaussians, and enabling high rendering speed while maintaining or even enhancing rendering fidelity.

artificial intelligence, machine learning, prototype, (17 more...)

arXiv.org Artificial Intelligence

2503.17486

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Nonparametric Factor Analysis and Beyond

Zheng, Yujia, Liu, Yang, Yao, Jiaxiong, Hu, Yingyao, Zhang, Kun

arXiv.org Machine LearningMar-21-2025

Nearly all identifiability results in unsupervised representation learning inspired by, e.g., independent component analysis, factor analysis, and causal representation learning, rely on assumptions of additive independent noise or noiseless regimes. In contrast, we study the more general case where noise can take arbitrary forms, depend on latent variables, and be non-invertibly entangled within a nonlinear function. We propose a general framework for identifying latent variables in the nonparametric noisy settings. We first show that, under suitable conditions, the generative model is identifiable up to certain submanifold indeterminacies even in the presence of non-negligible noise. Furthermore, under the structural or distributional variability conditions, we prove that latent variables of the general nonlinear models are identifiable up to trivial indeterminacies. Based on the proposed theoretical framework, we have also developed corresponding estimation methods and validated them in various synthetic and real-world settings. Interestingly, our estimate of the true GDP growth from alternative measurements suggests more insightful information on the economies than official reports. We expect our framework to provide new insight into how both researchers and practitioners deal with latent variables in real-world scenarios.

artificial intelligence, latent variable, machine learning, (15 more...)

arXiv.org Machine Learning

2503.16865

Country:

Asia > Japan > Honshū (0.14)
Asia > Middle East > Jordan (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Banking & Finance > Economy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Causal Discovery and Counterfactual Reasoning to Optimize Persuasive Dialogue Policies

Zeng, Donghuo, Legaspi, Roberto, Sun, Yuewen, Dong, Xinshuai, Ikeda, Kazushi, Spirtes, Peter, Zhang, Kun

arXiv.org Artificial IntelligenceMar-19-2025

Tailoring persuasive conversations to users leads to more effective persuasion. However, existing dialogue systems often struggle to adapt to dynamically evolving user states. This paper presents a novel method that leverages causal discovery and counterfactual reasoning for optimizing system persuasion capability and outcomes. We employ the Greedy Relaxation of the Sparsest Permutation (GRaSP) algorithm to identify causal relationships between user and system utterance strategies, treating user strategies as states and system strategies as actions. GRaSP identifies user strategies as causal factors influencing system responses, which inform Bidirectional Conditional Generative Adversarial Networks (BiCoGAN) in generating counterfactual utterances for the system. Subsequently, we use the Dueling Double Deep Q-Network (D3QN) model to utilize counterfactual data to determine the best policy for selecting system utterances. Our experiments with the PersuasionForGood dataset show measurable improvements in persuasion outcomes using our approach over baseline methods. The observed increase in cumulative rewards and Q-values highlights the effectiveness of causal discovery in enhancing counterfactual reasoning and optimizing reinforcement learning policies for online dialogue systems.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.16544

Country:

North America > United States (0.14)
Oceania > Australia (0.14)
Europe > Germany (0.14)
(2 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry:

Social Sector (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

Type Information-Assisted Self-Supervised Knowledge Graph Denoising

Sun, Jiaqi, Zheng, Yujia, Dong, Xinshuai, Dai, Haoyue, Zhang, Kun

arXiv.org Artificial IntelligenceMar-12-2025

Knowledge graphs serve as critical resources supporting intelligent systems, but they can be noisy due to imperfect automatic generation processes. Existing approaches to noise detection often rely on external facts, logical rule constraints, or structural embeddings. These methods are often challenged by imperfect entity alignment, flexible knowledge graph construction, and overfitting on structures. In this paper, we propose to exploit the consistency between entity and relation type information for noise detection, resulting a novel self-supervised knowledge graph denoising method that avoids those problems. We formalize type inconsistency noise as triples that deviate from the majority with respect to type-dependent reasoning along the topological structure. Specifically, we first extract a compact representation of a given knowledge graph via an encoder that models the type dependencies of triples. Then, the decoder reconstructs the original input knowledge graph based on the compact representation. It is worth noting that, our proposal has the potential to address the problems of knowledge graph compression and completion, although this is not our focus. For the specific task of noise detection, the discrepancy between the reconstruction results and the input knowledge graph provides an opportunity for denoising, which is facilitated by the type consistency embedded in our method. Experimental validation demonstrates the effectiveness of our approach in detecting potential noise in real-world data.

artificial intelligence, information, knowledge graph, (16 more...)

arXiv.org Artificial Intelligence

2503.09916

Country:

North America > United States > Pennsylvania (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)

Add feedback

SDD-4DGS: Static-Dynamic Aware Decoupling in Gaussian Splatting for 4D Scene Reconstruction

Sun, Dai, Guan, Huhao, Zhang, Kun, Xie, Xike, Zhou, S. Kevin

arXiv.org Artificial IntelligenceMar-12-2025

Dynamic and static components in scenes often exhibit distinct properties, yet most 4D reconstruction methods treat them indiscriminately, leading to suboptimal performance in both cases. This work introduces SDD-4DGS, the first framework for static-dynamic decoupled 4D scene reconstruction based on Gaussian Splatting. Our approach is built upon a novel probabilistic dynamic perception coefficient that is naturally integrated into the Gaussian reconstruction pipeline, enabling adaptive separation of static and dynamic components. With carefully designed implementation strategies to realize this theoretical framework, our method effectively facilitates explicit learning of motion patterns for dynamic elements while maintaining geometric stability for static structures. Extensive experiments on five benchmark datasets demonstrate that SDD-4DGS consistently outperforms state-of-the-art methods in reconstruction fidelity, with enhanced detail restoration for static structures and precise modeling of dynamic motions. The code will be released.

artificial intelligence, reconstruction, sdd-4dg, (13 more...)

arXiv.org Artificial Intelligence

2503.09332

Country:

North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Generative Artificial Intelligence in Robotic Manipulation: A Survey

Zhang, Kun, Yun, Peng, Cen, Jun, Cai, Junhao, Zhu, Didi, Yuan, Hangjie, Zhao, Chao, Feng, Tao, Wang, Michael Yu, Chen, Qifeng, Pan, Jia, Zhang, Wei, Yang, Bo, Chen, Hua

arXiv.org Artificial IntelligenceMar-10-2025

This survey provides a comprehensive review on recent advancements of generative learning models in robotic manipulation, addressing key challenges in the field. Robotic manipulation faces critical bottlenecks, including significant challenges in insufficient data and inefficient data acquisition, long-horizon and complex task planning, and the multi-modality reasoning ability for robust policy learning performance across diverse environments. To tackle these challenges, this survey introduces several generative model paradigms, including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), diffusion models, probabilistic flow models, and autoregressive models, highlighting their strengths and limitations. The applications of these models are categorized into three hierarchical layers: the Foundation Layer, focusing on data generation and reward generation; the Intermediate Layer, covering language, code, visual, and state generation; and the Policy Layer, emphasizing grasp generation and trajectory generation. Each layer is explored in detail, along with notable works that have advanced the state of the art. Finally, the survey outlines future research directions and challenges, emphasizing the need for improved efficiency in data utilization, better handling of long-horizon tasks, and enhanced generalization across diverse robotic scenarios. All the related resources, including research papers, open-source data, and projects, are collected for the community in https://github.com/GAI4Manipulation/AwesomeGAIManipulation

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.03464

Country: Asia > China > Guangdong Province (0.14)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.67)

Industry:

Education (0.67)
Energy > Oil & Gas > Upstream (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.82)

Add feedback

When Selection Meets Intervention: Additional Complexities in Causal Discovery

Dai, Haoyue, Ng, Ignavier, Sun, Jianle, Tang, Zeyu, Luo, Gongxu, Dong, Xinshuai, Spirtes, Peter, Zhang, Kun

arXiv.org Artificial IntelligenceMar-10-2025

We address the common yet often-overlooked selection bias in interventional studies, where subjects are selectively enrolled into experiments. For instance, participants in a drug trial are usually patients of the relevant disease; A/B tests on mobile applications target existing users only, and gene perturbation studies typically focus on specific cell types, such as cancer cells. Ignoring this bias leads to incorrect causal discovery results. Even when recognized, the existing paradigm for interventional causal discovery still fails to address it. This is because subtle differences in when and where interventions happen can lead to significantly different statistical patterns. We capture this dynamic by introducing a graphical model that explicitly accounts for both the observed world (where interventions are applied) and the counterfactual world (where selection occurs while interventions have not been applied). We characterize the Markov property of the model, and propose a provably sound algorithm to identify causal relations as well as selection mechanisms up to the equivalence class, from data with soft interventions and unknown targets. Through synthetic and real-world experiments, we demonstrate that our algorithm effectively identifies true causal relations despite the presence of selection bias.

artificial intelligence, latexit sha1, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.07302

Country: North America > United States (0.45)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.67)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.68)
Health & Medicine > Therapeutic Area > Immunology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Infant Cry Detection Using Causal Temporal Representation

Fu, Minghao, Li, Danning, Gadhiya, Aryan, Lambright, Benjamin, Alowais, Mohamed, Bahnassy, Mohab, Elletter, Saad El Dine, Toyin, Hawau Olamide, Jiang, Haiyan, Zhang, Kun, Aldarmaki, Hanan

arXiv.org Artificial IntelligenceMar-8-2025

Identifying relevant audio features in domestic Caring for newborns, especially for first-time parents, is a environments is challenging due to diverse background sounds significant challenge. One of the main difficulties is understanding and the limited availability of high-quality annotated data the meaning of infant cries. In response, numerous for specific cases like baby cries. We address this issue studies have emerged to address this problem. Early research through manual annotation and data augmentation techniques, showed that trained adult listeners could differentiate between improving baby cry analysis models by reducing noise during types of cries. For example, [1] first identified four types of cry interval extraction. In addition, as the acquisition of cries (pain, hunger, birth, and pleasure) by training nurses annotated data is both costly and challenging, we propose a to recognize them. However, at best, the accuracy of trained viable alternative using unsupervised methods to detect infant nurses is only up to 33.09%. Beyond recognizing infants' daily cry segment boundaries by approximating the underlying needs, disease prediction is another critical task in infant cry data-generating process.

artificial intelligence, detection, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2503.06247

Country: North America (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback