AITopics | Song, Qi

Collaborating Authors

Song, Qi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unleashing the Potential of Two-Tower Models: Diffusion-Based Cross-Interaction for Large-Scale Matching

Wang, Yihan, Xiong, Fei, Han, Zhexin, Song, Qi, Zhan, Kaiqiao, Wang, Ben

arXiv.org Artificial IntelligenceFeb-27-2025

Two-tower models are widely adopted in the industrial-scale matching stage across a broad range of application domains, such as content recommendations, advertisement systems, and search engines. This model efficiently handles large-scale candidate item screening by separating user and item representations. However, the decoupling network also leads to a neglect of potential information interaction between the user and item representations. Current state-of-the-art (SOTA) approaches include adding a shallow fully connected layer(i.e., COLD), which is limited by performance and can only be used in the ranking stage. For performance considerations, another approach attempts to capture historical positive interaction information from the other tower by regarding them as the input features(i.e., DAT). Later research showed that the gains achieved by this method are still limited because of lacking the guidance on the next user intent. To address the aforementioned challenges, we propose a "cross-interaction decoupling architecture" within our matching paradigm. This user-tower architecture leverages a diffusion module to reconstruct the next positive intention representation and employs a mixed-attention module to facilitate comprehensive cross-interaction. During the next positive intention generation, we further enhance the accuracy of its reconstruction by explicitly extracting the temporal drift within user behavior sequences. Experiments on two real-world datasets and one industrial dataset demonstrate that our method outperforms the SOTA two-tower models significantly, and our diffusion approach outperforms other generative models in reconstructing item representations.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.20687

Country:

Europe (0.93)
Asia (0.69)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

KnowPath: Knowledge-enhanced Reasoning via LLM-generated Inference Paths over Knowledge Graphs

Zhao, Qi, Yang, Hongyu, Song, Qi, Yao, Xinwei, Li, Xiangyang

arXiv.org Artificial IntelligenceFeb-17-2025

Large language models (LLMs) have demonstrated remarkable capabilities in various complex tasks, yet they still suffer from hallucinations. Introducing external knowledge, such as knowledge graph, can enhance the LLMs' ability to provide factual answers. LLMs have the ability to interactively explore knowledge graphs. However, most approaches have been affected by insufficient internal knowledge excavation in LLMs, limited generation of trustworthy knowledge reasoning paths, and a vague integration between internal and external knowledge. Therefore, we propose KnowPath, a knowledge-enhanced large model framework driven by the collaboration of internal and external knowledge. It relies on the internal knowledge of the LLM to guide the exploration of interpretable directed subgraphs in external knowledge graphs, better integrating the two knowledge sources for more accurate reasoning. Extensive experiments on multiple real-world datasets confirm the superiority of KnowPath.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2502.12029

Country:

Europe (1.00)
Asia > China (0.68)
North America > United States (0.46)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

LLMBind: A Unified Modality-Task Integration Framework

Zhu, Bin, Ning, Munan, Jin, Peng, Lin, Bin, Huang, Jinfa, Song, Qi, Zhang, Junwu, Tang, Zhenyu, Pan, Mingjun, Zhou, Xing, Yuan, Li

arXiv.org Artificial IntelligenceApr-18-2024

In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress. To address this challenge, we introduce \textbf{LLMBind}, a novel framework designed to unify a diverse array of multi-modal tasks. By harnessing a Mixture-of-Experts (MoE) Large Language Model (LLM), LLMBind processes multi-modal inputs and generates task-specific tokens, enabling the invocation of corresponding models to accomplish tasks. This unique approach empowers LLMBind to interpret inputs and generate outputs across various modalities, including image, text, video, and audio. Furthermore, we have constructed an interaction dataset comprising 400k instructions, which unlocks the ability of LLMBind for interactive visual generation and editing tasks. Extensive experimentation demonstrates that LLMBind achieves very superior performance across diverse tasks and outperforms existing models in user evaluations conducted in real-world scenarios. Moreover, the adaptability of LLMBind allows for seamless integration with the latest models and extension to new modality tasks, highlighting its potential to serve as a unified AI agent for modeling universal modalities.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.14891

Country:

Oceania > Australia (0.17)
Asia > China (0.14)
Europe > Netherlands (0.14)

Genre: Research Report (0.82)

Industry: Media (0.69)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Stochastic Planner-Actor-Critic for Unsupervised Deformable Image Registration

Luo, Ziwei, Hu, Jing, Wang, Xin, Hu, Shu, Kong, Bin, Yin, Youbing, Song, Qi, Wu, Xi, Lyu, Siwei

arXiv.org Artificial IntelligenceDec-14-2021

Large deformations of organs, caused by diverse shapes and nonlinear shape changes, pose a significant challenge for medical image registration. Traditional registration methods need to iteratively optimize an objective function via a specific deformation model along with meticulous parameter tuning, but which have limited capabilities in registering images with large deformations. While deep learning-based methods can learn the complex mapping from input images to their respective deformation field, it is regression-based and is prone to be stuck at local minima, particularly when large deformations are involved. To this end, we present Stochastic Planner-Actor-Critic (SPAC), a novel reinforcement learning-based framework that performs step-wise registration. The key notion is warping a moving image successively by each time step to finally align to a fixed image. Considering that it is challenging to handle high dimensional continuous action and state spaces in the conventional reinforcement learning (RL) framework, we introduce a new concept `Plan' to the standard Actor-Critic model, which is of low dimension and can facilitate the actor to generate a tractable high dimensional action. The entire framework is based on unsupervised training and operates in an end-to-end manner. We evaluate our method on several 2D and 3D medical image datasets, some of which contain large deformations. Our empirical results highlight that our work achieves consistent, significant gains and outperforms state-of-the-art methods.

diagnostic medicine, machine learning, pattern recognition, (21 more...)

arXiv.org Artificial Intelligence

2112.07415

Country: Asia > China (0.28)

Genre: Research Report (0.70)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.88)

Add feedback

Imperceptible Adversarial Examples for Fake Image Detection

Liao, Quanyu, Li, Yuezun, Wang, Xin, Kong, Bin, Zhu, Bin, Lyu, Siwei, Yin, Youbing, Song, Qi, Wu, Xi

arXiv.org Artificial IntelligenceJun-3-2021

Fooling people with highly realistic fake images generated with Deepfake or GANs brings a great social disturbance to our society. Many methods have been proposed to detect fake images, but they are vulnerable to adversarial perturbations -- intentionally designed noises that can lead to the wrong prediction. Existing methods of attacking fake image detectors usually generate adversarial perturbations to perturb almost the entire image. This is redundant and increases the perceptibility of perturbations. In this paper, we propose a novel method to disrupt the fake image detection by determining key pixels to a fake image detector and attacking only the key pixels, which results in the $L_0$ and the $L_2$ norms of adversarial perturbations much less than those of existing works. Experiments on two public datasets with three fake image detectors indicate that our proposed method achieves state-of-the-art performance in both white-box and black-box attacks.

deep learning, neural network, perturbation, (19 more...)

arXiv.org Artificial Intelligence

2106.01615

Country: Asia > China (0.29)

Genre: Research Report (1.00)

Industry:

Law > Criminal Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

TumorNet: Lung Nodule Characterization Using Multi-View Convolutional Neural Network with Gaussian Process

Hussein, Sarfaraz, Gillies, Robert, Cao, Kunlin, Song, Qi, Bagci, Ulas

arXiv.org Machine LearningMar-2-2017

Characterization of lung nodules as benign or malignant is one of the most important tasks in lung cancer diagnosis, staging and treatment planning. While the variation in the appearance of the nodules remains large, there is a need for a fast and robust computer aided system. In this work, we propose an end-to-end trainable multi-view deep Convolutional Neural Network (CNN) for nodule characterization. First, we use median intensity projection to obtain a 2D patch corresponding to each dimension. The three images are then concatenated to form a tensor, where the images serve as different channels of the input image. In order to increase the number of training samples, we perform data augmentation by scaling, rotating and adding noise to the input image. The trained network is used to extract features from the input image followed by a Gaussian Process (GP) regression to obtain the malignancy score. We also empirically establish the significance of different high level nodule attributes such as calcification, sphericity and others for malignancy determination. These attributes are found to be complementary to the deep multi-view CNN features and a significant improvement over other methods is obtained.

deep learning, neural network, nodule, (19 more...)

arXiv.org Machine Learning

1703.00645

Country: North America > United States > Florida > Orange County > Orlando (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback