AITopics | Yang, Yang

Plotting

Yang, Yang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Semi-Supervised Image Captioning Considering Wasserstein Graph Matching

Yang, Yang

arXiv.org Artificial IntelligenceMar-26-2024

Image captioning can automatically generate captions for the given images, and the key challenge is to learn a mapping function from visual features to natural language features. Existing approaches are mostly supervised ones, i.e., each image has a corresponding sentence in the training set. However, considering that describing images always requires a huge of manpower, we usually have limited amount of described images (i.e., image-text pairs) and a large number of undescribed images in real-world applications. Thereby, a dilemma is the "Semi-Supervised Image Captioning". To solve this problem, we propose a novel Semi-Supervised Image Captioning method considering Wasserstein Graph Matching (SSIC-WGM), which turns to adopt the raw image inputs to supervise the generated sentences. Different from traditional single modal semi-supervised methods, the difficulty of semi-supervised cross-modal learning lies in constructing intermediately comparable information among heterogeneous modalities. In this paper, SSIC-WGM adopts the successful scene graphs as intermediate information, and constrains the generated sentences from two aspects: 1) inter-modal consistency. SSIC-WGM constructs the scene graphs of the raw image and generated sentence respectively, then employs the wasserstein distance to better measure the similarity between region embeddings of different graphs. 2) intra-modal consistency. SSIC-WGM takes the data augmentation techniques for the raw images, then constrains the consistency among augmented images and generated sentences. Consequently, SSIC-WGM combines the cross-modal pseudo supervision and structure invariant measure for efficiently using the undescribed images, and learns more reasonable mapping function.

consistency, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.17995

Country:

North America > Canada (1.00)
Asia (0.93)
North America > United States > California (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Solution for Point Tracking Task of ICCV 1st Perception Test Challenge 2023

Pan, Hongpeng, Yang, Yang, Fu, Zhongtian, Zhang, Yuxuan, Du, Shian, Xu, Yi, Ji, Xiangyang

arXiv.org Artificial IntelligenceMar-26-2024

This report proposes an improved method for the Tracking Any Point (TAP) task, which tracks any physical surface through a video. Several existing approaches have explored the TAP by considering the temporal relationships to obtain smooth point motion trajectories, however, they still suffer from the cumulative error caused by temporal prediction. To address this issue, we propose a simple yet effective approach called TAP with confident static points (TAPIR+), which focuses on rectifying the tracking of the static point in the videos shot by a static camera. To clarify, our approach contains two key components: (1) Multi-granularity Camera Motion Detection, which could identify the video sequence by the static camera shot. (2) CMR-based point trajectory prediction with one moving object segmentation approach to isolate the static point from the moving object. Our approach ranked first in the final test with a score of 0.46.

artificial intelligence, prediction, video, (13 more...)

arXiv.org Artificial Intelligence

2403.17994

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

The Solution for the ICCV 2023 1st Scientific Figure Captioning Challenge

Chao, Dian, Song, Xin, Zhong, Shupeng, Wang, Boyuan, Wu, Xiangyu, Zhu, Chen, Yang, Yang

arXiv.org Artificial IntelligenceMar-25-2024

In this paper, we propose a solution for improving the quality of captions generated for figures in papers. We adopt the approach of summarizing the textual content in the paper to generate image captions. Throughout our study, we encounter discrepancies in the OCR information provided in the official dataset. To rectify this, we employ the PaddleOCR toolkit to extract OCR information from all images. Moreover, we observe that certain textual content in the official paper pertains to images that are not relevant for captioning, thereby introducing noise during caption generation. To mitigate this issue, we leverage LLaMA to extract image-specific information by querying the textual content based on image mentions, effectively filtering out extraneous information. Additionally, we recognize a discrepancy between the primary use of maximum likelihood estimation during text generation and the evaluation metrics such as ROUGE employed to assess the quality of generated captions. To bridge this gap, we integrate the BRIO model framework, enabling a more coherent alignment between the generation and evaluation processes. Our approach ranked first in the final test with a score of 4.49.

information, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2403.17342

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Path-GPTOmic: A Balanced Multi-modal Learning Framework for Survival Outcome Prediction

Wang, Hongxiao, Yang, Yang, Zhao, Zhuo, Gu, Pengfei, Sapkota, Nishchal, Chen, Danny Z.

arXiv.org Artificial IntelligenceMar-17-2024

For predicting cancer survival outcomes, standard approaches in clinical research are often based on two main modalities: pathology images for observing cell morphology features, and genomic (e.g., bulk RNA-seq) for quantifying gene expressions. However, existing pathology-genomic multi-modal algorithms face significant challenges: (1) Valuable biological insights regarding genes and gene-gene interactions are frequently overlooked; (2) one modality often dominates the optimization process, causing inadequate training for the other modality. In this paper, we introduce a new multi-modal ``Path-GPTOmic" framework for cancer survival outcome prediction. First, to extract valuable biological insights, we regulate the embedding space of a foundation model, scGPT, initially trained on single-cell RNA-seq data, making it adaptable for bulk RNA-seq data. Second, to address the imbalance-between-modalities problem, we propose a gradient modulation mechanism tailored to the Cox partial likelihood loss for survival prediction. The contributions of the modalities are dynamically monitored and adjusted during the training process, encouraging that both modalities are sufficiently trained. Evaluated on two TCGA(The Cancer Genome Atlas) datasets, our model achieves substantially improved survival prediction accuracy.

artificial intelligence, machine learning, modality, (20 more...)

arXiv.org Artificial Intelligence

2403.11375

Country: North America > United States (0.47)

Genre:

Research Report > New Finding (0.49)
Research Report > Experimental Study (0.35)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

SEMRes-DDPM: Residual Network Based Diffusion Modelling Applied to Imbalanced Data

Zheng, Ming, Yang, Yang, Zhao, Zhi-Hang, Gan, Shan-Chao, Chen, Yang, Ni, Si-Kai, Lu, Yang

arXiv.org Artificial IntelligenceMar-11-2024

In the field of data mining and machine learning, commonly used classification models cannot effectively learn in unbalanced data. In order to balance the data distribution before model training, oversampling methods are often used to generate data for a small number of classes to solve the problem of classifying unbalanced data. Most of the classical oversampling methods are based on the SMOTE technique, which only focuses on the local information of the data, and therefore the generated data may have the problem of not being realistic enough. In the current oversampling methods based on generative networks, the methods based on GANs can capture the true distribution of data, but there is the problem of pattern collapse and training instability in training; in the oversampling methods based on denoising diffusion probability models, the neural network of the inverse diffusion process using the U-Net is not applicable to tabular data, and although the MLP can be used to replace the U-Net, the problem exists due to the simplicity of the structure and the poor effect of removing noise. problem of poor noise removal. In order to overcome the above problems, we propose a novel oversampling method SEMRes-DDPM.In the SEMRes-DDPM backward diffusion process, a new neural network structure SEMST-ResNet is used, which is suitable for tabular data and has good noise removal effect, and it can generate tabular data with higher quality. Experiments show that the SEMResNet network removes noise better than MLP; SEMRes-DDPM generates data distributions that are closer to the real data distributions than TabDDPM with CWGAN-GP; on 20 real unbalanced tabular datasets with 9 classification models, SEMRes-DDPM improves the quality of the generated tabular data in terms of three evaluation metrics (F1, G-mean, AUC) with better classification performance than other SOTA oversampling methods.

artificial intelligence, classification model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2403.05918

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Graph-Skeleton: ~1% Nodes are Sufficient to Represent Billion-Scale Graph

Cao, Linfeng, Deng, Haoran, Yang, Yang, Wang, Chunping, Chen, Lei

arXiv.org Artificial IntelligenceMar-6-2024

Due to the ubiquity of graph data on the web, web graph mining has become a hot research spot. Nonetheless, the prevalence of large-scale web graphs in real applications poses significant challenges to storage, computational capacity and graph model design. Despite numerous studies to enhance the scalability of graph models, a noticeable gap remains between academic research and practical web graph mining applications. One major cause is that in most industrial scenarios, only a small part of nodes in a web graph are actually required to be analyzed, where we term these nodes as target nodes, while others as background nodes. In this paper, we argue that properly fetching and condensing the background nodes from massive web graph data might be a more economical shortcut to tackle the obstacles fundamentally. To this end, we make the first attempt to study the problem of massive background nodes compression for target nodes classification. Through extensive experiments, we reveal two critical roles played by the background nodes in target node classification: enhancing structural connectivity between target nodes, and feature correlation with target nodes. Followingthis, we propose a novel Graph-Skeleton1 model, which properly fetches the background nodes, and further condenses the semantic and topological information of background nodes within similar target-background local structures. Extensive experiments on various web graph datasets demonstrate the effectiveness and efficiency of the proposed method. In particular, for MAG240M dataset with 0.24 billion nodes, our generated skeleton graph achieves highly comparable performance while only containing 1.8% nodes of the original graph.

data mining, machine learning, node, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3589334.3645452

2402.09565

Country:

North America > United States (0.28)
Asia > Singapore (0.17)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications (0.93)
(3 more...)

Add feedback

Can GNN be Good Adapter for LLMs?

Huang, Xuanwen, Han, Kaiqiao, Yang, Yang, Bao, Dezheng, Tao, Quanjin, Chai, Ziwei, Zhu, Qi

arXiv.org Artificial IntelligenceFeb-20-2024

Recently, large language models (LLMs) have demonstrated superior capabilities in understanding and zero-shot learning on textual data, promising significant advances for many text-related domains. In the graph domain, various real-world scenarios also involve textual data, where tasks and node features can be described by text. These text-attributed graphs (TAGs) have broad applications in social media, recommendation systems, etc. Thus, this paper explores how to utilize LLMs to model TAGs. Previous methods for TAG modeling are based on million-scale LMs. When scaled up to billion-scale LLMs, they face huge challenges in computational costs. Additionally, they also ignore the zero-shot inference capabilities of LLMs. Therefore, we propose GraphAdapter, which uses a graph neural network (GNN) as an efficient adapter in collaboration with LLMs to tackle TAGs. In terms of efficiency, the GNN adapter introduces only a few trainable parameters and can be trained with low computation costs. The entire framework is trained using auto-regression on node text (next token prediction). Once trained, GraphAdapter can be seamlessly fine-tuned with task-specific prompts for various downstream tasks. Through extensive experiments across multiple real-world TAGs, GraphAdapter based on Llama 2 gains an average improvement of approximately 5\% in terms of node classification. Furthermore, GraphAdapter can also adapt to other language models, including RoBERTa, GPT-2. The promising results demonstrate that GNNs can serve as effective adapters for LLMs in TAG modeling.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.12984

Country:

North America > United States (0.28)
Asia > Singapore (0.17)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Chu, Xiangxiang, Qiao, Limeng, Zhang, Xinyu, Xu, Shuang, Wei, Fei, Yang, Yang, Sun, Xiaofei, Hu, Yiming, Lin, Xinyang, Zhang, Bo, Shen, Chunhua

arXiv.org Artificial IntelligenceFeb-6-2024

We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance. Specifically, MobileVLM V2 1.7B achieves better or on-par performance on standard VLM benchmarks compared with much larger VLMs at the 3B scale. Notably, our 3B model outperforms a large variety of VLMs at the 7B+ scale. Our models will be released at https://github.com/Meituan-AutoML/MobileVLM .

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.03766

Country: North America > United States (0.68)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

One Graph Model for Cross-domain Dynamic Link Prediction

Huang, Xuanwen, Chow, Wei, Wang, Yang, Chai, Ziwei, Wang, Chunping, Chen, Lei, Yang, Yang

arXiv.org Artificial IntelligenceFeb-3-2024

This work proposes DyExpert, a dynamic graph model for cross-domain link prediction. It can explicitly model historical evolving processes to learn the evolution pattern of a specific downstream graph and subsequently make pattern-specific link predictions. DyExpert adopts a decode-only transformer and is capable of efficiently parallel training and inference by \textit{conditioned link generation} that integrates both evolution modeling and link prediction. DyExpert is trained by extensive dynamic graphs across diverse domains, comprising 6M dynamic edges. Extensive experiments on eight untrained graphs demonstrate that DyExpert achieves state-of-the-art performance in cross-domain link prediction. Compared to the advanced baseline under the same setting, DyExpert achieves an average of 11.40% improvement Average Precision across eight graphs. More impressive, it surpasses the fully supervised performance of 8 advanced baselines on 6 untrained graphs.

data mining, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2402.02168

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry: Education > Educational Setting (0.68)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
(2 more...)

Add feedback

Unveiling Latent Causal Rules: A Temporal Point Process Approach for Abnormal Event Explanation

Kuang, Yiling, Yang, Chao, Yang, Yang, Li, Shuang

arXiv.org Artificial IntelligenceFeb-3-2024

In high-stakes systems such as healthcare, it is critical to understand the causal reasons behind unusual events, such as sudden changes in patient's health. Unveiling the causal reasons helps with quick diagnoses and precise treatment planning. In this paper, we propose an automated method for uncovering "if-then" logic rules to explain observational events. We introduce temporal point processes to model the events of interest, and discover the set of latent rules to explain the occurrence of events. To achieve this, we employ an Expectation-Maximization (EM) algorithm. In the E-step, we calculate the likelihood of each event being explained by each discovered rule. In the M-step, we update both the rule set and model parameters to enhance the likelihood function's lower bound. Notably, we optimize the rule set in a differential manner. Our approach demonstrates accurate performance in both discovering rules and identifying root causes. We showcase its promising results using synthetic and real healthcare datasets.

artificial intelligence, expert system, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2402.05946

Country:

South America > Chile (0.14)
Europe > Germany (0.14)
Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Consumer Health (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.89)

Add feedback