AITopics | yang yang

Collaborating Authors

yang yang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Solution for Temporal Action Localisation Task of Perception Test Challenge 2024

Han, Yinan, Jiang, Qingyuan, Mei, Hongming, Yang, Yang, Tang, Jinhui

arXiv.org Artificial IntelligenceOct-7-2024

Each action is represented by start and end timestamps along This report presents our method for Temporal Action with its corresponding class label, as illustrated in Figure1. Localisation (TAL), which focuses on identifying and classifying This task is critical for various applications, including actions within specific time intervals throughout a video surveillance, content analysis, and human-computer video sequence. We employ a data augmentation technique interaction.The dataset provided for this challenge is derived by expanding the training dataset using overlapping labels from the Perception Test, comprising high-resolution from the Something-SomethingV2 dataset, enhancing the videos (up to 35 seconds long, 30fps, and a maximum resolution model's ability to generalize across various action classes. of 1080p). Each video contains multiple action segment For feature extraction, we utilize state-of-the-art models, including annotations. To facilitate experimentation, both video UMT, VideoMAEv2 for video features, and BEATs and audio features are provided, along with detailed annotations and CAV-MAE for audio features. Our approach involves for the training and validation phases.

artificial intelligence, dataset, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2410.09088

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(7 more...)

Genre: Research Report > Promising Solution (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Solution for Temporal Sound Localisation Task of ECCV Second Perception Test Challenge 2024

Gu, Haowei, Zhu, Weihao, Yang, Yang

arXiv.org Artificial IntelligenceSep-29-2024

This report proposes an improved method for the Temporal Sound Localisation (TSL) task, which localizes and classifies the sound events occurring in the video according to a predefined set of sound classes. The champion solution from last year's first competition has explored the TSL by fusing audio and video modalities with the same weight. Considering the TSL task aims to localize sound events, we conduct relevant experiments that demonstrated the superiority of sound features (Section 3). Based on our findings, to enhance audio modality features, we employ various models to extract audio features, such as InterVideo, CaVMAE, and VideoMAE models. Our approach ranks first in the final test with a score of 0.4925.

audio feature, representation, yang yang, (11 more...)

arXiv.org Artificial Intelligence

2409.19595

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > District of Columbia > Washington (0.04)
(8 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

The Solution for the 5th GCAIAC Zero-shot Referring Expression Comprehension Challenge

Huang, Longfei, Yu, Feng, Guan, Zhihao, Wan, Zhonghua, Yang, Yang

arXiv.org Artificial IntelligenceJul-6-2024

This report presents a solution for the zero-shot referring expression comprehension task. Visual-language multimodal base models (such as CLIP, SAM) have gained significant attention in recent years as a cornerstone of mainstream research. One of the key applications of multimodal base models lies in their ability to generalize to zero-shot downstream tasks. Unlike traditional referring expression comprehension, zero-shot referring expression comprehension aims to apply pre-trained visual-language models directly to the task without specific training. Recent studies have enhanced the zero-shot performance of multimodal base models in referring expression comprehension tasks by introducing visual prompts. To address the zero-shot referring expression comprehension challenge, we introduced a combination of visual prompts and considered the influence of textual prompts, employing joint prediction tailored to the data characteristics. Ultimately, our approach achieved accuracy rates of 84.825 on the A leaderboard and 71.460 on the B leaderboard, securing the first position.

visual prompt, yang, yang yang, (12 more...)

arXiv.org Artificial Intelligence

2407.04998

Country:

Oceania > Australia > Victoria > Melbourne (0.06)
Europe > Portugal > Lisbon > Lisbon (0.05)
Asia > China > Jiangsu Province > Nanjing (0.05)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

The Solution for the sequential task continual learning track of the 2nd Greater Bay Area International Algorithm Competition

Pan, Sishun, Wu, Xixian, Li, Tingmin, Huang, Longfei, Feng, Mingxu, Wan, Zhonghua, Yang, Yang

arXiv.org Artificial IntelligenceJul-6-2024

This paper presents a data-free, parameter-isolation-based continual learning algorithm we developed for the sequential task continual learning track of the 2nd Greater Bay Area International Algorithm Competition. The method learns an independent parameter subspace for each task within the network's convolutional and linear layers and freezes the batch normalization layers after the first task. Specifically, for domain incremental setting where all domains share a classification head, we freeze the shared classification head after first task is completed, effectively solving the issue of catastrophic forgetting. Additionally, facing the challenge of domain incremental settings without providing a task identity, we designed an inference task identity strategy, selecting an appropriate mask matrix for each sample. Furthermore, we introduced a gradient supplementation strategy to enhance the importance of unselected parameters for the current task, facilitating learning for new tasks. We also implemented an adaptive importance scoring strategy that dynamically adjusts the amount of parameters to optimize single-task performance while reducing parameter usage. Moreover, considering the limitations of storage space and inference time, we designed a mask matrix compression strategy to save storage space and improve the speed of encryption and decryption of the mask matrix. Our approach does not require expanding the core network or using external auxiliary networks or data, and performs well under both task incremental and domain incremental settings. This solution ultimately won a second-place prize in the competition.

continual learning, learning, yang yang, (9 more...)

arXiv.org Artificial Intelligence

2407.04996

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

The Solution for the GAIIC2024 RGB-TIR object detection Challenge

Wu, Xiangyu, Xu, Jinling, Huang, Longfei, Yang, Yang

arXiv.org Artificial IntelligenceJul-4-2024

This report introduces a solution to The task of RGB-TIR object detection from the perspective of unmanned aerial vehicles. Unlike traditional object detection methods, RGB-TIR object detection aims to utilize both RGB and TIR images for complementary information during detection. The challenges of RGB-TIR object detection from the perspective of unmanned aerial vehicles include highly complex image backgrounds, frequent changes in lighting, and uncalibrated RGB-TIR image pairs. To address these challenges at the model level, we utilized a lightweight YOLOv9 model with extended multi-level auxiliary branches that enhance the model's robustness, making it more suitable for practical applications in unmanned aerial vehicle scenarios. For image fusion in RGB-TIR detection, we incorporated a fusion module into the backbone network to fuse images at the feature level, implicitly addressing calibration issues. Our proposed method achieved an mAP score of 0.516 and 0.543 on A and B benchmarks respectively while maintaining the highest inference speed among all models.

detection, fusion, yang, (16 more...)

arXiv.org Artificial Intelligence

2407.03872

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

First Place Solution of 2023 Global Artificial Intelligence Technology Innovation Competition Track 1

Wu, Xiangyu, Zhang, Hailiang, Yang, Yang, Lu, Jianfeng

arXiv.org Artificial IntelligenceJul-3-2024

In this paper, we present our champion solution to the Global Artificial Intelligence Technology Innovation Competition Track 1: Medical Imaging Diagnosis Report Generation. We select CPT-BASE as our base model for the text generation task. During the pre-training stage, we delete the mask language modeling task of CPT-BASE and instead reconstruct the vocabulary, adopting a span mask strategy and gradually increasing the number of masking ratios to perform the denoising auto-encoder pre-training task. In the fine-tuning stage, we design iterative retrieval augmentation and noise-aware similarity bucket prompt strategies. The retrieval augmentation constructs a mini-knowledge base, enriching the input information of the model, while the similarity bucket further perceives the noise information within the mini-knowledge base, guiding the model to generate higher-quality diagnostic reports based on the similarity prompts. Surprisingly, our single model has achieved a score of 2.321 on leaderboard A, and the multiple model fusion scores are 2.362 and 2.320 on the A and B leaderboards respectively, securing first place in the rankings.

diagnostic report, information, yang yang, (13 more...)

arXiv.org Artificial Intelligence

2407.01271

Country:

North America > United States > California > San Diego County > San Diego (0.05)
Asia > China > Ningxia Hui Autonomous Region > Yinchuan (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.46)
Health & Medicine > Diagnostic Medicine > Imaging (0.36)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

The Solution for The PST-KDD-2024 OAG-Challenge

Zhong, Shupeng, Li, Xinger, Jin, Shushan, Yang, Yang

arXiv.org Artificial IntelligenceJul-2-2024

In this paper, we introduce the second-place solution in the KDD-2024 OAG-Challenge paper source tracing track. Our solution is mainly based on two methods, BERT and GCN, and combines the reasoning results of BERT and GCN in the final submission to achieve complementary performance. In the BERT solution, we focus on processing the fragments that appear in the references of the paper, and use a variety of operations to reduce the redundant interference in the fragments, so that the information received by BERT is more refined. In the GCN solution, we map information such as paper fragments, abstracts, and titles to a high-dimensional semantic space through an embedding model, and try to build edges between titles, abstracts, and fragments to integrate contextual relationships for judgment. In the end, our solution achieved a remarkable score of 0.47691 in the competition.

classification, information, yang yang, (14 more...)

arXiv.org Artificial Intelligence

2407.12827

Country:

Asia > China > Jiangsu Province > Nanjing (0.06)
North America > United States > District of Columbia > Washington (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)

Add feedback

Complementary Fusion of Deep Network and Tree Model for ETA Prediction

Huang, YuRui, Zhang, Jie, Bao, HengDa, Yang, Yang, Yang, Jian

arXiv.org Artificial IntelligenceJul-1-2024

Estimated time of arrival (ETA) is a very important factor in the transportation system. It has attracted increasing attentions and has been widely used as a basic service in navigation systems and intelligent transportation systems. In this paper, we propose a novel solution to the ETA estimation problem, which is an ensemble on tree models and neural networks. We proved the accuracy and robustness of the solution on the A/B list and finally won first place in the SIGSPATIAL 2021 GISCUP competition.

deep network and tree model, sequence, yang yang, (11 more...)

arXiv.org Artificial Intelligence

2407.01262

Country:

Asia > China > Beijing > Beijing (0.06)
Asia > China > Jiangsu Province > Nanjing (0.05)
North America > United States (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.70)

Industry:

Transportation > Infrastructure & Services (0.70)
Transportation > Ground > Road (0.70)
Transportation > Passenger (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.91)

Add feedback

The Solution for the ICCV 2023 Perception Test Challenge 2023 -- Task 6 -- Grounded videoQA

Zhang, Hailiang, Chao, Dian, Guan, Zhihao, Yang, Yang

arXiv.org Artificial IntelligenceJul-1-2024

In this paper, we introduce a grounded video question-answering solution. Our research reveals that the fixed official baseline method for video question answering involves two main steps: visual grounding and object tracking. However, a significant challenge emerges during the initial step, where selected frames may lack clearly identifiable target objects. Furthermore, single images cannot address questions like "Track the container from which the person pours the first time." To tackle this issue, we propose an alternative two-stage approach:(1) First, we leverage the VALOR model to answer questions based on video information.(2) concatenate the answered questions with their respective answers. Finally, we employ TubeDETR to generate bounding boxes for the targets.

perception test challenge 2023, video, yang yang, (12 more...)

arXiv.org Artificial Intelligence

2407.01907

Country: Asia > China > Jiangsu Province > Nanjing (0.05)

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Proposal Report for the 2nd SciCAP Competition 2024

Li, Pengpeng, Li, Tingmin, Wang, Jingyuan, Wang, Boyuan, Yang, Yang

arXiv.org Artificial IntelligenceJul-1-2024

In this paper, we propose a method for document summarization using auxiliary information. This approach effectively summarizes descriptions related to specific images, tables, and appendices within lengthy texts. Our experiments demonstrate that leveraging high-quality OCR data and initially extracted information from the original text enables efficient summarization of the content related to described objects. Based on these findings, we enhanced popular text generation model models by incorporating additional auxiliary branches to improve summarization performance. Our method achieved top scores of 4.33 and 4.66 in the long caption and short caption tracks, respectively, of the 2024 SciCAP competition, ranking highest in both categories.

information, paragraph, yang yang, (15 more...)

arXiv.org Artificial Intelligence

2407.01897

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > China > Ningxia Hui Autonomous Region > Yinchuan (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback