AITopics | Zhong, Yihao

Collaborating Authors

Zhong, Yihao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Advanced Multimodal Deep Learning Architecture for Image-Text Matching

Wang, Jinyin, Zhang, Haijing, Zhong, Yihao, Liang, Yingbin, Ji, Rongwei, Cang, Yiru

arXiv.org Artificial IntelligenceJun-13-2024

Image-text matching is a key multimodal task that aims to model the semantic association between images and text as a matching relationship. With the advent of the multimedia information age, image, and text data show explosive growth, and how to accurately realize the efficient and accurate semantic correspondence between them has become the core issue of common concern in academia and industry. In this study, we delve into the limitations of current multimodal deep learning models in processing image-text pairing tasks. Therefore, we innovatively design an advanced multimodal deep learning architecture, which combines the high-level abstract representation ability of deep neural networks for visual information with the advantages of natural language processing models for text semantic understanding. By introducing a novel cross-modal attention mechanism and hierarchical feature fusion strategy, the model achieves deep fusion and two-way interaction between image and text feature space. In addition, we also optimize the training objectives and loss functions to ensure that the model can better map the potential association structure between images and text during the learning process. Experiments show that compared with existing image-text matching models, the optimized new model has significantly improved performance on a series of benchmark data sets. In addition, the new model also shows excellent generalization and robustness on large and diverse open scenario datasets and can maintain high matching performance even in the face of previously unseen complex situations.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.15306

Country:

North America > United States > New York (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Research on Image Recognition Technology Based on Multimodal Deep Learning

Wang, Jinyin, Li, Xingchen, Jin, Yixuan, Zhong, Yihao, Zhang, Keke, Zhou, Chang

arXiv.org Artificial IntelligenceMay-5-2024

This project investigates the human multi-modal behavior identification algorithm utilizing deep neural networks. According to the characteristics of different modal information, different deep neural networks are used to adapt to different modal video information. Through the integration of various deep neural networks, the algorithm successfully identifies behaviors across multiple modalities. In this project, multiple cameras developed by Microsoft Kinect were used to collect corresponding bone point data based on acquiring conventional images. In this way, the motion features in the image can be extracted. Ultimately, the behavioral characteristics discerned through both approaches are synthesized to facilitate the precise identification and categorization of behaviors. The performance of the suggested algorithm was evaluated using the MSR3D data set. The findings from these experiments indicate that the accuracy in recognizing behaviors remains consistently high, suggesting that the algorithm is reliable in various scenarios. Additionally, the tests demonstrate that the algorithm substantially enhances the accuracy of detecting pedestrian behaviors in video footage.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

2405.03091

Country:

North America > United States > New York (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback