AITopics | Shen, Huawen

Collaborating Authors

Shen, Huawen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LDP: Generalizing to Multilingual Visual Information Extraction by Language Decoupled Pretraining

Shen, Huawen, Li, Gengluo, Zhong, Jinwen, Zhou, Yu

arXiv.org Artificial IntelligenceDec-19-2024

Visual Information Extraction (VIE) plays a crucial role in the comprehension of semi-structured documents, and several pre-trained models have been developed to enhance performance. However, most of these works are monolingual (usually English). Due to the extremely unbalanced quantity and quality of pre-training corpora between English and other languages, few works can extend to non-English scenarios. In this paper, we conduct systematic experiments to show that vision and layout modality hold invariance among images with different languages. If decoupling language bias from document images, a vision-layout-based model can achieve impressive cross-lingual generalization. Accordingly, we present a simple but effective multilingual training paradigm LDP (Language Decoupled Pre-training) for better utilization of monolingual pre-training data. Our proposed model LDM (Language Decoupled Model) is first pre-trained on the language-independent data, where the language knowledge is decoupled by a diffusion model, and then the LDM is fine-tuned on the downstream languages. Extensive experiments show that the LDM outperformed all SOTA multilingual pre-trained models, and also maintains competitiveness on downstream monolingual/English benchmarks.

data mining, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.14596

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Data Science > Data Mining > Text Mining (0.61)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.61)

Add feedback

Falcon-UI: Understanding GUI Before Following User Instructions

Shen, Huawen, Liu, Chang, Li, Gengluo, Wang, Xinlong, Zhou, Yu, Ma, Can, Ji, Xiangyang

arXiv.org Artificial IntelligenceDec-12-2024

Pursuing human-like interaction for Graphical User Interface (GUI) agents requires understanding the GUI context and following user instructions. However, existing works typically couple these two aspects and focus more on instruct-following abilities, while ignoring the importance of understanding the GUI context. In this paper, we introduce an instruction-free GUI navigation dataset, termed Insight-UI Dataset, to enhance model comprehension of GUI environments. Insight-UI Dataset is automatically generated from the Common Crawl corpus, simulating various platforms -- including iOS, Android, Windows, and Linux -- across multiple resolutions on 312K domains. Although GUI interactions vary by context, diverse interfaces share common internal patterns, such as clicking an item to view its details. It implies the feasibility of independent GUI operation learning, followed by joint optimization with instruction tuning. Thereby, we develop the GUI agent model Falcon-UI, which is initially pretrained on Insight-UI Dataset and subsequently fine-tuned on Android and Web GUI datasets, including AITW, AITZ, Android Control, and Mind2Web. With 7 billion parameters, Falcon-UI achieves accuracy comparable to the 72 billion-parameter Qwen2VL on AITZ, validating the alignment between GUI context comprehension and agent performance. Our code and dataset will be open-sourced.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.09362

Genre:

Workflow (0.68)
Research Report (0.64)

Industry: Information Technology (0.93)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition

Wu, Daiqing, Yang, Dongbao, Shen, Huawen, Ma, Can, Zhou, Yu

arXiv.org Artificial IntelligenceJul-9-2024

With the proliferation of social media posts in recent years, the need to detect sentiments in multimodal (image-text) content has grown rapidly. Since posts are user-generated, the image and text from the same post can express different or even contradictory sentiments, leading to potential \textbf{sentiment discrepancy}. However, existing works mainly adopt a single-branch fusion structure that primarily captures the consistent sentiment between image and text. The ignorance or implicit modeling of discrepant sentiment results in compromised unimodal encoding and limited performances. In this paper, we propose a semantics Completion and Decomposition (CoDe) network to resolve the above issue. In the semantics completion module, we complement image and text representations with the semantics of the OCR text embedded in the image, helping bridge the sentiment gap. In the semantics decomposition module, we decompose image and text representations with exclusive projection and contrastive learning, thereby explicitly capturing the discrepant sentiment between modalities. Finally, we fuse image and text representations by cross-attention and combine them with the learned discrepant sentiment for final classification. Extensive experiments conducted on four multimodal sentiment datasets demonstrate the superiority of CoDe against SOTA methods.

machine learning, natural language, sentiment, (16 more...)

arXiv.org Artificial Intelligence

2407.07026

Country:

Europe (1.00)
North America > Canada (0.68)
North America > United States > Hawaii (0.14)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(3 more...)

Add feedback