AITopics | Wang, Haowen

Plotting

Wang, Haowen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation

Wang, Xiao, Wang, Fuling, Wang, Haowen, Jiang, Bo, Li, Chuanfu, Wang, Yaowei, Tian, Yonghong, Tang, Jin

arXiv.org Artificial IntelligenceJan-6-2025

Abstract--X-ray image based medical report generation achieves significant progress in recent years with the help of the large language model, however, these models have not fully exploited the effective information in visual image regions, resulting in reports that are linguistically sound but insufficient in describing key diseases. In this paper, we propose a novel associative memory-enhanced X-ray report generation model that effectively mimics the process of professional doctors writing medical reports. It considers both the mining of global and local visual information and associates historical report information to better complete the writing of the current report. Some researchers already exploit the effectiveness of LLM in the X-ray based medical report generation, such as R2Gen-GPT [1], R2Gen-I. This task can greatly alleviate the work pressure on high-quality text at the linguistic level, but they struggle to doctors and reduce the waiting time for patients, providing accurately identify abnormal conditions, diseases, and other a feasible method for empowering artificial intelligence in critical information in clinical diagnostic indicators. Although the task has made considerable result, although the obtained medical reports may appear to be progress in recent years, there are still many issues, such well-structured, they are actually difficult to address the practical as the difficulty in detecting key diseases and the challenge problems. In MRG, models typically need to process two shown in Figure 1, our framework contains two stages, i.e., the primary sources of information: visual information from medical disease-aware visual token mining and the associative memory images and linguistic information from existing medical augmented X-ray medical report generation. R2Gen [9] introduces a memory-driven the first stage, we extract the vision features of a given X-Transformer for radiology report generation, using relational ray image using the Swin Transformer network [4].

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.03458

Country: Asia > China > Anhui Province (0.28)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.46)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.93)
Health & Medicine > Nuclear Medicine (0.91)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers

Zhou, Hang, Ma, Yuezhou, Wu, Haixu, Wang, Haowen, Long, Mingsheng

arXiv.org Artificial IntelligenceJun-1-2024

Deep models have recently emerged as a promising tool to solve partial differential equations (PDEs), known as neural PDE solvers. While neural solvers trained from either simulation data or physics-informed loss can solve the PDEs reasonably well, they are mainly restricted to a specific set of PDEs, e.g. a certain equation or a finite set of coefficients. This bottleneck limits the generalizability of neural solvers, which is widely recognized as its major advantage over numerical solvers. In this paper, we present the Universal PDE solver (Unisolver) capable of solving a wide scope of PDEs by leveraging a Transformer pre-trained on diverse data and conditioned on diverse PDEs. Instead of simply scaling up data and parameters, Unisolver stems from the theoretical analysis of the PDE-solving process. Our key finding is that a PDE solution is fundamentally under the control of a series of PDE components, e.g. equation symbols, coefficients, and initial and boundary conditions. Inspired by the mathematical structure of PDEs, we define a complete set of PDE components and correspondingly embed them as domain-wise (e.g. equation symbols) and point-wise (e.g. boundaries) conditions for Transformer PDE solvers. Integrating physical insights with recent Transformer advances, Unisolver achieves consistent state-of-the-art results on three challenging large-scale benchmarks, showing impressive gains and endowing favorable generalizability and scalability.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.17527

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning

Jin, Congyun, Zhang, Ming, Ma, Xiaowei, Yujiao, Li, Wang, Yingbo, Jia, Yabo, Du, Yuliang, Sun, Tao, Wang, Haowen, Fan, Cong, Gu, Jinjie, Chi, Chenfei, Lv, Xiangguo, Li, Fangzhou, Xue, Wei, Huang, Yiran

arXiv.org Artificial IntelligenceFeb-19-2024

Recent advancements in Large Language Models (LLMs) and Large Multi-modal Models (LMMs) have shown potential in various medical applications, such as Intelligent Medical Diagnosis. Although impressive results have been achieved, we find that existing benchmarks do not reflect the complexity of real medical reports and specialized in-depth reasoning capabilities. In this work, we introduced RJUA-MedDQA, a comprehensive benchmark in the field of medical specialization, which poses several challenges: comprehensively interpreting imgage content across diverse challenging layouts, possessing numerical reasoning ability to identify abnormal indicators and demonstrating clinical reasoning ability to provide statements of disease diagnosis, status and advice based on medical contexts. We carefully design the data generation pipeline and proposed the Efficient Structural Restoration Annotation (ESRA) Method, aimed at restoring textual and tabular content in medical report images. This method substantially enhances annotation efficiency, doubling the productivity of each annotator, and yields a 26.8% improvement in accuracy. We conduct extensive evaluations, including few-shot assessments of 5 LMMs which are capable of solving Chinese medical QA tasks. To further investigate the limitations and potential of current LMMs, we conduct comparative experiments on a set of strong LLMs by using image-text generated by ESRA method. We report the performance of baselines and offer several observations: (1) The overall performance of existing LMMs is still limited; however LMMs more robust to low-quality and diverse-structured images compared to LLMs. (3) Reasoning across context and image content present significant challenges. We hope this benchmark helps the community make progress on these challenging tasks in multi-modal medical document understanding and facilitate its application in healthcare.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.1484

Country:

Asia > China (0.15)
Europe > Spain (0.14)
Europe > Romania (0.14)
Europe > Greece (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Transolver: A Fast Transformer Solver for PDEs on General Geometries

Wu, Haixu, Luo, Huakun, Wang, Haowen, Wang, Jianmin, Long, Mingsheng

arXiv.org Artificial IntelligenceFeb-4-2024

Transformers have empowered many milestones across various fields and have recently been applied to solve partial differential equations (PDEs). However, since PDEs are typically discretized into large-scale meshes with complex geometries, it is challenging for Transformers to capture intricate physical correlations directly from massive individual points. Going beyond superficial and unwieldy meshes, we present Transolver based on a more foundational idea, which is learning intrinsic physical states hidden behind discretized geometries. Specifically, we propose a new Physics-Attention to adaptively split the discretized domain into a series of learnable slices of flexible shapes, where mesh points under similar physical states will be ascribed to the same slice. By calculating attention to physics-aware tokens encoded from slices, Transovler can effectively capture intricate physical correlations under complex geometrics, which also empowers the solver with endogenetic geometry-general modeling capacity and can be efficiently computed in linear complexity. Transolver achieves consistent state-of-the-art with 22\% relative gain across six standard benchmarks and also excels in large-scale industrial simulations, including car and airfoil designs.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2402.02366

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

OrchMoE: Efficient Multi-Adapter Learning with Task-Skill Synergy

Wang, Haowen, Sun, Tao, Ji, Kaixiang, Wang, Jian, Fan, Cong, Gu, Jinjie

arXiv.org Artificial IntelligenceJan-19-2024

We advance the field of Parameter-Efficient Fine-Tuning (PEFT) with our novel multi-adapter method, OrchMoE, which capitalizes on modular skill architecture for enhanced forward transfer in neural networks. Unlike prior models that depend on explicit task identification inputs, OrchMoE automatically discerns task categories, streamlining the learning process. This is achieved through an integrated mechanism comprising an Automatic Task Classification module and a Task-Skill Allocation module, which collectively deduce task-specific classifications and tailor skill allocation matrices. Our extensive evaluations on the 'Super Natural Instructions' dataset, featuring 1,600 diverse instructional tasks, indicate that OrchMoE substantially outperforms comparable multi-adapter baselines in terms of both performance and sample utilization efficiency, all while operating within the same parameter constraints. These findings suggest that OrchMoE offers a significant leap forward in multi-task learning efficiency.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2401.10559

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

SM$^3$: Self-Supervised Multi-task Modeling with Multi-view 2D Images for Articulated Objects

Wang, Haowen, Zhao, Zhen, Jin, Zhao, Che, Zhengping, Qiao, Liang, Huang, Yakun, Fan, Zhipeng, Qiao, Xiuquan, Tang, Jian

arXiv.org Artificial IntelligenceJan-17-2024

Reconstructing real-world objects and estimating their movable joint structures are pivotal technologies within the field of robotics. Previous research has predominantly focused on supervised approaches, relying on extensively annotated datasets to model articulated objects within limited categories. However, this approach falls short of effectively addressing the diversity present in the real world. To tackle this issue, we propose a self-supervised interaction perception method, referred to as SM$^3$, which leverages multi-view RGB images captured before and after interaction to model articulated objects, identify the movable parts, and infer the parameters of their rotating joints. By constructing 3D geometries and textures from the captured 2D images, SM$^3$ achieves integrated optimization of movable part and joint parameters during the reconstruction process, obviating the need for annotations. Furthermore, we introduce the MMArt dataset, an extension of PartNet-Mobility, encompassing multi-view and multi-modal data of articulated objects spanning diverse categories. Evaluations demonstrate that SM$^3$ surpasses existing benchmarks across various categories and objects, while its adaptability in real-world scenarios has been thoroughly validated.

artificial intelligence, machine learning, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2401.09133

Country:

Asia > China (0.14)
Europe > Netherlands (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

GACE: Learning Graph-Based Cross-Page Ads Embedding For Click-Through Rate Prediction

Wang, Haowen, Du, Yuliang, Jin, Congyun, Li, Yujiao, Wang, Yingbo, Sun, Tao, Qin, Piqi, Fan, Cong

arXiv.org Artificial IntelligenceJan-14-2024

Predicting click-through rate (CTR) is the core task of many ads online recommendation systems, which helps improve user experience and increase platform revenue. In this type of recommendation system, we often encounter two main problems: the joint usage of multi-page historical advertising data and the cold start of new ads. In this paper, we proposed GACE, a graph-based cross-page ads embedding generation method. It can warm up and generate the representation embedding of cold-start and existing ads across various pages. Specifically, we carefully build linkages and a weighted undirected graph model considering semantic and page-type attributes to guide the direction of feature fusion and generation. We designed a variational auto-encoding task as pre-training module and generated embedding representations for new and old ads based on this task. The results evaluated in the public dataset AliEC from RecBole and the real-world industry dataset from Alipay show that our GACE method is significantly superior to the SOTA method. In the online A/B test, the click-through rate on three real-world pages from Alipay has increased by 3.6%, 2.13%, and 3.02%, respectively. Especially in the cold-start task, the CTR increased by 9.96%, 7.51%, and 8.97%, respectively.

information, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-981-99-8184-7_33

2401.07445

Country: North America > United States (0.14)

Genre:

Research Report (1.00)
Instructional Material (0.69)

Industry:

Marketing (0.47)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

From Beginner to Expert: Modeling Medical Knowledge into General LLMs

Li, Qiang, Yang, Xiaoyan, Wang, Haowen, Wang, Qin, Liu, Lei, Wang, Junjie, Zhang, Yang, Chu, Mingyuan, Hu, Sen, Chen, Yicheng, Shen, Yue, Fan, Cong, Zhang, Wangshu, Xu, Teng, Gu, Jinjie, Zheng, Jing, Group, Guannan Zhang Ant

arXiv.org Artificial IntelligenceJan-7-2024

Recently, large language model (LLM) based artificial intelligence (AI) systems have demonstrated remarkable capabilities in natural language understanding and generation. However, these models face a significant challenge when it comes to sensitive applications, such as reasoning over medical knowledge and answering medical questions in a physician-like manner. Prior studies attempted to overcome this challenge by increasing the model size (>100B) to learn more general medical knowledge, while there is still room for improvement in LLMs with smaller-scale model sizes (<100B). In this work, we start from a pre-trained general LLM model (AntGLM-10B) and fine-tune it from a medical beginner towards a medical expert (called AntGLM-Med-10B), which leverages a 3-stage optimization procedure, i.e., general medical knowledge injection, medical domain instruction tuning, and specific medical task adaptation. Our contributions are threefold: (1) We specifically investigate how to adapt a pre-trained general LLM in medical domain, especially for a specific medical task. (2) We collect and construct large-scale medical datasets for each stage of the optimization process. These datasets encompass various data types and tasks, such as question-answering, medical reasoning, multi-choice questions, and medical conversations. (3) Specifically for multi-choice questions in the medical domain, we propose a novel Verification-of-Choice approach for prompting engineering, which significantly enhances the reasoning ability of LLMs. Remarkably, by combining the above approaches, our AntGLM-Med-10B model can outperform the most of LLMs on PubMedQA, including both general and medical LLMs, even when these LLMs have larger model size.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2312.0104

Country: Asia > China (0.28)

Genre:

Workflow (0.88)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Exploring Popularity Bias in Session-based Recommendation

Wang, Haowen

arXiv.org Artificial IntelligenceDec-12-2023

Existing work has revealed that large-scale offline evaluation of recommender systems for user-item interactions is prone to bias caused by the deployed system itself, as a form of closed loop feedback. Many adopt the \textit{propensity} concept to analyze or mitigate this empirical issue. In this work, we extend the analysis to session-based setup and adapted propensity calculation to the unique characteristics of session-based recommendation tasks. Our experiments incorporate neural models and KNN-based models, and cover both the music and the e-commerce domain. We study the distributions of propensity and different stratification techniques on different datasets and find that propensity-related traits are actually dataset-specific. We then leverage the effect of stratification and achieve promising results compared to the original models.

artificial intelligence, machine learning, propensity, (18 more...)

arXiv.org Artificial Intelligence

2312.07855

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Information Technology (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Hypergraph-Guided Disentangled Spectrum Transformer Networks for Near-Infrared Facial Expression Recognition

Luo, Bingjun, Wang, Haowen, Wang, Jinpeng, Zhu, Junjie, Zhao, Xibin, Gao, Yue

arXiv.org Artificial IntelligenceDec-10-2023

With the strong robusticity on illumination variations, near-infrared (NIR) can be an effective and essential complement to visible (VIS) facial expression recognition in low lighting or complete darkness conditions. However, facial expression recognition (FER) from NIR images presents more challenging problem than traditional FER due to the limitations imposed by the data scale and the difficulty of extracting discriminative features from incomplete visible lighting contents. In this paper, we give the first attempt to deep NIR facial expression recognition and proposed a novel method called near-infrared facial expression transformer (NFER-Former). Specifically, to make full use of the abundant label information in the field of VIS, we introduce a Self-Attention Orthogonal Decomposition mechanism that disentangles the expression information and spectrum information from the input image, so that the expression features can be extracted without the interference of spectrum variation. We also propose a Hypergraph-Guided Feature Embedding method that models some key facial behaviors and learns the structure of the complex correlations between them, thereby alleviating the interference of inter-class similarity. Additionally, we have constructed a large NIR-VIS Facial Expression dataset that includes 360 subjects to better validate the efficiency of NFER-Former. Extensive experiments and ablation studies show that NFER-Former significantly improves the performance of NIR FER and achieves state-of-the-art results on the only two available NIR FER datasets, Oulu-CASIA and Large-HFE.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2312.05907

Country: Europe > Finland > Northern Ostrobothnia > Oulu (0.26)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback