AITopics | Feng, Rui

Collaborating Authors

Feng, Rui

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Repurposing the scientific literature with vision-language models

Alyakin, Anton, Stryker, Jaden, Alber, Daniel Alexander, Sangwon, Karl L., Duderstadt, Brandon, Save, Akshay, Kurland, David, Frome, Spencer, Singh, Shrutika, Zhang, Jeff, Yang, Eunice, Park, Ki Yun, Orillac, Cordelia, Valliani, Aly A., Neifert, Sean, Liu, Albert, Patel, Aneek, Livia, Christopher, Lau, Darryl, Laufer, Ilya, Rozman, Peter A., Hidalgo, Eveline Teresa, Riina, Howard, Feng, Rui, Hollon, Todd, Aphinyanaphongs, Yindalon, Golfinos, John G., Snyder, Laura, Leuthardt, Eric, Kondziolka, Douglas, Oermann, Eric Karl

arXiv.org Artificial IntelligenceFeb-26-2025

Research in AI for Science often focuses on using AI technologies to augment components of the scientific process, or in some cases, the entire scientific method; how about AI for scientific publications? Peer-reviewed journals are foundational repositories of specialized knowledge, written in discipline-specific language that differs from general Internet content used to train most large language models (LLMs) and vision-language models (VLMs). We hypothesized that by combining a family of scientific journals with generative AI models, we could invent novel tools for scientific communication, education, and clinical care. We converted 23,000 articles from Neurosurgery Publications into a multimodal database - NeuroPubs - of 134 million words and 78,000 image-caption pairs to develop six datasets for building AI models. We showed that the content of NeuroPubs uniquely represents neurosurgery-specific clinical contexts compared with broader datasets and PubMed. For publishing, we employed generalist VLMs to automatically generate graphical abstracts from articles. Editorial board members rated 70% of these as ready for publication without further edits. For education, we generated 89,587 test questions in the style of the ABNS written board exam, which trainee and faculty neurosurgeons found indistinguishable from genuine examples 54% of the time. We used these questions alongside a curriculum learning process to track knowledge acquisition while training our 34 billion-parameter VLM (CNS-Obsidian). In a blinded, randomized controlled trial, we demonstrated the non-inferiority of CNS-Obsidian to GPT-4o (p = 0.1154) as a diagnostic copilot for a neurosurgical service. Our findings lay a novel foundation for AI with Science and establish a framework to elevate scientific communication using state-of-the-art generative artificial intelligence while maintaining rigorous quality standards.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2502.19546

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.87)

Add feedback

FineMedLM-o1: Enhancing the Medical Reasoning Ability of LLM from Supervised Fine-Tuning to Test-Time Training

Yu, Hongzhou, Cheng, Tianhao, Cheng, Ying, Feng, Rui

arXiv.org Artificial IntelligenceJan-15-2025

Recent advancements in large language models (LLMs) have shown promise in medical applications such as disease diagnosis and treatment planning. However, most existing medical LLMs struggle with the advanced reasoning required for complex clinical scenarios, such as differential diagnosis or personalized treatment suggestions. We proposed FineMedLM-o1, which leverages high-quality synthetic medical data and long-form reasoning data for Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO), enabling advanced dialogue and deep reasoning capabilities. Additionally, we introduced Test-Time Training (TTT) in the medical domain for the first time, facilitating domain adaptation and ensuring reliable, accurate reasoning. Experimental results demonstrate that FineMedLM-o1 achieves a 23% average performance improvement over prior models on key medical benchmarks. Furthermore, the introduction of TTT provides an additional 14% performance boost, highlighting its effectiveness in enhancing medical reasoning capabilities. To support this process, we also proposed a novel method for synthesizing medical dialogue. Compared to other open-source datasets, our dataset stands out as superior in both quality and complexity. The project and data will be released on GitHub.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.09213

Country:

Asia > Thailand (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Cross-Fundus Transformer for Multi-modal Diabetic Retinopathy Grading with Cataract

Xiao, Fan, Hou, Junlin, Zhao, Ruiwei, Feng, Rui, Zou, Haidong, Lu, Lina, Xu, Yi, Zhang, Juzhao

arXiv.org Artificial IntelligenceNov-1-2024

Diabetic retinopathy (DR) is a leading cause of blindness worldwide and a common complication of diabetes. As two different imaging tools for DR grading, color fundus photography (CFP) and infrared fundus photography (IFP) are highly-correlated and complementary in clinical applications. To the best of our knowledge, this is the first study that explores a novel multi-modal deep learning framework to fuse the information from CFP and IFP towards more accurate DR grading. Specifically, we construct a dual-stream architecture Cross-Fundus Transformer (CFT) to fuse the ViT-based features of two fundus image modalities. In particular, a meticulously engineered Cross-Fundus Attention (CFA) module is introduced to capture the correspondence between CFP and IFP images. Moreover, we adopt both the single-modality and multi-modality supervisions to maximize the overall performance for DR grading. Extensive experiments on a clinical dataset consisting of 1,713 pairs of multi-modal fundus images demonstrate the superiority of our proposed method. Our code will be released for public access.

artificial intelligence, machine learning, representation, (13 more...)

arXiv.org Artificial Intelligence

2411.00726

Country: Asia > China (0.31)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CT2C-QA: Multimodal Question Answering over Chinese Text, Table and Chart

Zhao, Bowen, Cheng, Tianhao, Zhang, Yuejie, Cheng, Ying, Feng, Rui, Zhang, Xiaobo

arXiv.org Artificial IntelligenceOct-28-2024

Multimodal Question Answering (MMQA) is crucial as it enables comprehensive understanding and accurate responses by integrating insights from diverse data representations such as tables, charts, and text. Most existing researches in MMQA only focus on two modalities such as image-text QA, table-text QA and chart-text QA, and there remains a notable scarcity in studies that investigate the joint analysis of text, tables, and charts. In this paper, we present C$\text{T}^2$C-QA, a pioneering Chinese reasoning-based QA dataset that includes an extensive collection of text, tables, and charts, meticulously compiled from 200 selectively sourced webpages. Our dataset simulates real webpages and serves as a great test for the capability of the model to analyze and reason with multimodal data, because the answer to a question could appear in various modalities, or even potentially not exist at all. Additionally, we present AED (\textbf{A}llocating, \textbf{E}xpert and \textbf{D}esicion), a multi-agent system implemented through collaborative deployment, information interaction, and collective decision-making among different agents. Specifically, the Assignment Agent is in charge of selecting and activating expert agents, including those proficient in text, tables, and charts. The Decision Agent bears the responsibility of delivering the final verdict, drawing upon the analytical insights provided by these expert agents. We execute a comprehensive analysis, comparing AED with various state-of-the-art models in MMQA, including GPT-4. The experimental outcomes demonstrate that current methodologies, including GPT-4, are yet to meet the benchmarks set by our dataset.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3664647.3681053

2410.21414

Country: Oceania > Australia (0.15)

Genre: Research Report (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.44)

Add feedback

ADSNet: Cross-Domain LTV Prediction with an Adaptive Siamese Network in Advertising

Wang, Ruize, Xu, Hui, Cheng, Ying, He, Qi, Zhou, Xing, Feng, Rui, Xu, Wei, Huang, Lei, Jiang, Jie

arXiv.org Artificial IntelligenceJun-15-2024

Advertising platforms have evolved in estimating Lifetime Value (LTV) to better align with advertisers' true performance metric. However, the sparsity of real-world LTV data presents a significant challenge to LTV predictive model(i.e., pLTV), severely limiting the their capabilities. Therefore, we propose to utilize external data, in addition to the internal data of advertising platform, to expand the size of purchase samples and enhance the LTV prediction model of the advertising platform. To tackle the issue of data distribution shift between internal and external platforms, we introduce an Adaptive Difference Siamese Network (ADSNet), which employs cross-domain transfer learning to prevent negative transfer. Specifically, ADSNet is designed to learn information that is beneficial to the target domain. We introduce a gain evaluation strategy to calculate information gain, aiding the model in learning helpful information for the target domain and providing the ability to reject noisy samples, thus avoiding negative transfer. Additionally, we also design a Domain Adaptation Module as a bridge to connect different domains, reduce the distribution distance between them, and enhance the consistency of representation space distribution. We conduct extensive offline experiments and online A/B tests on a real advertising platform. Our proposed ADSNet method outperforms other methods, improving GINI by 2$\%$. The ablation study highlights the importance of the gain evaluation strategy in negative gain sample rejection and improving model performance. Additionally, ADSNet significantly improves long-tail prediction. The online A/B tests confirm ADSNet's efficacy, increasing online LTV by 3.47$\%$ and GMV by 3.89$\%$.

data mining, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.10517

Country:

Europe (0.48)
Asia > China (0.29)
North America > United States > New York (0.14)

Genre: Research Report (1.00)

Industry:

Marketing (0.47)
Information Technology > Services (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Add feedback

Anatomical Structure-Guided Medical Vision-Language Pre-training

Li, Qingqiu, Yan, Xiaohan, Xu, Jilan, Yuan, Runtian, Zhang, Yuejie, Feng, Rui, Shen, Quanli, Zhang, Xiaobo, Wang, Shujun

arXiv.org Artificial IntelligenceMar-14-2024

Learning medical visual representations through vision-lang uage pre-training has reached remarkable progress. Despite the promising performance, it still faces challenges, i.e., local alignment lacks interpretability and clinical relevance, and the insufficient internal and external representation learning of image-report pairs. To address these issues, we propose an Anatomical Structure-Guided (ASG) framework. Specifically, we parse raw reports into triplets , and fully utilize each element as supervision to enhance representation learning. For anatomical region, we design an automatic anatomical region-sentence alignment paradigm in collaboration with radiologists, considering them as the minimum semantic units to explore fine-grained local alignment. For finding and existence, we regard them as image tags, applying an image-tag recognition decoder to associate image features with their respective tags within each sample and constructing soft labels for contrastive learning to improve the semantic association of different image-report pairs. We evaluate the proposed ASG framework on two downstream tasks, including five public benchmarks. Experimental results demonstrate that our method outperforms the state-of-the-art methods.

artificial intelligence, learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2403.09294

Country: Europe > Germany (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Large Language Models are Complex Table Parsers

Zhao, Bowen, Ji, Changkai, Zhang, Yuejie, He, Wen, Wang, Yingwen, Wang, Qing, Feng, Rui, Zhang, Xiaobo

arXiv.org Artificial IntelligenceDec-12-2023

With the Generative Pre-trained Transformer 3.5 (GPT-3.5) exhibiting remarkable reasoning and comprehension abilities in Natural Language Processing (NLP), most Question Answering (QA) research has primarily centered around general QA tasks based on GPT, neglecting the specific challenges posed by Complex Table QA. In this paper, we propose to incorporate GPT-3.5 to address such challenges, in which complex tables are reconstructed into tuples and specific prompt designs are employed for dialogues. Specifically, we encode each cell's hierarchical structure, position information, and content as a tuple. By enhancing the prompt template with an explanatory description of the meaning of each tuple and the logical reasoning process of the task, we effectively improve the hierarchical structure awareness capability of GPT-3.5 to better parse the complex tables. Extensive experiments and results on Complex Table QA datasets, i.e., the open-domain dataset HiTAB and the aviation domain dataset AIT-QA show that our approach significantly outperforms previous work on both datasets, leading to state-of-the-art (SOTA) performance.

header, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2312.11521

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Health Care Providers & Services (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DeepPointMap: Advancing LiDAR SLAM with Unified Neural Descriptors

Zhang, Xiaze, Ding, Ziheng, Jing, Qi, Zhang, Yuejie, Ding, Wenchao, Feng, Rui

arXiv.org Artificial IntelligenceDec-5-2023

Point clouds have shown significant potential in various domains, including Simultaneous Localization and Mapping (SLAM). However, existing approaches either rely on dense point clouds to achieve high localization accuracy or use generalized descriptors to reduce map size. Unfortunately, these two aspects seem to conflict with each other. To address this limitation, we propose a unified architecture, DeepPointMap, achieving excellent preference on both aspects. We utilize neural network to extract highly representative and sparse neural descriptors from point clouds, enabling memory-efficient map representation and accurate multi-scale localization tasks (e.g., odometry and loop-closure). Moreover, we showcase the versatility of our framework by extending it to more challenging multi-agent collaborative SLAM. The promising results obtained in these scenarios further emphasize the effectiveness and potential of our approach.

artificial intelligence, descriptor, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2312.02684

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Enhanced Knowledge Injection for Radiology Report Generation

Li, Qingqiu, Xu, Jilan, Yuan, Runtian, Chen, Mohan, Zhang, Yuejie, Feng, Rui, Zhang, Xiaobo, Gao, Shang

arXiv.org Artificial IntelligenceNov-1-2023

Automatic generation of radiology reports holds crucial clinical value, as it can alleviate substantial workload on radiologists and remind less experienced ones of potential anomalies. Despite the remarkable performance of various image captioning methods in the natural image field, generating accurate reports for medical images still faces challenges, i.e., disparities in visual and textual data, and lack of accurate domain knowledge. To address these issues, we propose an enhanced knowledge injection framework, which utilizes two branches to extract different types of knowledge. The Weighted Concept Knowledge (WCK) branch is responsible for introducing clinical medical concepts weighted by TF-IDF scores. The Multimodal Retrieval Knowledge (MRK) branch extracts triplets from similar reports, emphasizing crucial clinical information related to entity positions and existence. By integrating this finer-grained and well-structured knowledge with the current image, we are able to leverage the multi-source knowledge gain to ultimately facilitate more accurate report generation. Extensive experiments have been conducted on two public benchmarks, demonstrating that our method achieves superior performance over other state-of-the-art methods. Ablation studies further validate the effectiveness of two extracted knowledge sources.

artificial intelligence, enhanced knowledge injection, radiology report generation

arXiv.org Artificial Intelligence

2311.00399

Genre: Research Report (0.69)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Nuclear Medicine (0.89)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.53)

Add feedback

PolyGET: Accelerating Polymer Simulations by Accurate and Generalizable Forcefield with Equivariant Transformer

Feng, Rui, Tran, Huan, Toland, Aubrey, Chen, Binghong, Zhu, Qi, Ramprasad, Rampi, Zhang, Chao

arXiv.org Artificial IntelligenceSep-1-2023

Polymer simulation with both accuracy and efficiency is a challenging task. Machine learning (ML) forcefields have been developed to achieve both the accuracy of ab initio methods and the efficiency of empirical force fields. However, existing ML force fields are usually limited to single-molecule settings, and their simulations are not robust enough. In this paper, we present PolyGET, a new framework for Polymer Forcefields with Generalizable Equivariant Transformers. PolyGET is designed to capture complex quantum interactions between atoms and generalize across various polymer families, using a deep learning model called Equivariant Transformers. We propose a new training paradigm that focuses exclusively on optimizing forces, which is different from existing methods that jointly optimize forces and energy. This simple force-centric objective function avoids competing objectives between energy and forces, thereby allowing for learning a unified forcefield ML model over different polymer families. We evaluated PolyGET on a large-scale dataset of 24 distinct polymer types and demonstrated state-of-the-art performance in force accuracy and robust MD simulations. Furthermore, PolyGET can simulate large polymers with high fidelity to the reference ab initio DFT method while being able to generalize to unseen polymers.

accurate and generalizable forcefield, artificial intelligence, machine learning, (4 more...)

arXiv.org Artificial Intelligence

2309.00585

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback