AITopics | Yang, Lin

Collaborating Authors

Yang, Lin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning and generalization of robotic dual-arm manipulation of boxes from demonstrations via Gaussian Mixture Models (GMMs)

Lee, Qian Ying, Kulkarni, Suhas Raghavendra, Wong, Kenzhi Iskandar, Yang, Lin, Noronha, Bernardo, Wee, Yongjun, Hung, Tzu-Yi, Campolo, Domenico

arXiv.org Artificial IntelligenceMar-7-2025

Learning from demonstration (LfD) is an effective method to teach robots to move and manipulate objects in a human-like manner. This is especially true when dealing with complex robotic systems, such as those with dual arms employed for their improved payload capacity and manipulability. However, a key challenge is in expanding the robotic movements beyond the learned scenarios to adapt to minor and major variations from the specific demonstrations. In this work, we propose a learning and novel generalization approach that adapts the learned Gaussian Mixture Model (GMM)-parameterized policy derived from human demonstrations. Our method requires only a small number of human demonstrations and eliminates the need for a robotic system during the demonstration phase, which can significantly reduce both cost and time. The generalization process takes place directly in the parameter space, leveraging the lower-dimensional representation of GMM parameters. With only three parameters per Gaussian component, this process is computationally efficient and yields immediate results upon request. We validate our approach through real-world experiments involving a dual-arm robotic manipulation of boxes. Starting with just five demonstrations for a single task, our approach successfully generalizes to new unseen scenarios, including new target locations, orientations, and box sizes. These results highlight the practical applicability and scalability of our method for complex manipulations.

artificial intelligence, demonstration, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2503.05619

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

PolyPath: Adapting a Large Multimodal Model for Multi-slide Pathology Report Generation

Ahmed, Faruk, Yang, Lin, Jaroensri, Tiam, Sellergren, Andrew, Matias, Yossi, Hassidim, Avinatan, Corrado, Greg S., Webster, Dale R., Shetty, Shravya, Prabhakara, Shruthi, Liu, Yun, Golden, Daniel, Wulczyn, Ellery, Steiner, David F.

arXiv.org Artificial IntelligenceFeb-14-2025

Recent applications of vision-language modeling in digital histopathology have been predominantly designed to generate text describing individual regions of interest extracted from a single digitized histopathology image, or Whole Slide Image (WSI). An emerging line of research approaches the more practical clinical use case of slide-level text generation (Ahmed et al., 2024, Chen et al., 2024). However, in the typical clinical use case, there can be multiple biological tissue parts associated with a case, with each part having multiple slides. Pathologists write up a report summarizing their part-level diagnostic findings by microscopically reviewing each of the available slides per part and integrating information across these slides. This many-to-one relationship of slides to clinical descriptions is a recognized challenge for vision-language modeling in this space (Ahmed et al., 2024). The common approach taken in recent literature is to restrict modeling and analysis to single-slide cases or to manually identify a single slide within a case or part that is most representative of the clinical findings in reports (Ahmed et al., 2024, Chen et al., 2024, Guo et al., 2024, Shaikovski et al., 2024, Xu et al., 2024, Zhou et al., 2024). This strategy of selecting representative slides was also adopted in constructing one of the most widely used histopathology datasets, TCGA (Cooper et al., 2018).

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.10536

Country: North America > United States (0.14)

Genre:

Research Report > Experimental Study (0.70)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.54)

Add feedback

I3S: Importance Sampling Subspace Selection for Low-Rank Optimization in LLM Pretraining

Zhang, Haochen, Yin, Junze, Wang, Guanchu, Liu, Zirui, Zhang, Tianyi, Shrivastava, Anshumali, Yang, Lin, Braverman, Vladimir

arXiv.org Artificial IntelligenceFeb-9-2025

Low-rank optimization has emerged as a promising approach to enabling memory-efficient training of large language models (LLMs). Existing low-rank optimization methods typically project gradients onto a low-rank subspace, reducing the memory cost of storing optimizer states. A key challenge in these methods is identifying suitable subspaces to ensure an effective optimization trajectory. Most existing approaches select the dominant subspace to preserve gradient information, as this intuitively provides the best approximation. However, we find that in practice, the dominant subspace stops changing during pretraining, thereby constraining weight updates to similar subspaces. In this paper, we propose importance sampling subspace selection (I3S) for low-rank optimization, which theoretically offers a comparable convergence rate to the dominant subspace approach. Empirically, we demonstrate that I3S significantly outperforms previous methods in LLM pretraining tasks.

artificial intelligence, large language model, natural language, (10 more...)

arXiv.org Artificial Intelligence

2502.0579

Country: North America > United States > California (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Transition Transfer $Q$-Learning for Composite Markov Decision Processes

Chai, Jinhang, Chen, Elynn, Yang, Lin

arXiv.org Machine LearningFeb-1-2025

To bridge the gap between empirical success and theoretical understanding in transfer reinforcement learning (RL), we study a principled approach with provable performance guarantees. We introduce a novel composite MDP framework where high-dimensional transition dynamics are modeled as the sum of a low-rank component representing shared structure and a sparse component capturing task-specific variations. This relaxes the common assumption of purely low-rank transition models, allowing for more realistic scenarios where tasks share core dynamics but maintain individual variations. We introduce UCB-TQL (Upper Confidence Bound Transfer Q-Learning), designed for transfer RL scenarios where multiple tasks share core linear MDP dynamics but diverge along sparse dimensions. When applying UCB-TQL to a target task after training on a source task with sufficient trajectories, we achieve a regret bound of $\tilde{O}(\sqrt{eH^5N})$ that scales independently of the ambient dimension. Here, $N$ represents the number of trajectories in the target task, while $e$ quantifies the sparse differences between tasks. This result demonstrates substantial improvement over single task RL by effectively leveraging their structural similarities. Our theoretical analysis provides rigorous guarantees for how UCB-TQL simultaneously exploits shared dynamics while adapting to task-specific variations.

composite markov decision process, machine learning, reinforcement learning, (2 more...)

arXiv.org Machine Learning

2502.00534

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Health AI Developer Foundations

Kiraly, Atilla P., Baur, Sebastien, Philbrick, Kenneth, Mahvar, Fereshteh, Yatziv, Liron, Chen, Tiffany, Sterling, Bram, George, Nick, Jamil, Fayaz, Tang, Jing, Bailey, Kai, Ahmed, Faruk, Goel, Akshay, Ward, Abbi, Yang, Lin, Sellergren, Andrew, Matias, Yossi, Hassidim, Avinatan, Shetty, Shravya, Golden, Daniel, Azizi, Shekoofeh, Steiner, David F., Liu, Yun, Thelin, Tim, Pilgrim, Rory, Kirmizibayrak, Can

arXiv.org Artificial IntelligenceNov-26-2024

Robust medical Machine Learning (ML) models have the potential to revolutionize healthcare by accelerating clinical research, improving workflows and outcomes, and producing novel insights or capabilities. Developing such ML models from scratch is cost prohibitive and requires substantial compute, data, and time (e.g., expert labeling). To address these challenges, we introduce Health AI Developer Foundations (HAI-DEF), a suite of pre-trained, domain-specific foundation models, tools, and recipes to accelerate building ML for health applications. The models cover various modalities and domains, including radiology (X-rays and computed tomography), histopathology, dermatological imaging, and audio. These models provide domain specific embeddings that facilitate AI development with less labeled data, shorter training times, and reduced computational costs compared to traditional approaches. In addition, we utilize a common interface and style across these models, and prioritize usability to enable developers to integrate HAI-DEF efficiently. We present model evaluations across various tasks and conclude with a discussion of their application and evaluation, covering the importance of ensuring efficacy, fairness, and equity. Finally, while HAI-DEF and specifically the foundation models lower the barrier to entry for ML in healthcare, we emphasize the importance of validation with problem- and population-specific data for each desired usage setting. This technical report will be updated over time as more modalities and features are added.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2411.15128

Genre:

Research Report > Experimental Study (0.66)
Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Path Planning in Complex Environments with Superquadrics and Voronoi-Based Orientation

Yang, Lin, Iyer, Ganesh, Lou, Baichuan, Turlapati, Sri Harsha, Lv, Chen, Campolo, Domenico

arXiv.org Artificial IntelligenceNov-7-2024

Path planning in narrow passages is a challenging problem in various applications. Traditional planning algorithms often face challenges in complex environments like mazes and traps, where narrow entrances require special orientation control for successful navigation. In this work, we present a novel approach that combines superquadrics (SQ) representation and Voronoi diagrams to solve the narrow passage problem in both 2D and 3D environment. Our method utilizes the SQ formulation to expand obstacles, eliminating impassable passages, while Voronoi hyperplane ensures maximum clearance path. Additionally, the hyperplane provides a natural reference for robot orientation, aligning its long axis with the passage direction. We validate our framework through a 2D object retrieval task and 3D drone simulation, demonstrating that our approach outperforms classical planners and a cutting-edge drone planner by ensuring passable trajectories with maximum clearance.

artificial intelligence, obstacle, planning & scheduling, (16 more...)

arXiv.org Artificial Intelligence

2411.05279

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.34)

Add feedback

Planning for quasi-static manipulation tasks via an intrinsic haptic metric

Yang, Lin, Turlapati, Sri Harsha, Lv, Chen, Campolo, Domenico

arXiv.org Artificial IntelligenceNov-6-2024

Contact-rich manipulation often requires strategic interactions with objects, such as pushing to accomplish specific tasks. We propose a novel scenario where a robot inserts a book into a crowded shelf by pushing aside neighboring books to create space before slotting the new book into place. Classical planning algorithms fail in this context due to limited space and their tendency to avoid contact. Additionally, they do not handle indirectly manipulable objects or consider force interactions. Our key contributions are: i) re-framing quasi-static manipulation as a planning problem on an implicit manifold derived from equilibrium conditions; ii) utilizing an intrinsic haptic metric instead of ad-hoc cost functions; and iii) proposing an adaptive algorithm that simultaneously updates robot states, object positions, contact points, and haptic distances. We evaluate our method on such crowded bookshelf insertion task but it is a general formulation to rigid bodies manipulation tasks. We propose proxies to capture contact point and force, with superellipse to represent objects. This simplified model guarantee the differentiablity. Our framework autonomously discovers strategic wedging-in policies while our simplified contact model achieves behavior similar to real world scenarios. We also vary the stiffness and initial positions to analysis our framework comprehensively. The video can be found at https://youtu.be/eab8umZ3AQ0.

artificial intelligence, machine learning, planning & scheduling, (17 more...)

arXiv.org Artificial Intelligence

2411.04374

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)

Add feedback

WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering

Chen, Pingyi, Zhu, Chenglu, Zheng, Sunyi, Li, Honglin, Yang, Lin

arXiv.org Artificial IntelligenceJul-8-2024

Whole slide imaging is routinely adopted for carcinoma diagnosis and prognosis. Abundant experience is required for pathologists to achieve accurate and reliable diagnostic results of whole slide images (WSI). The huge size and heterogeneous features of WSIs make the workflow of pathological reading extremely time-consuming. In this paper, we propose a novel framework (WSI-VQA) to interpret WSIs by generative visual question answering. WSI-VQA shows universality by reframing various kinds of slide-level tasks in a question-answering pattern, in which pathologists can achieve immunohistochemical grading, survival prediction, and tumor subtyping following human-machine interaction. Furthermore, we establish a WSI-VQA dataset which contains 8672 slide-level question-answering pairs with 977 WSIs. Besides the ability to deal with different slide-level tasks, our generative model which is named Wsi2Text Transformer (W2T) outperforms existing discriminative models in medical correctness, which reveals the potential of our model to be applied in the clinical scenario. Additionally, we also visualize the co-attention mapping between word embeddings and WSIs as an intuitive explanation for diagnostic results.

machine learning, natural language, question answering, (19 more...)

arXiv.org Artificial Intelligence

2407.05603

Country:

Asia > China (0.14)
North America > United States (0.14)
Europe > Spain (0.14)
Europe > France (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

PathAlign: A vision-language model for whole slide images in histopathology

Ahmed, Faruk, Sellergren, Andrew, Yang, Lin, Xu, Shawn, Babenko, Boris, Ward, Abbi, Olson, Niels, Mohtashamian, Arash, Matias, Yossi, Corrado, Greg S., Duong, Quang, Webster, Dale R., Shetty, Shravya, Golden, Daniel, Liu, Yun, Steiner, David F., Wulczyn, Ellery

arXiv.org Artificial IntelligenceJun-27-2024

Microscopic interpretation of histopathology images underlies many important diagnostic and treatment decisions. While advances in vision-language modeling raise new opportunities for analysis of such images, the gigapixel-scale size of whole slide images (WSIs) introduces unique challenges. Additionally, pathology reports simultaneously highlight key findings from small regions while also aggregating interpretation across multiple slides, often making it difficult to create robust image-text pairs. As such, pathology reports remain a largely untapped source of supervision in computational pathology, with most efforts relying on region-of-interest annotations or self-supervision at the patch-level. In this work, we develop a vision-language model based on the BLIP-2 framework using WSIs paired with curated text from pathology reports. This enables applications utilizing a shared image-text embedding space, such as text or image retrieval for finding cases of interest, as well as integration of the WSI encoder with a frozen large language model (LLM) for WSI-based generative text capabilities such as report generation or AI-in-the-loop interactions. We utilize a de-identified dataset of over 350,000 WSIs and diagnostic text pairs, spanning a wide range of diagnoses, procedure types, and tissue types. We present pathologist evaluation of text generation and text retrieval using WSI embeddings, as well as results for WSI classification and workflow prioritization (slide-level triaging). Model-generated text for WSIs was rated by pathologists as accurate, without clinically significant error or omission, for 78% of WSIs on average. This work demonstrates exciting potential capabilities for language-aligned WSI embeddings.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.19578

Country: North America > United States > Maryland (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Dermatology (0.93)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.93)
Health & Medicine > Diagnostic Medicine > Biopsy (0.74)
Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Add feedback

LLMEmbed: Rethinking Lightweight LLM's Genuine Function in Text Classification

Liu, Chun, Zhang, Hongguang, Zhao, Kainan, Ju, Xinghai, Yang, Lin

arXiv.org Artificial IntelligenceJun-5-2024

With the booming of Large Language Models (LLMs), prompt-learning has become a promising method mainly researched in various research areas. Recently, many attempts based on prompt-learning have been made to improve the performance of text classification. However, most of these methods are based on heuristic Chain-of-Thought (CoT), and tend to be more complex but less efficient. In this paper, we rethink the LLM-based text classification methodology, propose a simple and effective transfer learning strategy, namely LLMEmbed, to address this classical but challenging task. To illustrate, we first study how to properly extract and fuse the text embeddings via various lightweight LLMs at different network depths to improve their robustness and discrimination, then adapt such embeddings to train the classifier. We perform extensive experiments on publicly available datasets, and the results show that LLMEmbed achieves strong performance while enjoys low training overhead using lightweight LLM backbones compared to recent methods based on larger LLMs, i.e. GPT-3, and sophisticated prompt-based strategies. Our LLMEmbed achieves adequate accuracy on publicly available benchmarks without any fine-tuning while merely use 4% model parameters, 1.8% electricity consumption and 1.5% runtime compared to its counterparts. Code is available at: https://github.com/ChunLiu-cs/LLMEmbed-ACL2024.

large language model, llmembed, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2406.03725

Country:

North America > United States (0.46)
Asia > China (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry:

Energy > Power Industry (0.49)
Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback