AITopics | Huang, Hao

Collaborating Authors

Huang, Hao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Integrating Retrospective Framework in Multi-Robot Collaboration

Liang, Jiazhao, Huang, Hao, Hao, Yu, Bethala, Geeta Chandra Raju, Wen, Congcong, Rizzo, John-Ross, Fang, Yi

arXiv.org Artificial IntelligenceFeb-16-2025

Recent advancements in Large Language Models (LLMs) have demonstrated substantial capabilities in enhancing communication and coordination in multi-robot systems. However, existing methods often struggle to achieve efficient collaboration and decision-making in dynamic and uncertain environments, which are common in real-world multi-robot scenarios. To address these challenges, we propose a novel retrospective actor-critic framework for multi-robot collaboration. This framework integrates two key components: (1) an actor that performs real-time decision-making based on observations and task directives, and (2) a critic that retrospectively evaluates the outcomes to provide feedback for continuous refinement, such that the proposed framework can adapt effectively to dynamic conditions. Extensive experiments conducted in simulated environments validate the effectiveness of our approach, demonstrating significant improvements in task performance and adaptability. This work offers a robust solution to persistent challenges in robotic collaboration.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2502.11227

Country: Asia > Middle East (0.29)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)

Add feedback

Impact of Data Distribution on Fairness Guarantees in Equitable Deep Learning

Luo, Yan, Wen, Congcong, Shi, Min, Huang, Hao, Fang, Yi, Wang, Mengyu

arXiv.org Artificial IntelligenceDec-29-2024

We present a comprehensive theoretical framework analyzing the relationship between data distributions and fairness guarantees in equitable deep learning. Our work establishes novel theoretical bounds that explicitly account for data distribution heterogeneity across demographic groups, while introducing a formal analysis framework that minimizes expected loss differences across these groups. We derive comprehensive theoretical bounds for fairness errors and convergence rates, and characterize how distributional differences between groups affect the fundamental trade-off between fairness and accuracy. Through extensive experiments on diverse datasets, including FairVision (ophthalmology), CheXpert (chest X-rays), HAM10000 (dermatology), and FairFace (facial recognition), we validate our theoretical findings and demonstrate that differences in feature distributions across demographic groups significantly impact model fairness, with performance disparities particularly pronounced in racial categories. The theoretical bounds we derive crroborate these empirical observations, providing insights into the fundamental limits of achieving fairness in deep learning models when faced with heterogeneous data distributions. This work advances our understanding of fairness in AI-based diagnosis systems and provides a theoretical foundation for developing more equitable algorithms. The code for analysis is publicly available via \url{https://github.com/Harvard-Ophthalmology-AI-Lab/fairness_guarantees}.

artificial intelligence, machine learning, mean, (16 more...)

arXiv.org Artificial Intelligence

2412.20377

Country:

North America > United States > New York (0.14)
North America > United States > Louisiana (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GAMap: Zero-Shot Object Goal Navigation with Multi-Scale Geometric-Affordance Guidance

Yuan, Shuaihang, Huang, Hao, Hao, Yu, Wen, Congcong, Tzes, Anthony, Fang, Yi

arXiv.org Artificial IntelligenceOct-31-2024

Zero-Shot Object Goal Navigation (ZS-OGN) enables robots or agents to navigate toward objects of unseen categories without object-specific training. Traditional approaches often leverage categorical semantic information for navigation guidance, which struggles when only objects are partially observed or detailed and functional representations of the environment are lacking. To resolve the above two issues, we propose \textit{Geometric-part and Affordance Maps} (GAMap), a novel method that integrates object parts and affordance attributes as navigation guidance. Our method includes a multi-scale scoring approach to capture geometric-part and affordance attributes of objects at different scales. Comprehensive experiments conducted on HM3D and Gibson benchmark datasets demonstrate improvements in Success Rate and Success weighted by Path Length, underscoring the efficacy of our geometric-part and affordance-guided navigation approach in enhancing robot autonomy and versatility, without any additional object-specific training or fine-tuning with the semantics of unseen objects and/or the locomotions of the robot.

large language model, natural language, navigation, (16 more...)

arXiv.org Artificial Intelligence

2410.23978

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > New York (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Reliable Semantic Understanding for Real World Zero-shot Object Goal Navigation

Unlu, Halil Utku, Yuan, Shuaihang, Wen, Congcong, Huang, Hao, Tzes, Anthony, Fang, Yi

arXiv.org Artificial IntelligenceOct-29-2024

We introduce an innovative approach to advancing semantic understanding in zero-shot object goal navigation (ZS-OGN), enhancing the autonomy of robots in unfamiliar environments. Traditional reliance on labeled data has been a limitation for robotic adaptability, which we address by employing a dual-component framework that integrates a GLIP Vision Language Model for initial detection and an Instruction-BLIP model for validation. This combination not only refines object and environmental recognition but also fortifies the semantic interpretation, pivotal for navigational decision-making. Our method, rigorously tested in both simulated and real-world settings, exhibits marked improvements in navigation precision and reliability.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2410.21926

Country:

North America > United States (0.28)
Europe > Switzerland (0.28)

Genre:

Research Report > Promising Solution (0.48)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)

Add feedback

Exploring the Reliability of Foundation Model-Based Frontier Selection in Zero-Shot Object Goal Navigation

Yuan, Shuaihang, Unlu, Halil Utku, Huang, Hao, Wen, Congcong, Tzes, Anthony, Fang, Yi

arXiv.org Artificial IntelligenceOct-28-2024

In this paper, we present a novel method for reliable frontier selection in Zero-Shot Object Goal Navigation (ZS-OGN), enhancing robotic navigation systems with foundation models to improve commonsense reasoning in indoor environments. Our approach introduces a multi-expert decision framework to address the nonsensical or irrelevant reasoning often seen in foundation model-based systems. The method comprises two key components: Diversified Expert Frontier Analysis (DEFA) and Consensus Decision Making (CDM). DEFA utilizes three expert models: furniture arrangement, room type analysis, and visual scene reasoning, while CDM aggregates their outputs, prioritizing unanimous or majority consensus for more reliable decisions. Demonstrating state-of-the-art performance on the RoboTHOR and HM3D datasets, our method excels at navigating towards untrained objects or goals and outperforms various baselines, showcasing its adaptability to dynamic real-world conditions and superior generalization capabilities.

large language model, machine learning, natural language, (12 more...)

arXiv.org Artificial Intelligence

2410.21037

Country:

North America > United States (0.46)
Europe > Switzerland (0.28)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Zero-shot Object Navigation with Vision-Language Models Reasoning

Wen, Congcong, Huang, Yisiyuan, Huang, Hao, Huang, Yanjia, Yuan, Shuaihang, Hao, Yu, Lin, Hui, Liu, Yu-Shen, Fang, Yi

arXiv.org Artificial IntelligenceOct-24-2024

Object navigation is crucial for robots, but traditional methods require substantial training data and cannot be generalized to unknown environments. Zero-shot object navigation (ZSON) aims to address this challenge, allowing robots to interact with unknown objects without specific training data. Language-driven zero-shot object navigation (L-ZSON) is an extension of ZSON that incorporates natural language instructions to guide robot navigation and interaction with objects. In this paper, we propose a novel Vision Language model with a Tree-of-thought Network (VLTNet) for L-ZSON. VLTNet comprises four main modules: vision language model understanding, semantic mapping, tree-of-thought reasoning and exploration, and goal identification. Among these modules, Tree-of-Thought (ToT) reasoning and exploration module serves as a core component, innovatively using the ToT reasoning framework for navigation frontier selection during robot exploration. Compared to conventional frontier selection without reasoning, navigation using ToT reasoning involves multi-path reasoning processes and backtracking when necessary, enabling globally informed decision-making with higher accuracy. Experimental results on PASTURE and RoboTHOR benchmarks demonstrate the outstanding performance of our model in LZSON, particularly in scenarios involving complex natural language as target instructions.

large language model, natural language, navigation, (17 more...)

arXiv.org Artificial Intelligence

2410.1857

Country: Asia (0.47)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

$\texttt{dattri}$: A Library for Efficient Data Attribution

Deng, Junwei, Li, Ting-Wei, Zhang, Shiyuan, Liu, Shixuan, Pan, Yijun, Huang, Hao, Wang, Xinhe, Hu, Pingbang, Zhang, Xingjian, Ma, Jiaqi W.

arXiv.org Artificial IntelligenceOct-6-2024

Data attribution methods aim to quantify the influence of individual training samples on the prediction of artificial intelligence (AI) models. As training data plays an increasingly crucial role in the modern development of large-scale AI models, data attribution has found broad applications in improving AI performance and safety. However, despite a surge of new data attribution methods being developed recently, there lacks a comprehensive library that facilitates the development, benchmarking, and deployment of different data attribution methods. In this work, we introduce $\texttt{dattri}$, an open-source data attribution library that addresses the above needs. Specifically, $\texttt{dattri}$ highlights three novel design features. Firstly, $\texttt{dattri}$ proposes a unified and easy-to-use API, allowing users to integrate different data attribution methods into their PyTorch-based machine learning pipeline with a few lines of code changed. Secondly, $\texttt{dattri}$ modularizes low-level utility functions that are commonly used in data attribution methods, such as Hessian-vector product, inverse-Hessian-vector product or random projection, making it easier for researchers to develop new data attribution methods. Thirdly, $\texttt{dattri}$ provides a comprehensive benchmark framework with pre-trained models and ground truth annotations for a variety of benchmark settings, including generative AI settings. We have implemented a variety of state-of-the-art efficient data attribution methods that can be applied to large-scale neural network models, and will continuously update the library in the future. Using the developed $\texttt{dattri}$ library, we are able to perform a comprehensive and fair benchmark analysis across a wide range of data attribution methods. The source code of $\texttt{dattri}$ is available at https://github.com/TRAIS-Lab/dattri.

artificial intelligence, data attribution method, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.04555

Country: North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Curvature Diversity-Driven Deformation and Domain Alignment for Point Cloud

Wu, Mengxi, Huang, Hao, Fang, Yi, Rostami, Mohammad

arXiv.org Artificial IntelligenceOct-4-2024

Unsupervised Domain Adaptation (UDA) is crucial for reducing the need for extensive manual data annotation when training deep networks on point cloud data. A significant challenge of UDA lies in effectively bridging the domain gap. To tackle this challenge, we propose \textbf{C}urvature \textbf{D}iversity-Driven \textbf{N}uclear-Norm Wasserstein \textbf{D}omain Alignment (CDND). Our approach first introduces a \textit{\textbf{Curv}ature Diversity-driven Deformation \textbf{Rec}onstruction (CurvRec)} task, which effectively mitigates the gap between the source and target domains by enabling the model to extract salient features from semantically rich regions of a given point cloud. We then propose \textit{\textbf{D}eformation-based \textbf{N}uclear-norm \textbf{W}asserstein \textbf{D}iscrepancy (D-NWD)}, which applies the Nuclear-norm Wasserstein Discrepancy to both \textit{deformed and original} data samples to align the source and target domains. Furthermore, we contribute a theoretical justification for the effectiveness of D-NWD in distribution alignment and demonstrate that it is \textit{generic} enough to be applied to \textbf{any} deformations. To validate our method, we conduct extensive experiments on two public domain adaptation datasets for point cloud classification and segmentation tasks. Empirical experiment results show that our CDND achieves state-of-the-art performance by a noticeable margin over existing approaches.

artificial intelligence, domain adaptation, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2410.0272

Country:

North America > United States (0.28)
Europe > France (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Vision (0.69)
(2 more...)

Add feedback

FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification

Tian, Yu, Wen, Congcong, Shi, Min, Afzal, Muhammad Muneeb, Huang, Hao, Khan, Muhammad Osama, Luo, Yan, Fang, Yi, Wang, Mengyu

arXiv.org Artificial IntelligenceJul-11-2024

Addressing fairness in artificial intelligence (AI), particularly in medical AI, is crucial for ensuring equitable healthcare outcomes. Recent efforts to enhance fairness have introduced new methodologies and datasets in medical AI. However, the fairness issue under the setting of domain transfer is almost unexplored, while it is common that clinics rely on different imaging technologies (e.g., different retinal imaging modalities) for patient diagnosis. This paper presents FairDomain, a pioneering systemic study into algorithmic fairness under domain shifts, employing state-of-the-art domain adaptation (DA) and generalization (DG) algorithms for both medical segmentation and classification tasks to understand how biases are transferred between different domains. We also introduce a novel plug-and-play fair identity attention (FIA) module that adapts to various DA and DG algorithms to improve fairness by using self-attention to adjust feature importance based on demographic attributes. Additionally, we curate the first fairness-focused dataset with two paired imaging modalities for the same patient cohort on medical segmentation and classification tasks, to rigorously assess fairness in domain-shift scenarios. Excluding the confounding impact of demographic distribution variation between source and target domains will allow clearer quantification of the performance of domain transfer models. Our extensive evaluations reveal that the proposed FIA significantly enhances both model performance accounted for fairness across all domain shift settings (i.e., DA and DG) with respect to different demographics, which outperforms existing methods on both segmentation and classification. The code and data can be accessed at https://ophai.hms.harvard.edu/datasets/

artificial intelligence, fairness, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2407.08813

Country:

Europe (0.67)
North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.66)

Add feedback

Benchmarking Complex Instruction-Following with Multiple Constraints Composition

Wen, Bosi, Ke, Pei, Gu, Xiaotao, Wu, Lindong, Huang, Hao, Zhou, Jinfeng, Li, Wenchuang, Hu, Binxin, Gao, Wendy, Xu, Jiaxin, Liu, Yiming, Tang, Jie, Wang, Hongning, Huang, Minlie

arXiv.org Artificial IntelligenceJul-11-2024

Instruction following is one of the fundamental capabilities of large language models (LLMs). As the ability of LLMs is constantly improving, they have been increasingly applied to deal with complex human instructions in real-world scenarios. Therefore, how to evaluate the ability of complex instruction-following of LLMs has become a critical research problem. Existing benchmarks mainly focus on modeling different types of constraints in human instructions while neglecting the composition of different constraints, which is an indispensable constituent in complex instructions. To this end, we propose ComplexBench, a benchmark for comprehensively evaluating the ability of LLMs to follow complex instructions composed of multiple constraints. We propose a hierarchical taxonomy for complex instructions, including 4 constraint types, 19 constraint dimensions, and 4 composition types, and manually collect a high-quality dataset accordingly. To make the evaluation reliable, we augment LLM-based evaluators with rules to effectively verify whether generated texts can satisfy each constraint and composition. Furthermore, we obtain the final evaluation score based on the dependency structure determined by different composition types. ComplexBench identifies significant deficiencies in existing LLMs when dealing with complex instructions with multiple constraints composition.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2407.03978

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Industry:

Education > Educational Setting (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback