AITopics | Xiao, Cao

Collaborating Authors

Xiao, Cao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement

Wang, Xiyao, Chen, Jiuhai, Wang, Zhaoyang, Zhou, Yuhang, Zhou, Yiyang, Yao, Huaxiu, Zhou, Tianyi, Goldstein, Tom, Bhatia, Parminder, Huang, Furong, Xiao, Cao

arXiv.org Artificial IntelligenceJun-7-2024

Large vision-language models (LVLMs) have achieved impressive results in various visual question-answering and reasoning tasks through vision instruction tuning on specific datasets. However, there is still significant room for improvement in the alignment between visual and language modalities. Previous methods to enhance this alignment typically require external models or data, heavily depending on their capabilities and quality, which inevitably sets an upper bound on performance. In this paper, we propose SIMA, a framework that enhances visual and language modality alignment through self-improvement, eliminating the needs for external models or data. SIMA leverages prompts from existing vision instruction tuning datasets to self-generate responses and employs an in-context self-critic mechanism to select response pairs for preference tuning. The key innovation is the introduction of three vision metrics during the in-context self-critic process, which can guide the LVLM in selecting responses that enhance image comprehension. Through experiments across 14 hallucination and comprehensive benchmarks, we demonstrate that SIMA not only improves model performance across all benchmarks but also achieves superior modality alignment, outperforming previous approaches.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2405.15973

Country:

North America > United States > Maryland (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.46)
Government > Regional Government > North America Government > United States Government (0.46)
Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

Certifiably Byzantine-Robust Federated Conformal Prediction

Kang, Mintong, Lin, Zhen, Sun, Jimeng, Xiao, Cao, Li, Bo

arXiv.org Artificial IntelligenceJun-4-2024

Conformal prediction has shown impressive capacity in constructing statistically rigorous prediction sets for machine learning models with exchangeable data samples. The siloed datasets, coupled with the escalating privacy concerns related to local data sharing, have inspired recent innovations extending conformal prediction into federated environments with distributed data samples. However, this framework for distributed uncertainty quantification is susceptible to Byzantine failures. A minor subset of malicious clients can significantly compromise the practicality of coverage guarantees. To address this vulnerability, we introduce a novel framework Rob-FCP, which executes robust federated conformal prediction, effectively countering malicious clients capable of reporting arbitrary statistics with the conformal calibration process. We theoretically provide the conformal coverage bound of Rob-FCP in the Byzantine setting and show that the coverage of Rob-FCP is asymptotically close to the desired coverage level. We also propose a malicious client number estimator to tackle a more challenging setting where the number of malicious clients is unknown to the defender and theoretically shows its effectiveness. We empirically demonstrate the robustness of Rob-FCP against diverse proportions of malicious clients under a variety of Byzantine attacks on five standard benchmark and real-world healthcare datasets.

artificial intelligence, machine learning, malicious client, (15 more...)

arXiv.org Artificial Intelligence

2406.0196

Country:

North America > United States > Illinois (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine > Diagnostic Medicine (0.45)
Government > Regional Government > North America Government > United States Government (0.45)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

KG-FIT: Knowledge Graph Fine-Tuning Upon Open-World Knowledge

Jiang, Pengcheng, Cao, Lang, Xiao, Cao, Bhatia, Parminder, Sun, Jimeng, Han, Jiawei

arXiv.org Artificial IntelligenceJun-4-2024

Knowledge Graph Embedding (KGE) techniques are crucial in learning compact representations of entities and relations within a knowledge graph, facilitating efficient reasoning and knowledge discovery. While existing methods typically focus either on training KGE models solely based on graph structure or fine-tuning pre-trained language models with classification data in KG, KG-FIT leverages LLM-guided refinement to construct a semantically coherent hierarchical structure of entity clusters. By incorporating this hierarchical knowledge along with textual information during the fine-tuning process, KG-FIT effectively captures both global semantics from the LLM and local semantics from the KG. Extensive experiments on the benchmark datasets FB15K-237, YAGO3-10, and PrimeKG demonstrate the superiority of KG-FIT over state-of-the-art pre-trained language model-based methods, achieving improvements of 14.4%, 13.5%, and 11.9% in the Hits@10 metric for the link prediction task, respectively. Furthermore, KG-FIT yields substantial performance gains of 12.6%, 6.7%, and 17.7% compared to the structure-based base models upon which it is built. These results highlight the effectiveness of KG-FIT in incorporating open-world knowledge from LLMs to significantly enhance the expressiveness and informativeness of KG embeddings.

kg-fit, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2405.16412

Country:

North America > United States > New York (0.14)
North America > United States > California (0.14)

Genre:

Research Report (0.81)
Workflow (0.68)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Health & Medicine > Therapeutic Area (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale

Jiang, Pengcheng, Xiao, Cao, Wang, Zifeng, Bhatia, Parminder, Sun, Jimeng, Han, Jiawei

arXiv.org Artificial IntelligenceMar-15-2024

The advent of large language models (LLMs) has significantly advanced natural language processing tasks like text summarization. However, their large size and computational demands, coupled with privacy concerns in data transmission, limit their use in resource-constrained and privacy-centric settings. To overcome this, we introduce TriSum, a framework for distilling LLMs' text summarization abilities into a compact, local model. Initially, LLMs extract a set of aspect-triple rationales and summaries, which are refined using a dual-scoring method for quality. Next, a smaller local model is trained with these tasks, employing a curriculum learning strategy that evolves from simple to complex tasks. Our method enhances local model performance on various benchmarks (CNN/DailyMail, XSum, and ClinicalTrial), outperforming baselines by 4.5%, 8.5%, and 7.4%, respectively. It also improves interpretability by providing insights into the summarization rationale.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2403.10351

Country:

North America > Mexico (1.00)
Asia (0.92)
Europe (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine (1.00)
Energy > Oil & Gas (1.00)
Government > Regional Government > North America Government > Mexico Government (0.68)
Information Technology (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Recent Advances in Predictive Modeling with Electronic Health Records

Wang, Jiaqi, Luo, Junyu, Ye, Muchao, Wang, Xiaochen, Zhong, Yuan, Chang, Aofei, Huang, Guanjie, Yin, Ziyi, Xiao, Cao, Sun, Jimeng, Ma, Fenglong

arXiv.org Artificial IntelligenceFeb-1-2024

The development of electronic health records (EHR) systems has enabled the collection of a vast amount of digitized patient data. However, utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics. With the advancements in machine learning techniques, deep learning has demonstrated its superiority in various applications, including healthcare. This survey systematically reviews recent advances in deep learning-based predictive models using EHR data. Specifically, we begin by introducing the background of EHR data and providing a mathematical definition of the predictive modeling task. We then categorize and summarize predictive deep models from multiple perspectives. Furthermore, we present benchmarks and toolkits relevant to predictive modeling in healthcare. Finally, we conclude this survey by discussing open challenges and suggesting promising directions for future research.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

2402.01077

Country: North America > United States > Illinois (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PILOT: Legal Case Outcome Prediction with Case Law

Cao, Lang, Wang, Zifeng, Xiao, Cao, Sun, Jimeng

arXiv.org Artificial IntelligenceJan-28-2024

Machine learning shows promise in predicting the outcome of legal cases, but most research has concentrated on civil law cases rather than case law systems. We identified two unique challenges in making legal case outcome predictions with case law. First, it is crucial to identify relevant precedent cases that serve as fundamental evidence for judges during decision-making. Second, it is necessary to consider the evolution of legal principles over time, as early cases may adhere to different legal contexts. In this paper, we proposed a new model named PILOT (PredictIng Legal case OuTcome) for case outcome prediction. It comprises two modules for relevant case retrieval and temporal pattern handling, respectively. To benchmark the performance of existing legal case outcome prediction models, we curated a dataset from a large-scale case law database. We demonstrate the importance of accurately identifying precedent cases and mitigating the temporal shift when making predictions for case law, as our method shows a significant improvement over the prior methods that focus on civil law case outcome predictions.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2401.1577

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations

Jiang, Pengcheng, Xiao, Cao, Fu, Tianfan, Sun, Jimeng

arXiv.org Artificial IntelligenceJan-19-2024

Molecule representation learning is crucial for various downstream applications, such as understanding and predicting molecular properties and side effects. In this paper, we propose a novel method called GODE, which takes into account the two-level structure of individual molecules. We recognize that molecules have an intrinsic graph structure as well as being a node in a larger molecule knowledge graph. GODE integrates graph representations of individual molecules with multidomain biochemical data from knowledge graphs. By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures. This fusion results in a more robust and informative representation, which enhances molecular property prediction by harnessing both chemical and biological information. When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks. Impressively, it surpasses the current leading model in molecule property predictions with average advancements of 2.1% in classification and 6.4% in regression tasks.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

2306.01631

Country: North America > United States (0.46)

Genre: Research Report > Promising Solution (0.48)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

GraphCare: Enhancing Healthcare Predictions with Personalized Knowledge Graphs

Jiang, Pengcheng, Xiao, Cao, Cross, Adam, Sun, Jimeng

arXiv.org Artificial IntelligenceJan-17-2024

Clinical predictive models often rely on patients' electronic health records (EHR), but integrating medical knowledge to enhance predictions and decision-making is challenging. This is because personalized predictions require personalized knowledge graphs (KGs), which are difficult to generate from patient EHR data. Our method extracts knowledge from large language models (LLMs) and external biomedical KGs to build patient-specific KGs, which are then used to train our proposed Bi-attention AugmenTed (BAT) graph neural network (GNN) for healthcare predictions. On MIMIC-III, it boosts AUROC by 17.6% and 6.6% for mortality and readmission, and F1-score by 7.9% and 10.8% for LOS and drug recommendation, respectively. To improve predictive performance and integrate expert knowledge with data insights, clinical knowledge graphs (KGs) were adopted to complement EHR modeling (Chen et al., 2019; Choi et al., 2020; Rotmensch et al., 2017). These KGs represent medical concepts (e.g., diagnoses, procedures, drugs) and their relationships, enabling effective learning of patterns and dependencies. However, existing approaches mainly focus on simple hierarchical relations (Choi et al., 2017; 2018; 2020) rather than leveraging comprehensive relationships among biomedical entities despite incorporating valuable contextual information from established biomedical knowledge bases (e.g., UMLS (Bodenreider, 2004)) could enhance predictions. Moreover, large language models (LLMs) such as GPT (Brown et al., 2020; Chowdhery et al., 2022; Luo et al., 2022; OpenAI, 2023) pre-trained on web-scale biomedical literature could serve as alternative resources for extracting clinical knowledge given their remarkable reasoning abilities on open-world data. There is a substantial body of existing research demonstrating their potential use as knowledge bases (Lv et al., 2022; Petroni et al., 2019; AlKhamissi et al., 2022). To fill the gap in personalized medical KGs, we propose to leverage the exceptional reasoning abilities of LLMs to extract and integrate personalized KG from open-world data. Below: For each patient, we compose a patient-specific graph by combining the concept-specific KGs associated with them and make the graph temporal with sequential data across patient's visits ( 3.2). To utilize the patient graph for predictions, we employ a bi-attention augmented graph neural network (GNN) model, which highlights essential visits and nodes with attention weights ( 3.3). As shown in Figure 1, our patient KG generation module first takes medical concepts as input and generates concept-specific KGs by prompting LLMs or retrieving subgraphs from existing graphs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.12788

Country:

North America > United States > Illinois (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ConSequence: Synthesizing Logically Constrained Sequences for Electronic Health Record Generation

Theodorou, Brandon, Jain, Shrusti, Xiao, Cao, Sun, Jimeng

arXiv.org Artificial IntelligenceDec-20-2023

Generative models can produce synthetic patient records for analytical tasks when real data is unavailable or limited. However, current methods struggle with adhering to domain-specific knowledge and removing invalid data. We present ConSequence, an effective approach to integrating domain knowledge into sequential generative neural network outputs. Our rule-based formulation includes temporal aggregation and antecedent evaluation modules, ensured by an efficient matrix multiplication formulation, to satisfy hard and soft logical constraints across time steps. Existing constraint methods often fail to guarantee constraint satisfaction, lack the ability to handle temporal constraints, and hinder the learning and computational efficiency of the model. In contrast, our approach efficiently handles all types of constraints with guaranteed logical coherence. We demonstrate ConSequence's effectiveness in generating electronic health records, outperforming competitors in achieving complete temporal and spatial constraint satisfaction without compromising runtime performance or generative quality. Specifically, ConSequence successfully prevents all rule violations while improving the model quality in reducing its test perplexity by 5% and incurring less than a 13% slowdown in generation speed compared to an unconstrained model.

artificial intelligence, constraint, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2312.05964

Country: North America > United States > Illinois (0.28)

Genre:

Research Report (1.00)
Overview (0.93)

Industry: Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Synthesize High-dimensional Longitudinal Electronic Health Records via Hierarchical Autoregressive Language Model

Theodorou, Brandon, Xiao, Cao, Sun, Jimeng

arXiv.org Artificial IntelligenceNov-9-2023

Synthetic electronic health records (EHRs) that are both realistic and preserve privacy can serve as an alternative to real EHRs for machine learning (ML) modeling and statistical analysis. However, generating high-fidelity and granular electronic health record (EHR) data in its original, highly-dimensional form poses challenges for existing methods due to the complexities inherent in high-dimensional data. In this paper, we propose Hierarchical Autoregressive Language mOdel (HALO) for generating longitudinal high-dimensional EHR, which preserve the statistical properties of real EHR and can be used to train accurate ML models without privacy concerns. Our HALO method, designed as a hierarchical autoregressive model, generates a probability density function of medical codes, clinical visits, and patient records, allowing for the generation of realistic EHR data in its original, unaggregated form without the need for variable selection or aggregation. Additionally, our model also produces high-quality continuous variables in a longitudinal and probabilistic manner. We conducted extensive experiments and demonstrate that HALO can generate high-fidelity EHR data with high-dimensional disease code probabilities (d > 10,000), disease co-occurrence probabilities within visits (d > 1,000,000), and conditional probabilities across consecutive visits (d > 5,000,000) and achieve above 0.9 R2 correlation in comparison to real EHR data. This performance then enables downstream ML models trained on its synthetic data to achieve comparable accuracy to models trained on real data (0.938 AUROC with HALO data vs. 0.943 with real data). Finally, using a combination of real and synthetic data enhances the accuracy of ML models beyond that achieved by using only real EHR data.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/s41467-023-41093-0

2304.02169

Country: North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Health Care Technology > Medical Record (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback