Goto

Collaborating Authors

 Large Language Model


Adapting Pretrained Language Models for Solving Tabular Prediction Problems in the Electronic Health Record

arXiv.org Artificial Intelligence

We propose an approach for adapting the DeBERTa model for electronic health record (EHR) tasks using domain adaptation. We pretrain a small DeBERTa model on a dataset consisting of MIMIC-III discharge summaries, clinical notes, radiology reports, and PubMed abstracts. We compare this model's performance with a DeBERTa model pre-trained on clinical texts from our institutional EHR (MeDeBERTa) and an XGBoost model. We evaluate performance on three benchmark tasks for emergency department outcomes using the MIMIC-IV-ED dataset. We preprocess the data to convert it into text format and generate four versions of the original datasets to compare data processing and data inclusion. The results show that our proposed approach outperforms the alternative models on two of three tasks (p<0.001) and matches performance on the third task, with the use of descriptive columns improving performance over the original column names.


Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning

arXiv.org Artificial Intelligence

This paper presents a systematic overview and comparison of parameter-efficient fine-tuning methods covering over 40 papers published between February 2019 and February 2023. These methods aim to resolve the infeasibility and impracticality of fine-tuning large language models by only training a small set of parameters. We provide a taxonomy that covers a broad range of methods and present a detailed method comparison with a specific focus on real-life efficiency and fine-tuning multibillion-scale language models.


Opportunities and Challenges in Neural Dialog Tutoring

arXiv.org Artificial Intelligence

Designing dialog tutors has been challenging as it involves modeling the diverse and complex pedagogical strategies employed by human tutors. Although there have been significant recent advances in neural conversational systems using large language models (LLMs) and growth in available dialog corpora, dialog tutoring has largely remained unaffected by these advances. In this paper, we rigorously analyze various generative language models on two dialog tutoring datasets for language learning using automatic and human evaluations to understand the new opportunities brought by these advances as well as the challenges we must overcome to build models that would be usable in real educational settings. We find that although current approaches can model tutoring in constrained learning scenarios when the number of concepts to be taught and possible teacher strategies are small, they perform poorly in less constrained scenarios. Our human quality evaluation shows that both models and ground-truth annotations exhibit low performance in terms of equitable tutoring, which measures learning opportunities for students and how engaging the dialog is. To understand the behavior of our models in a real tutoring setting, we conduct a user study using expert annotators and find a significantly large number of model reasoning errors in 45% of conversations. Finally, we connect our findings to outline future work.


Zero-Shot On-the-Fly Event Schema Induction

arXiv.org Artificial Intelligence

What are the events involved in a pandemic outbreak? What steps should be taken when planning a wedding? The answers to these questions can be found by collecting many documents on the complex event of interest, extracting relevant information, and analyzing it. We present a new approach in which large language models are utilized to generate source documents that allow predicting, given a high-level event definition, the specific events, arguments, and relations between them to construct a schema that describes the complex event in its entirety. Using our model, complete schemas on any topic can be generated on-the-fly without any manual data collection, i.e., in a zero-shot manner. Moreover, we develop efficient methods to extract pertinent information from texts and demonstrate in a series of experiments that these schemas are considered to be more complete than human-curated ones in the majority of examined scenarios. Finally, we show that this framework is comparable in performance with previous supervised schema induction methods that rely on collecting real texts while being more general and flexible without the need for a predefined ontology.


Learning Attention as Disentangler for Compositional Zero-shot Learning

arXiv.org Artificial Intelligence

Compositional zero-shot learning (CZSL) aims at learning visual concepts (i.e., attributes and objects) from seen compositions and combining concept knowledge into unseen compositions. The key to CZSL is learning the disentanglement of the attribute-object composition. To this end, we propose to exploit cross-attentions as compositional disentanglers to learn disentangled concept embeddings. For example, if we want to recognize an unseen composition "yellow flower", we can learn the attribute concept "yellow" and object concept "flower" from different yellow objects and different flowers respectively. To further constrain the disentanglers to learn the concept of interest, we employ a regularization at the attention level. Specifically, we adapt the earth mover's distance (EMD) as a feature similarity metric in the cross-attention module. Moreover, benefiting from concept disentanglement, we improve the inference process and tune the prediction score by combining multiple concept probabilities. Comprehensive experiments on three CZSL benchmark datasets demonstrate that our method significantly outperforms previous works in both closed- and open-world settings, establishing a new state-of-the-art.


LMCanvas: Object-Oriented Interaction to Personalize Large Language Model-Powered Writing Environments

arXiv.org Artificial Intelligence

Beyond their generative capabilities, LLMs demonstrate significant few-shot and Large language models (LLMs) can enhance writing by automating zero-shot performance [4] meaning that they are able to perform or supporting specific tasks in writers' workflows (e.g., paraphrasing, previously unseen tasks with only an instruction and/or a couple creating analogies). Leveraging this capability, a collection of of examples--i.e., a prompt. By leveraging this ability of LLMs, interfaces have been developed that provide LLM-powered tools for writers can potentially automate or augment specific tasks in their specific writing tasks. However, these interfaces provide limited support workflows by using adequate prompts and, thus, further facilitate for writers to create personal tools for their own unique tasks, the writing process. For instance, based only on prompt examples and may not comprehensively fulfill a writer's needs--requiring provided for GPT-3 [17], writers can use LLMs to correct grammar, them to continuously switch between interfaces during writing. In create an outline, produce analogies, or even change the point-ofview this work, we envision LMCanvas, an interface that enables writers of a scene.


Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning

arXiv.org Artificial Intelligence

Recent compositional zero-shot learning (CZSL) methods adapt pre-trained vision-language models (VLMs) by constructing trainable prompts only for composed state-object pairs. Relying on learning the joint representation of seen compositions, these methods ignore the explicit modeling of the state and object, thus limiting the exploitation of pre-trained knowledge and generalization to unseen compositions. With a particular focus on the universality of the solution, in this work, we propose a novel paradigm for CZSL models that establishes three identification branches (i.e., Multi-Path) to jointly model the state, object, and composition. The presented Troika is our implementation that aligns the branch-specific prompt representations with decomposed visual features. To calibrate the bias between semantically similar multi-modal representations, we further devise a Cross-Modal Traction module into Troika that shifts the prompt representation towards the current visual content. We conduct extensive experiments on three popular benchmarks, where our method significantly outperforms existing methods in both closed-world and open-world settings.


Audio Visual Language Maps for Robot Navigation

arXiv.org Artificial Intelligence

While interacting in the world is a multi-sensory experience, many robots continue to predominantly rely on visual perception to map and navigate in their environments. In this work, we propose Audio-Visual-Language Maps (AVLMaps), a unified 3D spatial map representation for storing cross-modal information from audio, visual, and language cues. AVLMaps integrate the open-vocabulary capabilities of multimodal foundation models pre-trained on Internet-scale data by fusing their features into a centralized 3D voxel grid. In the context of navigation, we show that AVLMaps enable robot systems to index goals in the map based on multimodal queries, e.g., textual descriptions, images, or audio snippets of landmarks. In particular, the addition of audio information enables robots to more reliably disambiguate goal locations. Extensive experiments in simulation show that AVLMaps enable zero-shot multimodal goal navigation from multimodal prompts and provide 50% better recall in ambiguous scenarios. These capabilities extend to mobile robots in the real world - navigating to landmarks referring to visual, audio, and spatial concepts. Videos and code are available at: https://avlmaps.github.io.


GitHub - microsoft/visual-chatgpt: Official repo for the paper: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

#artificialintelligence

Visual ChatGPT connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting. On the one hand, ChatGPT (or LLMs) serves as a general interface that provides a broad and diverse understanding of a wide range of topics. On the other hand, Foundation Models serve as domain experts by providing deep knowledge in specific domains. By leveraging both general and deep knowledge, we aim at building an AI that is capable of handling various tasks. For help or issues using the Visual ChatGPT, please submit a GitHub issue.


How ChatGPT Can Help with Grading • TechNotes Blog

#artificialintelligence

I enjoy teaching, but I don't enjoy grading. Using rubrics makes grading easier, but it can still be a chore to review each assignment with fresh eyes so that the student you are grading now gets the same attention as the first few. This is where ChatGPT can come in handy. It doesn't get tired of grading. And if you have a tight rubric (well-designed with little or no loopholes), you can expect consistent results from ChatGPT, but there are a few important things to consider.