Large Language Model
IERL: Interpretable Ensemble Representation Learning -- Combining CrowdSourced Knowledge and Distributed Semantic Representations
Zi, Yuxin, Roy, Kaushik, Narayanan, Vignesh, Gaur, Manas, Sheth, Amit
Large Language Models (LLMs) encode meanings of words in the form of distributed semantics. Distributed semantics capture common statistical patterns among language tokens (words, phrases, and sentences) from large amounts of data. LLMs perform exceedingly well across General Language Understanding Evaluation (GLUE) tasks designed to test a model's understanding of the meanings of the input tokens. However, recent studies have shown that LLMs tend to generate unintended, inconsistent, or wrong texts as outputs when processing inputs that were seen rarely during training, or inputs that are associated with diverse contexts (e.g., well-known hallucination phenomenon in language generation tasks). Crowdsourced and expert-curated knowledge graphs such as ConceptNet are designed to capture the meaning of words from a compact set of well-defined contexts. Thus LLMs may benefit from leveraging such knowledge contexts to reduce inconsistencies in outputs. We propose a novel ensemble learning method, Interpretable Ensemble Representation Learning (IERL), that systematically combines LLM and crowdsourced knowledge representations of input tokens. IERL has the distinct advantage of being interpretable by design (when was the LLM context used vs. when was the knowledge context used?) over state-of-the-art (SOTA) methods, allowing scrutiny of the inputs in conjunction with the parameters of the model, facilitating the analysis of models' inconsistent or irrelevant outputs. Although IERL is agnostic to the choice of LLM and crowdsourced knowledge, we demonstrate our approach using BERT and ConceptNet. We report improved or competitive results with IERL across GLUE tasks over current SOTA methods and significantly enhanced model interpretability.
Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models
Zhang, Sarah J., Florin, Samuel, Lee, Ariel N., Niknafs, Eamon, Marginean, Andrei, Wang, Annie, Tyser, Keith, Chin, Zad, Hicke, Yann, Singh, Nikhil, Udell, Madeleine, Kim, Yoon, Buonassisi, Tonio, Solar-Lezama, Armando, Drori, Iddo
We curate a comprehensive dataset of 4,550 questions and solutions from problem sets, midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and Computer Science (EECS) courses required for obtaining a degree. We evaluate the ability of large language models to fulfill the graduation requirements for any MIT major in Mathematics and EECS. Our results demonstrate that GPT-3.5 successfully solves a third of the entire MIT curriculum, while GPT-4, with prompt engineering, achieves a perfect solve rate on a test set excluding questions based on images. We fine-tune an open-source large language model on this dataset. We employ GPT-4 to automatically grade model responses, providing a detailed performance breakdown by course, question, and answer type. By embedding questions in a low-dimensional space, we explore the relationships between questions, topics, and classes and discover which questions and classes are required for solving other questions and classes through few-shot learning. Our analysis offers valuable insights into course prerequisites and curriculum design, highlighting language models' potential for learning and improving Mathematics and EECS education.
MOFI: Learning Image Representations from Noisy Entity Annotated Images
Wu, Wentao, Timofeev, Aleksei, Chen, Chen, Zhang, Bowen, Duan, Kun, Liu, Shuangning, Zheng, Yantao, Shlens, Jon, Du, Xianzhi, Gan, Zhe, Yang, Yinfei
We present MOFI, a new vision foundation model designed to learn image representations from noisy entity annotated images. MOFI differs from previous work in two key aspects: ($i$) pre-training data, and ($ii$) training recipe. Regarding data, we introduce a new approach to automatically assign entity labels to images from noisy image-text pairs. Our approach involves employing a named entity recognition model to extract entities from the alt-text, and then using a CLIP model to select the correct entities as labels of the paired image. The approach is simple, does not require costly human annotation, and can be readily scaled up to billions of image-text pairs mined from the web. Through this method, we have created Image-to-Entities (I2E), a new large-scale dataset with 1 billion images and 2 million distinct entities, covering rich visual concepts in the wild. Building upon the I2E dataset, we study different training recipes, including supervised pre-training, contrastive pre-training, and multi-task learning. For constrastive pre-training, we treat entity names as free-form text, and further enrich them with entity descriptions. Experiments show that supervised pre-training with large-scale fine-grained entity labels is highly effective for image retrieval tasks, and multi-task training further improves the performance. The final MOFI model achieves 86.66% mAP on the challenging GPR1200 dataset, surpassing the previous state-of-the-art performance of 72.19% from OpenAI's CLIP model. Further experiments on zero-shot and linear probe image classification also show that MOFI outperforms a CLIP model trained on the original image-text data, demonstrating the effectiveness of the I2E dataset in learning strong image representations.
Multilingual LLMs are Better Cross-lingual In-context Learners with Alignment
Tanwar, Eshaan, Dutta, Subhabrata, Borthakur, Manish, Chakraborty, Tanmoy
In-context learning (ICL) unfolds as large language models become capable of inferring test labels conditioned on a few labeled samples without any gradient update. ICL-enabled large language models provide a promising step forward toward bypassing recurrent annotation costs in a low-resource setting. Yet, only a handful of past studies have explored ICL in a cross-lingual setting, in which the need for transferring label-knowledge from a high-resource language to a low-resource one is immensely crucial. To bridge the gap, we provide the first in-depth analysis of ICL for cross-lingual text classification. We find that the prevalent mode of selecting random input-label pairs to construct the prompt-context is severely limited in the case of cross-lingual ICL, primarily due to the lack of alignment in the input as well as the output spaces. To mitigate this, we propose a novel prompt construction strategy -- Cross-lingual In-context Source-Target Alignment (X-InSTA). With an injected coherence in the semantics of the input examples and a task-based alignment across the source and target languages, X-InSTA is able to outperform random prompt selection by a large margin across three different tasks using 44 different cross-lingual pairs.
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge
Li, Yunxiang, Li, Zihan, Zhang, Kai, Dan, Ruilong, Jiang, Steve, Zhang, You
The primary aim of this research was to address the limitations observed in the medical knowledge of prevalent large language models (LLMs) such as ChatGPT, by creating a specialized language model with enhanced accuracy in medical advice. We achieved this by adapting and refining the large language model meta-AI (LLaMA) using a large dataset of 100,000 patient-doctor dialogues sourced from a widely used online medical consultation platform. These conversations were cleaned and anonymized to respect privacy concerns. In addition to the model refinement, we incorporated a self-directed information retrieval mechanism, allowing the model to access and utilize real-time information from online sources like Wikipedia and data from curated offline medical databases. The fine-tuning of the model with real-world patient-doctor interactions significantly improved the model's ability to understand patient needs and provide informed advice. By equipping the model with self-directed information retrieval from reliable online and offline sources, we observed substantial improvements in the accuracy of its responses. Our proposed ChatDoctor, represents a significant advancement in medical LLMs, demonstrating a significant improvement in understanding patient inquiries and providing accurate advice. Given the high stakes and low error tolerance in the medical field, such enhancements in providing accurate and reliable information are not only beneficial but essential.
Chuck Schumer Wants AI to Be Explainable. It's Harder Than It Sounds
Earlier this week, Senate majority leader Chuck Schumer unveiled his SAFE Innovation Framework for artificial intelligence (AI), calling on Congress to take swift, decisive action. Leaders in the AI industry have been calling out for regulation. But Schumer's proposal reveals how difficult it could be in practice for policymakers to regulate a technology that even experts struggle to fully understand. The SAFE Innovation Framework has a number of policy goals: make sure AI systems are secure against cyber attacks, protect jobs, ensure accountability for those deploying AI systems, and defend U.S. democratic values, all without stifling innovation. The part of Schumer's framework which comes closest to making a concrete policy proposal, rather than setting a policy goal, is his call for explainability.
Natural Language Programming AIs are taking the drudgery out of coding
That three-word pejorative is perpetually on the lips and at the fingertips of internet trolls and tech bros whenever media layoffs are announced. A useless sentiment in its own right, but with the recent advent of code generating AIs, knowing the ins and outs of a programming language like Python could soon be about as useful as knowing how to fluently speak a dead language like Sanskrit. In fact, these genAIs are already helping professional software developers code faster and more effectively by handling much of the programming grunt work. Two of today's most widely distributed and written coding languages are Java and Python. The former almost single handedly revolutionized cross-platform operation when it was released in the mid-'90s and now drives "everything from smartcards to space vehicles," according to Java Magazine in 2020 -- not to mention Wikipedia's search function and all of Minecraft.
US lawyers fined $5,000 after including fake case citations generated by ChatGPT
It's something that's drilled into you from the first essay you write in school: Always check your sources. Yet, New York attorney Steven Schwartz relied on ChatGPT to find and review them for him -- a decision that's led a judge to issue a $5,000 fine to him, his associate Peter LoDuca and their law firm Levidow, Levidow and Oberman, The Guardian reports. Schwartz used it for a case in which a man was suing Colombian airline Avianca alleging he was injured on a flight to New York City. In this case, ChatGPT produced six cases as precedent, such as "Martinez v. Delta Airlines" and "Miller v. United Airlines," that were either inaccurate or simply didn't exist. In the decision to fine Schwartz and co., Judge P Kevin Castel explained, "Technological advances are commonplace and there is nothing inherently improper about using a reliable artificial intelligence tool for assistance. But existing rules impose a gatekeeping role on attorneys to ensure the accuracy of their filings."
AI Could Help Free Human Creativity
We're more distracted than ever. Why remember anything when I can just Google it? Why summon the attention to read a book when I can just scroll through Twitter? Some philosophers believe that ChatGPT and its siblings will further diminish our ability to do the kind of "deep work" needed to spark creativity and breed big ideas. What good are the tools if we begin to rely on them so much that we no longer have the capacity to think bigger?
Two US lawyers fined for submitting fake court citations from ChatGPT
A US judge has fined two lawyers and a law firm $5,000 (£3,935) after fake citations generated by ChatGPT were submitted in a court filing. A district judge in Manhattan ordered Steven Schwartz, Peter LoDuca and their law firm Levidow, Levidow & Oberman to pay the fine after fictitious legal research was used in an aviation injury claim. Schwartz had admitted that ChatGPT, a chatbot that churns out plausible text responses to human prompts, invented six cases he referred to in a legal brief in a case against the Colombian airline Avianca. The judge P Kevin Castel said in a written opinion there was nothing "inherently improper" about using artificial intelligence for assisting in legal work, but lawyers had to ensure their filings were accurate. "Technological advances are commonplace and there is nothing inherently improper about using a reliable artificial intelligence tool for assistance," Castel wrote.