Large Language Model
How to Design and Deliver Courses for Higher Education in the AI Era: Insights from Exam Data Analysis
Wazan, Ahmad Samer, Taj, Imran, Shoufan, Abdulhadi, Laborde, Romain, Venant, Rémi
In this position paper, we advocate for the idea that courses and exams in the AI era have to be designed based on two factors: (1) the strengths and limitations of AI, and (2) the pedagogical educational objectives. Based on insights from the Delors report on education [1], we first address the role of education and recall the main objectives that educational institutes must strive to achieve independently of any technology. We then explore the strengths and limitations of AI, based on current advances in AI. We explain how courses and exams can be designed based on these strengths and limitations of AI, providing different examples in the IT, English, and Art domains. We show how we adopted a pedagogical approach that is inspired from the Socratic teaching method from January 2023 to May 2023. Then, we present the data analysis results of seven ChatGPT-authorized exams conducted between December 2022 and March 2023. Our exam data results show that there is no correlation between students' grades and whether or not they use ChatGPT to answer their exam questions. Finally, we present a new exam system that allows us to apply our pedagogical approach in the AI era.
FinPT: Financial Risk Prediction with Profile Tuning on Pretrained Foundation Models
Yin, Yuwei, Yang, Yazheng, Yang, Jian, Liu, Qi
Financial risk prediction plays a crucial role in the financial sector. Machine learning methods have been widely applied for automatically detecting potential risks and thus saving the cost of labor. However, the development in this field is lagging behind in recent years by the following two facts: 1) the algorithms used are somewhat outdated, especially in the context of the fast advance of generative AI and large language models (LLMs); 2) the lack of a unified and open-sourced financial benchmark has impeded the related research for years. To tackle these issues, we propose FinPT and FinBench: the former is a novel approach for financial risk prediction that conduct Profile Tuning on large pretrained foundation models, and the latter is a set of high-quality datasets on financial risks such as default, fraud, and churn. In FinPT, we fill the financial tabular data into the pre-defined instruction template, obtain natural-language customer profiles by prompting LLMs, and fine-tune large foundation models with the profile text to make predictions. We demonstrate the effectiveness of the proposed FinPT by experimenting with a range of representative strong baselines on FinBench. The analytical studies further deepen the understanding of LLMs for financial risk prediction.
A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks
Labrak, Yanis, Rouvier, Mickael, Dufour, Richard
We evaluate four state-of-the-art instruction-tuned large language models (LLMs) -- ChatGPT, Flan-T5 UL2, Tk-Instruct, and Alpaca -- on a set of 13 real-world clinical and biomedical natural language processing (NLP) tasks in English, such as named-entity recognition (NER), question-answering (QA), relation extraction (RE), etc. Our overall results demonstrate that the evaluated LLMs begin to approach performance of state-of-the-art models in zero- and few-shot scenarios for most tasks, and particularly well for the QA task, even though they have never seen examples from these tasks before. However, we observed that the classification and RE tasks perform below what can be achieved with a specifically trained model for the medical field, such as PubMedBERT. Finally, we noted that no LLM outperforms all the others on all the studied tasks, with some models being better suited for certain tasks than others.
Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?
Wu, Cheng-En, Tian, Yu, Yu, Haichao, Wang, Heng, Morgado, Pedro, Hu, Yu Hen, Yang, Linjie
Vision-language models such as CLIP learn a generic text-image embedding from large-scale training data. A vision-language model can be adapted to a new classification task through few-shot prompt tuning. We find that such a prompt tuning process is highly robust to label noises. This intrigues us to study the key reasons contributing to the robustness of the prompt tuning paradigm. We conducted extensive experiments to explore this property and find the key factors are: 1) the fixed classname tokens provide a strong regularization to the optimization of the model, reducing gradients induced by the noisy samples; 2) the powerful pre-trained image-text embedding that is learned from diverse and generic web data provides strong prior knowledge for image classification. Further, we demonstrate that noisy zero-shot predictions from CLIP can be used to tune its own prompt, significantly enhancing prediction accuracy in the unsupervised setting. The code is available at https://github.com/CEWu/PTNL.
Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting
Schaeffer, Rylan, Pistunova, Kateryna, Khanna, Samar, Consul, Sarthak, Koyejo, Sanmi
Language models can be prompted to reason through problems in a manner that significantly improves performance. However, \textit{why} such prompting improves performance is unclear. Recent work showed that using logically \textit{invalid} Chain-of-Thought (CoT) prompting improves performance almost as much as logically \textit{valid} CoT prompting, and that editing CoT prompts to replace problem-specific information with abstract information or out-of-distribution information typically doesn't harm performance. Critics have responded that these findings are based on too few and too easily solved tasks to draw meaningful conclusions. To resolve this dispute, we test whether logically invalid CoT prompts offer the same level of performance gains as logically valid prompts on the hardest tasks in the BIG-Bench benchmark, termed BIG-Bench Hard (BBH). We find that the logically \textit{invalid} reasoning prompts do indeed achieve similar performance gains on BBH tasks as logically valid reasoning prompts. We also discover that some CoT prompts used by previous works contain logical errors. This suggests that covariates beyond logically valid reasoning are responsible for performance improvements.
SurgicalGPT: End-to-End Language-Vision GPT for Visual Question Answering in Surgery
Seenivasan, Lalithkumar, Islam, Mobarakol, Kannan, Gokul, Ren, Hongliang
Advances in GPT-based large language models (LLMs) are revolutionizing natural language processing, exponentially increasing its use across various domains. Incorporating uni-directional attention, these autoregressive LLMs can generate long and coherent paragraphs. However, for visual question answering (VQA) tasks that require both vision and language processing, models with bi-directional attention or models employing fusion techniques are often employed to capture the context of multiple modalities all at once. As GPT does not natively process vision tokens, to exploit the advancements in GPT models for VQA in robotic surgery, we design an end-to-end trainable Language-Vision GPT (LV-GPT) model that expands the GPT2 model to include vision input (image). The proposed LV-GPT incorporates a feature extractor (vision tokenizer) and vision token embedding (token type and pose). Given the limitations of unidirectional attention in GPT models and their ability to generate coherent long paragraphs, we carefully sequence the word tokens before vision tokens, mimicking the human thought process of understanding the question to infer an answer from an image. Quantitatively, we prove that the LV-GPT model outperforms other state-of-the-art VQA models on two publically available surgical-VQA datasets (based on endoscopic vision challenge robotic scene segmentation 2018 and CholecTriplet2021) and on our newly annotated dataset (based on the holistic surgical scene dataset). We further annotate all three datasets to include question-type annotations to allow sub-type analysis. Furthermore, we extensively study and present the effects of token sequencing, token type and pose embedding for vision tokens in the LV-GPT model.
OpenAI's trust and safety lead is leaving the company
OpenAI's trust and safety lead, Dave Willner, has left the position, as announced via a Linkedin post. Willner is staying on in an "advisory role" but has asked Linkedin followers to "reach out" for related opportunities. The former OpenAI project lead states that the move comes after a decision to spend more time with his family. Yes, that's what they always say, but Willner follows it up with actual details. "In the months following the launch of ChatGPT, I've found it more and more difficult to keep up my end of the bargain," he writes.
Google, Meta, Microsoft, OpenAI and more agree to voluntary AI safeguards
Several of the top American companies developing AI have agreed to work with the U.S. government and commit to several principles to ensure public trust in AI, the White House said Friday. Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI all signed off on the commitments to make AI safe, secure, and trustworthy. In May, the Biden administration had said that it would meet with leading AI developers to ensure that they were consistent with U.S. policy. The commitments are not binding, and there are no penalties for failing to adhere to them. The policies can't retroactively affect AI systems that have already been deployed, either -- one of the provisions says that the companies will commit to testing the AI for security vulnerabilities, both internally and externally, before releasing it.
Amazon, Google, Meta, Microsoft And Others Agree To AI Safeguards Set By The White House
Amazon, Google, Meta, Microsoft and other companies that are leading the development of artificial intelligence technology have agreed to meet a set of AI safeguards brokered by President Joe Biden's administration. The White House said Friday that it has secured voluntary commitments from seven U.S. companies meant to ensure their AI products are safe before they release them. Some of the commitments call for third-party oversight of the workings of commercial AI systems, though they don't detail who will audit the technology or hold the companies accountable. A surge of commercial investment in generative AI tools that can write convincingly human-like text and churn out new images and other media has brought public fascination as well as concern about their ability to trick people and spread disinformation, among other dangers. The four tech giants, along with ChatGPT-maker OpenAI and startups Anthropic and Inflection, have committed to security testing "carried out in part by independent experts" to guard against major risks, such as to biosecurity and cybersecurity, the White House said in a statement.
White House gets seven AI developers to agree to safety, security, trust guidelines
Fox News anchor Julie Banderas reacts to the vice president's gaffe and CNN calling Dylan Mulvaney a man on'Jesse Watters Primetime.' The Biden administration announced Friday that seven of the nation's top artificial intelligence developers have agreed to guidelines aimed at ensuring the "safe" deployment of AI. Amazon, Anthropic, Google, Inflection, Meta, Microsoft and OpenAI all agreed to the guidelines and will participate in a Friday afternoon event with President Biden to tout the voluntary agreement. "Companies that are developing these emerging technologies have a responsibility to ensure their products are safe," the White House said in a Friday morning statement. "To make the most of AI's potential, the Biden-Harris Administration is encouraging this industry to uphold the highest standards to ensure that innovation doesn't come at the expense of Americans' rights and safety."