Large Language Model
You Can Generate It Again: Data-to-text Generation with Verification and Correction Prompting
Despite significant advancements in existing models, generating text descriptions from structured data input, known as data-to-text generation, remains a challenging task. In this paper, we propose a novel approach that goes beyond traditional one-shot generation methods by introducing a multi-step process consisting of generation, verification, and correction stages. Our approach, VCP(Verification and Correction Prompting), begins with the model generating an initial output. We then proceed to verify the correctness of different aspects of the generated text. The observations from the verification step are converted into a specialized error-indication prompt, which instructs the model to regenerate the output while considering the identified errors. To enhance the model's correction ability, we have developed a carefully designed training procedure. This procedure enables the model to incorporate feedback from the error-indication prompt, resulting in improved output generation. Through experimental results, we demonstrate that our approach effectively reduces slot error rates while maintaining the overall quality of the generated text.
Extending Context Window of Large Language Models via Positional Interpolation
Chen, Shouyuan, Wong, Sherman, Chen, Liangjian, Tian, Yuandong
We present Position Interpolation (PI) that extends the context window sizes of RoPE-based pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within 1000 steps), while demonstrating strong empirical results on various tasks that require long context, including passkey retrieval, language modeling, and long document summarization from LLaMA 7B to 65B. Meanwhile, the extended model by Position Interpolation preserve quality relatively well on tasks within its original context window. To achieve this goal, Position Interpolation linearly down-scales the input position indices to match the original context window size, rather than extrapolating beyond the trained context length which may lead to catastrophically high attention scores that completely ruin the self-attention mechanism. Our theoretical study shows that the upper bound of interpolation is at least $\sim 600 \times$ smaller than that of extrapolation, further demonstrating its stability. Models extended via Position Interpolation retain its original architecture and can reuse most pre-existing optimization and infrastructure.
PhD Thesis: Exploring the role of (self-)attention in cognitive and computer vision architecture
We investigate the role of attention and memory in complex reasoning tasks. We analyze Transformer-based self-attention as a model and extend it with memory. By studying a synthetic visual reasoning test, we refine the taxonomy of reasoning tasks. Incorporating self-attention with ResNet50, we enhance feature maps using feature-based and spatial attention, achieving efficient solving of challenging visual reasoning tasks. Our findings contribute to understanding the attentional needs of SVRT tasks. Additionally, we propose GAMR, a cognitive architecture combining attention and memory, inspired by active vision theory. GAMR outperforms other architectures in sample efficiency, robustness, and compositionality, and shows zero-shot generalization on new reasoning tasks.
Xplainer: From X-Ray Observations to Explainable Zero-Shot Diagnosis
Pellegrini, Chantal, Keicher, Matthias, รzsoy, Ege, Jiraskova, Petra, Braren, Rickmer, Navab, Nassir
Automated diagnosis prediction from medical images is a valuable resource to support clinical decision-making. However, such systems usually need to be trained on large amounts of annotated data, which often is scarce in the medical domain. Zero-shot methods address this challenge by allowing a flexible adaption to new settings with different clinical findings without relying on labeled data. Further, to integrate automated diagnosis in the clinical workflow, methods should be transparent and explainable, increasing medical professionals' trust and facilitating correctness verification. In this work, we introduce Xplainer, a novel framework for explainable zero-shot diagnosis in the clinical setting. Xplainer adapts the classification-by-description approach of contrastive vision-language models to the multi-label medical diagnosis task. Specifically, instead of directly predicting a diagnosis, we prompt the model to classify the existence of descriptive observations, which a radiologist would look for on an X-Ray scan, and use the descriptor probabilities to estimate the likelihood of a diagnosis. Our model is explainable by design, as the final diagnosis prediction is directly based on the prediction of the underlying descriptors. We evaluate Xplainer on two chest X-ray datasets, CheXpert and ChestX-ray14, and demonstrate its effectiveness in improving the performance and explainability of zero-shot diagnosis. Our results suggest that Xplainer provides a more detailed understanding of the decision-making process and can be a valuable tool for clinical diagnosis.
Can AI-Generated Text be Reliably Detected?
Sadasivan, Vinu Sankar, Kumar, Aounon, Balasubramanian, Sriram, Wang, Wenxiao, Feizi, Soheil
In this paper, both empirically and theoretically, we show that several AI-text detectors are not reliable in practical scenarios. Empirically, we show that paraphrasing attacks, where a light paraphraser is applied on top of a large language model (LLM), can break a whole range of detectors, including ones using watermarking schemes as well as neural network-based detectors and zero-shot classifiers. Our experiments demonstrate that retrieval-based detectors, designed to evade paraphrasing attacks, are still vulnerable to recursive paraphrasing. We then provide a theoretical impossibility result indicating that as language models become more sophisticated and better at emulating human text, the performance of even the best-possible detector decreases. For a sufficiently advanced language model seeking to imitate human text, even the best-possible detector may only perform marginally better than a random classifier. Our result is general enough to capture specific scenarios such as particular writing styles, clever prompt design, or text paraphrasing. We also extend the impossibility result to include the case where pseudorandom number generators are used for AI-text generation instead of true randomness. We show that the same result holds with a negligible correction term for all polynomial-time computable detectors. Finally, we show that even LLMs protected by watermarking schemes can be vulnerable against spoofing attacks where adversarial humans can infer hidden LLM text signatures and add them to human-generated text to be detected as text generated by the LLMs, potentially causing reputational damage to their developers. We believe these results can open an honest conversation in the community regarding the ethical and reliable use of AI-generated text.
QR-CLIP: Introducing Explicit Open-World Knowledge for Location and Time Reasoning
Shi, Weimin, Zhuge, Mingchen, Gao, Dehong, Zhou, Zhong, Cheng, Ming-Ming, Fan, Deng-Ping
Daily images may convey abstract meanings that require us to memorize and infer profound information from them. To encourage such human-like reasoning, in this work, we teach machines to predict where and when it was taken rather than performing basic tasks like traditional segmentation or classification. Inspired by Horn's QR theory, we designed a novel QR-CLIP model consisting of two components: 1) the Quantity module first retrospects more open-world knowledge as the candidate language inputs; 2) the Relevance module carefully estimates vision and language cues and infers the location and time. Experiments show our QR-CLIP's effectiveness, and it outperforms the previous SOTA on each task by an average of about 10% and 130% relative lift in terms of location and time reasoning. This study lays a technical foundation for location and time reasoning and suggests that effectively introducing open-world knowledge is one of the panaceas for the tasks.
From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams
Drori, Iddo, Zhang, Sarah J., Shuttleworth, Reece, Zhang, Sarah, Tyser, Keith, Chin, Zad, Lantigua, Pedro, Surbehera, Saisamrit, Hunter, Gregory, Austin, Derek, Tang, Leonard, Hicke, Yann, Simhon, Sage, Karnik, Sathwik, Granberry, Darnell, Udell, Madeleine
A final exam in machine learning at a top institution such as MIT, Harvard, or Cornell typically takes faculty days to write, and students hours to solve. We demonstrate that large language models pass machine learning finals at a human level, on finals available online after the models were trained, and automatically generate new human-quality final exam questions in seconds. Previous work has developed program synthesis and few-shot learning methods to solve university-level problem set questions in mathematics and STEM courses. In this work, we develop and compare methods that solve final exams, which differ from problem sets in several ways: the questions are longer, have multiple parts, are more complicated, and span a broader set of topics. We curate a dataset and benchmark of questions from machine learning final exams available online and code for answering these questions and generating new questions. We show how to generate new questions from other questions and course notes. For reproducibility and future research on this final exam benchmark, we use automatic checkers for multiple-choice, numeric, and questions with expression answers. We perform ablation studies comparing zero-shot learning with few-shot learning and chain-of-thought prompting using GPT-3, OPT, Codex, and ChatGPT across machine learning topics and find that few-shot learning methods perform best. We highlight the transformative potential of language models to streamline the writing and solution of large-scale assessments, significantly reducing the workload from human days to mere machine seconds. Our results suggest that rather than banning large language models such as ChatGPT in class, instructors should teach students to harness them by asking students meta-questions about correctness, completeness, and originality of the responses generated, encouraging critical thinking in academic studies.
ChatGPT's Storytelling Chops Are No Match for Dungeons & Dragons
Our overeager party--an elvish druid; a dwarven wizard; a halfling rogue; and a human paladin--has arrived at a dusty, cluttered library. Hearing of our quest for the fabled Orb of Zarekath, the head librarian--Thimblewick, a gnome--recounts how it was once "a powerful artifact" that has long since disappeared in the nearby ruined city. But the rogue is less curious about Orb-lore and more interested in snooping and stealing from the nearby shelves. Sneaking into the shadows, he's caught by a librarian. "Oh, sorry," the rogue says with a disarming smile.
Hear a good Sunday sermon? AI ready to make preacher's words count all week long
'The Five' co-hosts discuss new AI bot ChatGPT and the impact artificial intelligence will have on future jobs. Church leaders and volunteers will soon have access to an artificial intelligence platform that aims to shave hours off their day-to-day tasks by generating content from sermons to engage fellow Christians when they are not in the pews. Upcoming platform Pulpit AI, founded by Michael Whittle, is expected to launch later this summer and will serve as a tool for Christian leaders looking to take the tedious work out of crafting religious blog posts, devotionals and prayer guides and social media posts. "We want to help pastors of small to medium-sized churches be able to make content for their congregations to interact with throughout the week and on social media," Whittle told Fox News Digital. "We think every pastor should, if they want, have a digital signal to their congregations beyond the sermon. "Most small to medium-sized churches have small or completely volunteer staff, so they have zero operational leverage when it comes to media and resources for their church," he added. "If we can help a church media team get past the blank page, we can not only save them crazy amounts of time, we can help every church become a resourcing church for their people." 'AI JESUS' TALKS DATING, RELATIONSHIPS, MORALS -- EVEN OFFERS VIDEO-GAMING TIPS A congregant reads a referred passage from her Bible during services at Highland Colony Baptist Church in Ridgeland, Mississippi, Nov. 29, 2020. Puplit AI "doesn't and never will" generate sermons, instead it serves as a tool where the user uploads a sermon or religious podcast in order to repurpose it into "social media highlights, blog posts, discussion questions, and the other content churches use to reach their congregations and communities day in and day out," Whittle said. "Pulpit AI analyzes long form audio and video, then repurposes that into various forms of content," Whittle said. "Pulpit AI's output is taken directly from the source material.
Meet the Humans Trying to Keep Us Safe From AI
A year ago, the idea of holding a meaningful conversation with a computer was the stuff of science fiction. But since OpenAI's ChatGPT launched last November, life has started to feel more like a techno-thriller with a fast-moving plot. Chatbots and other generative AI tools are beginning to profoundly change how people live and work. But whether this plot turns out to be uplifting or dystopian will depend on who helps write it. Thankfully, just as artificial intelligence is evolving, so is the cast of people who are building and studying it.