Large Language Model
GPT4Rec: A Generative Framework for Personalized Recommendation and User Interests Interpretation
Li, Jinming, Zhang, Wentao, Wang, Tian, Xiong, Guanglei, Lu, Alan, Medioni, Gerard
Recent advancements in Natural Language Processing (NLP) have led to the development of NLP-based recommender systems that have shown superior performance. However, current models commonly treat items as mere IDs and adopt discriminative modeling, resulting in limitations of (1) fully leveraging the content information of items and the language modeling capabilities of NLP models; (2) interpreting user interests to improve relevance and diversity; and (3) adapting practical circumstances such as growing item inventories. To address these limitations, we present GPT4Rec, a novel and flexible generative framework inspired by search engines. It first generates hypothetical "search queries" given item titles in a user's history, and then retrieves items for recommendation by searching these queries. The framework overcomes previous limitations by learning both user and item embeddings in the language space. To well-capture user interests with different aspects and granularity for improving relevance and diversity, we propose a multi-query generation technique with beam search. The generated queries naturally serve as interpretable representations of user interests and can be searched to recommend cold-start items. With GPT-2 language model and BM25 search engine, our framework outperforms state-of-the-art methods by $75.7\%$ and $22.2\%$ in Recall@K on two public datasets. Experiments further revealed that multi-query generation with beam search improves both the diversity of retrieved items and the coverage of a user's multi-interests. The adaptiveness and interpretability of generated queries are discussed with qualitative case studies.
Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Su, Hung-Ting, Niu, Yulei, Lin, Xudong, Hsu, Winston H., Chang, Shih-Fu
Causal Video Question Answering (CVidQA) queries not only association or temporal relations but also causal relations in a video. Existing question synthesis methods pre-trained question generation (QG) systems on reading comprehension datasets with text descriptions as inputs. However, QG models only learn to ask association questions (e.g., ``what is someone doing...'') and result in inferior performance due to the poor transfer of association knowledge to CVidQA, which focuses on causal questions like ``why is someone doing ...''. Observing this, we proposed to exploit causal knowledge to generate question-answer pairs, and proposed a novel framework, Causal Knowledge Extraction from Language Models (CaKE-LM), leveraging causal commonsense knowledge from language models to tackle CVidQA. To extract knowledge from LMs, CaKE-LM generates causal questions containing two events with one triggering another (e.g., ``score a goal'' triggers ``soccer player kicking ball'') by prompting LM with the action (soccer player kicking ball) to retrieve the intention (to score a goal). CaKE-LM significantly outperforms conventional methods by 4% to 6% of zero-shot CVidQA accuracy on NExT-QA and Causal-VidQA datasets. We also conduct comprehensive analyses and provide key findings for future research.
Towards Generating Functionally Correct Code Edits from Natural Language Issue Descriptions
Fakhoury, Sarah, Chakraborty, Saikat, Musuvathi, Madan, Lahiri, Shuvendu K.
Large language models (LLMs), such as OpenAI's Codex, have demonstrated their potential to generate code from natural language descriptions across a wide range of programming tasks. Several benchmarks have recently emerged to evaluate the ability of LLMs to generate functionally correct code from natural language intent with respect to a set of hidden test cases. This has enabled the research community to identify significant and reproducible advancements in LLM capabilities. However, there is currently a lack of benchmark datasets for assessing the ability of LLMs to generate functionally correct code edits based on natural language descriptions of intended changes. This paper aims to address this gap by motivating the problem NL2Fix of translating natural language descriptions of code changes (namely bug fixes described in Issue reports in repositories) into correct code fixes. To this end, we introduce Defects4J-NL2Fix, a dataset of 283 Java programs from the popular Defects4J dataset augmented with high-level descriptions of bug fixes, and empirically evaluate the performance of several state-of-the-art LLMs for the this task. Results show that these LLMS together are capable of generating plausible fixes for 64.6% of the bugs, and the best LLM-based technique can achieve up to 21.20% top-1 and 35.68% top-5 accuracy on this benchmark.
I used to work at Google and now I'm an AI researcher. Here's why slowing down AI development is wise
Is it time to put the brakes on the development of artificial intelligence (AI)? If you've quietly asked yourself that question, you're not alone. In the past week, a host of AI luminaries signed an open letter calling for a six-month pause on the development of more powerful models than GPT-4; European researchers called for tighter AI regulations; and long-time AI researcher and critic Eliezer Yudkowsky demanded a complete shutdown of AI development in the pages of TIME magazine. Meanwhile, the industry shows no sign of slowing down. In March, a senior AI executive at Microsoft reportedly spoke of "very, very high" pressure from chief executive Satya Nadella to get GPT-4 and other new models to the public "at a very high speed".
ChatGPT: Everything you need to know about the AI-powered chatbot
ChatGPT, OpenAI's text-generating AI chatbot, has taken the world by storm. It's able to write essays, code and more given short text prompts, hyper-charging productivity. But it also has a more…nefarious side. In any case, AI tools are not going away -- and indeed has expanded dramatically since its launch just a few months ago. Major brands are experimenting with it, using the AI to generate ad and marketing copy, for example.
ChatGPT is making up fake Guardian articles. Here's how we're responding
Last month one of our journalists received an interesting email. A researcher had come across mention of a Guardian article, written by the journalist on a specific subject from a few years before. But the piece was proving elusive on our website and in search. Had the headline perhaps been changed since it was launched? Had it been removed intentionally from the website because of a problem we'd identified?
The newest version of ChatGPT passed the US medical licensing exam with flying colors -- and diagnosed a 1 in 100,000 condition in seconds
Dr. Isaac Kohane, who's both a computer scientist at Harvard and a physician, teamed up with two colleagues to test drive GPT-4, with one main goal: To see how the newest artificial intelligence model from OpenAI performed in a medical setting. "I'm stunned to say: better than many doctors I've observed," he says in the forthcoming book, "The AI Revolution in Medicine," co-authored by independent journalist Carey Goldberg, and Microsoft vice president of research Peter Lee. In the book, Kohane says GPT-4, which was released in March 2023 to paying subscribers, answers US medical exam licensing questions correctly more than 90% of the time. It's a much better test-taker than previous ChatGPT AI models, GPT-3 and -3.5, and a better one than some licensed doctors, too. GPT-4 is not just a good test-taker and fact finder, though.
Can ChatGPT Be a Doctor? Bot Passes Medical Exam, Diagnoses Conditions
Top editors give you the stories you want -- delivered right to your inbox each weekday. Dr. Isaac Kohane, who's both a computer scientist at Harvard and a physician, teamed up with two colleagues to test drive GPT-4, with one main goal: To see how the newest artificial intelligence model from OpenAI performed in a medical setting. "I'm stunned to say: better than many doctors I've observed," he says in the forthcoming book, "The AI Revolution in Medicine," co-authored by independent journalist Carey Goldberg, and Microsoft vice president of research Peter Lee. In the book, Kohane says GPT-4, which was released in March 2023 to paying subscribers, answers US medical exam licensing questions correctly more than 90% of the time. It's a much better test-taker than previous ChatGPT AI models, GPT-3 and -3.5, and a better one than some licensed doctors, too.
The surprising ease and effectiveness of AI in a loop (Interconnected)
AI is still in the foothills of its adoption S-curve, and I love this period of any new technology – the scope of what it can do is unknown, so the main job is to stretch the imagination and try out things. Anyway, the tech am I digging recently is a software framework called LangChain (here are the docs) which does something pretty straightforward: it makes it easy to call OpenAI's GPT, say, a dozen times in a loop to answer a single question, and mix in queries to Wikipedia and other databases. This is a big deal because of a technique called ReAct from a paper out of Princeton and Google Research (the ReAct website links to the Nov 2022 paper, sample code, etc). ReAct looks innocuous but here's the deal: instead of asking GPT to simply do smart-autocomplete on your text, you prompt it to respond in a thought/act/observation loop. Thought: Let's think step by step.