Large Language Model
Elon Musk Signs Open Letter Urging AI Labs to Pump the Brakes
An open letter with signatures from hundreds of the biggest names in tech, including Elon Musk, has urged the world's leading artificial intelligence labs to pause the training of new super-powerful systems for six months, saying that recent advances in AI present "profound risks to society and humanity." The letter comes just two weeks after the public release of OpenAI's GPT-4, the most powerful AI system ever released, which has led researchers to slash their expectations for when AGI--or artificial general intelligence that surpasses human cognitive ability--will arrive. Many experts fear that, as an AI arms race heats up, humanity is sleepwalking into catastrophe. "Advanced AI could represent a profound change in the history of life on Earth, and should be planned for and managed with commensurate care and resources," the letter says. "Unfortunately, this level of planning and management is not happening, even though recent months have seen AI labs locked in an out-of-control race to develop and deploy ever more powerful digital minds that no one – not even their creators – can understand, predict, or reliably control."
3 Different Organizations And How They Use OpenAI Technology - cyberpogo
OpenAI is a research organization focusing on artificial intelligence (AI) development. They create advanced AI technologies using machine learning, deep learning, and natural language processing. OpenAI technology is built on a foundation of neural networks, algorithms modeled after how the human brain processes information. These networks consist of interconnected nodes that simulate neurons, allowing the system to learn from large datasets and make predictions based on that data. GPT (Generative Pre-trained Transformer), a deep learning model capable of creating human-like language, is one of the core technologies developed by OpenAI.
Microsoft introduces an A.I. chatbot for cybersecurity experts
Microsoft on Tuesday announced a chatbot designed to help cybersecurity professionals understand critical issues and find ways to fix them. The company has been busy bolstering its software with artificial intelligence models from startup OpenAI after OpenAI's ChatGPT bot captured the public imagination following its November debut. The resulting generative AI software can at times be "usefully wrong," as Microsoft put it earlier this month when talking up new features in Word and other productivity apps. But Microsoft is proceeding nevertheless, as it seeks to keep growing a cybersecurity business that fetched more than $20 billion in 2022 revenue. The Microsoft Security Copilot draws on GPT-4, the latest large language model from OpenAI -- in which Microsoft has invested billions -- and a security-specific model Microsoft built using daily activity data it gathers.
Questions of science: chatting with ChatGPT about complex systems
Crokidakis, Nuno, de Menezes, Marcio Argollo, Cajueiro, Daniel O.
We are currently in a great era for researchers and scientists studying and developing in the field of complex systems. Half of the physics Nobel prize of 2021 was awarded to the physicist Giorgio Parisi for his contributions to the theory of complex systems [9] and the other half to two meteorologists Syukuro Manabe and Klaus Hasselmann to the modeling of the Earth's climate [10]. Parisi has made significant contributions to the literature on complex systems, including areas such as spin glass [11, 12, 13], stochastic resonance [14], surface growth [15], multifractality [16], and bird flocking [17].
Text revision in Scientific Writing Assistance: An Overview
Jourdan, Léane, Boudin, Florian, Dufour, Richard, Hernandez, Nicolas
Writing a scientific article is a challenging task as it is a highly codified genre. Good writing skills are essential to properly convey ideas and results of research work. Since the majority of scientific articles are currently written in English, this exercise is all the more difficult for non-native English speakers as they additionally have to face language issues. This article aims to provide an overview of text revision in writing assistance in the scientific domain. We will examine the specificities of scientific writing, including the format and conventions commonly used in research articles. Additionally, this overview will explore the various types of writing assistance tools available for text revision. Despite the evolution of the technology behind these tools through the years, from rule-based approaches to deep neural-based ones, challenges still exist (tools' accessibility, limited consideration of the context, inexplicit use of discursive information, etc.)
Zero-shot Entailment of Leaderboards for Empirical AI Research
Kabongo, Salomon, D'Souza, Jennifer, Auer, Sören
We present a large-scale empirical investigation of the zero-shot learning phenomena in a specific recognizing textual entailment (RTE) task category, i.e. the automated mining of leaderboards for Empirical AI Research. The prior reported state-of-the-art models for leaderboards extraction formulated as an RTE task, in a non-zero-shot setting, are promising with above 90% reported performances. However, a central research question remains unexamined: did the models actually learn entailment? Thus, for the experiments in this paper, two prior reported state-of-the-art models are tested out-of-the-box for their ability to generalize or their capacity for entailment, given leaderboard labels that were unseen during training. We hypothesize that if the models learned entailment, their zero-shot performances can be expected to be moderately high as well-perhaps, concretely, better than chance. As a result of this work, a zero-shot labeled dataset is created via distant labeling formulating the leaderboard extraction RTE task. Figure 1: Rate of introduction of new tasks, datasets, metrics,
DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents
Nair, Varun, Schumacher, Elliot, Tso, Geoffrey, Kannan, Anitha
Large language models (LLMs) have emerged as valuable tools for many natural language understanding tasks. In safety-critical applications such as healthcare, the utility of these models is governed by their ability to generate outputs that are factually accurate and complete. In this work, we present dialog-enabled resolving agents (DERA). DERA is a paradigm made possible by the increased conversational abilities of LLMs, namely GPT-4. It provides a simple, interpretable forum for models to communicate feedback and iteratively improve output. We frame our dialog as a discussion between two agent types - a Researcher, who processes information and identifies crucial problem components, and a Decider, who has the autonomy to integrate the Researcher's information and makes judgments on the final output. We test DERA against three clinically-focused tasks. For medical conversation summarization and care plan generation, DERA shows significant improvement over the base GPT-4 performance in both human expert preference evaluations and quantitative metrics. In a new finding, we also show that GPT-4's performance (70%) on an open-ended version of the MedQA question-answering (QA) dataset (Jin et al. 2021, USMLE) is well above the passing level (60%), with DERA showing similar performance. We release the open-ended MEDQA dataset at https://github.com/curai/curai-research/tree/main/DERA.
Zero-Shot Retrieval with Search Agents and Hybrid Environments
Huebscher, Michelle Chen, Buck, Christian, Ciaramita, Massimiliano, Rothe, Sascha
Learning to search is the task of building artificial agents that learn to autonomously use a search box to find information. So far, it has been shown that current language models can learn symbolic query reformulation policies, in combination with traditional term-based retrieval, but fall short of outperforming neural retrievers. We extend the previous learning to search setup to a hybrid environment, which accepts discrete query refinement operations, after a first-pass retrieval step via a dual encoder. Experiments on the BEIR task show that search agents, trained via behavioral cloning, outperform the underlying search system based on a combined dual encoder retriever and cross encoder reranker. Furthermore, we find that simple heuristic Hybrid Retrieval Environments (HRE) can improve baseline performance by several nDCG points. The search agent based on HRE (HARE) matches state-of-the-art performance, balanced in both zero-shot and in-domain evaluations, via interpretable actions, and at twice the speed.
RetClean: Retrieval-Based Data Cleaning Using Foundation Models and Data Lakes
Ahmad, Mohammad Shahmeer, Naeem, Zan Ahmad, Eltabakh, Mohamed, Ouzzani, Mourad, Tang, Nan
Can foundation models (such as ChatGPT) clean your data? In this proposal, we demonstrate that indeed ChatGPT can assist in data cleaning by suggesting corrections for specific cells in a data table (scenario 1). However, ChatGPT may struggle with datasets it has never encountered before (e.g., local enterprise data) or when the user requires an explanation of the source of the suggested clean values. To address these issues, we developed a retrieval-based method that complements ChatGPT's power with a user-provided data lake. The data lake is first indexed, we then retrieve the top-k relevant tuples to the user's query tuple and finally leverage ChatGPT to infer the correct value (scenario 2). Nevertheless, sharing enterprise data with ChatGPT, an externally hosted model, might not be feasible for privacy reasons. To assist with this scenario, we developed a custom RoBERTa-based foundation model that can be locally deployed. By fine-tuning it on a small number of examples, it can effectively make value inferences based on the retrieved tuples (scenario 3). Our proposed system, RetClean, seamlessly supports all three scenarios and provides a user-friendly GUI that enables the VLDB audience to explore and experiment with the system.
Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams
Nunes, Desnes, Primi, Ricardo, Pires, Ramon, Lotufo, Roberto, Nogueira, Rodrigo
The present study aims to explore the capabilities of Language Models (LMs) in tackling high-stakes multiple-choice tests, represented here by the Exame Nacional do Ensino M\'edio (ENEM), a multidisciplinary entrance examination widely adopted by Brazilian universities. This exam poses challenging tasks for LMs, since its questions may span into multiple fields of knowledge, requiring understanding of information from diverse domains. For instance, a question may require comprehension of both statistics and biology to be solved. This work analyzed responses generated by GPT-3.5 and GPT-4 models for questions presented in the 2009-2017 exams, as well as for questions of the 2022 exam, which were made public after the training of the models was completed. Furthermore, different prompt strategies were tested, including the use of Chain-of-Thought (CoT) prompts to generate explanations for answers. On the 2022 edition, the best-performing model, GPT-4 with CoT, achieved an accuracy of 87%, largely surpassing GPT-3.5 by 11 points. The code and data used on experiments are available at https://github.com/piresramon/gpt-4-enem.