Goto

Collaborating Authors

 Large Language Model


D-ID unveils new chat API to enable face-to-face conversations with an AI digital human

#artificialintelligence

D-ID, the Israeli company leveraging artificial intelligence to create unique experiences like Deep Nostalgia, announced today that it's launching a new chat API to enable face-to-face conversations with an AI digital human. The announcement was timed to coincide with Mobile World Congress (MWC), which is taking place in Barcelona this week. The company is currently offering the API to enterprises for branding and customer experience purposes. The premise of the API is to provide a "human" interface for conversational AI. In a press release, D-ID said that with its new real-time streaming capabilities and its text-to-video technology, clients can integrate the power of large language models like GPT-3 and LaMDA to deploy interactive digital humans.


All about ChatGPT - Massive Resources & Collection On The Internet Planet - OMFiNiTiVE

#artificialintelligence

Awesome ChatGPT A curated list of awesome ChatGPT resources and GPT-3 from OpenAI, libraries, SDKs, APIs, extensions, tools, apps, and much more. ChatGPT app ChatGPT overview ChatGTP Discord OpenAI API Documentation chatGPT launch blog Developer Libraries, SDKs, and APIs Python ChatGPT : Lightweight package for interacting with ChatGPTโ€™s API by OpenAI. Uses reverse engineered official [โ€ฆ]


Sam Altman is tech's next household name -- if we survive the killer robots

#artificialintelligence

Sam Altman may be tech's next household name, but many Americans probably haven't heard of him. To anyone outside San Francisco, Altman would probably seem like just another young tech CEO. He's a Stanford University dropout who sold a tech startup years ago for a fortune, and he's spent the past decade investing and coaching other entrepreneurs. He posts confident and sunny life advice on Twitter and peppers his conversation with references to line graphs. But in the past three months, Altman, 37, has rocketed to the top of the tech industry's power rankings on the back of OpenAI.


German publisher Axel Springer says journalists could be replaced by AI

The Guardian

Journalists are at risk of being replaced by artificial intelligence systems like ChatGPT, the CEO of German media group Axel Springer has said. The announcement was made as the publisher sought to boost revenue at German newspapers Bild and Die Welt and transition to becoming a "purely digital media company". It said job cuts lay ahead, because automation and AI were increasingly making many of the jobs that supported the production of their journalism redundant. "Artificial intelligence has the potential to make independent journalism better than it ever was โ€“ or simply replace it," CEO Mathias Doepfner said in an internal letter to employees. AI tools like the popular ChatGPT promise a "revolution" in information, he said, and would soon be better at the "aggregation of information" than human journalists.


Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing

arXiv.org Artificial Intelligence

Contrastive pretraining on parallel image-text data has attained great success in vision-language processing (VLP), as exemplified by CLIP and related methods. However, prior explorations tend to focus on general domains in the web. Biomedical images and text are rather different, but publicly available datasets are small and skew toward chest X-ray, thus severely limiting progress. In this paper, we conducted by far the largest study on biomedical VLP, using 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central. Our dataset (PMC-15M) is two orders of magnitude larger than existing biomedical image-text datasets such as MIMIC-CXR, and spans a diverse range of biomedical images. The standard CLIP method is suboptimal for the biomedical domain. We propose BiomedCLIP with domain-specific adaptations tailored to biomedical VLP. We conducted extensive experiments and ablation studies on standard biomedical imaging tasks from retrieval to classification to visual question-answering (VQA). BiomedCLIP established new state of the art in a wide range of standard datasets, substantially outperformed prior VLP approaches. Surprisingly, BiomedCLIP even outperformed radiology-specific state-of-the-art models such as BioViL on radiology-specific tasks such as RSNA pneumonia detection, thus highlighting the utility in large-scale pretraining across all biomedical image types. We will release our models at https://aka.ms/biomedclip to facilitate future research in biomedical VLP.


Preference Transformer: Modeling Human Preferences using Transformers for RL

arXiv.org Artificial Intelligence

Preference-based reinforcement learning (RL) provides a framework to train agents using human preferences between two behaviors. However, preference-based RL has been challenging to scale since it requires a large amount of human feedback to learn a reward function aligned with human intent. In this paper, we present Preference Transformer, a neural architecture that models human preferences using transformers. Unlike prior approaches assuming human judgment is based on the Markovian rewards which contribute to the decision equally, we introduce a new preference model based on the weighted sum of non-Markovian rewards. We then design the proposed preference model using a transformer architecture that stacks causal and bidirectional self-attention layers. We demonstrate that Preference Transformer can solve a variety of control tasks using real human preferences, while prior approaches fail to work. We also show that Preference Transformer can induce a well-specified reward and attend to critical events in the trajectory by automatically capturing the temporal dependencies in human decision-making. Code is available on the project website: https://sites.google.com/view/preference-transformer.


Understanding Natural Language Understanding Systems. A Critical Analysis

arXiv.org Artificial Intelligence

The development of machines that {\guillemotleft}talk like us{\guillemotright}, also known as Natural Language Understanding (NLU) systems, is the Holy Grail of Artificial Intelligence (AI), since language is the quintessence of human intelligence. The brief but intense life of NLU research in AI and Natural Language Processing (NLP) is full of ups and downs, with periods of high hopes that the Grail is finally within reach, typically followed by phases of equally deep despair and disillusion. But never has the trust that we can build {\guillemotleft}talking machines{\guillemotright} been stronger than the one engendered by the last generation of NLU systems. But is it gold all that glitters in AI? do state-of-the-art systems possess something comparable to the human knowledge of language? Are we at the dawn of a new era, in which the Grail is finally closer to us? In fact, the latest achievements of AI systems have sparkled, or better renewed, an intense scientific debate on their true language understanding capabilities. Some defend the idea that, yes, we are on the right track, despite the limits that computational models still show. Others are instead radically skeptic and even dismissal: The present limits are not just contingent and temporary problems of NLU systems, but the sign of the intrinsic inadequacy of the epistemological and technological paradigm grounding them. This paper aims at contributing to such debate by carrying out a critical analysis of the linguistic abilities of the most recent NLU systems. I contend that they incorporate important aspects of the way language is learnt and processed by humans, but at the same time they lack key interpretive and inferential skills that it is unlikely they can attain unless they are integrated with structured knowledge and the ability to exploit it for language use.


Learning on Large-scale Text-attributed Graphs via Variational Inference

arXiv.org Artificial Intelligence

This paper studies learning on text-attributed graphs (TAGs), where each node is associated with a text description. An ideal solution for such a problem would be integrating both the text and graph structure information with large language models and graph neural networks (GNNs). However, the problem becomes very challenging when graphs are large due to the high computational complexity brought by training large language models and GNNs together. In this paper, we propose an efficient and effective solution to learning on large text-attributed graphs by fusing graph structure and language learning with a variational Expectation-Maximization (EM) framework, called GLEM. Instead of simultaneously training large language models and GNNs on big graphs, GLEM proposes to alternatively update the two modules in the E-step and M-step. Such a procedure allows training the two modules separately while simultaneously allowing the two modules to interact and mutually enhance each other. Extensive experiments on multiple data sets demonstrate the efficiency and effectiveness of the proposed approach.


Language Is Not All You Need: Aligning Perception with Language Models

arXiv.org Artificial Intelligence

A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). Specifically, we train Kosmos-1 from scratch on web-scale multimodal corpora, including arbitrarily interleaved text and images, image-caption pairs, and text data. We evaluate various settings, including zero-shot, few-shot, and multimodal chain-of-thought prompting, on a wide range of tasks without any gradient updates or finetuning. Experimental results show that Kosmos-1 achieves impressive performance on (i) language understanding, generation, and even OCR-free NLP (directly fed with document images), (ii) perception-language tasks, including multimodal dialogue, image captioning, visual question answering, and (iii) vision tasks, such as image recognition with descriptions (specifying classification via text instructions). We also show that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language. In addition, we introduce a dataset of Raven IQ test, which diagnoses the nonverbal reasoning capability of MLLMs.


Domain-adapted large language models for classifying nuclear medicine reports

arXiv.org Artificial Intelligence

With the growing use of transformer-based language models in medicine, it is unclear how well these models generalize to nuclear medicine which has domain-specific vocabulary and unique reporting styles. In this study, we evaluated the value of domain adaptation in nuclear medicine by adapting language models for the purpose of 5-point Deauville score prediction based on clinical 18F-fluorodeoxyglucose (FDG) PET/CT reports. We retrospectively retrieved 4542 text reports and 1664 images for FDG PET/CT lymphoma exams from 2008-2018 in our clinical imaging database. Deauville scores were removed from the reports and then the remaining text in the reports was used as the model input. Multiple general-purpose transformer language models were used to classify the reports into Deauville scores 1-5. We then adapted the models to the nuclear medicine domain using masked language modeling and assessed its impact on classification performance. The language models were compared against vision models, a multimodal vision language model, and a nuclear medicine physician with seven-fold Monte Carlo cross validation, reported are the mean and standard deviations. Domain adaption improved all language models. For example, BERT improved from 61.3% five-class accuracy to 65.7% following domain adaptation. The best performing model (domain-adapted RoBERTa) achieved a five-class accuracy of 77.4%, which was better than the physician's performance (66%), the best vision model's performance (48.1), and was similar to the multimodal model's performance (77.2). Domain adaptation improved the performance of large language models in interpreting nuclear medicine text reports.