Goto

Collaborating Authors

 Large Language Model


SQA3D: Situated Question Answering in 3D Scenes

arXiv.org Artificial Intelligence

The categories listed here do not mean to be exhaustive and a question could fall into multiple categories. Playing computer games sink and facing the towels. Albeit these promising advances, their actual performances in real-world embodied environments could still fall short of human expectations, especially in generalization to different situations (scenes and locations) and tasks that require substantial, knowledge-intensive reasoning. To diagnose the fundamental capability of realistic embodied agents, we investigate the problem of embodied scene understanding, where the agent needs to understand its situation and the surroundings in the environment from a dynamic egocentric view, then perceive, reason, and act accordingly, to accomplish complex tasks. What is at the core of embodied scene understanding? Drawing inspirations from situated cognition (Greeno, 1998; Anderson et al., 2000), a seminal theory of embodiment, we anticipate it to be two-fold: Situation understanding. The ability to imagine what the agent will see from arbitrary situations (position, orientations, etc.) in a 3D scene and understand the surroundings anchored to the situation, therefore generalize to novel positions or scenes; Situated reasoning. The ability to acquire knowledge about the environment based on the agents' current situation and reason with the knowledge, therefore further facilitates accomplishing complex action planning tasks. To step towards embodied scene understanding, we introduce SQA3D, a new task that reconciles the best of both parties, situation understanding, and situated reasoning, into embodied 3D scene understanding. Figure 1 sketches our task: given a 3D scene context (e.g., 3D scan, ego-centric video, or bird-eye view (BEV) picture), the agent in the 3D scene needs to first comprehend and localize its situation (position, orientation, etc.) from a textual description, then answer a question that requires substantial situated reasoning from that perspective. We crowd-sourced the situation descriptions from Amazon MTurk (AMT), where participants are instructed to select diverse locations and orientations in 3D scenes. To systematically examine the agent's ability in situated reasoning, we collect questions that cover a wide spectrum of knowledge, ranging from spatial relations to navigation, common sense reasoning, and multi-hop reasoning.


Boosted Prompt Ensembles for Large Language Models

arXiv.org Artificial Intelligence

Methods such as chain-of-thought prompting and self-consistency have pushed the frontier of language model reasoning performance with no additional training. To further improve performance, we propose a prompt ensembling method for large language models, which uses a small dataset to construct a set of few shot prompts that together comprise a ``boosted prompt ensemble''. The few shot examples for each prompt are chosen in a stepwise fashion to be ``hard'' examples on which the previous step's ensemble is uncertain. We show that this outperforms single-prompt output-space ensembles and bagged prompt-space ensembles on the GSM8k and AQuA datasets, among others. We propose both train-time and test-time versions of boosted prompting that use different levels of available annotation and conduct a detailed empirical study of our algorithm.


Innovations in Neural Data-to-text Generation: A Survey

arXiv.org Artificial Intelligence

The neural boom that has sparked natural language processing (NLP) research through the last decade has similarly led to significant innovations in data-to-text generation (DTG). This survey offers a consolidated view into the neural DTG paradigm with a structured examination of the approaches, benchmark datasets, and evaluation protocols. This survey draws boundaries separating DTG from the rest of the natural language generation (NLG) landscape, encompassing an up-to-date synthesis of the literature, and highlighting the stages of technological adoption from within and outside the greater NLG umbrella. With this holistic view, we highlight promising avenues for DTG research that not only focus on the design of linguistically capable systems but also systems that exhibit fairness and accountability.


TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models

arXiv.org Artificial Intelligence

Language Models (LMs) become outdated as the world changes; they often fail to perform tasks requiring recent factual information which was absent or different during training, a phenomenon called temporal misalignment. This is especially a challenging problem because the research community still lacks a coherent dataset for assessing the adaptability of LMs to frequently-updated knowledge corpus such as Wikipedia. To this end, we introduce TemporalWiki, a lifelong benchmark for ever-evolving LMs that utilizes the difference between consecutive snapshots of English Wikipedia and English Wikidata for training and evaluation, respectively. The benchmark hence allows researchers to periodically track an LM's ability to retain previous knowledge and acquire updated/new knowledge at each point in time. We also find that training an LM on the diff data through continual learning methods achieves similar or better perplexity than on the entire snapshot in our benchmark with 12 times less computational cost, which verifies that factual knowledge in LMs can be safely updated with minimal training data via continual learning. The dataset and the code are available at https://github.com/joeljang/temporalwiki.


Self-supervised Multi-modal Training from Uncurated Image and Reports Enables Zero-shot Oversight Artificial Intelligence in Radiology

arXiv.org Artificial Intelligence

Oversight AI is an emerging concept in radiology where the AI forms a symbiosis with radiologists by continuously supporting radiologists in their decision-making. Recent advances in vision-language models sheds a light on the long-standing problems of the oversight AI by the understanding both visual and textual concepts and their semantic correspondences. However, there have been limited successes in the application of vision-language models in the medical domain, as the current vision-language models and learning strategies for photographic images and captions call for the web-scale data corpus of image and text pairs which was not often feasible in the medical domain. To address this, here we present a model dubbed Medical Cross-attention Vision-Language model (Medical X-VL), leveraging the key components to be tailored for the medical domain. Our medical X-VL model is based on the following components: self-supervised uni-modal models in medical domain and fusion encoder to bridge them, momentum distillation, sentence-wise contrastive learning for medical reports, and the sentence similarity-adjusted hard negative mining. We experimentally demonstrated that our model enables various zero-shot tasks for oversight AI, ranging from the zero-shot classification to zero-shot error correction. Our model outperformed the current state-of-the-art models in two different medical image database, suggesting the novel clinical usage of our oversight AI model for monitoring human errors. Our method was especially successful in the data-limited setting, which is frequently encountered in the clinics, suggesting the potential widespread applicability in medical domain.


Language Models Can Teach Themselves to Program Better

arXiv.org Artificial Intelligence

Recent Language Models (LMs) achieve breakthrough performance in code generation when trained on human-authored problems, even solving some competitive-programming problems. Self-play has proven useful in games such as Go, and thus it is natural to ask whether LMs can generate their own instructive programming problems to improve their performance. We show that it is possible for an LM to synthesize programming problems and solutions, which are filtered for correctness by a Python interpreter. The LM's performance is then seen to improve when it is fine-tuned on its own synthetic problems and verified solutions; thus the model "improves itself" using the Python interpreter. Problems are specified formally as programming puzzles [Schuster et al., 2021], a code-based problem format where solutions can easily be verified for correctness by execution. In experiments on publicly-available LMs, test accuracy more than doubles. This work demonstrates the potential for code LMs, with an interpreter, to generate instructive problems and improve their own performance.


'No excuse' for AI developers to get data privacy wrong, warns UK data regulator

#artificialintelligence

AI developers have "no excuse" for getting data privacy wrong, one of the heads of the UK's data regulator has said, warning those who don't follow the law on data protection will face consequences. The Information Commissioner's Office (ICO) enforces data protection in the UK. Speaking amid the explosion of interest in generative AI, especially Large Language Models like the one that powers OpenAI's ChatGPT, Stephen Almond, the ICO's executive director of regulatory risk, warned LLMs posed a risk for data security. Writing in a blog post, he argued it is time to "take a step back and reflect on how personal data is being used". He noted that Sam Altman, the CEO of ChatGPT creator OpenAI, has himself declared his own worries about AI advances and what they could mean.


Elon Musk reportedly bought thousands of GPUs for a Twitter AI project

Engadget

More than a month after hiring a couple of former DeepMind researchers, Twitter is reportedly moving forward with an in-house artificial intelligence project. According to Business Insider, Elon Musk recently bought 100,000 GPUs for use at one of the company's two remaining data centers. A source told the outlet the purchase shows Musk is "committed" to the effort, particularly given the fact there would be little reason for Twitter to spend so much money on datacenter-grade GPUs if it didn't plan to use them for AI work. The project reportedly involves the creation of a generative AI that the company would train on its own massive trove of data. It's unclear how Twitter would utilize the technology.


The problems with a moratorium on training large AI systems

#artificialintelligence

In late March, the Future of Life Institute released an open letter (and a related FAQ) calling "on all AI labs to immediately pause for at least six months the training of AI systems more powerful than GPT-4. This pause should be public and verifiable, and include all key actors. If such a pause cannot be enacted quickly, governments should step in and institute a moratorium." The letter, which also stated that "Powerful AI systems should be developed only once we are confident that their effects will be positive and their risks will be manageable," was initially signed by over a thousand people, including many notable technology leaders. Many thousands more added their signatures after its publication.


Biden administration wants your input on rules for AI models like ChatGPT

Engadget

American officials are taking further steps to set rules for AI systems like ChatGPT. The National Telecommunications and Information Administration (NTIA) is asking for public comments on possible regulations that hold AI creators accountable. The measures will ideally help the Biden administration ensure that these models work as promised "without causing harm," the NTIA says. While the request is open-ended, the NTIA suggests input on areas like incentives for trustworthy AI, safety testing methods and the amount of data access needed to assess systems. The agency is also wondering if different strategies might be necessary for certain fields, such as healthcare.