AITopics

2209.04299

Country:

North America > United States > New Mexico (0.04)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(8 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Ushio, Asahi, Espinosa-Anke, Luis, Schockaert, Steven, Camacho-Collados, Jose

BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?

arXiv.org Artificial IntelligenceSep-9-2022

Analogies play a central role in human commonsense reasoning. The ability to recognize analogies such as "eye is to seeing what ear is to hearing", sometimes referred to as analogical proportions, shape how we structure knowledge and understand language. Surprisingly, however, the task of identifying such analogies has not yet received much attention in the language model era. In this paper, we analyze the capabilities of transformer-based language models on this unsupervised task, using benchmarks obtained from educational settings, as well as more commonly used datasets. We find that off-the-shelf language models can identify analogies to a certain extent, but struggle with abstract and complex relations, and results are highly sensitive to model architecture and hyperparameters. Overall the best results were obtained with GPT-2 and RoBERTa, while configurations using BERT were not able to outperform word embedding models. Our results raise important questions for future work about how, and to what extent, pre-trained language models capture knowledge about abstract semantic relations.

computational linguistic, dataset, proceedings, (14 more...)

2105.04949

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(12 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

#artificialintelligenceSep-8-2022, 16:53:54 GMT

The Head of Google Says Future AI Must Align with Human Values

AI is the foundational tech at Google and its parent company Alphabet, CEO Sundar Pichai told the audience at this year's Code conference in Los Angeles. He pointed out the "extraordinary" successes of the Google AI and DeepMind teams in areas such as large language models and the AlphaFold project, which showed the underlying structure of 200 million proteins. He said Google was now applying deep computer science and AI to all its products, from search to its work with pharma companies with AlphaFold to self-driving cars. But, he added, it is "important that we develop AI aligned with human values." Conference host Kara Swisher showed a 2016 interview in which Pichai (then interviewed by the now-retired Walt Mossberg) said he expected we would have true "conversational AI" to help get things done in the next 5 to 10 years.

google, information, pichai, (15 more...)

Country: North America > United States > California > Los Angeles County > Los Angeles (0.25)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.55)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)

arXiv.org Artificial IntelligenceSep-8-2022

Differentially Private Decoding in Large Language Models

Majmudar, Jimit, Dupuy, Christophe, Peris, Charith, Smaili, Sami, Gupta, Rahul, Zemel, Richard

Recent large-scale natural language processing (NLP) systems use a pre-trained Large Language Model (LLM) on massive and diverse corpora as a headstart. In practice, the pre-trained model is adapted to a wide array of tasks via fine-tuning on task-specific datasets. LLMs, while effective, have been shown to memorize instances of training data thereby potentially revealing private information processed during pre-training. The potential leakage might further propagate to the downstream tasks for which LLMs are fine-tuned. On the other hand, privacy-preserving algorithms usually involve retraining from scratch, which is prohibitively expensive for LLMs. In this work, we propose a simple, easy to interpret, and computationally lightweight perturbation mechanism to be applied to an already trained model at the decoding stage. Our perturbation mechanism is model-agnostic and can be used in conjunction with any LLM. We provide theoretical analysis showing that the proposed mechanism is differentially private, and experimental results showing a privacy-utility trade-off.

mechanism, memorization, predicted line, (13 more...)

2205.13621

Country: Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Ahmed, Toufique, Devanbu, Premkumar

Few-shot training LLMs for project-specific code-summarization

arXiv.org Artificial IntelligenceSep-8-2022

Very large language models (LLMs), such as GPT-3 and Codex have achieved state-of-the-art performance on several natural-language tasks, and show great promise also for code. A particularly exciting aspect of LLMs is their knack for few-shot and zero-shot learning: they can learn to perform a task with very few examples. Few-shotting has particular synergies in software engineering, where there are a lot of phenomena (identifier names, APIs, terminology, coding patterns) that are known to be highly project-specific. However, project-specific data can be quite limited, especially early in the history of a project; thus the few-shot learning capacity of LLMs might be very relevant. In this paper, we investigate the use few-shot training with the very large GPT (Generative Pre-trained Transformer) Codex model, and find evidence suggesting that one can significantly surpass state-of-the-art models for code-summarization, leveraging project-specific training.

code summarization, codex, few-shot training, (11 more...)

2207.04237

Country:

North America > United States > California > Yolo County > Davis (0.14)
North America > United States > Michigan > Oakland County > Rochester (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > Experimental Study (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

#artificialintelligenceSep-7-2022, 18:12:58 GMT

GitHub - deepmind/mujoco_menagerie: A collection of high-quality models for the MuJoCo physics engine, curated by DeepMind.

Menagerie is a collection of high-quality models for the MuJoCo physics engine, curated by DeepMind. A physics simulator is only as good as the model it is simulating, and in a powerful simulator like MuJoCo with many modeling options, it is easy to create "bad" models which do not behave as expected. The goal of this collection is to provide the community with a curated library of well-designed models that work well right out of the gate. Menagerie's only requirement is MuJoCo version 2.2.2 or higher. You can download prebuilt binaries from the GitHub releases page, or if you are working with Python, you can install the native bindings from PyPI via pip install mujoco 2.2.2.

high-quality model, menagerie, mujoco physics engine, (11 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceSep-7-2022, 16:25:35 GMT

Improving Language Model Behavior by Training on a Curated Dataset

We've found we can improve language model behavior with respect to specific behavioral values by fine-tuning on a curated dataset of 100 examples of those values. We also found that this process becomes more effective as models get larger. While the technique is still nascent, we're looking for OpenAI API users who would like to try it out and are excited to find ways to use these and other techniques in production use cases. Our approach aims to give language model operators the tools to narrow this universal set of behaviors to a constrained set of values. While OpenAI provides guardrails and monitoring to ensure that model use-cases are compatible with our Charter, we view selecting the exact set of Charter-compatible values for the model as a choice that our users must face for their specific applications.

dataset, language model behavior, value-targeted dataset, (10 more...)

Industry: Law > Civil Rights & Constitutional Law (0.72)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.83)

#artificialintelligenceSep-7-2022, 16:25:29 GMT

GPT-3 vs. Rasa chatbots

In 1829, an event took place that unleashed a technological revolution. At the Rainhill Trials a group of steam locomotives squared off to determine which one could win a series of tests of speed, strength and reliability. The winning machine, Rocket, not only blew away its competition at the trials, it also set the direction for steam locomotive development for the following century. What does all this have to do with GPT-3, the transformer language model that OpenAI made available in a limited beta starting in June? Some reviewers have heralded GPT-3 as the first glimpse of artificial general intelligence, while others are calling it a massive lookup table.

chatbot, gpt-3, rasa chatbot, (15 more...)

Industry: Transportation > Ground > Rail (0.80)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceSep-7-2022, 16:25:28 GMT

DeepMind's Selection-Inference Language Model System Generates Humanly Interpretable Reasoning Traces

Explainability is one of the most pressing concerns in machine learning research and development. Although contemporary large-scale language models (LMs) have demonstrated impressive question-answering capabilities, their inherent opacity can conceal just how these models reach their final answers, making it difficult for users to spot any possible mistakes or justify the outputs. A DeepMind research team addresses this issue in the new paper Faithful Reasoning Using Large Language Models, proposing a forward-chaining selection-inference model that can perform faithful reasoning and provide a valid reasoning trace to improve reasoning quality and help users check and validate the final answers. The proposed approach is based on the idea that LMs can perform faithful multi-step reasoning if the underlying logical structure of a given problem can be mirrored by a causal structure. To realize this, the team developed selection-inference (SI) as their system's backbone, a novel architecture comprising two fine-tuned language models: one for selection and one for inference.

generate humanly interpretable reasoning trace, inference, multi-step reasoning, (9 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.57)

#artificialintelligenceSep-7-2022, 00:23:09 GMT

No One Rung to Rule Them All: Addressing Scale and Expediency in Knowledge-Based AI

Can we drive effectiveness and efficiency of AI at the same time? If we want our systems to be more intelligent, do they have to become more expensive? Our goal should be to significantly increase the capabilities and improve the results of AI technologies while minimizing power and system cost, not by increasing it. Achieving this could be possible if we follow the architectural design observed time and again in natural control systems, that is, a hierarchy of specialized levels. This article challenges the single neural network's current large language model (LLM) approach, which attempts to encompass all world knowledge.

architecture, information, knowledge, (15 more...)

Country: North America > United States > California > San Francisco County > San Francisco (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)