AITopics | Atlantic Ocean

Collaborating Authors

Atlantic Ocean

Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Anagnostidis, Sotiris, Pavllo, Dario, Biggio, Luca, Noci, Lorenzo, Lucchi, Aurelien, Hofmann, Thomas

arXiv.org Artificial IntelligenceMay-28-2023

Autoregressive Transformers adopted in Large Language Models (LLMs) are hard to scale to long sequences. Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the sequence, thus incurring a quadratic cost. In this study, we present a novel approach that dynamically prunes contextual information while preserving the model's expressiveness, resulting in reduced memory and computational requirements during inference. Our method employs a learnable mechanism that determines which uninformative tokens can be dropped from the context at any point across the generation process. By doing so, our approach not only addresses performance concerns but also enhances interpretability, providing valuable insight into the model's decision-making process. Our technique can be applied to existing pre-trained models through a straightforward fine-tuning process, and the pruning strength can be specified by a sparsity parameter. Notably, our empirical findings demonstrate that we can effectively prune up to 80\% of the context without significant performance degradation on downstream tasks, offering a valuable tool for mitigating inference costs. Our reference implementation achieves up to $2\times$ increase in inference throughput and even greater memory savings.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.15805

Country:

Asia > Middle East > Lebanon (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Oregon > Linn County > Lebanon (0.04)
(7 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Promising Solution (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Assessing the Spatial Structure of the Association between Attendance at Preschool and Childrens Developmental Vulnerabilities in Queensland Australia

Areed, wala Draidi, Price, Aiden, Arnett, Kathryn, Thompson, Helen, Malseed, Reid, Mengersen, Kerrie

arXiv.org Artificial IntelligenceMay-25-2023

Demographic and educational factors are essential, influential factors of early childhood development. This study aimed to investigate spatial patterns in the association between attendance at preschool and children's developmental vulnerabilities in one or more domain(s) in their first year of full-time school at a small area level in Queensland, Australia. This was achieved by applying geographically weighted regression (GWR) followed by K-means clustering of the regression coefficients. Three distinct geographical clusters were found in Queensland using the GWR coefficients. The first cluster covered more than half of the state of Queensland, including the Greater Brisbane region, and displays a strong negative association between developmental vulnerabilities and attendance at preschool. That is, areas with high proportions of preschool attendance tended to have lower proportions of children with at least one developmental vulnerability in the first year of full-time school. Clusters two and three were characterized by stronger negative associations between developmental vulnerabilities, English as the mother language, and geographic remoteness, respectively. This research provides evidence of the need for collaboration between health and education sectors in specific regions of Queensland to update current service provision policies and to ensure holistic and appropriate care is available to support children with developmental vulnerabilities.

artificial intelligence, coefficient, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2305.15746

Country:

Oceania > Australia > Queensland (1.00)
Oceania > Australia > Tasmania (0.04)
Oceania > Australia > Northern Territory (0.04)
(13 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Government (1.00)
Health & Medicine > Public Health (0.68)
Education > Educational Setting (0.68)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

A DNN Framework for Learning Lagrangian Drift With Uncertainty

Jenkins, Joseph, Paiement, Adeline, Ourmières, Yann, Sommer, Julien Le, Verron, Jacques, Ubelmann, Clément, Glotin, Hervé

arXiv.org Artificial IntelligenceMay-24-2023

Reconstructions of Lagrangian drift, for example for objects lost at sea, are often uncertain due to unresolved physical phenomena within the data. Uncertainty is usually overcome by introducing stochasticity into the drift, but this approach requires specific assumptions for modelling uncertainty. We remove this constraint by presenting a purely data-driven framework for modelling probabilistic drift in flexible environments. Using ocean circulation model simulations, we generate probabilistic trajectories of object location by simulating uncertainty in the initial object position. We train an emulator of probabilistic drift over one day given perfectly known velocities and observe good agreement with numerical simulations. Several loss functions are tested. Then, we strain our framework by training models where the input information is imperfect. On these harder scenarios, we observe reasonable predictions although the effects of data drift become noticeable when evaluating the models against unseen flow scenarios.

artificial intelligence, learning lagrangian drift, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2204.05891

Country:

Atlantic Ocean > Mediterranean Sea (0.05)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Europe > Denmark (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

Add feedback

What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary

Ram, Ori, Bezalel, Liat, Zicher, Adi, Belinkov, Yonatan, Berant, Jonathan, Globerson, Amir

arXiv.org Artificial IntelligenceMay-24-2023

Dual encoders are now the dominant architecture for dense retrieval. Yet, we have little understanding of how they represent text, and why this leads to good performance. In this work, we shed light on this question via distributions over the vocabulary. We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space. We show that the resulting projections contain rich semantic information, and draw connection between them and sparse retrieval. We find that this view can offer an explanation for some of the failure cases of dense retrievers. For example, we observe that the inability of models to handle tail entities is correlated with a tendency of the token distributions to forget some of the tokens of those entities. We leverage this insight and propose a simple way to enrich query and passage representations with lexical information at inference time, and show that this significantly improves performance compared to the original model in zero-shot settings, and specifically on the BEIR benchmark.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2212.1038

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
(9 more...)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.95)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)

Add feedback

News Summarization and Evaluation in the Era of GPT-3

Goyal, Tanya, Li, Junyi Jessy, Durrett, Greg

arXiv.org Artificial IntelligenceMay-23-2023

The recent success of prompting large language models like GPT-3 has led to a paradigm shift in NLP research. In this paper, we study its impact on text summarization, focusing on the classic benchmark domain of news summarization. First, we investigate how GPT-3 compares against fine-tuned models trained on large summarization datasets. We show that not only do humans overwhelmingly prefer GPT-3 summaries, prompted using only a task description, but these also do not suffer from common dataset-specific issues such as poor factuality. Next, we study what this means for evaluation, particularly the role of gold standard test sets. Our experiments show that both reference-based and reference-free automatic metrics cannot reliably evaluate GPT-3 summaries. Finally, we evaluate models on a setting beyond generic summarization, specifically keyword-based summarization, and show how dominant fine-tuning approaches compare to prompting. To support further research, we release: (a) a corpus of 10K generated summaries from fine-tuned and prompt-based models across 4 standard summarization benchmarks, (b) 1K human preference judgments comparing different systems for generic- and keyword-based summarization.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2209.12356

Country:

Asia > Russia (0.68)
Africa (0.28)
North America > United States > Missouri > Jackson County > Kansas City (0.14)
(24 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Health & Medicine > Therapeutic Area (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PETAL: Physics Emulation Through Averaged Linearizations for Solving Inverse Problems

Jin, Jihui, Ollivier, Etienne, Touret, Richard, McKinley, Matthew, Sabra, Karim G., Romberg, Justin K.

arXiv.org Artificial IntelligenceMay-18-2023

Inverse problems describe the task of recovering an underlying signal of interest given observables. Typically, the observables are related via some non-linear forward model applied to the underlying unknown signal. Inverting the non-linear forward model can be computationally expensive, as it often involves computing and inverting a linearization at a series of estimates. Rather than inverting the physics-based model, we instead train a surrogate forward model (emulator) and leverage modern auto-grad libraries to solve for the input within a classical optimization framework. Current methods to train emulators are done in a black box supervised machine learning fashion and fail to take advantage of any existing knowledge of the forward model. In this article, we propose a simple learned weighted average model that embeds linearizations of the forward model around various reference points into the model itself, explicitly incorporating known physics. Grounding the learned model with physics based linearizations improves the forward modeling accuracy and provides richer physics based gradient information during the inversion process leading to more accurate signal recovery. We demonstrate the efficacy on an ocean acoustic tomography (OAT) example that aims to recover ocean sound speed profile (SSP) variations from acoustic observations (e.g.

artificial intelligence, machine learning, optimization problem, (19 more...)

arXiv.org Artificial Intelligence

2305.11056

Country:

North America > Mexico (0.05)
Atlantic Ocean > Gulf of Mexico (0.05)
North America > United States > Georgia > Fulton County > Atlanta (0.05)
(3 more...)

Genre: Research Report (0.82)

Industry: Transportation (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Generalized Neural Closure Models with Interpretability

Gupta, Abhinav, Lermusiaux, Pierre F. J.

arXiv.org Artificial IntelligenceMay-18-2023

Improving the predictive capability and computational cost of dynamical models is often at the heart of augmenting computational physics with machine learning (ML). However, most learning results are limited in interpretability and generalization over different computational grid resolutions, initial and boundary conditions, domain geometries, and physical or problem-specific parameters. In the present study, we simultaneously address all these challenges by developing the novel and versatile methodology of unified neural partial delay differential equations. We augment existing/low-fidelity dynamical models directly in their partial differential equation (PDE) forms with both Markovian and non-Markovian neural network (NN) closure parameterizations. The melding of the existing models with NNs in the continuous spatiotemporal space followed by numerical discretization automatically allows for the desired generalizability. The Markovian term is designed to enable extraction of its analytical form and thus provides interpretability. The non-Markovian terms allow accounting for inherently missing time delays needed to represent the real world. We obtain adjoint PDEs in the continuous form, thus enabling direct implementation across differentiable and non-differentiable computational physics codes, different ML frameworks, and treatment of nonuniformly-spaced spatiotemporal training data. We demonstrate the new generalized neural closure models (gnCMs) framework using four sets of experiments based on advecting nonlinear waves, shocks, and ocean acidification models. Our learned gnCMs discover missing physics, find leading numerical error terms, discriminate among candidate functional forms in an interpretable fashion, achieve generalization, and compensate for the lack of complexity in simpler models. Finally, we analyze the computational advantages of our new framework.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2301.06198

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.28)
North America > United States > New York (0.04)
North America > United States > Maine (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Self-Prompting Large Language Models for Zero-Shot Open-Domain QA

Li, Junlong, Zhang, Zhuosheng, Zhao, Hai

arXiv.org Artificial IntelligenceMay-16-2023

Open-Domain Question Answering (ODQA) aims at answering factoid questions without explicitly providing specific background documents. In a zero-shot setting, this task is more challenging since no data is available to train customized models like Retriever-Readers. Recently, Large Language Models (LLMs) like GPT-3 have shown their power in zero-shot ODQA with direct prompting methods, but these methods are still far from releasing the full powerfulness of LLMs only in an implicitly invoking way. In this paper, we propose a Self-Prompting framework to explicitly utilize the massive knowledge stored in the parameters of LLMs and their strong instruction understanding abilities. Concretely, we prompt LLMs step by step to generate multiple pseudo QA pairs with background passages and explanations from scratch and then use those generated elements for in-context learning. Experimental results show our method surpasses previous SOTA methods significantly on three widely-used ODQA datasets, and even achieves comparable performance with some Retriever-Reader models fine-tuned on full training data.

computational linguistic, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2212.08635

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Atlantic Ocean > Mediterranean Sea > Aegean Sea > Sea of Marmara > Dardanelles (0.05)
(9 more...)

Genre:

Research Report (0.70)
Personal (0.46)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

More Penguins Than Europeans Can Use Google Bard

WIREDMay-15-2023, 06:00:00 GMT

Google Bard, the search giant's ChatGPT rival, is already available in 180 countries and territories. But even though it's been widely available for months and was the centerpiece of Google's recent I/O event, it's missing one big region. The 450 million people living in the European Union are still unable to access Bard, or any of the company's other generative AI technologies. It's a move that has surprised lawmakers, and even Google won't say why it's holding back. Brando Benifei, the MEP leading the negotiations on Europe's new artificial intelligence rules, is not sure why the bloc had been excluded, describing the omission of the EU from Bard's rollout as a "big issue."

bard, google, google bard, (13 more...)

WIRED

Country:

North America > United States (0.06)
Europe > Norway (0.06)
Europe > Finland (0.06)
(3 more...)

Industry: Government > Regional Government > Europe Government (0.73)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.68)

Add feedback

ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human

Tian, Junfeng, Chen, Hehong, Xu, Guohai, Yan, Ming, Gao, Xing, Zhang, Jianhai, Li, Chenliang, Liu, Jiayi, Xu, Wenshen, Xu, Haiyang, Qian, Qi, Wang, Wei, Ye, Qinghao, Zhang, Jiejing, Zhang, Ji, Huang, Fei, Zhou, Jingren

arXiv.org Artificial IntelligenceMay-15-2023

In this paper, we present ChatPLUG, a Chinese open-domain dialogue system for digital human applications that instruction finetunes on a wide range of dialogue tasks in a unified internet-augmented format. Different from other open-domain dialogue models that focus on large-scale pre-training and scaling up model size or dialogue corpus, we aim to build a powerful and practical dialogue system for digital human with diverse skills and good multi-task generalization by internet-augmented instruction tuning. To this end, we first conduct large-scale pre-training on both common document corpus and dialogue data with curriculum learning, so as to inject various world knowledge and dialogue abilities into ChatPLUG. Then, we collect a wide range of dialogue tasks spanning diverse features of knowledge, personality, multi-turn memory, and empathy, on which we further instruction tune \modelname via unified natural language instruction templates. External knowledge from an internet search is also used during instruction finetuning for alleviating the problem of knowledge hallucinations. We show that \modelname outperforms state-of-the-art Chinese dialogue systems on both automatic and human evaluation, and demonstrates strong multi-task generalization on a variety of text understanding and generation tasks. In addition, we deploy \modelname to real-world applications such as Smart Speaker and Instant Message applications with fast inference. Our models and code will be made publicly available on ModelScope: https://modelscope.cn/models/damo/ChatPLUG-3.7B and Github: https://github.com/X-PLUG/ChatPLUG .

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2304.07849

Country:

Europe > Greece (0.05)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Beijing > Beijing (0.04)
(13 more...)

Genre:

Personal > Interview (0.92)
Research Report (0.63)

Industry:

Media (1.00)
Leisure & Entertainment > Sports (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.93)
Education (0.67)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback