AITopics | Gambardella, Andrew

Collaborating Authors

Gambardella, Andrew

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance?

Uchiyama, Fumiya, Kojima, Takeshi, Gambardella, Andrew, Cao, Qi, Iwasawa, Yusuke, Matsuo, Yutaka

arXiv.org Artificial IntelligenceOct-9-2024

Recent large language models (LLMs) have demonstrated remarkable generalization abilities in mathematics and logical reasoning tasks. Prior research indicates that LLMs pre-trained with programming language data exhibit high mathematical and reasoning abilities; however, this causal relationship has not been rigorously tested. Our research aims to verify which programming languages and features during pre-training affect logical inference performance. Specifically, we pre-trained decoder-based language models from scratch using datasets from ten programming languages (e.g., Python, C, Java) and three natural language datasets (Wikipedia, Fineweb, C4) under identical conditions. Thereafter, we evaluated the trained models in a few-shot in-context learning setting on logical reasoning tasks: FLD and bAbi, which do not require commonsense or world knowledge. The results demonstrate that nearly all models trained with programming languages consistently outperform those trained with natural languages, indicating that programming languages contain factors that elicit logic inference performance. In addition, we found that models trained with programming languages exhibit a better ability to follow instructions compared to those trained with natural languages. Further analysis reveals that the depth of Abstract Syntax Trees representing parsed results of programs also affects logical reasoning performance. These findings will offer insights into the essential elements of pre-training for acquiring the foundational abilities of LLMs.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.06735

Country:

Asia > Japan (0.14)
North America > Canada (0.14)
Europe > Belgium (0.14)
Asia > Thailand (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning

Takashiro, Shota, Kojima, Takeshi, Gambardella, Andrew, Cao, Qi, Iwasawa, Yusuke, Matsuo, Yutaka

arXiv.org Artificial IntelligenceOct-1-2024

As large language models (LLMs) are applied across diverse domains, the ability to selectively unlearn specific information has become increasingly essential. For instance, LLMs are expected to provide confidential information to authorized internal users, such as employees or trusted partners, while withholding it from external users, including the general public and unauthorized entities. In response to this challenge, we propose a novel method termed ``in-context knowledge unlearning'', which enables the model to selectively forget information in test-time based on the context of the query. Our method fine-tunes pre-trained LLMs to enable prompt unlearning of target knowledge within the context, while preserving other knowledge. Experiments on the TOFU and AGE datasets using Llama2-7B/13B and Mistral-7B models show our method achieves up to 95% forgetting accuracy while retaining 80% of unrelated knowledge, significantly outperforming baselines in both in-domain and out-of-domain scenarios. Further investigation into the model's internal behavior revealed that while fine-tuned LLMs generate correct predictions in the middle layers and maintain them up to the final layer, they make the decision to forget at the last layer, i.e., ``LLMs pretend to forget''. Our findings offer valuable insights into enhancing the robustness of unlearning mechanisms in LLMs, setting a foundation for future research in the field.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.00382

Country:

Asia > Japan (0.15)
North America > United States (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology (0.46)
Law (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Language Models Do Hard Arithmetic Tasks Easily and Hardly Do Easy Arithmetic Tasks

Gambardella, Andrew, Iwasawa, Yusuke, Matsuo, Yutaka

arXiv.org Artificial IntelligenceJun-4-2024

The ability (and inability) of large language models (LLMs) to perform arithmetic tasks has been the subject of much theoretical and practical debate. We show that LLMs are frequently able to correctly and confidently predict the first digit of n-digit by m-digit multiplication tasks without using chain of thought reasoning, despite these tasks require compounding operations to solve. Simultaneously, LLMs in practice often fail to correctly or confidently predict the last digit of an n-digit by m-digit multiplication, a task equivalent to 1-digit by 1-digit multiplication which can be easily learned or memorized. We show that the latter task can be solved more robustly when the LLM is conditioned on all of the correct higher-order digits, which on average increases the confidence of the correct last digit on 5-digit by 5-digit multiplication tasks using Llama 2-13B by over 230% (0.13 to 0.43) and Mistral-7B by 150% (0.22 to 0.55).

digit, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2406.02356

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Real-World Robot Applications of Foundation Models: A Review

Kawaharazuka, Kento, Matsushima, Tatsuya, Gambardella, Andrew, Guo, Jiaxian, Paxton, Chris, Zeng, Andy

arXiv.org Artificial IntelligenceFeb-8-2024

Recent developments in foundation models, like Large Language Models (LLMs) and Vision-Language Models (VLMs), trained on extensive data, facilitate flexible application across different tasks and modalities. Their impact spans various fields, including healthcare, education, and robotics. This paper provides an overview of the practical application of foundation models in real-world robotics, with a primary emphasis on the replacement of specific components within existing robot systems. The summary encompasses the perspective of input-output relationships in foundation models, as well as their role in perception, motion planning, and control within the field of robotics. This paper concludes with a discussion of future challenges and implications for practical robot applications.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2402.05741

Country:

Asia > Japan > Honshū (0.14)
Asia > Middle East > Israel (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Health & Medicine (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.88)

Add feedback

Detecting and Quantifying Malicious Activity with Simulation-based Inference

Gambardella, Andrew, State, Bogdan, Khan, Naeemullah, Tsourides, Leo, Torr, Philip H. S., Baydin, Atılım Güneş

arXiv.org Machine LearningOct-7-2021

Probabilistic programming provides numerous advantages Ideally speaking, a good recommendations system should be over other techniques, including but not able to identify and remove malicious users before they can limited to providing a disentangled representation disrupt the ranking system by a significant margin. However, of how malicious users acted under a structured to eliminate the risk of false positives a resilient ranking model, as well as allowing for the quantification system can use as much data as possible. So we have to of damage caused by malicious users. We show adjust the tradeoff between false positives and the damage a experiments in malicious user identification using set of malicious users can cause to a ranking system.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2110.02483

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (0.64)

Industry:

Media (1.00)
Government (0.93)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media (0.95)
(2 more...)

Add feedback

Simulation-Based Inference for Global Health Decisions

de Witt, Christian Schroeder, Gram-Hansen, Bradley, Nardelli, Nantas, Gambardella, Andrew, Zinkov, Rob, Dokania, Puneet, Siddharth, N., Espinosa-Gonzalez, Ana Belen, Darzi, Ara, Torr, Philip, Baydin, Atılım Güneş

arXiv.org Machine LearningMay-14-2020

This is fomenting the development of comprehensive modelling The COVID-19 pandemic has highlighted the importance and simulation to support the design of health interventions of in-silico epidemiological modelling in predicting and policies, and to guide decision-making in a variety of the dynamics of infectious diseases to inform health system domains [22, 49]. For example, simulations health policy and decision makers about suitable prevention have provided valuable insight to deal with public health and containment strategies. Work in this setting problems such as tobacco consumption in New Zealand [50], involves solving challenging inference and control and diabetes and obesity in the US [58]. They have been problems in individual-based models of ever increasing used to explore policy options such as those in maternal and complexity. Here we discuss recent breakthroughs antenatal care in Uganda [44], and applied to evaluate health in machine learning, specifically in simulation-based reform scenarios such as predicting changes in access to inference, and explore its potential as a novel venue primary care services in Portugal [21]. Their applicability for model calibration to support the design and evaluation in informing the design of cancer screening programmes of public health interventions. To further stimulate has been also discussed [42, 23]. Recently, simulations have research, we are developing software interfaces that informed the response to the COVID-19 outbreak [19].

arxiv, immunology, survey article, (13 more...)

arXiv.org Machine Learning

2005.07062

Country:

Africa > Uganda (0.25)
Oceania > New Zealand (0.25)
Europe > Portugal (0.25)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)

Add feedback

Multitask Soft Option Learning

Igl, Maximilian, Gambardella, Andrew, Nardelli, Nantas, Siddharth, N., Böhmer, Wendelin, Whiteson, Shimon

arXiv.org Machine LearningApr-1-2019

We present Multitask Soft Option Learning (MSOL), a hierarchical multitask framework based on Planning as Inference. MSOL extends the concept of options, using separate variational posteriors for each task, regularized by a shared prior. This allows fine-tuning of options for new tasks without forgetting their learned policies, leading to faster training without reducing the expressiveness of the hierarchical policy. Additionally, MSOL avoids several instabilities during training in a multitask setting and provides a natural way to not only learn intra-option policies, but also their terminations. We demonstrate empirically that MSOL significantly outperforms both hierarchical and flat transfer-learning baselines in challenging multi-task environments.

neural network, reinforcement learning, survey article, (17 more...)

arXiv.org Machine Learning

1904.01033

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback