AITopics | Matoba, Kyle

Collaborating Authors

Matoba, Kyle

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Chen, Zeming, Cano, Alejandro Hernández, Romanou, Angelika, Bonnet, Antoine, Matoba, Kyle, Salvi, Francesco, Pagliardini, Matteo, Fan, Simin, Köpf, Andreas, Mohtashami, Amirkeivan, Sallinen, Alexandre, Sakhaeirad, Alireza, Swamy, Vinitra, Krawczuk, Igor, Bayazit, Deniz, Marmet, Axel, Montariol, Syrielle, Hartley, Mary-Anne, Jaggi, Martin, Bosselut, Antoine

arXiv.org Artificial IntelligenceNov-27-2023

Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by releasing MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain. MEDITRON builds on Llama-2 (through our adaptation of Nvidia's Megatron-LM distributed trainer), and extends pretraining on a comprehensively curated medical corpus, including selected PubMed articles, abstracts, and internationally-recognized medical guidelines. Evaluations using four major medical benchmarks show significant performance gains over several state-of-the-art baselines before and after task-specific finetuning. Overall, MEDITRON achieves a 6% absolute performance gain over the best public baseline in its parameter class and 3% over the strongest baseline we finetuned from Llama-2. Compared to closed-source LLMs, MEDITRON-70B outperforms GPT-3.5 and Med-PaLM and is within 5% of GPT-4 and 10% of Med-PaLM-2. We release our code for curating the medical pretraining corpus and the MEDITRON model weights to drive open-source development of more capable medical LLMs.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2311.16079

Country:

Asia (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)
(12 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Attention-based Modeling of Physical Systems: Improved Latent Representations

Pannatier, Arnaud, Matoba, Kyle, Fleuret, François

arXiv.org Artificial IntelligenceJun-13-2023

We propose attention-based modeling of quantities at arbitrary spatial points conditioned on related measurements at different locations. Our approach adapts a transformer-encoder to process measurements and read-out positions together. Attention-based models exhibit excellent performance across domains, which makes them an interesting candidate for modeling data irregularly sampled in space. We introduce a novel encoding strategy that applies the same transformation to the measurements and read-out positions, after which they are combined with encoded measurement values instead of relying on two different mappings. Efficiently learning input-output mappings from irregularly-spaced data is a fundamental challenge in modeling physical phenomena. To evaluate the effectiveness of our model, we conduct experiments on diverse problem domains, including high-altitude wind nowcasting, two-days weather forecasting, fluid dynamics, and heat diffusion. Our attention-based model consistently outperforms state-of-the-art models, such as Graph Element Networks and Conditional Neural Processes, for modeling irregularly sampled data. Notably, our model reduces root mean square error (RMSE) for wind nowcasting, improving from 9.24 to 7.98 and for a heat diffusion task from .126 to .084. We hypothesize that this superior performance can be attributed to the enhanced flexibility of our latent representation and the improved data encoding technique. To support our hypothesis, we design a synthetic experiment that reveals excessive bottlenecking in the latent representations of alternative models, which hinders information utilization and impedes training.

artificial intelligence, latent representation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2210.11269

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Switzerland > Geneva > Geneva (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Energy > Oil & Gas > Upstream (0.93)
Transportation > Air (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Architecture (0.94)

Add feedback

Efficiently Training Low-Curvature Neural Networks

Srinivas, Suraj, Matoba, Kyle, Lakkaraju, Himabindu, Fleuret, Francois

arXiv.org Artificial IntelligenceJan-10-2023

Standard deep neural networks often have excess non-linearity, making them susceptible to issues such as low adversarial robustness and gradient instability. Common methods to address these downstream issues, such as adversarial training, are expensive and often sacrifice predictive accuracy. In this work, we address the core issue of excess non-linearity via curvature, and demonstrate low-curvature neural networks (LCNNs) that obtain drastically lower curvature than standard models while exhibiting similar predictive performance. This leads to improved robustness and stable gradients, at a fraction of the cost of standard adversarial training. To achieve this, we decompose overall model curvature in terms of curvatures and slopes of its constituent layers. To enable efficient curvature minimization of constituent layers, we introduce two novel architectural components: first, a non-linearity called centered-softplus that is a stable variant of the softplus non-linearity, and second, a Lipschitz-constrained batch normalization layer. Our experiments show that LCNNs have lower curvature, more stable gradients and increased off-the-shelf adversarial robustness when compared to standard neural networks, all without affecting predictive performance. Our approach is easy to use and can be readily incorporated into existing neural network architectures. Code to implement our method and replicate our experiments is available at https://github.com/kylematoba/lcnn.

artificial intelligence, curvature, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2206.07144

Country: North America (0.46)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Challenges for Using Impact Regularizers to Avoid Negative Side Effects

Lindner, David, Matoba, Kyle, Meulemans, Alexander

arXiv.org Artificial IntelligenceJan-29-2021

Designing reward functions for reinforcement learning is difficult: besides specifying which behavior is rewarded for a task, the reward also has to discourage undesired outcomes. Misspecified reward functions can lead to unintended negative side effects, and overall unsafe behavior. To overcome this problem, recent work proposed to augment the specified reward function with an impact regularizer that discourages behavior that has a big impact on the environment. Although initial results with impact regularizers seem promising in mitigating some types of side effects, important challenges remain. In this paper, we examine the main current challenges of impact regularizers and relate them to fundamental design decisions. We discuss in detail which challenges recent approaches address and which remain unsolved. Finally, we explore promising directions to overcome the unsolved challenges in preventing negative side effects with impact regularizers.

agent, artificial intelligence, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2101.12509

Country:

North America > United States (0.14)
Europe (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback