AITopics | Cappelli, Alessandro

Collaborating Authors

Cappelli, Alessandro

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optical training of large-scale Transformers and deep neural networks with direct feedback alignment

Wang, Ziao, Müller, Kilian, Filipovich, Matthew, Launay, Julien, Ohana, Ruben, Pariente, Gustave, Mokaadi, Safa, Brossollet, Charles, Moreau, Fabien, Cappelli, Alessandro, Poli, Iacopo, Carron, Igor, Daudet, Laurent, Krzakala, Florent, Gigan, Sylvain

arXiv.org Artificial IntelligenceSep-1-2024

Modern machine learning relies nearly exclusively on dedicated electronic hardware accelerators. Photonic approaches, with low consumption and high operation speed, are increasingly considered for inference but, to date, remain mostly limited to relatively basic tasks. Simultaneously, the problem of training deep and complex neural networks, overwhelmingly performed through backpropagation, remains a significant limitation to the size and, consequently, the performance of current architectures and a major compute and energy bottleneck. Here, we experimentally implement a versatile and scalable training algorithm, called direct feedback alignment, on a hybrid electronic-photonic platform. An optical processing unit performs large-scale random matrix multiplications, which is the central operation of this algorithm, at speeds up to 1500 TeraOps. We perform optical training of one of the most recent deep learning architectures, including Transformers, with more than 1B parameters, and obtain good performances on both language and vision tasks. We study the compute scaling of our hybrid optical approach, and demonstrate a potential advantage for ultra-deep and wide neural networks, thus opening a promising route to sustain the exponential growth of modern artificial intelligence beyond traditional von Neumann approaches.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Artificial Intelligence

2409.12965

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Falcon Series of Open Language Models

Almazrouei, Ebtesam, Alobeidli, Hamza, Alshamsi, Abdulaziz, Cappelli, Alessandro, Cojocaru, Ruxandra, Debbah, Mérouane, Goffinet, Étienne, Hesslow, Daniel, Launay, Julien, Malartic, Quentin, Mazzotta, Daniele, Noune, Badreddine, Pannier, Baptiste, Penedo, Guilherme

arXiv.org Artificial IntelligenceNov-29-2023

We introduce the Falcon series: 7B, 40B, and 180B parameters causal decoder-only models trained on a diverse high-quality corpora predominantly assembled from web data. The largest model, Falcon-180B, has been trained on over 3.5 trillion tokens of text--the largest openly documented pretraining run. Falcon-180B significantly outperforms models such as PaLM or Chinchilla, and improves upon concurrently developed models such as LLaMA 2 or Inflection-1. It nears the performance of PaLM-2-Large at a reduced pretraining and inference cost, making it, to our knowledge, one of the three best language models in the world along with GPT-4 and PaLM-2-Large. We report detailed evaluations, as well as a deep dive into the methods and custom tooling employed to pretrain Falcon. Notably, we report on our custom distributed training codebase, allowing us to efficiently pretrain these models on up to 4,096 A100s on cloud AWS infrastructure with limited interconnect. We release a 600B tokens extract of our web dataset, as well as the Falcon-7/40/180B models under a permissive license to foster open-science and accelerate the development of an open ecosystem of large language models.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2311.16867

Country:

Europe (1.00)
North America > United States > Louisiana (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

Penedo, Guilherme, Malartic, Quentin, Hesslow, Daniel, Cojocaru, Ruxandra, Cappelli, Alessandro, Alobeidli, Hamza, Pannier, Baptiste, Almazrouei, Ebtesam, Launay, Julien

arXiv.org Artificial IntelligenceJun-1-2023

Large language models are commonly trained on a mixture of filtered web data and curated high-quality corpora, such as social media conversations, books, or technical papers. This curation process is believed to be necessary to produce performant models with broad zero-shot generalization abilities. However, as larger models requiring pretraining on trillions of tokens are considered, it is unclear how scalable is curation and whether we will run out of unique high-quality data soon. At variance with previous beliefs, we show that properly filtered and deduplicated web data alone can lead to powerful models; even significantly outperforming models from the state-of-the-art trained on The Pile. Despite extensive filtering, the high-quality data we extract from the web is still plentiful, and we are able to obtain five trillion tokens from CommonCrawl. We publicly release an extract of 600 billion tokens from our RefinedWeb dataset, and 1.3/7.5B parameters language models trained on it.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.01116

Country:

North America > United States (0.67)
Asia (0.46)

Genre: Research Report (1.00)

Industry:

Information Technology (1.00)
Banking & Finance > Real Estate (0.68)
Law > Intellectual Property & Technology Law (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Photonic co-processors in HPC: using LightOn OPUs for Randomized Numerical Linear Algebra

Hesslow, Daniel, Cappelli, Alessandro, Carron, Igor, Daudet, Laurent, Lafargue, Raphaël, Müller, Kilian, Ohana, Ruben, Pariente, Gustave, Poli, Iacopo

arXiv.org Machine LearningMay-7-2021

Randomized Numerical Linear Algebra (RandNLA) is a powerful class of methods, widely used in High Performance Computing (HPC). RandNLA provides approximate solutions to linear algebra functions applied to large signals, at reduced computational costs. However, the randomization step for dimensionality reduction may itself become the computational bottleneck on traditional hardware. Leveraging near constant-time linear random projections delivered by LightOn Optical Processing Units we show that randomization can be significantly accelerated, at negligible precision loss, in a wide range of important RandNLA algorithms, such as RandSVD or trace estimators.

artificial intelligence, projection, random projection, (12 more...)

arXiv.org Machine Learning

2104.14429

Country: Europe > France (0.15)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.84)

Add feedback