AITopics | Herten, Andreas

Collaborating Authors

Herten, Andreas

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML

John, Chelsea Maria, Nassyr, Stepan, Penke, Carolin, Herten, Andreas

arXiv.org Artificial IntelligenceOct-29-2024

Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML Chelsea Maria John, Stepan Nassyr, Carolin Penke, Andreas Herten J ulich Supercomputing Centre F orschungszentrum J ulich J ulich, Germany Abstract --The rapid advancement of machine learning (ML) technologies has driven the development of specialized hardware accelerators designed to facilitate more efficient model training. This paper introduces the CARAML benchmark suite, which is employed to assess performance and energy consumption during the training of transformer-based large language models and computer vision models on a range of hardware accelerators, including systems from NVIDIA, AMD, and Graphcore. CARAML provides a compact, automated, extensible, and reproducible framework for assessing the performance and energy of ML workloads across various novel hardware architectures. The design and implementation of CARAML, along with a custom power measurement tool called jpwr, are discussed in detail. I NTRODUCTION Fueled by the growing interest in training ever larger deep neural networks, such as large language models and other foundation models, the demands for hardware specialized on these workloads have grown immensely. Graphics processing units (GPUs) have evolved from their origins in computer graphics to become the primary computational engines of the AI revolution. While the central processing unit (CPU) controls a program's execution flow, it offloads compute-intensive highly-parallel tasks to the GPU (the accelerator). Evolving from a pioneering company, NVIDIA has emerged as the dominant player in the market as of 2024, spearheading current hardware developments. Other vendors, such as AMD and Intel, also provide GPUs aiming to accelerate model training and inference. Another promising class of AI accelerators is based on the idea of distributed local per-compute-unit memory together with on-chip message passing, in contrast to a shared memory hierarchy, typical to classical CPUs and GPUs. Performance characteristics not only vary between generations and vendors, but depend on the node or cluster configuration in which the accelerator is embedded, including CPU, memory, and interconnect. When evaluating and comparing these heterogeneous hardware options, e.g. for purchase decisions in an academic or industrial setting, it is not sufficient to compare hardware characteristics such as number of cores, thermal design power (TDP), theoretic bandwidth, or peak performance in F LO P/s . Their effect on workload performance is not straightforward, and the accelerator architectures might barely be comparable. Performance data reflecting the actual intended workloads, collected on various competing systems independently of vendor interests, offer highly valuable information. Power consumption is one such important metric in this regard.

benchmark, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2409.12994

Country: Europe > Germany (0.48)

Genre: Research Report (0.82)

Industry:

Energy (0.67)
Information Technology (0.59)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs

Ali, Mehdi, Fromm, Michael, Thellmann, Klaudia, Ebert, Jan, Weber, Alexander Arno, Rutmann, Richard, Jain, Charvi, Lübbering, Max, Steinigen, Daniel, Leveling, Johannes, Klug, Katrin, Buschhoff, Jasper Schulze, Jurkschat, Lena, Abdelwahab, Hammam, Stein, Benny Jörg, Sylla, Karl-Heinz, Denisov, Pavel, Brandizzi, Nicolo', Saleem, Qasid, Bhowmick, Anirban, Helmer, Lennard, John, Chelsea, Suarez, Pedro Ortiz, Ostendorff, Malte, Jude, Alex, Manjunath, Lalith, Weinbach, Samuel, Penke, Carolin, Filatov, Oleg, Asaadi, Shima, Barth, Fabio, Sifa, Rafet, Küch, Fabian, Herten, Andreas, Jäkel, René, Rehm, Georg, Kesselheim, Stefan, Köhler, Joachim, Flores-Herr, Nicolas

arXiv.org Artificial IntelligenceOct-15-2024

We present two multilingual LLMs designed to embrace Europe's linguistic diversity by supporting all 24 official languages of the European Union. Trained on a dataset comprising around 60% non-English data and utilizing a custom multilingual tokenizer, our models address the limitations of existing LLMs that predominantly focus on English or a few high-resource languages. We detail the models' development principles, i.e., data composition, tokenizer optimization, and training methodologies. The models demonstrate competitive performance across multilingual benchmarks, as evidenced by their performance on European versions of ARC, HellaSwag, MMLU, and TruthfulQA.

large language model, machine learning, meta-llama-3, (15 more...)

arXiv.org Artificial Intelligence

2410.0373

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry: Government > Regional Government > Europe Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Add feedback