AITopics | Anastasiou, Chrysovalantis

Collaborating Authors

Anastasiou, Chrysovalantis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BIG-Bench Extra Hard

Kazemi, Mehran, Fatemi, Bahare, Bansal, Hritik, Palowitch, John, Anastasiou, Chrysovalantis, Mehta, Sanket Vaibhav, Jain, Lalit K., Aglietti, Virginia, Jindal, Disha, Chen, Peter, Dikkala, Nishanth, Tyen, Gladys, Liu, Xin, Shalit, Uri, Chiappa, Silvia, Olszewska, Kate, Tay, Yi, Tran, Vinh Q., Le, Quoc V., Firat, Orhan

arXiv.org Artificial IntelligenceFeb-26-2025

Large language models (LLMs) are increasingly deployed in everyday applications, demanding robust general reasoning capabilities and diverse reasoning skillset. However, current LLM reasoning benchmarks predominantly focus on mathematical and coding abilities, leaving a gap in evaluating broader reasoning proficiencies. One particular exception is the BIG-Bench dataset, which has served as a crucial benchmark for evaluating the general reasoning capabilities of LLMs, thanks to its diverse set of challenging tasks that allowed for a comprehensive assessment of general reasoning across various skills within a unified framework. However, recent advances in LLMs have led to saturation on BIG-Bench, and its harder version BIG-Bench Hard (BBH). State-of-the-art models achieve near-perfect scores on many tasks in BBH, thus diminishing its utility. To address this limitation, we introduce BIG-Bench Extra Hard (BBEH), a new benchmark designed to push the boundaries of LLM reasoning evaluation. BBEH replaces each task in BBH with a novel task that probes a similar reasoning capability but exhibits significantly increased difficulty. We evaluate various models on BBEH and observe a (harmonic) average accuracy of 9.8\% for the best general-purpose model and 44.8\% for the best reasoning-specialized model, indicating substantial room for improvement and highlighting the ongoing challenge of achieving robust general reasoning in LLMs. We release BBEH publicly at: https://github.com/google-deepmind/bbeh.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.19187

Country:

North America > United States (0.14)
Asia > Thailand (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MetisFL: An Embarrassingly Parallelized Controller for Scalable & Efficient Federated Learning Workflows

Stripelis, Dimitris, Anastasiou, Chrysovalantis, Toral, Patrick, Asghar, Armaghan, Ambite, Jose Luis

arXiv.org Artificial IntelligenceNov-13-2023

A Federated Learning (FL) system typically consists of two core processing entities: the federation controller and the learners. The controller is responsible for managing the execution of FL workflows across learners and the learners for training and evaluating federated models over their private datasets. While executing an FL workflow, the FL system has no control over the computational resources or data of the participating learners. Still, it is responsible for other operations, such as model aggregation, task dispatching, and scheduling. These computationally heavy operations generally need to be handled by the federation controller. Even though many FL systems have been recently proposed to facilitate the development of FL workflows, most of these systems overlook the scalability of the controller. To meet this need, we designed and developed a novel FL system called MetisFL, where the federation controller is the first-class citizen. MetisFL re-engineers all the operations conducted by the federation controller to accelerate the training of large-scale FL workflows. By quantitatively comparing MetisFL against other state-of-the-art FL systems, we empirically demonstrate that MetisFL leads to a 10-fold wall-clock time execution boost across a wide range of challenging FL workflows with increasing model sizes and federation sites.

artificial intelligence, machine learning, metisfl, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3630048.3630186

2311.00334

Country: North America > United States > California (0.15)

Genre: Workflow (1.00)

Industry:

Education (0.94)
Health & Medicine (0.94)
Information Technology > Security & Privacy (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback