AITopics | Bucharest

Collaborating Authors

Bucharest

CBM: Curriculum by Masking

Jarca, Andrei, Croitoru, Florinel-Alin, Ionescu, Radu Tudor

arXiv.org Artificial IntelligenceJul-9-2024

We propose Curriculum by Masking (CBM), a novel state-of-the-art curriculum learning strategy that effectively creates an easy-to-hard training schedule via patch (token) masking, offering significant accuracy improvements over the conventional training regime and previous curriculum learning (CL) methods. CBM leverages gradient magnitudes to prioritize the masking of salient image regions via a novel masking algorithm and a novel masking block. Our approach enables controlling sample difficulty via the patch masking ratio, generating an effective easy-to-hard curriculum by gradually introducing harder samples as training progresses. CBM operates with two easily configurable parameters, i.e. the number of patches and the curriculum schedule, making it a versatile curriculum learning approach for object recognition and detection. We conduct experiments with various neural architectures, ranging from convolutional networks to vision transformers, on five benchmark data sets (CIFAR-10, CIFAR-100, ImageNet, Food-101 and PASCAL VOC), to compare CBM with conventional as well as curriculum-based training regimes. Our results reveal the superiority of our strategy compared with the state-of-the-art curriculum learning regimes. We also observe improvements in transfer learning contexts, where CBM surpasses previous work by considerable margins in terms of accuracy. We release our code for free non-commercial use at https://github.com/CroitoruAlin/CBM.

curriculum, proceedings, training regime, (12 more...)

arXiv.org Artificial Intelligence

2407.05193

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

EVA-Score: Evaluation of Long-form Summarization on Informativeness through Extraction and Validation

Fan, Yuchen, Zhong, Xin, Wang, Chengsi, Wu, Gaoche, Zhou, Bowen

arXiv.org Artificial IntelligenceJul-6-2024

Summarization is a fundamental task in natural language processing (NLP) and since large language models (LLMs), such as GPT-4 and Claude, come out, increasing attention has been paid to long-form summarization whose input sequences are much longer, indicating more information contained. The current evaluation metrics either use similarity-based metrics like ROUGE and BERTScore which rely on similarity and fail to consider informativeness or LLM-based metrics, lacking quantitative analysis of information richness and are rather subjective. In this paper, we propose a new evaluation metric called EVA-Score using Atomic Fact Chain Generation and Document-level Relation Extraction together to automatically calculate the informativeness and give a definite number as an information score. Experiment results show that our metric shows a state-of-the-art correlation with humans. We also re-evaluate the performance of LLMs on long-form summarization comprehensively from the information aspect, forecasting future ways to use LLMs for long-form summarization.

atomic fact, information, summarization, (14 more...)

arXiv.org Artificial Intelligence

2407.04969

Country:

North America > United States (0.14)
Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.05)
Europe > Romania > Sud - Muntenia Development Region > Ialomița County (0.05)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Energy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PoPreRo: A New Dataset for Popularity Prediction of Romanian Reddit Posts

Rogoz, Ana-Cristina, Nechita, Maria Ilinca, Ionescu, Radu Tudor

arXiv.org Artificial IntelligenceJul-5-2024

We introduce PoPreRo, the first dataset for Popularity Prediction of Romanian posts collected from Reddit. The PoPreRo dataset includes a varied compilation of post samples from five distinct subreddits of Romania, totaling 28,107 data samples. Along with our novel dataset, we introduce a set of competitive models to be used as baselines for future research. Interestingly, the top-scoring model achieves an accuracy of 61.35% and a macro F1 score of 60.60% on the test set, indicating that the popularity prediction task on PoPreRo is very challenging. Further investigations based on few-shot prompting the Falcon-7B Large Language Model also point in the same direction. We thus believe that PoPreRo is a valuable resource that can be used to evaluate models on predicting the popularity of social media posts in Romanian. We release our dataset at https://github.com/ana-rogoz/PoPreRo.

arXiv.org Artificial Intelligence

2407.04541

Country:

Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.05)
Europe > Romania > Vest Development Region > Timiș County > Timișoara (0.05)
Europe > Ukraine (0.04)

Genre: Research Report (0.65)

Industry: Media > News (0.77)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)

Add feedback

LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation

Yin, Yongjing, Zeng, Jiali, Li, Yafu, Meng, Fandong, Zhang, Yue

arXiv.org Artificial IntelligenceJul-2-2024

The fine-tuning of open-source large language models (LLMs) for machine translation has recently received considerable attention, marking a shift towards data-centric research from traditional neural machine translation. However, the area of data collection for instruction fine-tuning in machine translation remains relatively underexplored. In this paper, we present LexMatcher, a simple yet effective method for data curation, the design of which is driven by the coverage of senses found in bilingual dictionaries. The construction process comprises data retrieval from an existing corpus and data augmentation that supplements the infrequent senses of polysemous words. Utilizing LLaMA2 as our base model, our approach outperforms the established baselines on the WMT2022 test sets and also exhibits remarkable performance in tasks related to word sense disambiguation and specialized terminology translation. These results underscore the effectiveness of LexMatcher in enhancing LLM-based machine translation. The code, data, and models are available at https://github.com/ARIES-LM/Lexmatcher-MT.git.

computational linguistic, machine translation, translation, (15 more...)

arXiv.org Artificial Intelligence

2406.01441

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(8 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Look Into Training Large Language Models on Next Generation Datacenters

Gherghescu, Alexandru M., Bădoiu, Vlad-Andrei, Agache, Alexandru, Dumitru, Mihai-Valentin, Vasilescu, Iuliu, Mantu, Radu, Raiciu, Costin

arXiv.org Artificial IntelligenceJul-1-2024

Is it still worth doing computer networking research? What are relevant problems in this space given the supremacy of hyperscalers in deployed large networks? We take an unconventional approach to finding relevant research directions, by starting from Microsoft's plans to build a $100 billion datacenter for ML. Our goal is to understand what models could be trained in such a datacenter, as well as the high-level challenges one may encounter in doing so. We first examine the constraints imposed by cooling and power requirements for our target datacenter and find that it is infeasible to build in a single location. We use LLM scaling laws to determine that we could train models of 50T or 100T. Finally, we examine how distributed training might work for these models, and what the networking requirements are. We conclude that building the datacenter and training such models is technically possible, but this requires a novel NIC-based multipath transport along with a redesign of the entire training stack, outlining a research agenda for our community in the near future.

communication, datacenter, gpus, (12 more...)

arXiv.org Artificial Intelligence

2407.12819

Country:

Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.06)
North America > United States > California (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Energy > Power Industry (1.00)
Information Technology (0.66)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

"Vorbe\c{s}ti Rom\^ane\c{s}te?" A Recipe to Train Powerful Romanian LLMs with English Instructions

Masala, Mihai, Ilie-Ablachim, Denis C., Dima, Alexandru, Corlatescu, Dragos, Zavelca, Miruna, Olaru, Ovio, Terian, Simina, Terian, Andrei, Leordeanu, Marius, Velicu, Horia, Popescu, Marius, Dascalu, Mihai, Rebedea, Traian

arXiv.org Artificial IntelligenceJun-27-2024

In recent years, Large Language Models (LLMs) have achieved almost human-like performance on various tasks. While some LLMs have been trained on multilingual data, most of the training data is in English; hence, their performance in English greatly exceeds other languages. To our knowledge, we are the first to collect and translate a large collection of texts, instructions, and benchmarks and train, evaluate, and release open-source LLMs tailored for Romanian. We evaluate our methods on four different categories, including academic benchmarks, MT-Bench (manually translated), and a professionally built historical, cultural, and social benchmark adapted to Romanian. We argue for the usefulness and high performance of RoLLMs by obtaining state-of-the-art results across the board. We publicly release all resources (i.e., data, training and evaluation code, models) to support and encourage research on Romanian LLMs while concurrently creating a generalizable recipe, adequate for other low or less-resourced languages.

arxiv, benchmark, instruction, (16 more...)

arXiv.org Artificial Intelligence

2406.18266

Country:

Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.05)
North America > Canada > Ontario > Toronto (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry: Education > Curriculum > Subject-Specific Education (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Prose-to-P4: Leveraging High Level Languages

Dumitru, Mihai-Valentin, Bădoiu, Vlad-Andrei, Raiciu, Costin

arXiv.org Artificial IntelligenceJun-19-2024

Languages such as P4 and NPL have enabled a wide and diverse range of networking applications that take advantage of programmable dataplanes. However, software development in these languages is difficult. To address this issue, high-level languages have been designed to offer programmers powerful abstractions that reduce the time, effort and domain-knowledge required for developing networking applications. These languages are then translated by a compiler into P4/NPL code. Inspired by the recent success of Large Language Models (LLMs) in the task of code generation, we propose to raise the level of abstraction even higher, employing LLMs to translate prose into high-level networking code. We analyze the problem, focusing on the motivation and opportunities, as well as the challenges involved and sketch out a roadmap for the development of a system that can generate high-level dataplane code from natural language instructions. We present some promising preliminary results on generating Lucid code from natural language.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.13679

Country: Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

MAiDE-up: Multilingual Deception Detection of GPT-generated Hotel Reviews

Ignat, Oana, Xu, Xiaomeng, Mihalcea, Rada

arXiv.org Artificial IntelligenceJun-18-2024

Deceptive reviews are becoming increasingly common, especially given the increase in performance and the prevalence of LLMs. While work to date has addressed the development of models to differentiate between truthful and deceptive human reviews, much less is known about the distinction between real reviews and AI-authored fake reviews. Moreover, most of the research so far has focused primarily on English, with very little work dedicated to other languages. In this paper, we compile and make publicly available the MAiDE-up dataset, consisting of 10,000 real and 10,000 AI-generated fake hotel reviews, balanced across ten languages. Using this dataset, we conduct extensive linguistic analyses to (1) compare the AI fake hotel reviews to real hotel reviews, and (2) identify the factors that influence the deception detection model performance. We explore the effectiveness of several models for deception detection in hotel reviews across three main dimensions: sentiment, location, and language. We find that these dimensions influence how well we can detect AI-generated fake reviews.

dataset, hotel review, sentiment, (14 more...)

arXiv.org Artificial Intelligence

2404.12938

Country:

Asia > Middle East > Republic of Türkiye > Ankara Province > Ankara (0.05)
Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
(10 more...)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

DataStates-LLM: Lazy Asynchronous Checkpointing for Large Language Models

Maurya, Avinash, Underwood, Robert, Rafique, M. Mustafa, Cappello, Franck, Nicolae, Bogdan

arXiv.org Artificial IntelligenceJun-15-2024

LLMs have seen rapid adoption in all domains. They need to be trained on high-end high-performance computing (HPC) infrastructures and ingest massive amounts of input data. Unsurprisingly, at such a large scale, unexpected events (e.g., failures of components, instability of the software, undesirable learning patterns, etc.), are frequent and typically impact the training in a negative fashion. Thus, LLMs need to be checkpointed frequently so that they can be rolled back to a stable state and subsequently fine-tuned. However, given the large sizes of LLMs, a straightforward checkpointing solution that directly writes the model parameters and optimizer state to persistent storage (e.g., a parallel file system), incurs significant I/O overheads. To address this challenge, in this paper we study how to reduce the I/O overheads for enabling fast and scalable checkpointing for LLMs that can be applied at high frequency (up to the granularity of individual iterations) without significant impact on the training process. Specifically, we introduce a lazy asynchronous multi-level approach that takes advantage of the fact that the tensors making up the model and optimizer state shards remain immutable for extended periods of time, which makes it possible to copy their content in the background with minimal interference during the training process. We evaluate our approach at scales of up to 180 GPUs using different model sizes, parallelism settings, and checkpointing frequencies. The results show up to 48$\times$ faster checkpointing and 2.2$\times$ faster end-to-end training runtime compared with the state-of-art checkpointing approaches.

checkpoint, host memory, shard, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3625549.3658685

2406.10707

Country:

Europe > Italy > Tuscany > Pisa Province > Pisa (0.05)
North America > United States > Illinois > Cook County > Lemont (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(21 more...)

Genre: Research Report > New Finding (0.34)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Introducing Brain-like Concepts to Embodied Hand-crafted Dialog Management System

Joublin, Frank, Ceravola, Antonello, Sandu, Cristian

arXiv.org Artificial IntelligenceJun-13-2024

Along with the development of chatbot, language models and speech technologies, there is a growing possibility and interest of creating systems able to interface with humans seamlessly through natural language or directly via speech. In this paper, we want to demonstrate that placing the research on dialog system in the broader context of embodied intelligence allows to introduce concepts taken from neurobiology and neuropsychology to define behavior architecture that reconcile hand-crafted design and artificial neural network and open the gate to future new learning approaches like imitation or learning by instruction. To do so, this paper presents a neural behavior engine that allows creation of mixed initiative dialog and action generation based on hand-crafted models using a graphical language. A demonstration of the usability of such brain-like inspired architecture together with a graphical dialog model is described through a virtual receptionist application running on a semi-public space.

application, architecture, miron, (17 more...)

arXiv.org Artificial Intelligence

2406.08996

Country:

Europe > Germany (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(6 more...)

Genre: Research Report (0.82)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Transportation (0.93)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback