AITopics | Brakna

Collaborating Authors

Brakna

RoQLlama: A Lightweight Romanian Adapted Language Model

Dima, George-Andrei, Avram, Andrei-Marius, Crăciun, Cristian-George, Cercel, Dumitru-Clementin

arXiv.org Artificial IntelligenceOct-5-2024

The remarkable achievements obtained by open-source large language models (LLMs) in recent years have predominantly been concentrated on tasks involving the English language. In this paper, we aim to advance the performance of Llama2 models on Romanian tasks. We tackle the problem of reduced computing resources by using QLoRA for training. We release RoQLlama-7b, a quantized LLM, which shows equal or improved results compared to its full-sized counterpart when tested on seven Romanian downstream tasks in the zero-shot setup. Also, it consistently achieves higher average scores across all few-shot prompts. Additionally, we introduce a novel Romanian dataset, namely RoMedQA, which contains single-choice medical questions in Romanian.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.04269

Country:

Africa > Mauritania > Brakna > Aleg (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
Asia (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.48)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.59)

Add feedback

Efficient Algorithms for Empirical Group Distributional Robust Optimization and Beyond

Yu, Dingzhi, Cai, Yunuo, Jiang, Wei, Zhang, Lijun

arXiv.org Machine LearningMar-6-2024

We investigate the empirical counterpart of group distributionally robust optimization (GDRO), which aims to minimize the maximal empirical risk across $m$ distinct groups. We formulate empirical GDRO as a $\textit{two-level}$ finite-sum convex-concave minimax optimization problem and develop a stochastic variance reduced mirror prox algorithm. Unlike existing methods, we construct the stochastic gradient by per-group sampling technique and perform variance reduction for all groups, which fully exploits the $\textit{two-level}$ finite-sum structure of empirical GDRO. Furthermore, we compute the snapshot and mirror snapshot point by a one-index-shifted weighted average, which distinguishes us from the naive ergodic average. Our algorithm also supports non-constant learning rates, which is different from existing literature. We establish convergence guarantees both in expectation and with high probability, demonstrating a complexity of $\mathcal{O}\left(\frac{m\sqrt{\bar{n}\ln{m}}}{\varepsilon}\right)$, where $\bar n$ is the average number of samples among $m$ groups. Remarkably, our approach outperforms the state-of-the-art method by a factor of $\sqrt{m}$. Furthermore, we extend our methodology to deal with the empirical minimax excess risk optimization (MERO) problem and manage to give the expectation bound and the high probability bound, accordingly. The complexity of our empirical MERO algorithm matches that of empirical GDRO at $\mathcal{O}\left(\frac{m\sqrt{\bar{n}\ln{m}}}{\varepsilon}\right)$, significantly surpassing the bounds of existing methods.

algorithm, complexity, optimization, (15 more...)

arXiv.org Machine Learning

2403.03562

Country:

Africa > Mauritania > Brakna > Aleg (0.06)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback