Goto

Collaborating Authors

 seqeval


Fine-tuning Large Language Models with Sequential Instructions

Hu, Hanxu, Yu, Simon, Chen, Pinzhen, Ponti, Edoardo M.

arXiv.org Artificial Intelligence

Despite the success of existing instruction-tuned models, we find that they usually struggle to respond to queries with multiple instructions. This impairs their performance in complex problems whose solution consists of multiple intermediate tasks. Thus, we contend that part of the fine-tuning data mixture should be sequential--containing a chain of interrelated tasks. We first approach sequential instruction tuning from a task-driven perspective, manually creating interpretable intermediate tasks for multilingual and visual question answering: namely "translate then predict" and "caption then answer". Next, we automate this process by turning instructions in existing datasets (e.g., Alpaca and FlanCoT) into diverse and complex sequential instructions, making our method general-purpose. Models that underwent our sequential instruction tuning show improved results in coding, maths, and open-ended generation. Moreover, we put forward a new benchmark named SeqEval to evaluate a model's ability to follow all the instructions in a sequence, which further corroborates the benefits of our fine-tuning method. We hope that our endeavours will open new research avenues on instruction tuning for complex tasks.


Bridging the Gap Between Offline and Online Reinforcement Learning Evaluation Methodologies

Sujit, Shivakanth, Braga, Pedro H. M., Bornschein, Jorg, Kahou, Samira Ebrahimi

arXiv.org Artificial Intelligence

Reinforcement learning (RL) has shown great promise with algorithms learning in environments with large state and action spaces purely from scalar reward signals. A crucial challenge for current deep RL algorithms is that they require a tremendous amount of environment interactions for learning. This can be infeasible in situations where such interactions are expensive; such as in robotics. Offline RL algorithms try to address this issue by bootstrapping the learning process from existing logged data without needing to interact with the environment from the very beginning. While online RL algorithms are typically evaluated as a function of the number of environment interactions, there exists no single established protocol for evaluating offline RL methods.In this paper, we propose a sequential approach to evaluate offline RL algorithms as a function of the training set size and thus by their data efficiency. Sequential evaluation provides valuable insights into the data efficiency of the learning process and the robustness of algorithms to distribution changes in the dataset while also harmonizing the visualization of the offline and online learning phases. Our approach is generally applicable and easy to implement. We compare several existing offline RL algorithms using this approach and present insights from a variety of tasks and offline datasets.


CamemBERT-bio: a Tasty French Language Model Better for your Health

Touchent, Rian, Romary, Laurent, de la Clergerie, Eric

arXiv.org Artificial Intelligence

Clinical data in hospitals are increasingly accessible for research through clinical data warehouses, however these documents are unstructured. It is therefore necessary to extract information from medical reports to conduct clinical studies. Transfer learning with BERT-like models such as CamemBERT has allowed major advances, especially for named entity recognition. However, these models are trained for plain language and are less efficient on biomedical data. This is why we propose a new French public biomedical dataset on which we have continued the pre-training of CamemBERT. Thus, we introduce a first version of CamemBERT-bio, a specialized public model for the French biomedical domain that shows 2.54 points of F1 score improvement on average on different biomedical named entity recognition tasks. Our findings demonstrate the success of continual pre-training from a French model and contrast with recent proposals on the same domain and language. One of our key contributions highlights the importance of using a standard evaluation protocol that enables a clear view of the current state-of-the-art for French biomedical models.


GitHub - chakki-works/seqeval: A Python framework for sequence labeling evaluation(named-entity recognition, pos tagging, etc...)

#artificialintelligence

This is well-tested by using the Perl script conlleval, which can be used for measuring the performance of a system that has processed the CoNLL-2000 shared task data. The default mode is compatible with conlleval. If you want to use the default mode, you don't need to specify it: In strict mode, the inputs are evaluated according to the specified schema. The behavior of the strict mode is different from the default one which is designed to simulate conlleval. If you want to use the strict mode, please specify mode'strict' and scheme arguments at the same time: