Bucharest
Detecting Continuous Integration Skip : A Reinforcement Learning-based Approach
Mhalla, Hajer, Saied, Mohamed Aymen
The software industry is experiencing a surge in the adoption of Continuous Integration (CI) practices, both in commercial and open-source environments. CI practices facilitate the seamless integration of code changes by employing automated building and testing processes. Some frameworks, such as Travis CI and GitHub Actions have significantly contributed to simplifying and enhancing the CI process, rendering it more accessible and efficient for development teams. Despite the availability these CI tools , developers continue to encounter difficulties in accurately flagging commits as either suitable for CI execution or as candidates for skipping especially for large projects with many dependencies. Inaccurate flagging of commits can lead to resource-intensive test and build processes, as even minor commits may inadvertently trigger the Continuous Integration process. The problem of detecting CI-skip commits, can be modeled as binary classification task where we decide to either build a commit or to skip it. This study proposes a novel solution that leverages Deep Reinforcement Learning techniques to construct an optimal Decision Tree classifier that addresses the imbalanced nature of the data. We evaluate our solution by running a within and a cross project validation benchmark on diverse range of Open-Source projects hosted on GitHub which showcased superior results when compared with existing state-of-the-art methods.
Advancing Multimodal Medical Capabilities of Gemini
Yang, Lin, Xu, Shawn, Sellergren, Andrew, Kohlberger, Timo, Zhou, Yuchen, Ktena, Ira, Kiraly, Atilla, Ahmed, Faruk, Hormozdiari, Farhad, Jaroensri, Tiam, Wang, Eric, Wulczyn, Ellery, Jamil, Fayaz, Guidroz, Theo, Lau, Chuck, Qiao, Siyuan, Liu, Yun, Goel, Akshay, Park, Kendall, Agharwal, Arnav, George, Nick, Wang, Yang, Tanno, Ryutaro, Barrett, David G. T., Weng, Wei-Hung, Mahdavi, S. Sara, Saab, Khaled, Tu, Tao, Kalidindi, Sreenivasa Raju, Etemadi, Mozziyar, Cuadros, Jorge, Sorensen, Gregory, Matias, Yossi, Chou, Katherine, Corrado, Greg, Barral, Joelle, Shetty, Shravya, Fleet, David, Eslami, S. M. Ali, Tse, Daniel, Prabhakara, Shruthi, McLean, Cory, Steiner, Dave, Pilgrim, Rory, Kelly, Christopher, Azizi, Shekoofeh, Golden, Daniel
Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histopathology, ophthalmology, dermatology and genomic data. Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation based on expert evaluation, exceeding previous best results across two separate datasets by an absolute margin of 1% and 12%, where 57% and 96% of AI reports on normal cases, and 43% and 65% on abnormal cases, are evaluated as "equivalent or better" than the original radiologists' reports. We demonstrate the first ever large multimodal model-based report generation for 3D computed tomography (CT) volumes using Med-Gemini-3D, with 53% of AI reports considered clinically acceptable, although additional research is needed to meet expert radiologist reporting quality. Beyond report generation, Med-Gemini-2D surpasses the previous best performance in CXR visual question answering (VQA) and performs well in CXR classification and radiology VQA, exceeding SoTA or baselines on 17 of 20 tasks. In histopathology, ophthalmology, and dermatology image classification, Med-Gemini-2D surpasses baselines across 18 out of 20 tasks and approaches task-specific model performance. Beyond imaging, Med-Gemini-Polygenic outperforms the standard linear polygenic risk score-based approach for disease risk prediction and generalizes to genetically correlated diseases for which it has never been trained. Although further development and evaluation are necessary in the safety-critical medical domain, our results highlight the potential of Med-Gemini across a wide range of medical tasks.
HistNERo: Historical Named Entity Recognition for the Romanian Language
Avram, Andrei-Marius, Iuga, Andreea, Manolache, George-Vlad, Matei, Vlad-Cristian, Micliuş, Răzvan-Gabriel, Muntean, Vlad-Andrei, Sorlescu, Manuel-Petru, Şerban, Dragoş-Andrei, Urse, Adrian-Dinu, Păiş, Vasile, Cercel, Dumitru-Clementin
This work introduces HistNERo, the first Romanian corpus for Named Entity Recognition (NER) in historical newspapers. The dataset contains 323k tokens of text, covering more than half of the 19th century (i.e., 1817) until the late part of the 20th century (i.e., 1990). Eight native Romanian speakers annotated the dataset with five named entities. The samples belong to one of the following four historical regions of Romania, namely Bessarabia, Moldavia, Transylvania, and Wallachia. We employed this proposed dataset to perform several experiments for NER using Romanian pre-trained language models. Our results show that the best model achieved a strict F1-score of 55.69%. Also, by reducing the discrepancies between regions through a novel domain adaption technique, we improved the performance on this corpus to a strict F1-score of 66.80%, representing an absolute gain of more than 10%.
UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice Questions
Rogoz, Ana-Cristina, Ionescu, Radu Tudor
This work explores a novel data augmentation method based on Large Language Models (LLMs) for predicting item difficulty and response time of retired USMLE Multiple-Choice Questions (MCQs) in the BEA 2024 Shared Task. Our approach is based on augmenting the dataset with answers from zero-shot LLMs (Falcon, Meditron, Mistral) and employing transformer-based models based on six alternative feature combinations. The results suggest that predicting the difficulty of questions is more challenging. Notably, our top performing methods consistently include the question text, and benefit from the variability of LLM answers, highlighting the potential of LLMs for improving automated assessment in medical licensing exams. We make our code available https://github.com/ana-rogoz/BEA-2024.
Accuracy and repeatability of a parallel robot for personalised minimally invasive surgery
Pisla, Doina, Tucan, Paul, Chablat, Damien, Hajjar, Nadim Al, Ciocan, Andra, Pisla, Adrian, Pusca, Alexandru, Radu, Corina, Pop, Grigore, Gherman, Bogdan
The paper presents the methodology used for accuracy and repeatability measurements of the experimental model of a parallel robot developed for surgical applications. The experimental setup uses a motion tracking system (for accuracy) and a high precision measuring arm for position (for repeatability). The accuracy was obtained by comparing the trajectory data from the experimental measurement with a baseline trajectory defined with the kinematic models of the parallel robotic system. The repeatability was experimentally determined by moving (repeatedly) the robot platform in predefined points. Keywords: parallel robot, robotic assisted surgery, measurement, accuracy, repeatability.
Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers
Grigore, Diana-Nicoleta, Georgescu, Mariana-Iuliana, Justo, Jon Alvarez, Johansen, Tor, Ionescu, Andreea Iuliana, Ionescu, Radu Tudor
Few-shot knowledge distillation recently emerged as a viable approach to harness the knowledge of large-scale pre-trained models, using limited data and computational resources. In this paper, we propose a novel few-shot feature distillation approach for vision transformers. Our approach is based on two key steps. Leveraging the fact that vision transformers have a consistent depth-wise structure, we first copy the weights from intermittent layers of existing pre-trained vision transformers (teachers) into shallower architectures (students), where the intermittence factor controls the complexity of the student transformer with respect to its teacher. Next, we employ an enhanced version of Low-Rank Adaptation (LoRA) to distill knowledge into the student in a few-shot scenario, aiming to recover the information processing carried out by the skipped teacher layers. We present comprehensive experiments with supervised and self-supervised transformers as teachers, on five data sets from various domains, including natural, medical and satellite images. The empirical results confirm the superiority of our approach over competitive baselines. Moreover, the ablation results demonstrate the usefulness of each component of the proposed pipeline.
Fusing Dictionary Learning and Support Vector Machines for Unsupervised Anomaly Detection
Irofti, Paul, Hîji, Iulian-Andrei, Pătraşcu, Andrei, Cleju, Nicolae
We study in this paper the improvement of one-class support vector machines (OC-SVM) through sparse representation techniques for unsupervised anomaly detection. As Dictionary Learning (DL) became recently a common analysis technique that reveals hidden sparse patterns of data, our approach uses this insight to endow unsupervised detection with more control on pattern finding and dimensions. We introduce a new anomaly detection model that unifies the OC-SVM and DL residual functions into a single composite objective, subsequently solved through K-SVD-type iterative algorithms. A closed-form of the alternating K-SVD iteration is explicitly derived for the new composite model and practical implementable schemes are discussed. The standard DL model is adapted for the Dictionary Pair Learning (DPL) context, where the usual sparsity constraints are naturally eliminated. Finally, we extend both objectives to the more general setting that allows the use of kernel functions. The empirical convergence properties of the resulting algorithms are provided and an in-depth analysis of their parametrization is performed while also demonstrating their numerical performance in comparison with existing methods.
Leveraging Zero-Shot Prompting for Efficient Language Model Distillation
Vöge, Lukas, Gurgul, Vincent, Lessmann, Stefan
This paper introduces a novel approach for efficiently distilling Large Language Models (LLMs) into smaller, application-specific models, significantly reducing operational costs and manual labor. Addressing the challenge of deploying computationally intensive LLMs in specific applications or edge devices, this technique utilizes LLMs' reasoning capabilities to generate labels and natural language rationales for unlabeled data. Our approach enhances both finetuning and distillation by employing a multi-task training framework where student models mimic these rationales alongside teacher predictions. Key contributions include the employment of zero-shot prompting to elicit teacher model rationales, reducing the necessity for handcrafted few-shot examples and lowering the overall token count required, which directly translates to cost savings given the pay-per-token billing model of major tech companies' LLM APIs. Additionally, the paper investigates the impact of explanation properties on distillation efficiency, demonstrating that minimal performance loss occurs even when rationale augmentation is not applied across the entire dataset, facilitating further reductions of tokens. This research marks a step toward the efficient training of task-specific models with minimal human intervention, offering substantial cost-savings while maintaining, or even enhancing, performance.
Visually Grounded Speech Models have a Mutual Exclusivity Bias
Nortje, Leanne, Oneaţă, Dan, Matusevych, Yevgen, Kamper, Herman
When children learn new words, they employ constraints such as the mutual exclusivity (ME) bias: a novel word is mapped to a novel object rather than a familiar one. This bias has been studied computationally, but only in models that use discrete word representations as input, ignoring the high variability of spoken words. We investigate the ME bias in the context of visually grounded speech models that learn from natural images and continuous speech audio. Concretely, we train a model on familiar words and test its ME bias by asking it to select between a novel and a familiar object when queried with a novel word. To simulate prior acoustic and visual knowledge, we experiment with several initialisation strategies using pretrained speech and vision networks. Our findings reveal the ME bias across the different initialisation approaches, with a stronger bias in models with more prior (in particular, visual) knowledge. Additional tests confirm the robustness of our results, even when different loss functions are considered.
Cheap Ways of Extracting Clinical Markers from Texts
Sandu, Anastasia, Mihailescu, Teodor, Nisioi, Sergiu
This paper describes the work of the UniBuc Archaeology team for CLPsych's 2024 Shared Task, which involved finding evidence within the text supporting the assigned suicide risk level. Two types of evidence were required: highlights (extracting relevant spans within the text) and summaries (aggregating evidence into a synthesis). Our work focuses on evaluating Large Language Models (LLM) as opposed to an alternative method that is much more memory and resource efficient. The first approach employs a good old-fashioned machine learning (GOML) pipeline consisting of a tf-idf vectorizer with a logistic regression classifier, whose representative features are used to extract relevant highlights. The second, more resource intensive, uses an LLM for generating the summaries and is guided by chain-of-thought to provide sequences of text indicating clinical markers.