Goto

Collaborating Authors

 Overview


Large Scale Legal Text Classification Using Transformer Models

arXiv.org Artificial Intelligence

Large multi-label text classification is a challenging Natural Language Processing (NLP) problem that is concerned with text classification for datasets with thousands of labels. We tackle this problem in the legal domain, where datasets, such as JRC-Acquis and EURLEX57K labeled with the EuroVoc vocabulary were created within the legal information systems of the European Union. The EuroVoc taxonomy includes around 7000 concepts. In this work, we study the performance of various recent transformer-based models in combination with strategies such as generative pretraining, gradual unfreezing and discriminative learning rates in order to reach competitive classification performance, and present new state-of-the-art results of 0.661 (F1) for JRC-Acquis and 0.754 for EURLEX57K. Furthermore, we quantify the impact of individual steps, such as language model fine-tuning or gradual unfreezing in an ablation study, and provide reference dataset splits created with an iterative stratification algorithm.


Welcome! You are invited to join a webinar: New Trends in Drug Discovery : Robotics and AI. After registering, you will receive a confirmation email about joining the webinar.

#artificialintelligence

The drug discovery ecosystem is changing rapidly. The rise of robotics and AI enables the emergence of a new model of data-driven drug discovery. Bringing together recent advances in life sciences automation and machine learning applications for drug discovery, new partnerships evolve that allow for game-changing improvements in the drug discovery process. The webinar will provide an overview on large-scale data and metadata capture enabled by end-to-end automation, going beyond what is currently possible in traditional wet lab operations, and will present case studies showing the impact on biotech and pharma operations, providing actionable insights for biopharma leaders. Disclaimer Regarding Audio/Video Recording: a) By participating in this Webinar, you will be participating in an event where photography, video and audio recording may occur. b) By participating in this webinar, you consent to interview(s), photography, audio recording, video recording and its/their release, publication, exhibition, or reproduction to be used for news, web casts, promotional purposes, telecasts, advertising, inclusion on web sites, or for any other purpose(s) that Invitrocue, its vendors, partners, affiliates and/or representatives deems fit to use. You release Invitrocue, its employees, and each and all persons involved from any liability connected with the taking, recording, digitising, or publication of interviews, photographs, computer images, video and/or or sound recordings.


A Perspective on Machine Learning Methods in Turbulence Modelling

arXiv.org Artificial Intelligence

This work presents a review of the current state of research in data-driven turbulence closure modeling. It offers a perspective on the challenges and open issues, but also on the advantages and promises of machine learning methods applied to parameter estimation, model identification, closure term reconstruction and beyond, mostly from the perspective of Large Eddy Simulation and related techniques. We stress that consistency of the training data, the model, the underlying physics and the discretization is a key issue that needs to be considered for a successful ML-augmented modeling strategy. In order to make the discussion useful for non-experts in either field, we introduce both the modeling problem in turbulence as well as the prominent ML paradigms and methods in a concise and self-consistent manner. Following, we present a survey of the current data-driven model concepts and methods, highlight important developments and put them into the context of the discussed challenges.


Online Semi-Supervised Learning with Bandit Feedback

arXiv.org Machine Learning

We formulate a new problem at the intersectionof semi-supervised learning and contextual bandits,motivated by several applications including clini-cal trials and ad recommendations. We demonstratehow Graph Convolutional Network (GCN), a semi-supervised learning approach, can be adjusted tothe new problem formulation. We also propose avariant of the linear contextual bandit with semi-supervised missing rewards imputation. We thentake the best of both approaches to develop multi-GCN embedded contextual bandit. Our algorithmsare verified on several real world datasets.


On the Universality of the Double Descent Peak in Ridgeless Regression

arXiv.org Machine Learning

We prove a non-asymptotic distribution-independent lower bound for the expected mean squared generalization error caused by label noise in ridgeless linear regression. Our lower bound generalizes a similar known result to the overparameterized (interpolating) regime. In contrast to most previous works, our analysis applies to a broad class of input distributions with almost surely full-rank feature matrices, which allows us to cover various types of deterministic or random feature maps. Our lower bound is asymptotically sharp and implies that in the presence of label noise, ridgeless linear regression does not perform well around the interpolation threshold for any of these feature maps. We analyze the imposed assumptions in detail and provide a theory for analytic (random) feature maps. Using this theory, we can show that our assumptions are satisfied for input distributions with a (Lebesgue) density and feature maps given by random deep neural networks with analytic activation functions like sigmoid, tanh, softplus or GELU. As further examples, we show that feature maps from random Fourier features and polynomial kernels also satisfy our assumptions. We complement our theory with further experimental and analytic results.


Estimating Individual Treatment Effects using Non-Parametric Regression Models: a Review

arXiv.org Machine Learning

Large observational data are increasingly available in disciplines such as health, economic and social sciences, where researchers are interested in causal questions rather than prediction. In this paper, we investigate the problem of estimating heterogeneous treatment effects using non-parametric regression-based methods. Firstly, we introduce the setup and the issues related to conducting causal inference with observational or non-fully randomized data, and how these issues can be tackled with the help of statistical learning tools. Then, we provide a review of state-of-the-art methods, with a particular focus on non-parametric modeling, and we cast them under a unifying taxonomy. After presenting a brief overview on the problem of model selection, we illustrate the performance of some of the methods on three different simulated studies and on a real world example to investigate the effect of participation in school meal programs on health indicators.


A Software Architecture for Autonomous Vehicles: Team LRM-B Entry in the First CARLA Autonomous Driving Challenge

arXiv.org Artificial Intelligence

The objective of the first CARLA autonomous driving challenge was to deploy autonomous driving systems to lead with complex traffic scenarios where all participants faced the same challenging traffic situations. According to the organizers, this competition emerges as a way to democratize and to accelerate the research and development of autonomous vehicles around the world using the CARLA simulator contributing to the development of the autonomous vehicle area. Therefore, this paper presents the architecture design for the navigation of an autonomous vehicle in a simulated urban environment that attempts to commit the least number of traffic infractions, which used as the baseline the original architecture of the platform for autonomous navigation CaRINA 2. Our agent traveled in simulated scenarios for several hours, demonstrating his capabilities, winning three out of the four tracks of the challenge, and being ranked second in the remaining track. Our architecture was made towards meeting the requirements of CARLA Autonomous Driving Challenge and has components for obstacle detection using 3D point clouds, traffic signs detection and classification which employs Convolutional Neural Networks (CNN) and depth information, risk assessment with collision detection using short-term motion prediction, decision-making with Markov Decision Process (MDP), and control using Model Predictive Control (MPC).


Riemannian Langevin Algorithm for Solving Semidefinite Programs

arXiv.org Machine Learning

We propose a Langevin diffusion-based algorithm for non-convex optimization and sampling on a product manifold of spheres. Under a logarithmic Sobolev inequality, we establish a guarantee for finite iteration convergence to the Gibbs distribution in terms of Kullback-Leibler divergence. We show that with an appropriate temperature choice, the suboptimality gap to the global minimum is guaranteed to be arbitrarily small with high probability. As an application, we analyze the proposed Langevin algorithm for solving the Burer-Monteiro relaxation of a semidefinite program (SDP). In particular, we establish a logarithmic Sobolev inequality for the Burer-Monteiro problem when there are no spurious local minima; hence implying a fast escape from saddle points. Combining the results, we then provide a global optimality guarantee for the SDP and the Max-Cut problem. More precisely, we show the Langevin algorithm achieves $\epsilon$-multiplicative accuracy with high probability in $\widetilde{\Omega}( n^2 \epsilon^{-3} )$ iterations, where $n$ is the size of the cost matrix.


Knowledge Distillation: A Survey

arXiv.org Machine Learning

In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.


The Need for Standardized Explainability

arXiv.org Artificial Intelligence

Explainable AI (XAI) is paramount in industry-grade AI; however existing methods fail to address this necessity, in part due to a lack of standardisation of explainability methods. The purpose of this paper is to offer a perspective on the current state of the area of explainability, and to provide novel definitions for Explainability and Interpretability to begin standardising this area of research. To do so, we provide an overview of the literature on explainability, and of the existing methods that are already implemented. Finally, we offer a tentative taxonomy of the different explainability methods, opening the door to future research.