AITopics

2311.14581

Country: Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.85)

arXiv.org Artificial IntelligenceNov-24-2023

Task-Robust Pre-Training for Worst-Case Downstream Adaptation

Wang, Jianghui, Chen, Yang, Xie, Xingyu, Fang, Cong, Lin, Zhouchen

Pre-training has achieved remarkable success when transferred to downstream tasks. In machine learning, we care about not only the good performance of a model but also its behavior under reasonable shifts of condition. The same philosophy holds when pre-training a foundation model. However, the foundation model may not uniformly behave well for a series of related downstream tasks. This happens, for example, when conducting mask recovery regression where the recovery ability or the training instances diverge like pattern features are extracted dominantly on pre-training, but semantic features are also required on a downstream task. This paper considers pre-training a model that guarantees a uniformly good performance over the downstream tasks. We call this goal as $\textit{downstream-task robustness}$. Our method first separates the upstream task into several representative ones and applies a simple minimax loss for pre-training. We then design an efficient algorithm to solve the minimax loss and prove its convergence in the convex setting. In the experiments, we show both on large-scale natural language processing and computer vision datasets our method increases the metrics on worse-case downstream tasks. Additionally, some theoretical explanations for why our loss is beneficial are provided. Specifically, we show fewer samples are inherently required for the most challenging downstream task in some cases.

downstream task, international conference, robustness, (13 more...)

2306.1207

Country:

Asia > China (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

arXiv.org Artificial IntelligenceNov-24-2023

MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding

Wang, Steven H., Scardigli, Antoine, Tang, Leonard, Chen, Wei, Levkin, Dimitry, Chen, Anya, Ball, Spencer, Woodside, Thomas, Zhang, Oliver, Hendrycks, Dan

Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points Study, with over 39,000 examples and over 47,000 total annotations. Our fine-tuned Transformer baselines show promising results, with models performing well above random on most questions. However, on a large subset of questions, there is still room for significant improvement. As the only expert-annotated merger agreement dataset, MAUD is valuable as a benchmark for both the legal profession and the NLP community.

agreement, dataset, deal point text, (14 more...)

2301.00876

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(3 more...)

Genre: Financial News (1.00)

Industry:

Law > Business Law (1.00)
Banking & Finance > Mergers & Acquisitions (1.00)
Education > Assessment & Standards > Student Performance (0.56)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.47)

arXiv.org Artificial IntelligenceNov-23-2023

Class Uncertainty: A Measure to Mitigate Class Imbalance

Baltaci, Z. S., Oksuz, K., Kuzucu, S., Tezoren, K., Konar, B. K., Ozkan, A., Akbas, E., Kalkan, S.

Class-wise characteristics of training examples affect the performance of deep classifiers. A well-studied example is when the number of training examples of classes follows a long-tailed distribution, a situation that is likely to yield sub-optimal performance for under-represented classes. This class imbalance problem is conventionally addressed by approaches relying on the class-wise cardinality of training examples, such as data resampling. In this paper, we demonstrate that considering solely the cardinality of classes does not cover all issues causing class imbalance. To measure class imbalance, we propose "Class Uncertainty" as the average predictive uncertainty of the training examples, and we show that this novel measure captures the differences across classes better than cardinality. We also curate SVCI-20 as a novel dataset in which the classes have equal number of training examples but they differ in terms of their hardness; thereby causing a type of class imbalance which cannot be addressed by the approaches relying on cardinality. We incorporate our "Class Uncertainty" measure into a diverse set of ten class imbalance mitigation methods to demonstrate its effectiveness on long-tailed datasets as well as on our SVCI-20. Code and datasets will be made available.

cardinality, imbalance, ncertainty, (14 more...)

2311.1409

Country:

Asia > Middle East > Republic of Türkiye > Ankara Province > Ankara (0.04)
Europe > United Kingdom (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Shukla, Vinay, Zeng, Zhe, Ahmed, Kareem, Broeck, Guy Van den

A Unified Approach to Count-Based Weakly-Supervised Learning

arXiv.org Artificial IntelligenceNov-22-2023

High-quality labels are often very scarce, whereas unlabeled data with inferred weak labels occurs more naturally. In many cases, these weak labels dictate the frequency of each respective class over a set of instances. In this paper, we develop a unified approach to learning from such weakly-labeled data, which we call count-based weakly-supervised learning. At the heart of our approach is the ability to compute the probability of exactly k out of n outputs being set to true. This computation is differentiable, exact, and efficient. Building upon the previous computation, we derive a count loss penalizing the model for deviations in its distribution from an arithmetic constraint defined over label counts. We evaluate our approach on three common weakly-supervised learning paradigms and observe that our proposed approach achieves state-of-the-art or highly competitive results across all three of the paradigms.

experiment, learning, proportion, (15 more...)

2311.13718

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Canada > Ontario > Toronto (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Shwartz-Ziv, Ravid, LeCun, Yann

To Compress or Not to Compress- Self-Supervised Learning and Information Theory: A Review

arXiv.org Artificial IntelligenceNov-21-2023

Deep neural networks excel in supervised learning tasks but are constrained by the need for extensive labeled data. Self-supervised learning emerges as a promising alternative, allowing models to learn without explicit labels. Information theory, and notably the information bottleneck principle, has been pivotal in shaping deep neural networks. This principle focuses on optimizing the trade-off between compression and preserving relevant information, providing a foundation for efficient network design in supervised contexts. However, its precise role and adaptation in self-supervised learning remain unclear. In this work, we scrutinize various self-supervised learning approaches from an information-theoretic perspective, introducing a unified framework that encapsulates the \textit{self-supervised information-theoretic learning problem}. We weave together existing research into a cohesive narrative, delve into contemporary self-supervised methodologies, and spotlight potential research avenues and inherent challenges. Additionally, we discuss the empirical evaluation of information-theoretic quantities and their estimation methods. Overall, this paper furnishes an exhaustive review of the intersection of information theory, self-supervised learning, and deep neural networks.

information, learning, representation, (13 more...)

2304.09355

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey

Wörmann, Julian, Bogdoll, Daniel, Brunner, Christian, Bührle, Etienne, Chen, Han, Chuo, Evaristus Fuh, Cvejoski, Kostadin, van Elst, Ludger, Gottschall, Philip, Griesche, Stefan, Hellert, Christian, Hesels, Christian, Houben, Sebastian, Joseph, Tim, Keil, Niklas, Kelsch, Johann, Keser, Mert, Königshof, Hendrik, Kraft, Erwin, Kreuser, Leonie, Krone, Kevin, Latka, Tobias, Mattern, Denny, Matthes, Stefan, Motzkus, Franz, Munir, Mohsin, Nekolla, Moritz, Paschke, Adrian, von Pilchau, Stefan Pilar, Pintz, Maximilian Alexander, Qiu, Tianming, Qureishi, Faraz, Rizvi, Syed Tahseen Raza, Reichardt, Jörg, von Rueden, Laura, Sagel, Alexander, Sasdelli, Diogo, Scholl, Tobias, Schunk, Gerhard, Schwalbe, Gesina, Shen, Hao, Shoeb, Youssef, Stapelbroek, Hendrik, Stehr, Vera, Srinivas, Gurucharan, Tran, Anh Tuan, Vivekanandan, Abhishek, Wang, Ya, Wasserrab, Florian, Werner, Tino, Wirth, Christian, Zwicklbauer, Stefan

The availability of representative datasets is an essential prerequisite for many successful artificial intelligence and machine learning models. However, in real life applications these models often encounter scenarios that are inadequately represented in the data used for training. There are various reasons for the absence of sufficient data, ranging from time and cost constraints to ethical considerations. As a consequence, the reliable usage of these models, especially in safety-critical applications, is still a tremendous challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches. Knowledge augmented machine learning approaches offer the possibility of compensating for deficiencies, errors, or ambiguities in the data, thus increasing the generalization capability of the applied models. Even more, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-driven models with existing knowledge. The identified approaches are structured according to the categories knowledge integration, extraction and conformity. In particular, we address the application of the presented methods in the field of autonomous driving.

ieee cvf international conference, logic & formal reasoning, pattern recognition, (31 more...)

2205.04712

Country:

Europe > Germany (0.92)
Asia > China (0.67)
North America > United States > New York > New York County > New York City (0.28)
(29 more...)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.67)
Research Report > Promising Solution (0.67)
Research Report > New Finding (0.67)

Industry:

Transportation > Ground > Road (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)
Law (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
(19 more...)

Parekh, Akshay, Anand, Ashish, Awekar, Amit

Noise in Relation Classification Dataset TACRED: Characterization and Reduction

The overarching objective of this paper is two-fold. First, to explore model-based approaches to characterize the primary cause of the noise. in the RE dataset TACRED Second, to identify the potentially noisy instances. Towards the first objective, we analyze predictions and performance of state-of-the-art (SOTA) models to identify the root cause of noise in the dataset. Our analysis of TACRED shows that the majority of the noise in the dataset originates from the instances labeled as no-relation which are negative examples. For the second objective, we explore two nearest-neighbor-based strategies to automatically identify potentially noisy examples for elimination and reannotation. Our first strategy, referred to as Intrinsic Strategy (IS), is based on the assumption that positive examples are clean. Thus, we have used false-negative predictions to identify noisy negative examples. Whereas, our second approach, referred to as Extrinsic Strategy, is based on using a clean subset of the dataset to identify potentially noisy negative examples. Finally, we retrained the SOTA models on the eliminated and reannotated dataset. Our empirical results based on two SOTA models trained on TACRED-E following the IS show an average 4% F1-score improvement, whereas reannotation (TACRED-R) does not improve the original results. However, following ES, SOTA models show the average F1-score improvement of 3.8% and 4.4% when trained on respective eliminated (TACRED-EN) and reannotated (TACRED-RN) datasets respectively. We further extended the ES for cleaning positive examples as well, which resulted in an average performance improvement of 5.8% and 5.6% for the eliminated (TACRED-ENP) and reannotated (TACRED-RNP) datasets respectively.

dataset, reannotation, tacred, (13 more...)

2311.12298

Country:

North America > United States > District of Columbia > Washington (0.05)
Europe > Italy > Tuscany > Florence (0.04)
North America > United States > New Jersey (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Miguel, Ian, Salamon, András Z., Stone, Christopher

Towards Exploratory Reformulation of Constraint Models

It is well established that formulating an effective constraint model of a problem of interest is crucial to the efficiency with which it can subsequently be solved. Following from the observation that it is difficult, if not impossible, to know a priori which of a set of candidate models will perform best in practice, we envisage a system that explores the space of models through a process of reformulation from an initial model, guided by performance on a set of training instances from the problem class under consideration. We plan to situate this system in a refinement-based approach, where a user writes a constraint specification describing a problem above the level of abstraction at which many modelling decisions are made. In this position paper we set out our plan for an exploratory reformulation system, and discuss progress made so far.

reformulation, rewrite rule, specification, (12 more...)

2311.11868

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre:

Overview (0.46)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)

Köksal, Abdullatif, Schick, Timo, Schütze, Hinrich

MEAL: Stable and Active Learning for Few-Shot Prompting

Few-shot classification has made great strides due to foundation models that, through priming and prompting, are highly effective few-shot learners. However, this approach has high variance both across different sets of few shots (data selection) and across different finetuning runs (run variability). This is problematic not only because it impedes the fair comparison of different approaches, but especially because it makes few-shot learning too unreliable for many real-world applications. To alleviate these issues, we make two contributions for more stable and effective few-shot learning: First, we propose novel ensembling methods and show that they substantially reduce run variability. Second, we introduce a new active learning (AL) criterion for data selection and present the first AL-based approach specifically tailored towards prompt-based learning. In our experiments, we show that our combined method, MEAL (Multiprompt finetuning and prediction Ensembling with Active Learning), improves overall performance of prompt-based finetuning by 2.3 points on five diverse tasks. We publicly share our code and data splits in https://github.com/akoksal/MEAL.

accuracy, algorithm, classification, (15 more...)

2211.08358

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > Dominican Republic (0.04)
(5 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)