AITopics | Supervised Learning

Collaborating Authors

Supervised Learning

Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

On Structured Prediction Theory with Calibrated Convex Surrogate Losses

Anton Osokin, Francis Bach, Simon Lacoste-Julien

Neural Information Processing SystemsOct-3-2024, 04:59:26 GMT

We provide novel theoretical insights on structured prediction in the context of efficient convex surrogate loss minimization with consistency guarantees. For any task loss, we construct a convex surrogate that can be optimized via stochastic gradient descent and we prove tight bounds on the so-called "calibration function" relating the excess surrogate risk to the actual risk. In contrast to prior related work, we carefully monitor the effect of the exponential number of classes in the learning guarantees as well as on the optimization complexity. As an interesting consequence, we formalize the intuition that some task losses make learning harder than others, and that the classical 0-1 loss is ill-suited for structured prediction.

calibration function, consistency, prediction, (13 more...)

Neural Information Processing Systems

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(4 more...)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

An explainable approach to detect case law on housing and eviction issues within the HUDOC database

Mohammadi, Mohammad, Wieling, Martijn, Vols, Michel

arXiv.org Artificial IntelligenceOct-3-2024

Case law is instrumental in shaping our understanding of human rights, including the right to adequate housing. The HUDOC database provides access to the textual content of case law from the European Court of Human Rights (ECtHR), along with some metadata. While this metadata includes valuable information, such as the application number and the articles addressed in a case, it often lacks detailed substantive insights, such as the specific issues a case covers. This underscores the need for detailed analysis to extract such information. However, given the size of the database - containing over 40,000 cases - an automated solution is essential. In this study, we focus on the right to adequate housing and aim to build models to detect cases related to housing and eviction issues. Our experiments show that the resulting models not only provide performance comparable to more sophisticated approaches but are also interpretable, offering explanations for their decisions by highlighting the most influential words. The application of these models led to the identification of new cases that were initially overlooked during data collection. This suggests that NLP approaches can be effectively applied to categorise case law based on the specific issues they address.

arXiv.org Artificial Intelligence

2410.02978

Country:

Europe > Netherlands (0.14)
Europe > Jersey (0.14)
Europe > Latvia (0.04)
(4 more...)

Genre: Research Report (0.70)

Industry:

Law > International Law (0.66)
Law > Civil Rights & Constitutional Law (0.56)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
(2 more...)

Add feedback

Aggressive Sampling for Multi-class to Binary Reduction with Applications to Text Classification

Bikash Joshi, Massih R. Amini, Ioannis Partalas, Franck Iutzeler, Yury Maximov

Neural Information Processing SystemsOct-2-2024, 19:55:35 GMT

We address the problem of multi-class classification in the case where the number of classes is very large. We propose a double sampling strategy on top of a multi-class to binary reduction strategy, which transforms the original multi-class problem into a binary classification problem over pairs of examples. The aim of the sampling strategy is to overcome the curse of long-tailed class distributions exhibited in majority of large-scale multi-class classification problems and to reduce the number of pairs of examples in the expanded data. We show that this strategy does not alter the consistency of the empirical risk minimization principle defined over the double sample reduction. Experiments are carried out on DMOZ and Wikipedia collections with 10,000 to 100,000 classes where we show the efficiency of the proposed approach in terms of training and prediction time, memory consumption, and predictive performance with respect to state-of-the-art approaches.

classification, dataset, predictive performance, (15 more...)

Neural Information Processing Systems

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(5 more...)

Genre: Research Report (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Numerical Approximation Capacity of Neural Networks with Bounded Parameters: Do Limits Exist, and How Can They Be Measured?

Liu, Li, Yu, Tengchao, Yong, Heng

arXiv.org Artificial IntelligenceSep-25-2024

The Universal Approximation Theorem posits that neural networks can theoretically possess unlimited approximation capacity with a suitable activation function and a freely chosen or trained set of parameters. However, a more practical scenario arises when these neural parameters, especially the nonlinear weights and biases, are bounded. This leads us to question: \textbf{Does the approximation capacity of a neural network remain universal, or does it have a limit when the parameters are practically bounded? And if it has a limit, how can it be measured?} Our theoretical study indicates that while universal approximation is theoretically feasible, in practical numerical scenarios, Deep Neural Networks (DNNs) with any analytic activation functions (such as Tanh and Sigmoid) can only be approximated by a finite-dimensional vector space under a bounded nonlinear parameter space (NP space), whether in a continuous or discrete sense. Based on this study, we introduce the concepts of \textit{$\epsilon$ outer measure} and \textit{Numerical Span Dimension (NSdim)} to quantify the approximation capacity limit of a family of networks both theoretically and practically. Furthermore, drawing on our new theoretical study and adopting a fresh perspective, we strive to understand the relationship between back-propagation neural networks and random parameter networks (such as the Extreme Learning Machine (ELM)) with both finite and infinite width. We also aim to provide fresh insights into regularization, the trade-off between width and depth, parameter space, width redundancy, condensation, and other related important issues.

neural network, nsdim, parameter space, (12 more...)

arXiv.org Artificial Intelligence

2409.16697

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

Semi-strong Efficient Market of Bitcoin and Twitter: an Analysis of Semantic Vector Spaces of Extracted Keywords and Light Gradient Boosting Machine Models

Wang, Fang, Gacesa, Marko

arXiv.org Artificial IntelligenceSep-24-2024

This study extends the examination of the Efficient-Market Hypothesis in Bitcoin market during a five year fluctuation period, from September 1 2017 to September 1 2022, by analyzing 28,739,514 qualified tweets containing the targeted topic "Bitcoin". Unlike previous studies, we extracted fundamental keywords as an informative proxy for carrying out the study of the EMH in the Bitcoin market rather than focusing on sentiment analysis, information volume, or price data. We tested market efficiency in hourly, 4-hourly, and daily time periods to understand the speed and accuracy of market reactions towards the information within different thresholds. A sequence of machine learning methods and textual analyses were used, including measurements of distances of semantic vector spaces of information, keywords extraction and encoding model, and Light Gradient Boosting Machine (LGBM) classifiers. Our results suggest that 78.06% (83.08%), 84.63% (87.77%), and 94.03% (94.60%) of hourly, 4-hourly, and daily bullish (bearish) market movements can be attributed to public information within organic tweets.

information, market movement, tweet, (15 more...)

arXiv.org Artificial Intelligence

2409.15988

Country:

Asia > China (0.04)
Africa > Nigeria (0.04)
Oceania > Australia (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.71)
(4 more...)

Add feedback

Universal approximation theorem for neural networks with inputs from a topological vector space

Ismailov, Vugar

arXiv.org Artificial IntelligenceSep-19-2024

We study feedforward neural networks with inputs from a topological vector space (TVS-FNNs). Unlike traditional feedforward neural networks, TVS-FNNs can process a broader range of inputs, including sequences, matrices, functions and more. We prove a universal approximation theorem for TVS-FNNs, which demonstrates their capacity to approximate any continuous function defined on this expanded input space.

activation function, approximation theorem, neural network, (12 more...)

arXiv.org Artificial Intelligence

2409.12913

Country:

North America > United States > New York (0.04)
Asia > Azerbaijan > Baku Economic Region > Baku (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.62)

Add feedback

Agent Workflow Memory

Wang, Zora Zhiruo, Mao, Jiayuan, Fried, Daniel, Neubig, Graham

arXiv.org Artificial IntelligenceSep-11-2024

Despite the potential of language model-based agents to solve real-world tasks such as web navigation, current methods still struggle with long-horizon tasks with complex action trajectories. In contrast, humans can flexibly solve complex tasks by learning reusable task workflows from past experiences and using them to guide future actions. To build agents that can similarly benefit from this process, we introduce Agent Workflow Memory (AWM), a method for inducing commonly reused routines, i.e., workflows, and selectively providing workflows to the agent to guide subsequent generations. AWM flexibly applies to both offline and online scenarios, where agents induce workflows from training examples beforehand or from test queries on the fly. We experiment on two major web navigation benchmarks -- Mind2Web and WebArena -- that collectively cover 1000+ tasks from 200+ domains across travel, shopping, and social media, among others. AWM substantially improves the baseline results by 24.6% and 51.1% relative success rate on Mind2Web and WebArena while reducing the number of steps taken to solve WebArena tasks successfully. Furthermore, online AWM robustly generalizes in cross-task, website, and domain evaluations, surpassing baselines from 8.9 to 14.0 absolute points as train-test task distribution gaps widen.

agent, awm, workflow, (15 more...)

arXiv.org Artificial Intelligence

2409.07429

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts (0.04)

Genre: Workflow (1.00)

Industry: Consumer Products & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.55)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.34)

Add feedback

Decompose the model: Mechanistic interpretability in image models with Generalized Integrated Gradients (GIG)

Kim, Yearim, Han, Sangyu, Han, Sangbum, Kwak, Nojun

arXiv.org Artificial IntelligenceSep-3-2024

In the field of eXplainable AI (XAI) in language models, the progression from local explanations of individual decisions to global explanations with high-level concepts has laid the groundwork for mechanistic interpretability, which aims to decode the exact operations. However, this paradigm has not been adequately explored in image models, where existing methods have primarily focused on classspecific interpretations. This paper introduces a novel approach to systematically trace the entire pathway from input through all intermediate layers to the final output within the whole dataset. We utilize Pointwise Feature Vectors (PFVs) and Effective Receptive Fields (ERFs) to decompose model embeddings into interpretable Concept Vectors. Then, we calculate the relevance between concept vectors with our Generalized Integrated Gradients (GIG), enabling a comprehensive, dataset-wide analysis of model behavior. In the field of eXplainable AI (XAI), efforts have historically transitioned from Local explanation to Global explanation to Mechanistic Interpretability. While local explanation methods including Selvaraju et al. (2016); Montavon et al. (2017); Sundararajan et al. (2017); Han et al. (2024) have focused on explaining specific decisions for individual instances, global explanation methods seek to uncover overall patterns and behaviors applicable across the entire dataset (Wu et al., 2022; Xuanyuan et al., 2023; Singh et al., 2024).

attribution, concept vector, explanation, (13 more...)

arXiv.org Artificial Intelligence

2409.0161

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Efficient and Scalable Estimation of Tool Representations in Vector Space

Moon, Suhong, Jha, Siddharth, Erdogan, Lutfi Eren, Kim, Sehoon, Lim, Woosang, Keutzer, Kurt, Gholami, Amir

arXiv.org Artificial IntelligenceSep-2-2024

Recent advancements in function calling and tool use have significantly enhanced the capabilities of large language models (LLMs) by enabling them to interact with external information sources and execute complex tasks. However, the limited context window of LLMs presents challenges when a large number of tools are available, necessitating efficient methods to manage prompt length and maintain accuracy. Existing approaches, such as fine-tuning LLMs or leveraging their reasoning capabilities, either require frequent retraining or incur significant latency overhead. A more efficient solution involves training smaller models to retrieve the most relevant tools for a given query, although this requires high quality, domain-specific data. To address those challenges, we present a novel framework for generating synthetic data for tool retrieval applications and an efficient data-driven tool retrieval strategy using small encoder models. Empowered by LLMs, we create ToolBank, a new tool retrieval dataset that reflects real human user usages. For tool retrieval methodologies, we propose novel approaches: (1) Tool2Vec: usage-driven tool embedding generation for tool retrieval, (2) ToolRefiner: a staged retrieval method that iteratively improves the quality of retrieved tools, and (3) MLC: framing tool retrieval as a multi-label classification problem. With these new methods, we achieve improvements of up to 27.28 in Recall@K on the ToolBench dataset and 30.5 in Recall@K on ToolBank. Additionally, we present further experimental results to rigorously validate our methods. Our code is available at \url{https://github.com/SqueezeAILab/Tool2Vec}

efficient and scalable estimation, tool representation, vector space

arXiv.org Artificial Intelligence

2409.02141

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.40)

Add feedback

Conan-embedding: General Text Embedding with More and Better Negative Samples

Li, Shiyu, Tang, Yang, Chen, Shizhe, Chen, Xi

arXiv.org Artificial IntelligenceAug-29-2024

With the growing popularity of RAG, the capabilities of embedding models are gaining increasing attention. Embedding models are primarily trained through contrastive loss learning, with negative examples being a key component. Previous work has proposed various hard negative mining strategies, but these strategies are typically employed as preprocessing steps. In this paper, we propose the conan-embedding model, which maximizes the utilization of more and higher-quality negative examples. Specifically, since the model's ability to handle preprocessed negative examples evolves during training, we propose dynamic hard negative mining method to expose the model to more challenging negative examples throughout the training process. Secondly, contrastive learning requires as many negative examples as possible but is limited by GPU memory constraints. Therefore, we use a Cross-GPU balancing Loss to provide more negative examples for embedding training and balance the batch size across multiple tasks. Moreover, we also discovered that the prompt-response pairs from LLMs can be used for embedding training. Our approach effectively enhances the capabilities of embedding models, currently ranking first on the Chinese leaderboard of Massive text embedding benchmark

arxiv preprint arxiv, hard negative mining, negative example, (11 more...)

arXiv.org Artificial Intelligence

2408.1571

Country: Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)

Add feedback