AITopics

2509.12326

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Lampinen, Andrew Kyle, Chan, Stephanie C. Y., Hermann, Katherine

Learned feature representations are biased by complexity, learning order, position, and more

arXiv.org Artificial IntelligenceJun-6-2024

Representation learning, and interpreting learned representations, are key areas of focus in machine learning and neuroscience. Both fields generally use representations as a means to understand or improve a system's computations. In this work, however, we explore surprising dissociations between representation and computation that may pose challenges for such efforts. We create datasets in which we attempt to match the computational role that different features play, while manipulating other properties of the features or the data. We train various deep learning architectures to compute these multiple abstract features about their inputs. We find that their learned feature representations are systematically biased towards representing some features more strongly than others, depending upon extraneous properties such as feature complexity, the order in which features are learned, and the distribution of features over the inputs. For example, features that are simpler to compute or learned first tend to be represented more strongly and densely than features that are more complex or learned later, even if all features are learned equally well. We also explore how these biases are affected by architectures, optimizers, and training regimes (e.g., in transformers, features decoded earlier in the output sequence also tend to be represented more strongly). Our results help to characterize the inductive biases of gradient-based representation learning. These results also highlight a key challenge for interpretability $-$ or for comparing the representations of models and brains $-$ disentangling extraneous biases from the computationally important aspects of a system's internal representations.

representation, variance, xor, (14 more...)

2405.05847

Country: Europe > Latvia > Lubāna Municipality > Lubāna (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceMay-12-2023

$\partial\mathbb{B}$ nets: learning discrete functions by gradient descent

Wright, Ian

B nets are differentiable neural networks that learn discrete boolean-valued functions by gradient descent. B nets have two semantically equivalent aspects: a differentiable soft-net, with real weights, and a non-differentiable hard-net, with boolean weights. We train the soft-net by backpropagation and then'harden' the learned weights to yield boolean weights that bind with the hard-net. The result is a learned discrete function. 'Hardening' involves no loss of accuracy, unlike existing approaches to neural network binarization. Preliminary experiments demonstrate that B nets achieve comparable performance on standard machine learning problems yet are compact (due to 1-bit weights) and interpretable (due to the logical nature of the learnt functions). Neural networks are differentiable functions with weights represented by machine floats. Networks are trained by gradient descent in weight-space, where the direction of descent minimises loss. The gradients are efficiently calculated by the backpropagation algorithm (Rumelhart et al., 1986). This overall approach has led to tremendous advances in machine learning.

artificial intelligence, harden, machine learning, (17 more...)

2305.07315

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.81)

arXiv.org Artificial IntelligenceMar-13-2023

Max-Margin Works while Large Margin Fails: Generalization without Uniform Convergence

Glasgow, Margalit, Wei, Colin, Wootters, Mary, Ma, Tengyu

A major challenge in modern machine learning is theoretically understanding the generalization properties of overparameterized models. Many existing tools rely on uniform convergence (UC), a property that, when it holds, guarantees that the test loss will be close to the training loss, uniformly over a class of candidate models. Nagarajan and Kolter (2019) show that in certain simple linear and neural-network settings, any uniform convergence bound will be vacuous, leaving open the question of how to prove generalization in settings where UC fails. Our main contribution is proving novel generalization bounds in two such settings, one linear, and one non-linear. We study the linear classification setting of Nagarajan and Kolter, and a quadratic ground truth function learned via a two-layer neural network in the non-linear regime. We prove a new type of margin bound showing that above a certain signal-to-noise threshold, any near-max-margin classifier will achieve almost no test loss in these two settings. Our results show that near-max-margin is important: while any model that achieves at least a $(1 - \epsilon)$-fraction of the max-margin generalizes well, a classifier achieving half of the max-margin may fail terribly. Building on the impossibility results of Nagarajan and Kolter, under slightly stronger assumptions, we show that one-sided UC bounds and classical margin bounds will fail on near-max-margin classifiers. Our analysis provides insight on why memorization can coexist with generalization: we show that in this challenging regime where generalization occurs but UC fails, near-max-margin classifiers simultaneously contain some generalizable components and some overfitting components that memorize the data. The presence of the overfitting components is enough to preclude UC, but the near-extremal margin guarantees that sufficient generalizable components are present.

artificial intelligence, generalization, machine learning, (14 more...)

2206.07892

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

#artificialintelligenceJul-1-2022, 15:08:01 GMT

How neurons really work is being elucidated

A neuron is a thing of beauty. Ever since Santiago Ramón y Cajal stained them with silver nitrate to make them visible under the microscopes of the 1880s (see drawing above), their ramifications have fired the scientific imagination. Ramón y Cajal called them the butterflies of the soul. Your browser does not support the audio element. Those ramifications--dendrites by the dozen to collect incoming signals, called action potentials, from other neurons, and a single axon to pass on the summed wisdom of those signals in the form of another action potential, turn neurons into parts of far bigger structures known as neural networks.

action potential, neuron, perceptron, (15 more...)

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.25)
North America > United States > New Hampshire (0.05)
Europe > Greece (0.05)
(2 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

#artificialintelligenceJun-12-2022, 00:40:33 GMT

Building an IDS with sickitLearn MLP

A quick look at our dataset: In a short way we'll be having lines (instances) that contain several columns that we call features. In our case, we'll be using SickitLearn which is a python library that's widely used in ML.

perceptron, prediction, sickitlearn mlp, (10 more...)

Industry: Information Technology > Security & Privacy (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceAug-20-2020

A general approach to progressive learning

Vogelstein, Joshua T., Helm, Hayden S., Mehta, Ronak D., Dey, Jayanta, LeVine, Will, Yang, Weiwei, Tower, Bryan, Larson, Jonathan, White, Chris, Priebe, Carey E.

In biological learning, data are used to improve performance simultaneously on the current task, as well as previously encountered and as yet unencountered tasks. In contrast, classical machine learning starts from a blank slate, or tabula rasa, using data only for the single task at hand. While typical transfer learning algorithms can improve performance on future tasks, their performance on prior tasks degrades upon learning new tasks (called catastrophic forgetting). Many recent approaches have attempted to maintain performance given new tasks. But striving to avoid forgetting sets the goal unnecessarily low: the goal of progressive learning, whether biological or artificial, is to improve performance on all tasks (including past and future) with any new data. We propose representation ensembling, as opposed to learner ensembling (e.g., bagging), to address progressive learning. We show that representation ensembling -- including representations learned by decision forests or deep network -- uniquely demonstrates improved performance on both past and future tasks in a variety of simulated and real data scenarios, including vision, language, and adversarial tasks, with or without resource constraints. Beyond progressive learning, this work has immediate implications with regards to mitigating batch effects and federated learning applications. We expect a deeper understanding of the mechanisms underlying biological progressive learning to enable further improvements in machine progressive learning.

algorithm, artificial intelligence, machine learning, (14 more...)

2004.12908

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningJul-7-2019

Copula Representations and Error Surface Projections for the Exclusive Or Problem

Freedman, Roy S.

The exclusive or (xor) function is one of the simplest examples that illustrate why nonlinear feedforward networks are superior to linear regression for machine learning applications. We review the xor representation and approximation problems and discuss their solutions in terms of probabilistic logic and associative copula functions. After briefly reviewing the specification of feedforward networks, we compare the dynamics of learned error surfaces with different activation functions such as RELU and tanh through a set of colorful three-dimensional charts. The copula representations extend xor from Boolean to real values, thereby providing a convenient way to demonstrate the concept of cross-validation on in-sample and out-sample data sets. Our approach is pedagogical and is meant to be a machine learning prolegomenon. Keywords: machine learning; neural networks; probabilistic logic; copulas; error surfaces; xor.

artificial intelligence, freedman copula representation and surface, machine learning, (12 more...)

arXiv.org Machine Learning

1907.04483

Country: North America > United States > New York (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

#artificialintelligenceOct-22-2018, 11:28:58 GMT

Consturuct a neural network (multilayer perceptrons) using micro:bit

Let's learn the basics of neural networks using micro:bit. For neural network learning to solve various tasks, back propagation is generally required. Learning with this back propagation requires considerably long time calculations. It is not realistic to do this with such a tiny microbit with low computing power, even though it is not impossible. Therefore, here we will try forward calculation only, using the edge weights of the already learned neural network.

artificial intelligence, machine learning, neural network, (13 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.90)

#artificialintelligenceSep-17-2016, 07:05:51 GMT

Build a Neural Net to solve Exclusive OR (XOR) problem

Cool, colorful, creative Perceptron Learning Algorithm in plain words https://t.co/LPmMTPsRHb Maximum Likelihood Estimate (MLE) and Logistic Regression simplified: https://t.co/CUdOhpP4ko

artificial intelligence, machine learning, machinelearning, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)