AITopics

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.62)

Neural Information Processing SystemsOct-11-2024, 00:29:21 GMT

Introspective Learning : A Two-Stage approach for Inference in Neural Networks

In this paper, we advocate for two stages in a neural network's decision making process. The first is the existing feed-forward inference framework where patterns in given data are sensed and associated with previously learned patterns. The second stage is a slower reflection stage where we ask the network to reflect on its feed-forward decision by considering and evaluating all available choices. We use gradients of trained neural networks as a measurement of this reflection. A simple three-layered Multi Layer Perceptron is used as the second stage that predicts based on all extracted gradient features.

introspective learning, neural network, two-stage approach, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.63)

Adolfi, Federico, Vilas, Martina G., Wareham, Todd

The Computational Complexity of Circuit Discovery for Inner Interpretability

Many proposed applications of neural networks in machine learning, cognitive/brain science, and society hinge on the feasibility of inner interpretability via circuit discovery. This calls for empirical and theoretical explorations of viable algorithmic options. Despite advances in the design and testing of heuristics, there are concerns about their scalability and faithfulness at a time when we lack understanding of the complexity properties of the problems they are deployed to solve. To address this, we study circuit discovery with classical and parameterized computational complexity theory: (1) we describe a conceptual scaffolding to reason about circuit finding queries in terms of affordances for description, explanation, prediction and control; (2) we formalize a comprehensive set of queries that capture mechanistic explanation, and propose a formal framework for their analysis; (3) we use it to settle the complexity of many query variants and relaxations of practical interest on multi-layer perceptrons (part of, e.g., transformers). Our findings reveal a challenging complexity landscape. Many queries are intractable (NP-hard, $\Sigma^p_2$-hard), remain fixed-parameter intractable (W[1]-hard) when constraining model/circuit features (e.g., depth), and are inapproximable under additive, multiplicative, and probabilistic approximation schemes. To navigate this landscape, we prove there exist transformations to tackle some of these hard problems (NP- vs. $\Sigma^p_2$-complete) with better-understood heuristics, and prove the tractability (PTIME) or fixed-parameter tractability (FPT) of more modest queries which retain useful affordances. This framework allows us to understand the scope and limits of interpretability queries, explore viable options, and compare their resource demands among existing and future architectures.

artificial intelligence, machine learning, neuron, (16 more...)

2410.08025

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > Canada > Newfoundland and Labrador > Newfoundland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(13 more...)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.56)

Optimized Biomedical Question-Answering Services with LLM and Multi-BERT Integration

Qian, Cheng, Shi, Xianglong, Yao, Shanshan, Liu, Yichen, Zhou, Fengming, Zhang, Zishu, Akram, Junaid, Braytee, Ali, Anaissi, Ali

We present a refined approach to biomedical question-answering (QA) services by integrating large language models (LLMs) with Multi-BERT configurations. By enhancing the ability to process and prioritize vast amounts of complex biomedical data, this system aims to support healthcare professionals in delivering better patient outcomes and informed decision-making. Through innovative use of BERT and BioBERT models, combined with a multi-layer perceptron (MLP) layer, we enable more specialized and efficient responses to the growing demands of the healthcare sector. Our approach not only addresses the challenge of overfitting by freezing one BERT model while training another but also improves the overall adaptability of QA services. The use of extensive datasets, such as BioASQ and BioMRC, demonstrates the system's ability to synthesize critical information. This work highlights how advanced language models can make a tangible difference in healthcare, providing reliable and responsive tools for professionals to manage complex information, ultimately serving the broader goal of improved care and data-driven insights.

large language model, machine learning, natural language, (19 more...)

2410.12856

Country: Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.56)

Sasse, Arthur Mendonça, de Farias, Claudio Miceli

Evaluating Federated Kolmogorov-Arnold Networks on Non-IID Data

Federated Kolmogorov-Arnold Networks (F-KANs) have already been proposed, but their assessment is at an initial stage. We present a comparison between KANs (using B-splines and Radial Basis Functions as activation functions) and Multi- Layer Perceptrons (MLPs) with a similar number of parameters for 100 rounds of federated learning in the MNIST classification task using non-IID partitions with 100 clients. After 15 trials for each model, we show that the best accuracies achieved by MLPs can be achieved by Spline-KANs in half of the time (in rounds), with just a moderate increase in computing time.

accuracy, artificial intelligence, machine learning, (17 more...)

2410.08961

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States > Virginia (0.04)

Genre:

Research Report > Experimental Study (0.95)
Research Report > New Finding (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling

Zhang, Ruochen, Yu, Qinan, Zang, Matianyu, Eickhoff, Carsten, Pavlick, Ellie

Using English and Chinese multilingual and monolingual models, we analyze the internal circuitry involved in two tasks, one focusing on indirect object identification (IOI) which is virtually identical between the languages, and one which involves generating paste tense verbs that require morphological marking in English but not in Chinese. Our contributions are as follows: We show that a multilingual model uses a single circuit to handle the same syntactic process independently of the language in which it occurs ( 3.4). We show that even monolingual models trained independently on English and Chinese each adopt nearly the same circuit for this task ( 3.5), suggesting a surprising amount of consistency with how LLMs learn to handle this particular aspect of language modeling. Finally, we show that, when faced with similar tasks that require language-specific morphological processes, multilingual models still invoke a largely overlapping circuit, but employ language-specific components as needed. Specifically, in our task, we find that the model uses a circuit that consists primarily of attention heads to perform most of the task, but employs the feed-forward networks in English only to perform morphological marking that is necessary in English but not in Chinese ( 4). Together, our results provide new insights into how LLMs trade off between exploiting common structures and preserving linguistic differences when tasked with modeling multiple languages simultaneously. Our experiments can lay the groundwork for future works which seek to improve cross-lingual transfer through more principled parameter updates (Wu et al., 2024), as well as work which seeks to use LLMs in order to improve the study of linguistic and grammatical structure for its own sake (Lakretz et al., 2021; Misra & Kim, 2024).

large language model, machine learning, natural language, (19 more...)

2410.09223

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.48)

arXiv.org Machine LearningOct-11-2024

Generalized Sparse Additive Model with Unknown Link Function

Yuan, Peipei, You, Xinge, Chen, Hong, Zhang, Xuelin, Peng, Qinmu

--Generalized additive models (GAM) have been successfully applied to high dimensional data analysis. T o alleviate this problem, we propose a new sparse additive model, named generalized sparse additive model with unknown link function (GSAMUL), in which the component functions are estimated by B-spline basis and the unknown link function is estimated by a multi-layer perceptron (MLP) network. The proposed GSAMUL can realize both variable selection and hidden interaction. We integrate this estimation into a bilevel optimization problem, where the data is split into training set and validation set. In theory, we provide the guarantees about the convergence of the approximate procedure. In applications, experimental evaluations on both synthetic and real world data sets consistently validate the e ff ectiveness of the proposed approach. I ntroduction Additive models and generalized additive models (GAMs) have been widely used in data analysis when exploring the nonlinear e ff ects of the variables on the response [1]-[6]. Especially for high-dimensional data, they are useful to address "the curse of dimensionality" [7], [8].

additive model, gsamul, link function, (15 more...)

arXiv.org Machine Learning

2410.06012

Country:

Asia > China > Hubei Province > Wuhan (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Neural Information Processing SystemsOct-10-2024, 23:41:43 GMT

Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds

We propose a novel, conceptually simple and general framework for instance segmentation on 3D point clouds. Our method, called 3D-BoNet, follows the simple design philosophy of per-point multilayer perceptrons (MLPs). It consists of a backbone network followed by two parallel network branches for 1) bounding box regression and 2) point mask prediction. Moreover, it is remarkably computationally efficient as, unlike existing approaches, it does not require any post-processing steps such as non-maximum suppression, feature sampling, clustering or voting. Extensive experiments show that our approach surpasses existing work on both ScanNet and S3DIS datasets while being approximately 10x more computationally efficient.

learning object bounding box, point cloud, segmentation, (1 more...)

Technology:

Information Technology > Graphics (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.65)

Neural Information Processing SystemsOct-10-2024, 18:41:43 GMT

Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization

We consider a commonly studied supervised classification of a synthetic dataset whose labels are generated by feeding a one-layer non-linear neural network with random iid inputs. We study the generalization performances of standard classifiers in the high-dimensional regime where \alpha \frac{n}{d} is kept finite in the limit of a high dimension d and number of samples n . Our contribution is three-fold: First, we prove a formula for the generalization error achieved by \ell_2 regularized classifiers that minimize a convex loss. This formula was first obtained by the heuristic replica method of statistical physics. Secondly, focussing on commonly used loss functions and optimizing the \ell_2 regularization strength, we observe that while ridge regression performance is poor, logistic and hinge regression are surprisingly able to approach the Bayes-optimal generalization error extremely closely.

approaching bayes error, generalization error, high-dimensional perceptron, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.40)

Neural Information Processing SystemsOct-10-2024, 15:00:20 GMT

Generative Well-intentioned Networks

We propose Generative Well-intentioned Networks (GWINs), a novel framework for increasing the accuracy of certainty-based, closed-world classifiers. A conditional generative network recovers the distribution of observations that the classifier labels correctly with high certainty. We introduce a reject option to the classifier during inference, allowing the classifier to reject an observation instance rather than predict an uncertain label. These rejected observations are translated by the generative network to high-certainty representations, which are then relabeled by the classifier. This architecture allows for any certainty-based classifier or rejection function and is not limited to multilayer perceptrons.

classifier, generative network, generative well-intentioned network, (1 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)