AITopics | srelu

Collaborating Authors

srelu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization

Huang, Yu, Wen, Zixin, Singh, Aarti, Chi, Yuejie, Chen, Yuxin

arXiv.org Machine LearningNov-11-2025

The ability to reason lies at the core of artificial intelligence (AI), and challenging problems usually call for deeper and longer reasoning to tackle. A crucial question about AI reasoning is whether models can extrapolate learned reasoning patterns to solve harder tasks with longer chain-of-thought (CoT). In this work, we present a theoretical analysis of transformers learning on synthetic state-tracking tasks with gradient descent. We mathematically prove how the algebraic structure of state-tracking problems governs the degree of extrapolation of the learned CoT. Specifically, our theory characterizes the length generalization of transformers through the mechanism of attention concentration, linking the retrieval robustness of the attention layer to the state-tracking task structure of long-context reasoning. Moreover, for transformers with limited reasoning length, we prove that a recursive self-training scheme can progressively extend the range of solvable problem lengths. To our knowledge, we provide the first optimization guarantee that constant-depth transformers provably learn $\mathsf{NC}^1$-complete problems with CoT, significantly going beyond prior art confined in $\mathsf{TC}^0$, unless the widely held conjecture $\mathsf{TC}^0 \neq \mathsf{NC}^1$ fails. Finally, we present a broad set of experiments supporting our theoretical results, confirming the length generalization behaviors and the mechanism of attention concentration.

large language model, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2511.07378

Country: North America > United States (0.28)

Genre:

Instructional Material > Course Syllabus & Notes (0.92)
Research Report > New Finding (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)
(2 more...)

Add feedback

Automatic Synthesis of Neurons for Recurrent Neural Nets

Olsson, Roland, Tran, Chau, Magnusson, Lars

arXiv.org Artificial IntelligenceJun-29-2022

We present a new class of neurons, ARNs, which give a cross entropy on test data that is up to three times lower than the one achieved by carefully optimized LSTM neurons. The explanations for the huge improvements that often are achieved are elaborate skip connections through time, up to four internal memory states per neuron and a number of novel activation functions including small quadratic forms. The new neurons were generated using automatic programming and are formulated as pure functional programs that easily can be transformed. We present experimental results for eight datasets and found excellent improvements for seven of them, but LSTM remained the best for one dataset. The results are so promising that automatic programming to generate new neurons should become part of the standard operating procedure for any machine learning practitioner who works on time series data such as sensor signals.

artificial intelligence, machine learning, neuron, (18 more...)

arXiv.org Artificial Intelligence

2207.03577

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Group-Connected Multilayer Perceptron Networks

Kachuee, Mohammad, Darabi, Sajad, Fazeli, Shayan, Sarrafzadeh, Majid

arXiv.org Machine LearningDec-19-2019

Despite the success of deep learning in domains such as image, voice, and graphs, there has been little progress in deep representation learning for domains without a known structure between features. For instance, a tabular dataset of different demographic and clinical factors where the feature interactions are not given as a prior. In this paper, we propose Group-Connected Multilayer Perceptron (GMLP) networks to enable deep representation learning in these domains. GMLP is based on the idea of learning expressive feature combinations (groups) and exploiting them to reduce the network complexity by defining local group-wise operations. During the training phase, GMLP learns a sparse feature grouping matrix using temperature annealing softmax with an added entropy loss term to encourage the sparsity. Furthermore, an architecture is suggested which resembles binary trees, where group-wise operations are followed by pooling operations to combine information; reducing the number of groups as the network grows in depth. To evaluate the proposed method, we conducted experiments on five different real-world datasets covering various application areas. Additionally, we provide visualizations on MNIST and synthesized data. According to the results, GMLP is able to successfully learn and exploit expressive feature combinations and achieve state-of-the-art classification performance on different datasets.

architecture, bnorm, dataset, (16 more...)

arXiv.org Machine Learning

1912.096

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Add feedback

Multi-scale Deep Neural Networks for Solving High Dimensional PDEs

Cai, Wei, Xu, Zhi-Qin John

arXiv.org Machine LearningOct-25-2019

In this paper, we propose the idea of radial scaling in frequency domain and activation functions with compact support to produce a multi-scale DNN (MscaleDNN), which will have the multi-scale capability in approximating high frequency and high dimensional functions and speeding up the solution of high dimensional PDEs. Numerical results on high dimensional function fitting and solutions of high dimensional PDEs, using loss functions with either Ritz energy or least squared PDE residuals, have validated the increased power of multi-scale resolution and high frequency capturing of the proposed MscaleDNN.

activation function, mscalednn, srelu, (11 more...)

arXiv.org Machine Learning

1910.1171

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > Texas > Dallas County > Dallas (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback

Single-bit-per-weight deep convolutional neural networks without batch-normalization layers for embedded systems

McDonnell, Mark D., Mostafa, Hesham, Wang, Runchun, van Schaik, Andre

arXiv.org Machine LearningJul-22-2019

Batch-normalization (BN) layers are thought to be an integrally important layer type in today's state-of-the-art deep convolutional neural networks for computer vision tasks such as classification and detection. However, BN layers introduce complexity and computational overheads that are highly undesirable for training and/or inference on low-power custom hardware implementations of real-time embedded vision systems such as UAVs, robots and Internet of Things (IoT) devices. They are also problematic when batch sizes need to be very small during training, and innovations such as residual connections introduced more recently than BN layers could potentially have lessened their impact. In this paper we aim to quantify the benefits BN layers offer in image classification networks, in comparison with alternative choices. In particular, we study networks that use shifted-ReLU layers instead of BN layers. We found, following experiments with wide residual networks applied to the ImageNet, CIFAR 10 and CIFAR 100 image classification datasets, that BN layers do not consistently offer a significant advantage. We found that the accuracy margin offered by BN layers depends on the data set, the network size, and the bit-depth of weights. We conclude that in situations where BN layers are undesirable due to speed, memory or complexity costs, that using shifted-ReLU layers instead should be considered; we found they can offer advantages in all these areas, and often do not impose a significant accuracy cost.

bn layer, deep learning, neural network, (16 more...)

arXiv.org Machine Learning

1907.06916

Country:

Oceania > Australia (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology (0.34)
Energy > Oil & Gas (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Learning with S-Shaped Rectified Linear Activation Units

Jin, Xiaojie (National University of Singapore) | Xu, Chunyan (Nanjing University of Science and Technology) | Feng, Jiashi (National University of Singapore) | Wei, Yunchao (National University of Singapore) | Xiong, Junjun (Beijing Samsung Telecom) | Yan, Shuicheng (National University of Singapore)

AAAI ConferencesApr-19-2016

Rectified linear activation units are important components for state-of-the-art deep convolutional networks. In this paper, we propose a novel S-shaped rectifiedlinear activation unit (SReLU) to learn both convexand non-convex functions, imitating the multiple function forms given by the two fundamental laws, namely the Webner-Fechner law and the Stevens law, in psychophysics and neural sciences. Specifically, SReLU consists of three piecewise linear functions, which are formulated by four learnable parameters. The SReLU is learned jointly with the training of the whole deep network through back propagation. During the training phase, to initialize SReLU in different layers, we propose a “freezing” method to degenerate SReLU into a predefined leaky rectified linear unit in the initial several training epochs and then adaptively learn the good initial values. SReLU can be universally used in the existing deep networks with negligible additional parameters and computation cost. Experiments with two popular CNN architectures, Network in Network and GoogLeNet on scale-various benchmarks including CIFAR10, CIFAR100, MNIST and ImageNet demonstrate that SReLU achieves remarkable improvement compared to other activation functions.

artificial intelligence, machine learning, srelu, (18 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback