AITopics | input

Collaborating Authors

input

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning to Add, Multiply, and Execute Algorithmic Instructions Exactly with Neural Networks

Neural Information Processing SystemsJun-18-2026, 09:56:39 GMT

Neural networks are known for their ability to approximate smooth functions, yet they fail to generalize perfectly to unseen inputs when trained on discrete operations. Such operations lie at the heart of algorithmic tasks such as arithmetic, which is often used as a test bed for algorithmic execution in neural networks. In this work, we ask: can neural networks learn to execute binary-encoded algorithmic instructions exactly? We use the Neural Tangent Kernel (NTK) framework to study the training dynamics of two-layer fully connected networks in the infinite-width limit and show how a sufficiently large ensemble of such models can be trained to execute exactly, with high probability, four fundamental tasks: binary permutations, binary addition, binary multiplication, and Subtract and Branch if Negative (SBN) instructions. Since SBN is Turing-complete, our framework extends to computable functions. We show how this can be efficiently achieved using only logarithmically many training data. Our approach relies on two techniques: structuring the training data to isolate bit-level rules, and controlling correlations in the NTK regime to align model predictions with the target algorithmic executions.

artificial intelligence, instruction count, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.45)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

f8f78f8043f35890181a824e53a57134-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 01:21:22 GMT

artificial intelligence, patch size, raster scan, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.72)

Add feedback

Approximating Real-Time Recurrent Learning with Random Kronecker Factors

Asier Mujika, Florian Meier, Angelika Steger

Neural Information Processing SystemsFeb-14-2026, 21:04:38 GMT

Wealso confirm these theoretical results experimentally. Further,we showempirically thattheKF-RTRLalgorithm captures long-term dependencies and almost matches the performance of TBPTT on real world tasks by trainingRecurrent Highway Networks on a synthetic string memorization task and onthe Penn TreeBank task, respectively.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland (0.05)
North America > Canada > Quebec > Montreal (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Empirical Risk Minimization in Non-interactive Local Differential Privacy Revisited

Di Wang, Marco Gaboardi, Jinhui Xu

Neural Information Processing SystemsFeb-12-2026, 06:57:21 GMT

In this paper, we revisit the Empirical Risk Minimization problem in the noninteractive local model of differential privacy. In the case of constant or low dimensions (pn), we first show that if the loss function is(,T)-smooth, wecanavoidadependence ofthesample complexity,toachieveerrorα,onthe exponential of the dimensionalityp with base1/α (i.e.,α p), which answers a questionin[19].

artificial intelligence, loss function, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York > Erie County > Buffalo (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

FullyUnconstrainedOnlineLearning

Neural Information Processing SystemsFeb-8-2026, 05:45:13 GMT

We provide a technique for online convex optimization that obtains regret G w Tlog( w G T)+ w 2 +G2 on G-Lipschitz losses for any comparison pointw without knowing eitherG or w .

artificial intelligence, machine learning, proof, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

f8f78f8043f35890181a824e53a57134-Supplemental-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 12:17:20 GMT

artificial intelligence, patch size, raster scan, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.72)

Add feedback

Coupling Generative Modeling and an Autoencoder with the Causal Bridge

Meng, Ruolin, Chung, Ming-Yu, Brahma, Dhanajit, Henao, Ricardo, Carin, Lawrence

arXiv.org Machine LearningOct-1-2025

We consider inferring the causal effect of a treatment (intervention) on an outcome of interest in situations where there is potentially an unobserved confounder influencing both the treatment and the outcome. This is achievable by assuming access to two separate sets of control (proxy) measurements associated with treatment and outcomes, which are used to estimate treatment effects through a function termed the em causal bridge (CB). We present a new theoretical perspective, associated assumptions for when estimating treatment effects with the CB is feasible, and a bound on the average error of the treatment effect when the CB assumptions are violated. From this new perspective, we then demonstrate how coupling the CB with an autoencoder architecture allows for the sharing of statistical strength between observed quantities (proxies, treatment, and outcomes), thus improving the quality of the CB estimates. Experiments on synthetic and real-world data demonstrate the effectiveness of the proposed approach in relation to the state-of-the-art methodology for proxy measurements.

assumption, causal effect, experiment, (13 more...)

arXiv.org Machine Learning

2509.25599

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

What If the Input is Expanded in OOD Detection?

Neural Information Processing SystemsMay-26-2025, 19:06:45 GMT

Out-of-distribution (OOD) detection aims to identify OOD inputs from unknown classes, which is important for the reliable deployment of machine learning models in the open world. Various scoring functions are proposed to distinguish it from in-distribution (ID) data. However, existing methods generally focus on excavating the discriminative information from a single input, which implicitly limits its representation dimension. In this work, we introduce a novel perspective, i.e., employing different common corruptions on the input space, to expand that. We reveal an interesting phenomenon termed confidence mutation, where the confidence of OOD data can decrease significantly under the corruptions, while the ID data shows a higher confidence expectation considering the resistance of semantic features. Based on that, we formalize a new scoring method, namely, Confidence aVerage (CoVer), which can capture the dynamic differences by simply averaging the scores obtained from different corrupted inputs and the original ones, making the OOD and ID distributions more separable in detection tasks.

expanded, input, ood detection, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.82)

Add feedback

GeoGalactica: A Scientific Large Language Model in Geoscience

Lin, Zhouhan, Deng, Cheng, Zhou, Le, Zhang, Tianhang, Xu, Yi, Xu, Yutong, He, Zhongmou, Shi, Yuanyuan, Dai, Beiya, Song, Yunchong, Zeng, Boyi, Chen, Qiyuan, Shi, Tao, Huang, Tianyu, Xu, Yiwei, Wang, Shu, Fu, Luoyi, Zhang, Weinan, He, Junxian, Ma, Chao, Zhu, Yunqiang, Wang, Xinbing, Zhou, Chenghu

arXiv.org Artificial IntelligenceDec-31-2023

Large language models (LLMs) have achieved huge success for their general knowledge and ability to solve a wide spectrum of tasks in natural language processing (NLP). Due to their impressive abilities, LLMs have shed light on potential inter-discipline applications to foster scientific discoveries of a specific domain by using artificial intelligence (AI for science, AI4S). In the meantime, utilizing NLP techniques in geoscience research and practice is wide and convoluted, contributing from knowledge extraction and document classification to question answering and knowledge discovery. In this work, we take the initial step to leverage LLM for science, through a rather straightforward approach. We try to specialize an LLM into geoscience, by further pre-training the model with a vast amount of texts in geoscience, as well as supervised fine-tuning (SFT) the resulting model with our custom collected instruction tuning dataset. These efforts result in a model GeoGalactica consisting of 30 billion parameters. To our best knowledge, it is the largest language model for the geoscience domain. More specifically, GeoGalactica is from further pre-training of Galactica. We train GeoGalactica over a geoscience-related text corpus containing 65 billion tokens curated from extensive data sources in the big science project Deep-time Digital Earth (DDE), preserving as the largest geoscience-specific text corpus. Then we fine-tune the model with 1 million pairs of instruction-tuning data consisting of questions that demand professional geoscience knowledge to answer. In this technical report, we will illustrate in detail all aspects of GeoGalactica, including data collection, data cleaning, base model selection, pre-training, SFT, and evaluation. We open-source our data curation tools and the checkpoints of GeoGalactica during the first 3/4 of pre-training.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2401.00434

Country:

North America > United States (0.93)
Asia > Middle East (0.27)
Europe (0.14)
Asia > China > Sichuan Province (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Materials (1.00)
Law (1.00)
Information Technology (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Less is More: Summary of Long Instructions is Better for Program Synthesis

Kuznia, Kirby, Mishra, Swaroop, Parmar, Mihir, Baral, Chitta

arXiv.org Artificial IntelligenceOct-22-2022

Despite the success of large pre-trained language models (LMs) such as Codex, they show below-par performance on the larger and more complicated programming related questions. We show that LMs benefit from the summarized version of complicated questions. Our findings show that superfluous information often present in problem description such as human characters, background stories, and names (which are included to help humans in understanding a task) does not help models in understanding a task. To this extent, we create a meta-dataset from the frequently used APPS dataset and the newly created CodeContests dataset for the program synthesis task. Our meta-dataset consists of human and synthesized summaries of the long and complicated programming questions. Experimental results on Codex show that our proposed approach outperforms baseline by 8.13% on the APPS dataset and 11.88% on the CodeContests dataset on average in terms of strict accuracy. Our analysis shows that summaries significantly improve performance for introductory (9.86%) and interview (11.48%) programming questions. However, it shows improvement by a small margin (~ 2%) for competitive programming questions, implying scope for future research in this direction.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2203.08597

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Arizona (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(2 more...)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback