AITopics | intermediate value

Collaborating Authors

intermediate value

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

b063829b922fdeb4fa3472dd3471ff43-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 12:23:09 GMT

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Optimizing Intermediate Memory for Long Sequences Training

Neural Information Processing SystemsOct-10-2025, 13:29:14 GMT

Meanwhile, Llama3 maintains its hidden size of 4k for inference efficiency.

activation recomputation, arxiv preprint arxiv, sequence length, (11 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Distinct Computations Emerge From Compositional Curricula in In-Context Learning

Lee, Jin Hwa, Lampinen, Andrew K., Singh, Aaditya K., Saxe, Andrew M.

arXiv.org Artificial IntelligenceJun-17-2025

In-context learning (ICL) research often considers learning a function in-context through a uniform sample of input-output pairs. Here, we investigate how presenting a compositional subtask curriculum in context may alter the computations a transformer learns. We design a compositional algorithmic task based on the modular exponential-a double exponential task composed of two single exponential subtasks and train transformer models to learn the task in-context. We compare (a) models trained using an in-context curriculum consisting of single exponential subtasks and, (b) models trained directly on the double exponential task without such a curriculum. We show that models trained with a subtask curriculum can perform zero-shot inference on unseen compositional tasks and are more robust given the same context length. We study how the task and subtasks are represented across the two training regimes. We find that the models employ diverse strategies modulated by the specific curriculum design.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.13253

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Education > Curriculum (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Chain-of-Thought Tokens are Computer Program Variables

Zhu, Fangwei, Wang, Peiyi, Sui, Zhifang

arXiv.org Artificial IntelligenceMay-9-2025

Chain-of-thoughts (CoT) requires large language models (LLMs) to generate intermediate steps before reaching the final answer, and has been proven effective to help LLMs solve complex reasoning tasks. However, the inner mechanism of CoT still remains largely unclear. In this paper, we empirically study the role of CoT tokens in LLMs on two compositional tasks: multi-digit multiplication and dynamic programming. While CoT is essential for solving these problems, we find that preserving only tokens that store intermediate results would achieve comparable performance. Furthermore, we observe that storing intermediate results in an alternative latent form will not affect model performance. We also randomly intervene some values in CoT, and notice that subsequent CoT tokens and the final answer would change correspondingly. These findings suggest that CoT tokens may function like variables in computer programs but with potential drawbacks like unintended shortcuts and computational complexity limits between tokens. The code and data are available at https://github.com/solitaryzero/CoTs_are_Variables.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.04955

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training

Luo, Cheng, Zhao, Jiawei, Chen, Zhuoming, Chen, Beidi, Anandkumar, Anima

arXiv.org Artificial IntelligenceJul-21-2024

We introduce Mini-Sequence Transformer (MsT), a simple and effective methodology for highly efficient and accurate LLM training with extremely long sequences. MsT partitions input sequences and iteratively processes mini-sequences to reduce intermediate memory usage. Integrated with activation recomputation, it enables significant memory savings in both forward and backward passes. In experiments with the Llama3-8B model, with MsT, we measure no degradation in throughput or convergence even with 12x longer sequences than standard implementations due to our careful memory optimizations. MsT is fully general, implementation-agnostic, and requires minimal code changes to integrate with existing LLM training frameworks.

activation recomputation, arxiv preprint arxiv, sequence length, (11 more...)

arXiv.org Artificial Intelligence

2407.15892

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Profiling checkpointing schedules in adjoint ST-AD

Hascoët, Laurent, Bouchot, Jean-Luc, Gaikwad, Shreyas Sunil, Narayanan, Sri Hari Krishna, Hückelheim, Jan

arXiv.org Artificial IntelligenceMay-24-2024

Section 4 discusses the information Source-transformation algorithmic differentiation (STthat we found most appropriate to guide the choice of AD) in its adjoint mode transforms a primal code that activated checkpoints, and an algorithm to extract this evaluates some original function into an adjoint code information at run-time by profiling execution of the adjoint that computes its gradient. It is well known [9] that code. Section 5 discusses implementation of this the most efficient implementation of the adjoint code profiling in an existing source-transformation AD tool, must progress backwards of the original computation, and section 6 applies it to two realistic test-cases taken progressively using values originating from the primal from the MITgcm code suite. We will show how the execution. The amount of values used grows linearly developer can achieve a significant performance gain by with the run time of the primal code and, since they are exploiting the profiling results. In section 7, we come used in the reverse of their production order, their management back to some limitations of our proposed approach and (data-flow reversal) is a key issue that requires discuss how they could be overcome, before concluding a delicate trade-off between storage and recomputation. in section 8. This work focuses on one particular setting, where data-flow reversal is primarily done through a stack 2 Our checkpointing model / setting and the memory cost of this stack is mitigated through In our setting, data-flow reversal is achieved by storing a classical storage/recomputation trade-off known as intermediate values of the primal execution.

checkpoint, configuration, siam unauthorized reproduction, (14 more...)

arXiv.org Artificial Intelligence

2405.1559

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Tracing and Manipulating Intermediate Values in Neural Math Problem Solvers

Matsumoto, Yuta, Heinzerling, Benjamin, Yoshikawa, Masashi, Inui, Kentaro

arXiv.org Artificial IntelligenceJan-17-2023

How language models process complex input that requires multiple steps of inference is not well understood. Previous research has shown that information about intermediate values of these inputs can be extracted from the activations of the models, but it is unclear where that information is encoded and whether that information is indeed used during inference. We introduce a method for analyzing how a Transformer model processes these inputs by focusing on simple arithmetic problems and their intermediate values. To trace where information about intermediate values is encoded, we measure the correlation between intermediate values and the activations of the model using principal component analysis (PCA). Then, we perform a causal intervention by manipulating model weights. This intervention shows that the weights identified via tracing are not merely correlated with intermediate values, but causally related to model predictions. Our findings show that the model has a locality to certain intermediate values, and this is useful for enhancing the interpretability of the models.

artificial intelligence, intermediate value, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2301.06758

Country:

Asia > Japan > Honshū > Tōhoku (0.06)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

Combined Pruning for Nested Cross-Validation to Accelerate Automated Hyperparameter Optimization for Embedded Feature Selection in High-Dimensional Data with Very Small Sample Sizes

May, Sigrun, Hartmann, Sven, Klawonn, Frank

arXiv.org Artificial IntelligenceSep-12-2022

Background: Embedded feature selection in high-dimensional data with very small sample sizes requires optimized hyperparameters for the model building process. For this hyperparameter optimization, nested cross-validation must be applied to avoid a biased performance estimation. The resulting repeated training with high-dimensional data leads to very long computation times. Moreover, it is likely to observe a high variance in the individual performance evaluation metrics caused by outliers in tiny validation sets. Therefore, early stopping applying standard pruning algorithms to save time risks discarding promising hyperparameter sets. Result: To speed up feature selection for high-dimensional data with tiny sample size, we adapt the use of a state-of-the-art asynchronous successive halving pruner. In addition, we combine it with two complementary pruning strategies based on domain or prior knowledge. One pruning strategy immediately stops computing trials with semantically meaningless results for the selected hyperparameter combinations. The other is a new extrapolating threshold pruning strategy suitable for nested-cross-validation with a high variance of performance evaluation metrics. In repeated experiments, our combined pruning strategy keeps all promising trials. At the same time, the calculation time is substantially reduced compared to using a state-of-the-art asynchronous successive halving pruner alone. Up to 81.3\% fewer models were trained achieving the same optimization result. Conclusion: The proposed combined pruning strategy accelerates data analysis or enables deeper searches for hyperparameters within the same computation time. This leads to significant savings in time, money and energy consumption, opening the door to advanced, time-consuming analyses.

evaluation metric, hyperparameter optimization, pruning strategy, (12 more...)

arXiv.org Artificial Intelligence

2202.00598

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
South America > Brazil > Pernambuco > Recife (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre:

Research Report > Strength Low (0.61)
Research Report > Experimental Study (0.61)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.70)
Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.88)

Add feedback

The Composability of Intermediate Values in Composable Inductive Programming

McDaid, Edward, McDaid, Sarah

arXiv.org Artificial IntelligenceJul-4-2021

It is believed that mechanisms including intermediate values enable composable inductive programming (CIP) to be used to produce software of any size. We present the results of a study that investigated the relationships between program size, the number of intermediate values and the number of test cases used to specify programs using CIP. In the study 96,000 programs of various sizes were randomly generated, decomposed into fragments and transformed into test cases. The test cases were then used to regenerate new versions of the original programs using Zoea. The results show linear relationships between the number of intermediate values and regenerated program size, and between the number of test cases and regenerated program size within the size range studied. In addition, as program size increases there is increasing scope for trading off the number of test cases against the number of intermediate values and vice versa.

composition, program size, test case, (13 more...)

arXiv.org Artificial Intelligence

2107.01621

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence (1.00)
Information Technology > Software Engineering (0.68)

Add feedback

BUSTLE: Bottom-up program-Synthesis Through Learning-guided Exploration

Odena, Augustus, Shi, Kensen, Bieber, David, Singh, Rishabh, Sutton, Charles

arXiv.org Machine LearningJul-28-2020

Program synthesis is challenging largely because of the difficulty of search in a large space of programs. Human programmers routinely tackle the task of writing complex programs by writing sub-programs and then analysing their intermediate results to compose them in appropriate ways. Motivated by this intuition, we present a new synthesis approach that leverages learning to guide a bottom-up search over programs. In particular, we train a model to prioritize compositions of intermediate values during search conditioned on a given set of input-output examples. This is a powerful combination because of several emergent properties: First, in bottom-up search, intermediate programs can be executed, providing semantic information to the neural network. Second, given the concrete values from those executions, we can exploit rich features based on recent work on property signatures. Finally, bottom-up search allows the system substantial flexibility in what order to generate the solution, allowing the synthesizer to build up a program from multiple smaller sub-programs. Overall, our empirical evaluation finds that the combination of learning and bottom-up search is remarkably effective, even with simple supervised learning approaches. We demonstrate the effectiveness of our technique on a new data set for synthesis of string transformation programs.

artificial intelligence, logic & formal reasoning, machine learning, (20 more...)

arXiv.org Machine Learning

2007.14381

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.72)

Add feedback