AITopics | data order

Collaborating Authors

data order

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Signaland Noise: AFramework for Reducing Uncertainty in Language Model Evaluation

Neural Information Processing SystemsJun-15-2026, 05:55:04 GMT

Developing large language models is expensive and involves making decisions with small experiments, typically by evaluating on large, multi-task evaluation suites. In this work, we analyze specific properties which make a benchmark more reliable for such decisions, and interventions to design higher-quality evaluation benchmarks. We introduce two key metrics that show differences in current benchmarks: signal, a benchmark's ability to separate better models from worse models, and noise, a benchmark's sensitivity to random variability between training steps. We demonstrate that benchmarks with a better signal-to-noiseratio are more reliable when making decisions at small scale, and those with less noisehave lower scaling law prediction error. These results suggest that improving signal or noise will lead to more useful benchmarks, so we introduce three interventions designed to directly affect signal or noise.

benchmark, large language model, natural language, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Consistent Amortized Clustering via Generative Flow Networks

Chelly, Irit, Uziel, Roy, Freifeld, Oren, Pakman, Ari

arXiv.org Artificial IntelligenceFeb-26-2025

Neural models for amortized probabilistic clustering yield samples of cluster labels given a set-structured input, while avoiding lengthy Markov chain runs and the need for explicit data likelihoods. Existing methods which label each data point sequentially, like the Neural Clustering Process, often lead to cluster assignments highly dependent on the data order. Alternatively, methods that sequentially create full clusters, do not provide assignment probabilities. In this paper, we introduce GFNCP, a novel framework for amortized clustering. GFNCP is formulated as a Generative Flow Network with a shared energy-based parametrization of policy and reward. We show that the flow matching conditions are equivalent to consistency of the clustering posterior under marginalization, which in turn implies order invariance. GFNCP also outperforms existing methods in clustering performance on both synthetic and real-world data.

dataset, gfncp, international conference, (14 more...)

arXiv.org Artificial Intelligence

2502.19337

Country:

Asia > Middle East > Jordan (0.04)
Asia > Thailand (0.04)
Asia > Middle East > Israel > Southern District > Beer-Sheva (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Mitigating Training Imbalance in LLM Fine-Tuning via Selective Parameter Merging

Ju, Yiming, Ni, Ziyi, Xing, Xingrun, Zeng, Zhixiong, Zhao, hanyu, Fan, Siqi, Zhang, Zheng

arXiv.org Artificial IntelligenceOct-1-2024

Supervised fine-tuning (SFT) is crucial for adapting Large Language Models (LLMs) to specific tasks. In this work, we demonstrate that the order of training data can lead to significant training imbalances, potentially resulting in performance degradation. Consequently, we propose to mitigate this imbalance by merging SFT models fine-tuned with different data orders, thereby enhancing the overall effectiveness of SFT. Additionally, we introduce a novel technique, "parameter-selection merging," which outperforms traditional weighted-average methods on five datasets. Further, through analysis and ablation studies, we validate the effectiveness of our method and identify the sources of performance improvements.

experiment, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2410.03743

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Just read twice: closing the recall gap for recurrent language models

Arora, Simran, Timalsina, Aman, Singhal, Aaryan, Spector, Benjamin, Eyuboglu, Sabri, Zhao, Xinyi, Rao, Ashish, Rudra, Atri, Ré, Christopher

arXiv.org Artificial IntelligenceJul-7-2024

Recurrent large language models that compete with Transformers in language modeling perplexity are emerging at a rapid rate (e.g., Mamba, RWKV). Excitingly, these architectures use a constant amount of memory during inference. However, due to the limited memory, recurrent LMs cannot recall and use all the information in long contexts leading to brittle in-context learning (ICL) quality. A key challenge for efficient LMs is selecting what information to store versus discard. In this work, we observe the order in which information is shown to the LM impacts the selection difficulty. To formalize this, we show that the hardness of information recall reduces to the hardness of a problem called set disjointness (SD), a quintessential problem in communication complexity that requires a streaming algorithm (e.g., recurrent model) to decide whether inputted sets are disjoint. We empirically and theoretically show that the recurrent memory required to solve SD changes with set order, i.e., whether the smaller set appears first in-context. Our analysis suggests, to mitigate the reliance on data order, we can put information in the right order in-context or process prompts non-causally. Towards that end, we propose: (1) JRT-Prompt, where context gets repeated multiple times in the prompt, effectively showing the model all data orders. This gives $11.0 \pm 1.3$ points of improvement, averaged across $16$ recurrent LMs and the $6$ ICL tasks, with $11.9\times$ higher throughput than FlashAttention-2 for generation prefill (length $32$k, batch size $16$, NVidia H100). We then propose (2) JRT-RNN, which uses non-causal prefix-linear-attention to process prompts and provides $99\%$ of Transformer quality at $360$M params., $30$B tokens and $96\%$ at $1.3$B params., $50$B tokens on average across the tasks, with $19.2\times$ higher throughput for prefill than FA2.

architecture, international conference, jrt-rnn, (16 more...)

arXiv.org Artificial Intelligence

2407.05483

Country:

North America > United States > New York (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > United States > Colorado (0.04)
(9 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Information Technology (1.00)
Health & Medicine (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On The Impact of Machine Learning Randomness on Group Fairness

Ganesh, Prakhar, Chang, Hongyan, Strobel, Martin, Shokri, Reza

arXiv.org Artificial IntelligenceJul-9-2023

Statistical measures for group fairness in machine learning reflect the gap in performance of algorithms across different groups. These measures, however, exhibit a high variance between different training instances, which makes them unreliable for empirical evaluation of fairness. What causes this high variance? We investigate the impact on group fairness of different sources of randomness in training neural networks. We show that the variance in group fairness measures is rooted in the high volatility of the learning process on under-represented groups. Further, we recognize the dominant source of randomness as the stochasticity of data order during training. Based on these findings, we show how one can control group-level accuracy (i.e., model fairness), with high efficiency and negligible impact on the model's overall performance, by simply changing the data order for a single epoch.

data order, epoch, variance, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3593013.3594116

2307.04138

Country:

North America > United States > Illinois > Cook County > Chicago (0.06)
Asia > Singapore > Central Region > Singapore (0.04)
North America > United States > California (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Linear Mode Connectivity and the Lottery Ticket Hypothesis

Frankle, Jonathan, Dziugaite, Gintare Karolina, Roy, Daniel M., Carbin, Michael

arXiv.org Machine LearningDec-11-2019

We introduce "instability analysis," a framework for assessing whether the outcome of optimizing a neural network is robust to SGD noise. It entails training two copies of a network on different random data orders. If error does not increase along the linear path between the trained parameters, we say the network is "stable." Instability analysis reveals new properties of neural networks. For example, standard vision models are initially unstable but become stable early in training; from then on, the outcome of optimization is determined up to linear interpolation. We leverage instability analysis to examine iterative magnitude pruning (IMP), the procedure underlying the lottery ticket hypothesis. On small vision tasks, IMP finds sparse "matching subnetworks" that can train in isolation from initialization to full accuracy, but it fails to do so in more challenging settings. We find that IMP subnetworks are matching only when they are stable. In cases where IMP subnetworks are unstable at initialization, they become stable and matching early in training. We augment IMP to rewind subnetworks to their weights early in training, producing sparse subnetworks of large-scale networks, including Resnet-50 for ImageNet, that train to full accuracy. This submission subsumes 1903.01611 ("Stabilizing the Lottery Ticket Hypothesis" and "The Lottery Ticket Hypothesis at Scale").

imp subnetwork, initialization, subnetwork, (14 more...)

arXiv.org Machine Learning

1912.05671

Country: North America > Canada > Ontario > Toronto (0.14)

Genre:

Contests & Prizes (0.77)
Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Gambling (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback