AITopics | winogrande

Collaborating Authors

winogrande

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Stacking Y our Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

Neural Information Processing SystemsFeb-8-2026, 06:37:26 GMT

LLMs are computationally expensive to pre-train due to their large scale.

acc, large language model, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Characterizing and Understanding Energy Footprint and Efficiency of Small Language Model on Edges

Islam, Md Romyull, Deng, Bobin, Dhar, Nobel, Nguyen, Tu N., He, Selena, Shi, Yong, Suo, Kun

arXiv.org Artificial IntelligenceNov-18-2025

Cloud-based large language models (LLMs) and their variants have significantly influenced real-world applications. Deploying smaller models (i.e., small language models (SLMs)) on edge devices offers additional advantages, such as reduced latency and independence from network connectivity. However, edge devices' limited computing resources and constrained energy budgets challenge efficient deployment. This study evaluates the power efficiency of five representative SLMs - Llama 3.2, Phi-3 Mini, TinyLlama, and Gemma 2 on Raspberry Pi 5, Jetson Nano, and Jetson Orin Nano (CPU and GPU configurations). Results show that Jetson Orin Nano with GPU acceleration achieves the highest energy-to-performance ratio, significantly outperforming CPU-based setups. Llama 3.2 provides the best balance of accuracy and power efficiency, while TinyLlama is well-suited for low-power environments at the cost of reduced accuracy. In contrast, Phi-3 Mini consumes the most energy despite its high accuracy. In addition, GPU acceleration, memory bandwidth, and model architecture are key in optimizing inference energy efficiency. Our empirical analysis offers practical insights for AI, smart systems, and mobile ad-hoc platforms to leverage tradeoffs from accuracy, inference latency, and power efficiency in energy-constrained environments.

efficiency, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2511.11624

Genre: Research Report > New Finding (1.00)

Industry:

Energy (1.00)
Information Technology > Services (0.48)
Information Technology > Hardware (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

MARS-M: When Variance Reduction Meets Matrices

Liu, Yifeng, Yuan, Angela, Gu, Quanquan

arXiv.org Machine LearningOct-29-2025

Matrix-based preconditioned optimizers, such as Muon, have recently been shown to be more efficient than scalar-based optimizers for training large-scale neural networks, including large language models (LLMs). On the other hand, recent benchmarks on optimizers for LLM pre-training have demonstrated that variance-reduction techniques such as MARS can achieve substantial speedups over standard optimizers that do not employ variance reduction. In this paper, to achieve the best of both worlds, we introduce MARS-M, a new optimizer that integrates the variance reduction technique in MARS with Muon. Under standard regularity conditions, we prove that Muon-M converges to a first-order stationary point at a rate of $\tilde{\mathcal{O}}(T^{-1/3})$, which improves upon $\tilde{\mathcal{O}}(T^{-1/4})$ rate attained by Muon. Our empirical results on language modeling and computer vision tasks demonstrate that MARS-M consistently yields lower losses and improved performance across various downstream benchmarks. The implementation of MARS-M is available at https://github.com/AGI-Arena/MARS/tree/main/MARS_M.

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2510.218

Country:

Europe (0.67)
North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

143ea4a156ef64f32d4d905206cf32e1-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 19:02:40 GMT

acc, flop, token, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (0.92)
Research Report > Experimental Study (0.92)

Industry: Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

From Parameters to Performance: A Data-Driven Study on LLM Structure and Development

Wang, Suqing, Li, Zuchao, Shi, Luohe, Du, Bo, Zhao, Hai, Li, Yun, Wang, Qianren

arXiv.org Artificial IntelligenceSep-24-2025

Large language models (LLMs) have achieved remarkable success across various domains, driving significant technological advancements and innovations. Despite the rapid growth in model scale and capability, systematic, data-driven research on how structural configurations affect performance remains scarce. To address this gap, we present a large-scale dataset encompassing diverse open-source LLM structures and their performance across multiple benchmarks. Leveraging this dataset, we conduct a systematic, data mining-driven analysis to validate and quantify the relationship between structural configurations and performance. Our study begins with a review of the historical development of LLMs and an exploration of potential future trends. We then analyze how various structural choices impact performance across benchmarks and further corroborate our findings using mechanistic interpretability techniques. By providing data-driven insights into LLM optimization, our work aims to guide the targeted development and application of future models. We will release our dataset at https://huggingface.co/datasets/DX0369/LLM-Structure-Performance-Dataset

arxiv preprint arxiv, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2509.18136

Country: Asia > China (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Add feedback

Instance-level Randomization: Toward More Stable LLM Evaluations

Li, Yiyang, Wu, Yonghuang, Luo, Ying, Sun, Liangtai, Qin, Zishu, Qiu, Lin, Cao, Xuezhi, Cai, Xunliang

arXiv.org Artificial IntelligenceSep-17-2025

Evaluations of large language models (LLMs) suffer from instability, where small changes of random factors such as few-shot examples can lead to drastic fluctuations of scores and even model rankings. Moreover, different LLMs can have different preferences for a certain setting of random factors. As a result, using a fixed setting of random factors, which is often adopted as the paradigm of current evaluations, can lead to potential unfair comparisons between LLMs. To mitigate the volatility of evaluations, we first theoretically analyze the sources of variance induced by changes in random factors. Targeting these specific sources, we then propose the instance-level randomization (ILR) method to reduce variance and enhance fairness in model comparisons. Instead of using a fixed setting across the whole benchmark in a single experiment, we randomize all factors that affect evaluation scores for every single instance, run multiple experiments and report the averaged score. Theoretical analyses and empirical results demonstrate that ILR can reduce the variance and unfair comparisons caused by random factors, as well as achieve similar robustness level with less than half computational cost compared with previous methods.

large language model, natural language, random factor, (17 more...)

arXiv.org Artificial Intelligence

2509.12678

Country:

North America > United States (0.68)
Europe (0.68)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry: Leisure & Entertainment > Sports (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

ICL CIPHERS: Quantifying "Learning" in In-Context Learning via Substitution Ciphers

Fang, Zhouxiang, Mishra, Aayush, Gao, Muhan, Liu, Anqi, Khashabi, Daniel

arXiv.org Artificial IntelligenceAug-28-2025

Recent works have suggested that In-Context Learning (ICL) operates in dual modes, i.e. task retrieval (remember learned patterns from pre-training) and task learning (inference-time ''learning'' from demonstrations). However, disentangling these the two modes remains a challenging goal. We introduce ICL CIPHERS, a class of task reformulations based on substitution ciphers borrowed from classic cryptography. In this approach, a subset of tokens in the in-context inputs are substituted with other (irrelevant) tokens, rendering English sentences less comprehensible to human eye. However, by design, there is a latent, fixed pattern to this substitution, making it reversible. This bijective (reversible) cipher ensures that the task remains a well-defined task in some abstract sense, despite the transformations. It is a curious question if LLMs can solve tasks reformulated by ICL CIPHERS with a BIJECTIVE mapping, which requires ''deciphering'' the latent cipher. We show that LLMs are better at solving tasks reformulated by ICL CIPHERS with BIJECTIVE mappings than the NON-BIJECTIVE (irreversible) baseline, providing a novel approach to quantify ''learning'' in ICL. While this gap is small, it is consistent across the board on four datasets and six models. Finally, we examine LLMs' internal representations and identify evidence in their ability to decode the ciphered inputs.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2504.19395

Country:

North America (0.46)
Europe (0.46)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Toward Better Generalisation in Uncertainty Estimators: Leveraging Data-Agnostic Features

Ha, Thuy An, Vo, Bao Quoc

arXiv.org Artificial IntelligenceJul-8-2025

Large Language Models (LLMs) often generate responses that are factually incorrect yet expressed with high confidence, which can pose serious risks for end users. To address this, it is essential for LLMs not only to produce answers but also to provide accurate estimates of their correctness. Uncertainty quantification methods have been introduced to assess the quality of LLM outputs, with factual accuracy being a key aspect of that quality. Among these methods, those that leverage hidden states to train probes have shown particular promise, as these internal representations encode information relevant to the factuality of responses, making this approach the focus of this paper. However, the probe trained on the hidden states of one dataset often struggles to generalise to another dataset of a different task or domain. To address this limitation, we explore combining data-agnostic features with hidden-state features and assess whether this hybrid feature set enhances out-of-domain performance. We further examine whether selecting only the most informative hidden-state features, thereby discarding task-specific noise, enables the data-agnostic features to contribute more effectively. The experiment results indicate that although introducing data-agnostic features generally enhances generalisation performance in most cases, in certain scenarios their inclusion degrades performance. A similar pattern emerges when retaining only the most important hidden-state features - adding data-agnostic features does not consistently further enhance performance compared to using the full set of hidden-state features. A closer analysis reveals that, in some specific cases, the trained probe underweights the data-agnostic features relative to the hidden-state features, which we believe is the main reason why the results are inconclusive.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2507.03998

Country:

Asia > Singapore (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.49)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Bridging the Gap: Enhancing LLM Performance for Low-Resource African Languages with New Benchmarks, Fine-Tuning, and Cultural Adjustments

Alhanai, Tuka, Kasumovic, Adam, Ghassemi, Mohammad, Zitzelberger, Aven, Lundin, Jessica, Chabot-Couture, Guillaume

arXiv.org Artificial IntelligenceDec-16-2024

Large Language Models (LLMs) have shown remarkable performance across various tasks, yet significant disparities remain for non-English languages, and especially native African languages. This paper addresses these disparities by creating approximately 1 million human-translated words of new benchmark data in 8 low-resource African languages, covering a population of over 160 million speakers of: Amharic, Bambara, Igbo, Sepedi (Northern Sotho), Shona, Sesotho (Southern Sotho), Setswana, and Tsonga. Our benchmarks are translations of Winogrande and three sections of MMLU: college medicine, clinical knowledge, and virology. Using the translated benchmarks, we report previously unknown performance gaps between state-of-the-art (SOTA) LLMs in English and African languages. Finally, using results from over 400 fine-tuned models, we explore several methods to reduce the LLM performance gap, including high-quality dataset fine-tuning (using an LLM-as-an-Annotator), cross-lingual transfer, and cultural appropriateness adjustments. Key findings include average mono-lingual improvements of 5.6% with fine-tuning (with 5.4% average mono-lingual improvements when using high-quality data over low-quality data), 2.9% average gains from cross-lingual transfer, and a 3.0% out-of-the-box performance boost on culturally appropriate questions. The publicly available benchmarks, translations, and code from this study support further research and development aimed at creating more inclusive and effective language technologies.

large language model, machine learning, translation, (19 more...)

arXiv.org Artificial Intelligence

2412.12417

Country:

North America > United States (0.04)
Africa > Niger (0.04)
Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.93)
Health & Medicine (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches

Khan, Rana Muhammad Shahroz, Li, Pingzhi, Yun, Sukwon, Wang, Zhenyu, Nirjon, Shahriar, Wong, Chau-Wai, Chen, Tianlong

arXiv.org Artificial IntelligenceOct-24-2024

As large language models (LLMs) increasingly shape the AI landscape, fine-tuning pretrained models has become more popular than in the pre-LLM era for achieving optimal performance in domain-specific tasks. However, pretrained LLMs such as ChatGPT are periodically evolved, i.e., model parameters are frequently updated), making it challenging for downstream users with limited resources to keep up with fine-tuning the newest LLMs for their domain application. Even though fine-tuning costs have nowadays been reduced thanks to the innovations of parameter-efficient fine-tuning such as LoRA, not all downstream users have adequate computing for frequent personalization. Moreover, access to fine-tuning datasets, particularly in sensitive domains such as healthcare, could be time-restrictive, making it crucial to retain the knowledge encoded in earlier fine-tuned rounds for future adaptation. In this paper, we present PortLLM, a training-free framework that (i) creates an initial lightweight model update patch to capture domain-specific knowledge, and (ii) allows a subsequent seamless plugging for the continual personalization of evolved LLM at minimal cost. Our extensive experiments cover seven representative datasets, from easier question-answering tasks {BoolQ, SST2} to harder reasoning tasks {WinoGrande, GSM8K}, and models including {Mistral-7B, Llama2, Llama3.1, and Gemma2}, validating the portability of our designed model patches and showcasing the effectiveness of our proposed framework. For instance, PortLLM achieves comparable performance to LoRA fine-tuning with reductions of up to 12.2x in GPU memory usage. Finally, we provide theoretical justifications to understand the portability of our model update patches, which offers new insights into the theoretical dimension of LLMs' personalization.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.1087

Country:

North America > United States > North Carolina (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback