Overview
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines
Sengupta, Ayan, Goel, Yash, Chakraborty, Tanmoy
Neural scaling laws have revolutionized the design and optimization of large-scale AI models by revealing predictable relationships between model size, dataset volume, and computational resources. Early research established power-law relationships in model performance, leading to compute-optimal scaling strategies. However, recent studies highlighted their limitations across architectures, modalities, and deployment contexts. Sparse models, mixture-of-experts, retrieval-augmented learning, and multimodal models often deviate from traditional scaling patterns. Moreover, scaling behaviors vary across domains such as vision, reinforcement learning, and fine-tuning, underscoring the need for more nuanced approaches. In this survey, we synthesize insights from over 50 studies, examining the theoretical foundations, empirical findings, and practical implications of scaling laws. We also explore key challenges, including data efficiency, inference scaling, and architecture-specific constraints, advocating for adaptive scaling strategies tailored to real-world applications. We suggest that while scaling laws provide a useful guide, they do not always generalize across all architectures and training strategies.
Culture is Not Trivia: Sociocultural Theory for Cultural NLP
Zhou, Naitian, Bamman, David, Bleaman, Isaac L.
The field of cultural NLP has recently experienced rapid growth, driven by a pressing need to ensure that language technologies are effective and safe across a pluralistic user base. This work has largely progressed without a shared conception of culture, instead choosing to rely on a wide array of cultural proxies. However, this leads to a number of recurring limitations: coarse national boundaries fail to capture nuanced differences that lay within them, limited coverage restricts datasets to only a subset of usually highly-represented cultures, and a lack of dynamicity results in static cultural benchmarks that do not change as culture evolves. In this position paper, we argue that these methodological limitations are symptomatic of a theoretical gap. We draw on a well-developed theory of culture from sociocultural linguistics to fill this gap by 1) demonstrating in a case study how it can clarify methodological constraints and affordances, 2) offering theoretically-motivated paths forward to achieving cultural competence, and 3) arguing that localization is a more useful framing for the goals of much current work in cultural NLP.
Evaluating Step-by-step Reasoning Traces: A Survey
Step-by-step reasoning is widely used to enhance the reasoning ability of large language models (LLMs) in complex problems. Evaluating the quality of reasoning traces is crucial for understanding and improving LLM reasoning. However, the evaluation criteria remain highly unstandardized, leading to fragmented efforts in developing metrics and meta-evaluation benchmarks. To address this gap, this survey provides a comprehensive overview of step-by-step reasoning evaluation, proposing a taxonomy of evaluation criteria with four top-level categories (groundedness, validity, coherence, and utility). We then categorize metrics based on their implementations, survey which metrics are used for assessing each criterion, and explore whether evaluator models can transfer across different criteria. Finally, we identify key directions for future research.
Positional Encoding in Transformer-Based Time Series Models: A Survey
Irani, Habib, Metsis, Vangelis
As a result, machine learning-based approaches, particularly Recent advancements in transformer-based models Recurrent Neural Networks (RNNs) and Convolutional have greatly improved time series analysis, providing Neural Networks (CNNs), have gained popularity robust solutions for tasks such as forecasting, due to their ability to model complex temporal anomaly detection, and classification. A crucial element dynamics [4, 24]. of these models is positional encoding, which RNNs, including their more advanced variants like allows transformers to capture the intrinsic sequential Long Short-Term Memory (LSTM) and Gated Recurrent nature of time series data. This survey systematically Units (GRU), excel at modeling sequential data examines existing techniques for positional encoding by maintaining a hidden state that captures information in transformer-based time series models. We investigate from previous time steps. These architectures a variety of methods, including fixed, learnable, offer several advantages for time series analysis: they relative, and hybrid approaches, and evaluate naturally handle irregular time intervals and missing their effectiveness in different time series classification data points through their sequential processing, tasks. Furthermore, we outline key challenges excel at capturing local temporal patterns through and suggest potential research directions to enhance their recurrent connections, and exhibit a beneficial positional encoding strategies. By delivering a comprehensive "recency bias" where recent time steps are weighted overview and quantitative benchmarking, more heavily than distant ones--a characteristic particularly this survey intends to assist researchers and practitioners valuable in applications like financial forecasting in selecting and designing effective positional and weather prediction. However, RNNs suffer encoding methods for transformer-based time series from inherent limitations such as vanishing and models. The source code for the methods and experiments exploding gradients, making it difficult to learn dependencies discussed in this survey is available on over long time horizons [22].
Soft Robotics for Search and Rescue: Advancements, Challenges, and Future Directions
Soft robotics has emerged as a transformative technology in Search and Rescue (SAR) operations, addressing challenges in navigating complex, hazardous environments that often limit traditional rigid robots. This paper critically examines advancements in soft robotic technologies tailored for SAR applications, focusing on their unique capabilities in adaptability, safety, and efficiency. By leveraging bio-inspired designs, flexible materials, and advanced locomotion mechanisms, such as crawling, rolling, and shape morphing, soft robots demonstrate exceptional potential in disaster scenarios. However, significant barriers persist, including material durability, power inefficiency, sensor integration, and control complexity. This comprehensive review highlights the current state of soft robotics in SAR, discusses simulation methodologies and hardware validations, and introduces performance metrics essential for their evaluation. By bridging the gap between theoretical advancements and practical deployment, this study underscores the potential of soft robotic systems to revolutionize SAR missions and advocates for continued interdisciplinary innovation to overcome existing limitations.
Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges
Ma, Bolei, Li, Yuting, Zhou, Wei, Gong, Ziwei, Liu, Yang Janet, Jasinskaja, Katja, Friedrich, Annemarie, Hirschberg, Julia, Kreuter, Frauke, Plank, Barbara
Understanding pragmatics-the use of language in context-is crucial for developing NLP systems capable of interpreting nuanced language use. Despite recent advances in language technologies, including large language models, evaluating their ability to handle pragmatic phenomena such as implicatures and references remains challenging. To advance pragmatic abilities in models, it is essential to understand current evaluation trends and identify existing limitations. In this survey, we provide a comprehensive review of resources designed for evaluating pragmatic capabilities in NLP, categorizing datasets by the pragmatics phenomena they address. We analyze task designs, data collection methods, evaluation approaches, and their relevance to real-world applications. By examining these resources in the context of modern language models, we highlight emerging trends, challenges, and gaps in existing benchmarks. Our survey aims to clarify the landscape of pragmatic evaluation and guide the development of more comprehensive and targeted benchmarks, ultimately contributing to more nuanced and context-aware NLP models.
Achieving Upper Bound Accuracy of Joint Training in Continual Learning
Continual learning has been an active research area in machine learning, focusing on incrementally learning a sequence of tasks. A key challenge is catastrophic forgetting (CF), and most research efforts have been directed toward mitigating this issue. However, a significant gap remains between the accuracy achieved by state-of-the-art continual learning algorithms and the ideal or upper-bound accuracy achieved by training all tasks together jointly. This gap has hindered or even prevented the adoption of continual learning in applications, as accuracy is often of paramount importance. Recently, another challenge, termed inter-task class separation (ICS), was also identified, which spurred a theoretical study into principled approaches for solving continual learning. Further research has shown that by leveraging the theory and the power of large foundation models, it is now possible to achieve upper-bound accuracy, which has been empirically validated using both text and image classification datasets. Continual learning is now ready for real-life applications. This paper surveys the main research leading to this achievement, justifies the approach both intuitively and from neuroscience research, and discusses insights gained.
Incomplete Graph Learning: A Comprehensive Survey
Xia, Riting, Liu, Huibo, Li, Anchen, Liu, Xueyan, Zhang, Yan, Zhang, Chunxu, Yang, Bo
Graph learning is a prevalent field that operates on ubiquitous graph data. Effective graph learning methods can extract valuable information from graphs. However, these methods are non-robust and affected by missing attributes in graphs, resulting in sub-optimal outcomes. This has led to the emergence of incomplete graph learning, which aims to process and learn from incomplete graphs to achieve more accurate and representative results. In this paper, we conducted a comprehensive review of the literature on incomplete graph learning. Initially, we categorize incomplete graphs and provide precise definitions of relevant concepts, terminologies, and techniques, thereby establishing a solid understanding for readers. Subsequently, we classify incomplete graph learning methods according to the types of incompleteness: (1) attribute-incomplete graph learning methods, (2) attribute-missing graph learning methods, and (3) hybrid-absent graph learning methods. By systematically classifying and summarizing incomplete graph learning methods, we highlight the commonalities and differences among existing approaches, aiding readers in selecting methods and laying the groundwork for further advancements. In addition, we summarize the datasets, incomplete processing modes, evaluation metrics, and application domains used by the current methods. Lastly, we discuss the current challenges and propose future directions for incomplete graph learning, with the aim of stimulating further innovations in this crucial field. To our knowledge, this is the first review dedicated to incomplete graph learning, aiming to offer valuable insights for researchers in related fields.We developed an online resource to follow relevant research based on this review, available at https://github.com/cherry-a11y/Incomplete-graph-learning.git
A Survey on Large Language Models for Automated Planning
Aghzal, Mohamed, Plaku, Erion, Stein, Gregory J., Yao, Ziyu
The planning ability of Large Language Models (LLMs) has garnered increasing attention in recent years due to their remarkable capacity for multi-step reasoning and their ability to generalize across a wide range of domains. While some researchers emphasize the potential of LLMs to perform complex planning tasks, others highlight significant limitations in their performance, particularly when these models are tasked with handling the intricacies of long-horizon reasoning. In this survey, we critically investigate existing research on the use of LLMs in automated planning, examining both their successes and shortcomings in detail. We illustrate that although LLMs are not well-suited to serve as standalone planners because of these limitations, they nonetheless present an enormous opportunity to enhance planning applications when combined with other approaches. Thus, we advocate for a balanced methodology that leverages the inherent flexibility and generalized knowledge of LLMs alongside the rigor and cost-effectiveness of traditional planning methods.
DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs
Lv, Minxuan, Su, Zhenpeng, Pan, Leiyu, Xiong, Yizhe, Lin, Zijia, Chen, Hui, Zhou, Wei, Han, Jungong, Ding, Guiguang, Luo, Cheng, Zhang, Di, Gai, Kun, Hu, Songlin
As large language models continue to scale, computational costs and resource consumption have emerged as significant challenges. While existing sparsification methods like pruning reduce computational overhead, they risk losing model knowledge through parameter removal. This paper proposes DSMoE (Dynamic Sparse Mixture-of-Experts), a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks. We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge based on input complexity. Additionally, we introduce a sparsity loss term to balance performance and computational efficiency. Extensive experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches across language modeling and downstream tasks, particularly excelling in generation tasks. Analysis reveals that DSMoE learns distinctive layerwise activation patterns, providing new insights for future MoE architecture design.