cgl
Curriculum-Guided Layer Scaling for Language Model Pretraining
Singh, Karanpartap, Band, Neil, Adeli, Ehsan
As the cost of pretraining large language models grows, there is continued interest in strategies to improve learning efficiency during this core training stage. Motivated by cognitive development, where humans gradually build knowledge as their brains mature, we propose Curriculum-Guided Layer Scaling (CGLS), a framework for compute-efficient pretraining that synchronizes increasing data difficulty with model growth through progressive layer stacking (i.e., gradually adding layers during training). At the 100M parameter scale, using a curriculum transitioning from synthetic short stories to general web data, CGLS outperforms baseline methods on the question-answering benchmarks PIQA and ARC. Our results show that progressively increasing model depth alongside sample difficulty leads to better generalization and zero-shot performance on various downstream benchmarks. Altogether, our findings demonstrate that CGLS unlocks the potential of progressive stacking, offering a simple yet effective strategy for improving generalization on knowledge-intensive and reasoning tasks. Large language models (LLMs) are typically pretrained in a single, continuous pass, processing all tokens with a uniform amount of computation regardless of their complexity or relevance to downstream tasks of interest. While this approach has shown remarkable success in large-scale models like GPT -4 (OpenAI et al., 2023) and Llama 3 (Dubey et al., 2024), it differs significantly from how humans learn, often leading to models that excel in generating coherent text but struggle with long-context reasoning across varied tasks (Schnabel et al., 2025). Recent works like Phi-3 (Abdin et al., 2024), MiniCPM (Hu et al., 2024), and others (Feng et al., 2024) have explored midtraining, adjusting the training data distribution partway through training by incorporating higher-quality, multilingual, or long-form text. However, this coarse-grained curriculum is applied on fixed model architectures. Inspired by how humans progressively build knowledge alongside their physically growing brains, we explore whether gradually scaling a model in tandem with increasingly complex data can enable more efficient and effective learning.
- North America > United States (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Middle East > Jordan (0.04)
- Education (1.00)
- Health & Medicine (0.68)
Exploring Training and Inference Scaling Laws in Generative Retrieval
Cai, Hongru, Li, Yongqi, Yuan, Ruifeng, Wang, Wenjie, Zhang, Zhen, Li, Wenjie, Chua, Tat-Seng
Generative retrieval reformulates retrieval as an autoregressive generation task, where large language models (LLMs) generate target documents directly from a query. As a novel paradigm, the mechanisms that underpin its performance and scalability remain largely unexplored. We systematically investigate training and inference scaling laws in generative retrieval, exploring how model size, training data scale, and inference-time compute jointly influence performance. We propose a novel evaluation metric inspired by contrastive entropy and generation loss, providing a continuous performance signal that enables robust comparisons across diverse generative retrieval methods. Our experiments show that n-gram-based methods align strongly with training and inference scaling laws. We find that increasing model size, training data scale, and inference-time compute all contribute to improved performance, highlighting the complementary roles of these factors in enhancing generative retrieval. Across these settings, LLaMA models consistently outperform T5 models, suggesting a particular advantage for larger decoder-only models in generative retrieval. Our findings underscore that model sizes, data availability, and inference computation interact to unlock the full potential of generative retrieval, offering new insights for designing and optimizing future systems.
- Europe > Italy (0.05)
- Asia > China > Hong Kong (0.05)
- Asia > Singapore > Central Region > Singapore (0.04)
- (5 more...)
H2CGL: Modeling Dynamics of Citation Network for Impact Prediction
He, Guoxiu, Xue, Zhikai, Jiang, Zhuoren, Kang, Yangyang, Zhao, Star, Lu, Wei
Assessing the potential impact of papers is of great significance to both academia and industry (Wang, Song and Barabási, 2013), especially given the exponential annual growth in the number of papers (Lo, Wang, Neumann, Kinney and Weld, 2020; Chu and Evans, 2021; Xue, He, Liu, Jiang, Zhao and Lu, 2023). As the numerical value of the scientific impact could be difficult to determine, citation count is frequently employed as a rough estimate (Evans and Reimer, 2009; Sinatra, Wang, Deville, Song and Barabási, 2016; Jiang, Koch and Sun, 2021). Actually, the dynamics in citation networks cannot be ignored. For example, the "sleeping beauties" (Van Raan, 2004) phenomenon indicates that the citations of a paper can vary considerably in different time periods. Besides the content quality, the future citations of a paper will be influenced by newly published papers (Funk and Owen-Smith, 2017; Park, Leahey and Funk, 2023). New papers may be successors to older ones, discovering the importance of previous works, thereby drawing more citations for them; or new papers may be competing with older ones, correcting or improving the previous works, thus making them lose potential citations. Therefore, it's imperative to capture dynamics of the citation network to accurately predict the future citations of a target paper. Previous studies within informetrics have primarily concentrated on content information or citation networks of papers.
Continual Graph Learning: A Survey
Yuan, Qiao, Guan, Sheng-Uei, Ni, Pin, Luo, Tianlun, Man, Ka Lok, Wong, Prudence, Chang, Victor
Research on continual learning (CL) mainly focuses on data represented in the Euclidean space, while research on graph-structured data is scarce. Furthermore, most graph learning models are tailored for static graphs. However, graphs usually evolve continually in the real world. Catastrophic forgetting also emerges in graph learning models when being trained incrementally. This leads to the need to develop robust, effective and efficient continual graph learning approaches. Continual graph learning (CGL) is an emerging area aiming to realize continual learning on graph-structured data. This survey is written to shed light on this emerging area. It introduces the basic concepts of CGL and highlights two unique challenges brought by graphs. Then it reviews and categorizes recent state-of-the-art approaches, analyzing their strategies to tackle the unique challenges in CGL. Besides, it discusses the main concerns in each family of CGL methods, offering potential solutions. Finally, it explores the open issues and potential applications of CGL.
- North America > United States > New York > New York County > New York City (0.05)
- Europe > United Kingdom > England > Merseyside > Liverpool (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- (3 more...)
- Overview (1.00)
- Research Report > Promising Solution (0.54)
- Health & Medicine (1.00)
- Education (1.00)
- Transportation (0.92)
- Information Technology > Security & Privacy (0.46)
Collaborative Graph Learning with Auxiliary Text for Temporal Event Prediction in Healthcare
Lu, Chang, Reddy, Chandan K., Chakraborty, Prithwish, Kleinberg, Samantha, Ning, Yue
Accurate and explainable health event predictions are becoming crucial for healthcare providers to develop care plans for patients. The availability of electronic health records (EHR) has enabled machine learning advances in providing these predictions. However, many deep learning based methods are not satisfactory in solving several key challenges: 1) effectively utilizing disease domain knowledge; 2) collaboratively learning representations of patients and diseases; and 3) incorporating unstructured text. To address these issues, we propose a collaborative graph learning model to explore patient-disease interactions and medical domain knowledge. Our solution is able to capture structural features of both patients and diseases. The proposed model also utilizes unstructured text data by employing an attention regulation strategy and then integrates attentive text features into a sequential learning process. We conduct extensive experiments on two important healthcare problems to show the competitive prediction performance of the proposed method compared with various state-of-the-art models. We also confirm the effectiveness of learned representations and model interpretability by a set of ablation and case studies.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Virginia (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
Efficiently Guiding Imitation Learning Algorithms with Human Gaze
Saran, Akanksha, Zhang, Ruohan, Short, Elaine Schaertl, Niekum, Scott
Human gaze is known to be an intention-revealing signal in human demonstrations of tasks. In this work, we use gaze cues from human demonstrators to enhance the performance of state-of-the-art inverse reinforcement learning (IRL) and behavior cloning (BC) algorithms. We propose a novel approach for utilizing gaze data in a computationally efficient manner --- encoding the human's attention as part of an auxiliary loss function, without adding any additional learnable parameters to those models and without requiring gaze data at test time. The auxiliary loss encourages a network to have convolutional activations in regions where the human's gaze fixated. We show how to augment any existing convolutional architecture with our auxiliary gaze loss (coverage-based gaze loss or CGL) that can guide learning toward a better reward function or policy. We show that our proposed approach consistently improves performance of both BC and IRL methods on a variety of Atari games. We also compare against two baseline methods for utilizing gaze data with imitation learning methods. Our approach outperforms a baseline method, called gaze-modulated dropout (GMD), and is comparable to another method (AGIL) which uses gaze as input to the network and thus increases the amount of learnable parameters.
Graph Learning from Data under Structural and Laplacian Constraints
Egilmez, Hilmi E., Pavez, Eduardo, Ortega, Antonio
RAPHS are generic mathematical structures consisting of sets of vertices and edges, which are used for modeling pairwise relations (edges) between a number of objects (vertices). In practice, this representation is often extended to weighted graphs, for which a set of scalar values (weights) are assigned to edges and potentially to vertices. Thus, weighted graphs offer general and flexible representations for modeling affinity relations between the objects of interest. Many practical problems can be represented using weighted graphs. For example, a broad class of combinatorial problems such as weighted matching, shortest-path and network-flow [2] are defined using weighted graphs. In signal/data-oriented problems, weighted graphs provide concise (sparse) representations for robust modeling of signals/data [3]. Such graphbased models are also useful for analyzing and visualizing the relations between their samples/features. Moreover, weighted graphs naturally emerge in networked data applications, such as learning, signal processing and analysis on computer, social, sensor, energy, transportation and biological networks [4], where the signals/data are inherently related to a graph associated with the underlying network.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (5 more...)
- Research Report > New Finding (0.67)
- Research Report > Promising Solution (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)
Learning Concept Graphs from Online Educational Data
Liu, Hanxiao, Ma, Wanli, Yang, Yiming, Carbonell, Jaime
This paper addresses an open challenge in educational data mining, i.e., the problem of automatically mapping online courses from different providers (universities, MOOCs, etc.) onto a universal space of concepts, and predicting latent prerequisite dependencies (directed links) among both concepts and courses. We propose a novel approach for inference within and across course-level and concept-level directed graphs. In the training phase, our system projects partially observed course-level prerequisite links onto directed concept-level links; in the testing phase, the induced concept-level links are used to infer the unknown course-level prerequisite links. Whereas courses may be specific to one institution, concepts are shared across different providers. The bi-directional mappings enable our system to perform interlingua-style transfer learning, e.g. treating the concept graph as the interlingua and transferring the prerequisite relations across universities via the interlingua. Experiments on our newly collected datasets of courses from MIT, Caltech, Princeton and CMU show promising results.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
- Research Report (1.00)
- Instructional Material > Course Syllabus & Notes (1.00)
- Education > Educational Technology > Educational Software > Computer Based Training (1.00)
- Education > Educational Setting > Online (1.00)
The Cluster Graphical Lasso for improved estimation of Gaussian graphical models
Tan, Kean Ming, Witten, Daniela, Shojaie, Ali
We consider the task of estimating a Gaussian graphical model in the high-dimensional setting. The graphical lasso, which involves maximizing the Gaussian log likelihood subject to an l1 penalty, is a well-studied approach for this task. We begin by introducing a surprising connection between the graphical lasso and hierarchical clustering: the graphical lasso in effect performs a two-step procedure, in which (1) single linkage hierarchical clustering is performed on the variables in order to identify connected components, and then (2) an l1-penalized log likelihood is maximized on the subset of variables within each connected component. In other words, the graphical lasso determines the connected components of the estimated network via single linkage clustering. Unfortunately, single linkage clustering is known to perform poorly in certain settings. Therefore, we propose the cluster graphical lasso, which involves clustering the features using an alternative to single linkage clustering, and then performing the graphical lasso on the subset of variables within each cluster. We establish model selection consistency for this technique, and demonstrate its improved performance relative to the graphical lasso in a simulation study, as well as in applications to an equities data set, a university webpage data set, and a gene expression data set.
- North America > United States > New York (0.04)
- North America > United States > California (0.04)
- North America > United States > Wisconsin (0.04)
- North America > United States > Texas (0.04)
- Banking & Finance > Trading (0.68)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.66)
Community-Guided Learning: Exploiting Mobile Sensor Users to Model Human Behavior
Peebles, Daniel (Dartmouth College) | Lu, Hong (Dartmouth College) | Lane, Nicholas D. (Dartmouth College) | Choudhury, Tanzeem (Dartmouth College) | Campbell, Andrew T. (Dartmouth College)
Modeling human behavior requires vast quantities of accurately labeled training data, but for ubiquitous people-aware applications such data is rarely attainable. Even researchers make mistakes when labeling data, and consistent, reliable labels from low-commitment users are rare. In particular, users may give identical labels to activities with characteristically different signatures (e.g., labeling eating at home or at a restaurant as "dinner") or may give different labels to the same context (e.g., "work" vs. "office"). In this scenario, labels are unreliable but nonetheless contain valuable information for classification. To facilitate learning in such unconstrained labeling scenarios, we propose Community-Guided Learning (CGL), a framework that allows existing classifiers to learn robustly from unreliably-labeled user-submitted data. CGL exploits the underlying structure in the data and the unconstrained labels to intelligently group crowd-sourced data. We demonstrate how to use similarity measures to determine when and how to split and merge contributions from different labeled categories and present experimental results that demonstrate the effectiveness of our framework.
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > New Hampshire > Grafton County > Hanover (0.04)
- (3 more...)
- Information Technology (0.46)
- Consumer Products & Services > Restaurants (0.34)