AITopics | cgl

Collaborating Authors

cgl

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Curriculum-Guided Layer Scaling for Language Model Pretraining

Singh, Karanpartap, Band, Neil, Adeli, Ehsan

arXiv.org Artificial IntelligenceSep-30-2025

As the cost of pretraining large language models grows, there is continued interest in strategies to improve learning efficiency during this core training stage. Motivated by cognitive development, where humans gradually build knowledge as their brains mature, we propose Curriculum-Guided Layer Scaling (CGLS), a framework for compute-efficient pretraining that synchronizes increasing data difficulty with model growth through progressive layer stacking (i.e., gradually adding layers during training). At the 100M parameter scale, using a curriculum transitioning from synthetic short stories to general web data, CGLS outperforms baseline methods on the question-answering benchmarks PIQA and ARC. Our results show that progressively increasing model depth alongside sample difficulty leads to better generalization and zero-shot performance on various downstream benchmarks. Altogether, our findings demonstrate that CGLS unlocks the potential of progressive stacking, offering a simple yet effective strategy for improving generalization on knowledge-intensive and reasoning tasks. Large language models (LLMs) are typically pretrained in a single, continuous pass, processing all tokens with a uniform amount of computation regardless of their complexity or relevance to downstream tasks of interest. While this approach has shown remarkable success in large-scale models like GPT -4 (OpenAI et al., 2023) and Llama 3 (Dubey et al., 2024), it differs significantly from how humans learn, often leading to models that excel in generating coherent text but struggle with long-context reasoning across varied tasks (Schnabel et al., 2025). Recent works like Phi-3 (Abdin et al., 2024), MiniCPM (Hu et al., 2024), and others (Feng et al., 2024) have explored midtraining, adjusting the training data distribution partway through training by incorporating higher-quality, multilingual, or long-form text. However, this coarse-grained curriculum is applied on fixed model architectures. Inspired by how humans progressively build knowledge alongside their physically growing brains, we explore whether gradually scaling a model in tandem with increasingly complex data can enable more efficient and effective learning.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.11389

Genre: Research Report > New Finding (1.00)

Industry:

Education (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring Training and Inference Scaling Laws in Generative Retrieval

Cai, Hongru, Li, Yongqi, Yuan, Ruifeng, Wang, Wenjie, Zhang, Zhen, Li, Wenjie, Chua, Tat-Seng

arXiv.org Artificial IntelligenceJun-10-2025

Generative retrieval reformulates retrieval as an autoregressive generation task, where large language models (LLMs) generate target documents directly from a query. As a novel paradigm, the mechanisms that underpin its performance and scalability remain largely unexplored. We systematically investigate training and inference scaling laws in generative retrieval, exploring how model size, training data scale, and inference-time compute jointly influence performance. We propose a novel evaluation metric inspired by contrastive entropy and generation loss, providing a continuous performance signal that enables robust comparisons across diverse generative retrieval methods. Our experiments show that n-gram-based methods align strongly with training and inference scaling laws. We find that increasing model size, training data scale, and inference-time compute all contribute to improved performance, highlighting the complementary roles of these factors in enhancing generative retrieval. Across these settings, LLaMA models consistently outperform T5 models, suggesting a particular advantage for larger decoder-only models in generative retrieval. Our findings underscore that model sizes, data availability, and inference computation interact to unlock the full potential of generative retrieval, offering new insights for designing and optimizing future systems.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.18941

Country:

Asia > China (0.29)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

H2CGL: Modeling Dynamics of Citation Network for Impact Prediction

He, Guoxiu, Xue, Zhikai, Jiang, Zhuoren, Kang, Yangyang, Zhao, Star, Lu, Wei

arXiv.org Artificial IntelligenceOct-15-2023

Assessing the potential impact of papers is of great significance to both academia and industry (Wang, Song and Barabási, 2013), especially given the exponential annual growth in the number of papers (Lo, Wang, Neumann, Kinney and Weld, 2020; Chu and Evans, 2021; Xue, He, Liu, Jiang, Zhao and Lu, 2023). As the numerical value of the scientific impact could be difficult to determine, citation count is frequently employed as a rough estimate (Evans and Reimer, 2009; Sinatra, Wang, Deville, Song and Barabási, 2016; Jiang, Koch and Sun, 2021). Actually, the dynamics in citation networks cannot be ignored. For example, the "sleeping beauties" (Van Raan, 2004) phenomenon indicates that the citations of a paper can vary considerably in different time periods. Besides the content quality, the future citations of a paper will be influenced by newly published papers (Funk and Owen-Smith, 2017; Park, Leahey and Funk, 2023). New papers may be successors to older ones, discovering the importance of previous works, thereby drawing more citations for them; or new papers may be competing with older ones, correcting or improving the previous works, thus making them lose potential citations. Therefore, it's imperative to capture dynamics of the citation network to accurately predict the future citations of a target paper. Previous studies within informetrics have primarily concentrated on content information or citation networks of papers.

citation network, graph, information, (16 more...)

arXiv.org Artificial Intelligence

2305.01572

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre:

Research Report > New Finding (0.93)
Overview (0.92)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

Add feedback

Continual Graph Learning: A Survey

Yuan, Qiao, Guan, Sheng-Uei, Ni, Pin, Luo, Tianlun, Man, Ka Lok, Wong, Prudence, Chang, Victor

arXiv.org Artificial IntelligenceJan-28-2023

Research on continual learning (CL) mainly focuses on data represented in the Euclidean space, while research on graph-structured data is scarce. Furthermore, most graph learning models are tailored for static graphs. However, graphs usually evolve continually in the real world. Catastrophic forgetting also emerges in graph learning models when being trained incrementally. This leads to the need to develop robust, effective and efficient continual graph learning approaches. Continual graph learning (CGL) is an emerging area aiming to realize continual learning on graph-structured data. This survey is written to shed light on this emerging area. It introduces the basic concepts of CGL and highlights two unique challenges brought by graphs. Then it reviews and categorizes recent state-of-the-art approaches, analyzing their strategies to tackle the unique challenges in CGL. Besides, it discusses the main concerns in each family of CGL methods, offering potential solutions. Finally, it explores the open issues and potential applications of CGL.

artificial intelligence, machine learning, spatial reasoning, (17 more...)

arXiv.org Artificial Intelligence

2301.1223

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > United Kingdom > England > Merseyside > Liverpool (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)
(3 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.54)

Industry:

Health & Medicine (1.00)
Education (1.00)
Transportation (0.92)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.67)

Add feedback

Collaborative Graph Learning with Auxiliary Text for Temporal Event Prediction in Healthcare

Lu, Chang, Reddy, Chandan K., Chakraborty, Prithwish, Kleinberg, Samantha, Ning, Yue

arXiv.org Artificial IntelligenceMay-16-2021

Accurate and explainable health event predictions are becoming crucial for healthcare providers to develop care plans for patients. The availability of electronic health records (EHR) has enabled machine learning advances in providing these predictions. However, many deep learning based methods are not satisfactory in solving several key challenges: 1) effectively utilizing disease domain knowledge; 2) collaboratively learning representations of patients and diseases; and 3) incorporating unstructured text. To address these issues, we propose a collaborative graph learning model to explore patient-disease interactions and medical domain knowledge. Our solution is able to capture structural features of both patients and diseases. The proposed model also utilizes unstructured text data by employing an attention regulation strategy and then integrates attentive text features into a sequential learning process. We conduct extensive experiments on two important healthcare problems to show the competitive prediction performance of the proposed method compared with various state-of-the-art models. We also confirm the effectiveness of learned representations and model interpretability by a set of ablation and case studies.

clinical note, diagnosis, graph, (16 more...)

arXiv.org Artificial Intelligence

2105.07542

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Virginia (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report (0.84)

Industry:

Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.74)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Efficiently Guiding Imitation Learning Algorithms with Human Gaze

Saran, Akanksha, Zhang, Ruohan, Short, Elaine Schaertl, Niekum, Scott

arXiv.org Artificial IntelligenceFeb-27-2020

Human gaze is known to be an intention-revealing signal in human demonstrations of tasks. In this work, we use gaze cues from human demonstrators to enhance the performance of state-of-the-art inverse reinforcement learning (IRL) and behavior cloning (BC) algorithms. We propose a novel approach for utilizing gaze data in a computationally efficient manner --- encoding the human's attention as part of an auxiliary loss function, without adding any additional learnable parameters to those models and without requiring gaze data at test time. The auxiliary loss encourages a network to have convolutional activations in regions where the human's gaze fixated. We show how to augment any existing convolutional architecture with our auxiliary gaze loss (coverage-based gaze loss or CGL) that can guide learning toward a better reward function or policy. We show that our proposed approach consistently improves performance of both BC and IRL methods on a variety of Atari games. We also compare against two baseline methods for utilizing gaze data with imitation learning methods. Our approach outperforms a baseline method, called gaze-modulated dropout (GMD), and is comparable to another method (AGIL) which uses gaze as input to the network and thus increases the amount of learnable parameters.

convolutional layer, demonstration, demonstrator, (14 more...)

arXiv.org Artificial Intelligence

2002.125

Country: North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report > Promising Solution (0.48)

Industry: Leisure & Entertainment > Games > Computer Games (0.71)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

Graph Learning from Data under Structural and Laplacian Constraints

Egilmez, Hilmi E., Pavez, Eduardo, Ortega, Antonio

arXiv.org Machine LearningJul-5-2017

RAPHS are generic mathematical structures consisting of sets of vertices and edges, which are used for modeling pairwise relations (edges) between a number of objects (vertices). In practice, this representation is often extended to weighted graphs, for which a set of scalar values (weights) are assigned to edges and potentially to vertices. Thus, weighted graphs offer general and flexible representations for modeling affinity relations between the objects of interest. Many practical problems can be represented using weighted graphs. For example, a broad class of combinatorial problems such as weighted matching, shortest-path and network-flow [2] are defined using weighted graphs. In signal/data-oriented problems, weighted graphs provide concise (sparse) representations for robust modeling of signals/data [3]. Such graphbased models are also useful for analyzing and visualizing the relations between their samples/features. Moreover, weighted graphs naturally emerge in networked data applications, such as learning, signal processing and analysis on computer, social, sensor, energy, transportation and biological networks [4], where the signals/data are inherently related to a graph associated with the underlying network.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Machine Learning

1611.05181

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(5 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)

Add feedback

Learning Concept Graphs from Online Educational Data

Liu, Hanxiao, Ma, Wanli, Yang, Yiming, Carbonell, Jaime

Journal of Artificial Intelligence ResearchApr-24-2016

This paper addresses an open challenge in educational data mining, i.e., the problem of automatically mapping online courses from different providers (universities, MOOCs, etc.) onto a universal space of concepts, and predicting latent prerequisite dependencies (directed links) among both concepts and courses. We propose a novel approach for inference within and across course-level and concept-level directed graphs. In the training phase, our system projects partially observed course-level prerequisite links onto directed concept-level links; in the testing phase, the induced concept-level links are used to infer the unknown course-level prerequisite links. Whereas courses may be specific to one institution, concepts are shared across different providers. The bi-directional mappings enable our system to perform interlingua-style transfer learning, e.g. treating the concept graph as the interlingua and transferring the prerequisite relations across universities via the interlingua. Experiments on our newly collected datasets of courses from MIT, Caltech, Princeton and CMU show promising results.

cgl, concept graph, graph, (13 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.5002

AI Access Foundation

11000

Journal of Artificial Intelligence Research

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(5 more...)

Add feedback

The Cluster Graphical Lasso for improved estimation of Gaussian graphical models

Tan, Kean Ming, Witten, Daniela, Shojaie, Ali

arXiv.org Machine LearningJul-19-2013

We consider the task of estimating a Gaussian graphical model in the high-dimensional setting. The graphical lasso, which involves maximizing the Gaussian log likelihood subject to an l1 penalty, is a well-studied approach for this task. We begin by introducing a surprising connection between the graphical lasso and hierarchical clustering: the graphical lasso in effect performs a two-step procedure, in which (1) single linkage hierarchical clustering is performed on the variables in order to identify connected components, and then (2) an l1-penalized log likelihood is maximized on the subset of variables within each connected component. In other words, the graphical lasso determines the connected components of the estimated network via single linkage clustering. Unfortunately, single linkage clustering is known to perform poorly in certain settings. Therefore, we propose the cluster graphical lasso, which involves clustering the features using an alternative to single linkage clustering, and then performing the graphical lasso on the subset of variables within each cluster. We establish model selection consistency for this technique, and demonstrate its improved performance relative to the graphical lasso in a simulation study, as well as in applications to an equities data set, a university webpage data set, and a gene expression data set.

artificial intelligence, graphical lasso, machine learning, (17 more...)

arXiv.org Machine Learning

1307.5339

Country:

North America > United States > New York (0.04)
North America > United States > California (0.04)
North America > United States > Wisconsin (0.04)
North America > United States > Texas (0.04)

Genre: Research Report (0.82)

Industry:

Banking & Finance > Trading (0.68)
Health & Medicine > Pharmaceuticals & Biotechnology (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Add feedback

Community-Guided Learning: Exploiting Mobile Sensor Users to Model Human Behavior

Peebles, Daniel (Dartmouth College) | Lu, Hong (Dartmouth College) | Lane, Nicholas D. (Dartmouth College) | Choudhury, Tanzeem (Dartmouth College) | Campbell, Andrew T. (Dartmouth College)

AAAI ConferencesJul-15-2010

Modeling human behavior requires vast quantities of accurately labeled training data, but for ubiquitous people-aware applications such data is rarely attainable. Even researchers make mistakes when labeling data, and consistent, reliable labels from low-commitment users are rare. In particular, users may give identical labels to activities with characteristically different signatures (e.g., labeling eating at home or at a restaurant as "dinner") or may give different labels to the same context (e.g., "work" vs. "office"). In this scenario, labels are unreliable but nonetheless contain valuable information for classification. To facilitate learning in such unconstrained labeling scenarios, we propose Community-Guided Learning (CGL), a framework that allows existing classifiers to learn robustly from unreliably-labeled user-submitted data. CGL exploits the underlying structure in the data and the unconstrained labels to intelligently group crowd-sourced data. We demonstrate how to use similarity measures to determine when and how to split and merge contributions from different labeled categories and present experimental results that demonstrate the effectiveness of our framework.

classifier, machine learning, simulation of human behavior, (19 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > New Hampshire > Grafton County > Hanover (0.04)
(3 more...)

Industry:

Information Technology (0.46)
Consumer Products & Services > Restaurants (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Communications > Networks > Sensor Networks (0.50)
Information Technology > Artificial Intelligence > Cognitive Science > Simulation of Human Behavior (0.50)

Add feedback