materialization
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- Europe > Middle East > Malta > Port Region > Southern Harbour District > Floriana (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
From Projection to Prediction: Beyond Logits for Scalable Language Models
Dong, Jianbing, Chang, Jianbin
Training Large Language Models (LLMs) typically involves a two-stage pipeline at the output layer: hidden states are projected into vocabulary logits via a linear transformation (lm_head), followed by cross-entropy loss computation against target tokens. While conceptually simple, this design incurs substantial overhead. The intermediate logits tensor, with dimensions proportional to batch size, sequence length, and vocabulary size, must be fully materialized in GPU memory, even though only one target token per position is ultimately used. This leads to significant memory footprint and bandwidth comsumption, limiting scalability and slowing training throughput. In this work, we introduce a novel approach to integrates the output projection and loss prediction into a single operation. By directly computing the loss from hidden states and target tokens, our approach bypasses explicit logits materialization. This design reduces memory usage and alleviates bandwidth pressure. Experiments on LLM training demonstrate that our method achieves substantial memory savings and measurable speedups compared to the standard two-stage pipeline, enabling large batch sizes and longer sequences without sacrificing accuracy. Our work highlights the benefits of rethinking the boundary between projection and prediction, offering a practical systems optimization for efficient LLM training.
Scaling Continuous Latent Variable Models as Probabilistic Integral Circuits Gennaro Gala 1, Cassio de Campos 1 Antonio V ergari 2, Erik Quaeghebeur
Probabilistic integral circuits (PICs) have been recently introduced as probabilistic models enjoying the key ingredient behind expressive generative models: continuous latent variables (L Vs). PICs are symbolic computational graphs defining continuous L V models as hierarchies of functions that are summed and multiplied together, or integrated over some L Vs. They are tractable if L Vs can be analytically integrated out, otherwise they can be approximated by tractable probabilistic circuits (PC) encoding a hierarchical numerical quadrature process, called QPCs. So far, only tree-shaped PICs have been explored, and training them via numerical quadrature requires memory-intensive processing at scale. In this paper, we address these issues, and present: (i) a pipeline for building DAG-shaped PICs out of arbitrary variable decompositions, (ii) a procedure for training PICs using tensorized circuit architectures, and (iii) neural functional sharing techniques to allow scalable training.
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- Europe > Middle East > Malta > Port Region > Southern Harbour District > Floriana (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Ilargi: a GPU Compatible Factorized ML Model Training Framework
The machine learning (ML) training over disparate data sources traditionally involves materialization, which can impose substantial time and space overhead due to data movement and replication. Factorized learning, which leverages direct computation on disparate sources through linear algebra (LA) rewriting, has emerged as a viable alternative to improve computational efficiency. However, the adaptation of factorized learning to leverage the full capabilities of modern LA-friendly hardware like GPUs has been limited, often requiring manual intervention for algorithm compatibility. This paper introduces Ilargi, a novel factorized learning framework that utilizes matrix-represented data integration (DI) metadata to facilitate automatic factorization across CPU and GPU environments without the need for costly relational joins. Ilargi incorporates an ML-based cost estimator to intelligently selects between factorization and materialization based on data properties, algorithm complexity, hardware environments, and their interactions. This strategy ensures up to 8.9x speedups on GPUs and achieves over 20% acceleration in batch ML training workloads, thereby enhancing the practicability of ML training across diverse data integration scenarios and hardware platforms. To our knowledge, this work is the very first effort in GPU-compatible factorized learning.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (4 more...)
Scaling Continuous Latent Variable Models as Probabilistic Integral Circuits
Gala, Gennaro, de Campos, Cassio, Vergari, Antonio, Quaeghebeur, Erik
Probabilistic integral circuits (PICs) have been recently introduced as probabilistic models enjoying the key ingredient behind expressive generative models: continuous latent variables (LVs). PICs are symbolic computational graphs defining continuous LV models as hierarchies of functions that are summed and multiplied together, or integrated over some LVs. They are tractable if LVs can be analytically integrated out, otherwise they can be approximated by tractable probabilistic circuits (PC) encoding a hierarchical numerical quadrature process, called QPCs. So far, only tree-shaped PICs have been explored, and training them via numerical quadrature requires memory-intensive processing at scale. In this paper, we address these issues, and present: (i) a pipeline for building DAG-shaped PICs out of arbitrary variable decompositions, (ii) a procedure for training PICs using tensorized circuit architectures, and (iii) neural functional sharing techniques to allow scalable training.
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- Europe > Middle East > Malta > Port Region > Southern Harbour District > Floriana (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Managed Geo-Distributed Feature Store: Architecture and System Design
Li, Anya, Ranganathan, Bhala, Pan, Feng, Zhang, Mickey, Xu, Qianjun, Li, Runhan, Raman, Sethu, Shah, Shail Paragbhai, Tang, Vivienne
Companies are using machine learning to solve real-world problems and are developing hundreds to thousands of features in the process. They are building feature engineering pipelines as part of MLOps life cycle to transform data from various data sources and materialize the same for future consumption. Without feature stores, different teams across various business groups would maintain the above process independently, which can lead to conflicting and duplicated features in the system. Data scientists find it hard to search for and reuse existing features and it is painful to maintain version control. Furthermore, feature correctness violations related to online (inferencing) - offline (training) skews and data leakage are common. Although the machine learning community has extensively discussed the need for feature stores and their purpose [10, 11], this paper aims to capture the core architectural components that make up a managed feature store and to share the design learning in building such a system.
The CaLiGraph Ontology as a Challenge for OWL Reasoners
Heist, Nicolas, Paulheim, Heiko
CaLiGraph is a large-scale cross-domain knowledge graph generated from Wikipedia by exploiting the category system, list pages, and other list structures in Wikipedia, containing more than 15 million typed entities and around 10 million relation assertions. Other than knowledge graphs such as DBpedia and YAGO, whose ontologies are comparably simplistic, CaLiGraph also has a rich ontology, comprising more than 200,000 class restrictions. Those two properties - a large A-box and a rich ontology - make it an interesting challenge for benchmarking reasoners. In this paper, we show that a reasoning task which is particularly relevant for CaLiGraph, i.e., the materialization of owl:hasValue constraints into assertions between individuals and between individuals and literals, is insufficiently supported by available reasoning systems. We provide differently sized benchmark subsets of CaLiGraph, which can be used for performance analysis of reasoning systems.
- Asia > China (0.06)
- Europe > Sweden (0.05)
- North America > United States > New York > Albany County > Albany (0.04)
- Europe > Germany (0.04)
More is not Always Better: The Negative Impact of A-box Materialization on RDF2vec Knowledge Graph Embeddings
Iana, Andreea, Paulheim, Heiko
RDF2vec is an embedding technique for representing knowledge graph entities in a continuous vector space. In this paper, we investigate the effect of materializing implicit A-box axioms induced by subproperties, as well as symmetric and transitive properties. While it might be a reasonable assumption that such a materialization before computing embeddings might lead to better embeddings, we conduct a set of experiments on DBpedia which demonstrate that the materialization actually has a negative effect on the performance of RDF2vec. In our analysis, we argue that despite the huge body of work devoted on completing missing information in knowledge graphs, such missing implicit information is actually a signal, not a defect, and we show examples illustrating that assumption.
- Media (0.46)
- Leisure & Entertainment (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)
If Machines Want to Make Art, Will Humans Understand It?
Assuming that the emergence of consciousness in artificial minds is possible, those minds will feel the urge to create art. But will we be able to understand it? To answer this question, we need to consider two subquestions: when does the machine become an author of an artwork? And how can we form an understanding of the art that it makes? Empathy, we argue, is the force behind our capacity to understand works of art.
Dichotomies in Ontology-Mediated Querying with the Guarded Fragment
Hernich, Andre, Lutz, Carsten, Papacchini, Fabio, Wolter, Frank
We study the complexity of ontology-mediated querying when ontologies are formulated in the guarded fragment of first-order logic (GF). Our general aim is to classify the data complexity on the level of ontologies where query evaluation w.r.t. an ontology O is considered to be in PTime if all (unions of conjunctive) queries can be evaluated in PTime w.r.t. O and coNP-hard if at least one query is coNP-hard w.r.t. O. We identify several large and relevant fragments of GF that enjoy a dichotomy between PTime and coNP, some of them additionally admitting a form of counting. In fact, almost all ontologies in the BioPortal repository fall into these fragments or can easily be rewritten to do so. We then establish a variation of Ladner's Theorem on the existence of NP-intermediate problems and use this result to show that for other fragments, there is provably no such dichotomy. Again for other fragments (such as full GF), establishing a dichotomy implies the Feder-Vardi conjecture on the complexity of constraint satisfaction problems. We also link these results to Datalog-rewritability and study the decidability of whether a given ontology enjoys PTime query evaluation, presenting both positive and negative results.
- Europe > Germany > Bremen > Bremen (0.27)
- Europe > United Kingdom > England > Merseyside > Liverpool (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)