Goto

Collaborating Authors

 Vorarlberg


Under the Influence at the Whitney Biennial

The New Yorker

How the artists in this year's survey do or, more often, don't acknowledge those who paved the way for them. Machado makes pieces that one might call documents of reverence, excavated burial grounds. If nothing else, the 2026 Whitney Biennial, curated by Marcela Guerrero and Drew Sawyer (at the Whitney Museum through August 23rd), introduces viewers to what I call ChatGPT art--facsimiles of facsimiles by makers who have little if any relationship to what they're putting out there, aside from its being a product in service of a career. Indeed, it's difficult to think of the people who grew up with and apparently condone the use of A.I. sources in the creation of "art" as artists themselves, especially if you define art as a creative expression of thoughts or feelings that have changed, and contributed to the vision of, the artists who made it. It's true that, nearly from the beginning, postmodern art challenged the notion of originality, or, more specifically, the weight of originality--often with great joy and wit and not a little fear.


Load Forecasting for Households and Energy Communities: Are Deep Learning Models Worth the Effort?

arXiv.org Artificial Intelligence

Accurate load forecasting is crucial for predictive control in many energy domain applications, with significant economic and ecological implications. To address these implications, this study provides an extensive benchmark of state-of-the-art deep learning models for short-term load forecasting in energy communities. Namely, LSTM, xLSTM, and Transformers are compared with benchmarks such as KNNs, synthetic load models, and persistence forecasting models. This comparison considers different scales of aggregation (e.g., number of household loads) and varying training data availability (e.g., training data time spans). Further, the impact of transfer learning from synthetic (standard) load profiles and the deep learning model size (i.e., parameter count) is investigated in terms of forecasting error. Implementations are publicly available and other researchers are encouraged to benchmark models using this framework. Additionally, a comprehensive case study, comprising an energy community of 50 households and a battery storage demonstrates the beneficial financial implications of accurate predictions. Key findings of this research include: (1) Simple persistence benchmarks outperform deep learning models for short-term load forecasting when the available training data is limited to six months or less; (2) Pretraining with publicly available synthetic load profiles improves the normalized Mean Absolute Error (nMAE) by an average of 1.28%pt during the first nine months of training data; (3) Increased aggregation significantly enhances the performance of deep learning models relative to persistence benchmarks; (4) Improved load forecasting, with an nMAE reduction of 1.1%pt, translates to an economic benefit of approximately 600EUR per year in an energy community comprising 50 households.


Reducing the Transformer Architecture to a Minimum

arXiv.org Artificial Intelligence

Transformers are a widespread and successful model architecture, particularly in Natural Language Processing (NLP) and Computer Vision (CV). The essential innovation of this architecture is the Attention Mechanism, which solves the problem of extracting relevant context information from long sequences in NLP and realistic scenes in CV. A classical neural network component, a Multi-Layer Perceptron (MLP), complements the attention mechanism. Its necessity is frequently justified by its capability of modeling nonlinear relationships. However, the attention mechanism itself is nonlinear through its internal use of similarity measures. A possible hypothesis is that this nonlinearity is sufficient for modeling typical application problems. As the MLPs usually contain the most trainable parameters of the whole model, their omission would substantially reduce the parameter set size. Further components can also be reorganized to reduce the number of parameters. Under some conditions, query and key matrices can be collapsed into a single matrix of the same size. The same is true about value and projection matrices, which can also be omitted without eliminating the substance of the attention mechanism. Initially, the similarity measure was defined asymmetrically, with peculiar properties such as that a token is possibly dissimilar to itself. A possible symmetric definition requires only half of the parameters. We have laid the groundwork by testing widespread CV benchmarks: MNIST and CIFAR-10. The tests have shown that simplified transformer architectures (a) without MLP, (b) with collapsed matrices, and (c) symmetric similarity matrices exhibit similar performance as the original architecture, saving up to 90% of parameters without hurting the classification performance.


Efficient Neural Network Training via Subset Pretraining

arXiv.org Machine Learning

In training neural networks, it is common practice to use partial gradients computed over batches, mostly very small subsets of the training set. This approach is motivated by the argument that such a partial gradient is close to the true one, with precision growing only with the square root of the batch size. A theoretical justification is with the help of stochastic approximation theory. However, the conditions for the validity of this theory are not satisfied in the usual learning rate schedules. Batch processing is also difficult to combine with efficient second-order optimization methods. This proposal is based on another hypothesis: the loss minimum of the training set can be expected to be well-approximated by the minima of its subsets. Such subset minima can be computed in a fraction of the time necessary for optimizing over the whole training set. This hypothesis has been tested with the help of the MNIST, CIFAR-10, and CIFAR-100 image classification benchmarks, optionally extended by training data augmentation. The experiments have confirmed that results equivalent to conventional training can be reached. In summary, even small subsets are representative if the overdetermination ratio for the given model parameter set sufficiently exceeds unity. The computing expense can be reduced to a tenth or less.


Ontology-Free General-Domain Knowledge Graph-to-Text Generation Dataset Synthesis using Large Language Model

arXiv.org Artificial Intelligence

Knowledge Graph-to-Text (G2T) generation involves verbalizing structured knowledge graphs into natural language text. Recent advancements in Pretrained Language Models (PLMs) have improved G2T performance, but their effectiveness depends on datasets with precise graph-text alignment. However, the scarcity of high-quality, general-domain G2T generation datasets restricts progress in the general-domain G2T generation research. To address this issue, we introduce Wikipedia Ontology-Free Graph-text dataset (WikiOFGraph), a new large-scale G2T dataset generated using a novel method that leverages Large Language Model (LLM) and Data-QuestEval. Our new dataset, which contains 5.85M general-domain graph-text pairs, offers high graph-text consistency without relying on external ontologies. Experimental results demonstrate that PLM fine-tuned on WikiOFGraph outperforms those trained on other datasets across various evaluation metrics. Our method proves to be a scalable and effective solution for generating high-quality G2T data, significantly advancing the field of G2T generation.


Analyzing the Impact of Electric Vehicles on Local Energy Systems using Digital Twins

arXiv.org Artificial Intelligence

The electrification of the transportation and heating sector, the so-called sector coupling, is one of the core elements to achieve independence from fossil fuels. As it highly affects the electricity demand, especially on the local level, the integrated modeling and simulation of all sectors is a promising approach for analyzing design decisions or complex control strategies. This paper analyzes the increase in electricity demand resulting from sector coupling, mainly due to integrating electric vehicles into urban energy systems. Therefore, we utilize a digital twin of an existing local energy system and extend it with a mobility simulation model to evaluate the impact of electric vehicles on the distribution grid level. Our findings indicate a significant rise in annual electricity consumption attributed to electric vehicles, with home charging alone resulting in a 78% increase. However, we demonstrate that integrating photovoltaic and battery energy storage systems can effectively mitigate this rise.


Improve Load Forecasting in Energy Communities through Transfer Learning using Open-Access Synthetic Profiles

arXiv.org Artificial Intelligence

According to a conservative estimate, a 1% reduction in forecast error for a 10 GW energy utility can save up to $ 1.6 million annually. In our context, achieving precise forecasts of future power consumption is crucial for operating flexible energy assets using model predictive control approaches. Specifically, this work focuses on the load profile forecast of a first-year energy community with the common practical challenge of limited historical data availability. We propose to pre-train the load prediction models with open-access synthetic load profiles using transfer learning techniques to tackle this challenge. Results show that this approach improves both, the training stability and prediction error. In a test case with 74 households, the prediction mean squared error (MSE) decreased from 0.34 to 0.13, showing transfer learning based on synthetic load profiles to be a viable approach to compensate for a lack of historic data.


Code Generation for Machine Learning using Model-Driven Engineering and SysML

arXiv.org Artificial Intelligence

Data-driven engineering refers to systematic data collection and processing using machine learning to improve engineering systems. Currently, the implementation of data-driven engineering relies on fundamental data science and software engineering skills. At the same time, model-based engineering is gaining relevance for the engineering of complex systems. In previous work, a model-based engineering approach integrating the formalization of machine learning tasks using the general-purpose modeling language SysML is presented. However, formalized machine learning tasks still require the implementation in a specialized programming languages like Python. Therefore, this work aims to facilitate the implementation of data-driven engineering in practice by extending the previous work of formalizing machine learning tasks by integrating model transformation to generate executable code. The method focuses on the modifiability and maintainability of the model transformation so that extensions and changes to the code generation can be integrated without requiring modifications to the code generator. The presented method is evaluated for feasibility in a case study to predict weather forecasts. Based thereon, quality attributes of model transformations are assessed and discussed. Results demonstrate the flexibility and the simplicity of the method reducing efforts for implementation. Further, the work builds a theoretical basis for standardizing data-driven engineering implementation in practice.


Pay More Attention to Relation Exploration for Knowledge Base Question Answering

arXiv.org Artificial Intelligence

Knowledge base question answering (KBQA) is a challenging task that aims to retrieve correct answers from large-scale knowledge bases. Existing attempts primarily focus on entity representation and final answer reasoning, which results in limited supervision for this task. Moreover, the relations, which empirically determine the reasoning path selection, are not fully considered in recent advancements. In this study, we propose a novel framework, RE-KBQA, that utilizes relations in the knowledge base to enhance entity representation and introduce additional supervision. We explore guidance from relations in three aspects, including (1) distinguishing similar entities by employing a variational graph auto-encoder to learn relation importance; (2) exploring extra supervision by predicting relation distributions as soft labels with a multi-task scheme; (3) designing a relation-guided re-ranking algorithm for post-processing. Experimental results on two benchmark datasets demonstrate the effectiveness and superiority of our framework, improving the F1 score by 5.7% from 40.5 to 46.3 on CWQ and 5.8% from 62.8 to 68.5 on WebQSP, better or on par with state-of-the-art methods.


Deep importance sampling using tensor trains with application to a priori and a posteriori rare event estimation

arXiv.org Artificial Intelligence

We propose a deep importance sampling method that is suitable for estimating rare event probabilities in high-dimensional problems. We approximate the optimal importance distribution in a general importance sampling problem as the pushforward of a reference distribution under a composition of order-preserving transformations, in which each transformation is formed by a squared tensor-train decomposition. The squared tensor-train decomposition provides a scalable ansatz for building order-preserving high-dimensional transformations via density approximations. The use of composition of maps moving along a sequence of bridging densities alleviates the difficulty of directly approximating concentrated density functions. To compute expectations over unnormalized probability distributions, we design a ratio estimator that estimates the normalizing constant using a separate importance distribution, again constructed via a composition of transformations in tensor-train format. This offers better theoretical variance reduction compared with self-normalized importance sampling, and thus opens the door to efficient computation of rare event probabilities in Bayesian inference problems. Numerical experiments on problems constrained by differential equations show little to no increase in the computational complexity with the event probability going to zero, and allow to compute hitherto unattainable estimates of rare event probabilities for complex, high-dimensional posterior densities.