Pacific Ocean
Prompt Baking
Bhargava, Aman, Witkowski, Cameron, Detkov, Alexander, Thomson, Matt
Two primary ways to change LLM behavior are prompting and weight updates (e.g., fine-tuning). Prompting LLMs is simple and effective, specifying the desired changes explicitly in natural language, whereas weight updates provide more expressive and permanent behavior changes, specified implicitly via training on large datasets. We present a technique for "baking" prompts into the weights of an LLM. Prompt Baking converts a prompt $u$ and initial weights $\theta$ to a new set of weights $\theta_u$ such that new "baked" LLM behaves like the original prompted LLM. Mathematically, we minimize the KL divergence between $P_\theta(\cdot | u)$ and $P_{\theta_u}(\cdot)$, where $P$ is the LLM's probability distribution over token sequences. Across all our experiments, we find prompts can be readily baked into weight updates. Baking chain-of-thought prompts improves zero-shot performance on GSM8K, ASDiv, MBPP, ARC-Easy, ARC-Challenge, and CommonsenseQA benchmarks. Baking news headlines directly updates an LLM's knowledge. And baking instructions & personas alleviates "prompt forgetting" over long sequences. Furthermore, stopping baking early creates "half-baked" models, continuously scaling prompt strength. Baked models retain their sensitivity to further prompting and baking, including re-prompting with the baked-in prompt. Surprisingly, the re-prompted models yield further performance gains in instruction following, as well as math reasoning and coding benchmarks. Taking re-prompting and re-baking to the limit yields a form of iterative self-improvement we call Prompt Pursuit, and preliminary results on instruction following exhibit dramatic performance gains. Finally, we discuss implications for AI safety, continuous model updating, enhancing real-time learning capabilities in LLM-based agents, and generating more stable AI personas.
Clustering of Indonesian and Western Gamelan Orchestras through Machine Learning of Performance Parameters
Linke, Simon, Wendt, Gerrit, Bader, Rolf
Indonesian and Western gamelan ensembles are investigated with respect to performance differences. Thereby, the often exotistic history of this music in the West might be reflected in contemporary tonal system, articulation, or large-scale form differences. Analyzing recordings of four Western and five Indonesian orchestras with respect to tonal systems and timbre features and using self-organizing Kohonen map (SOM) as a machine learning algorithm, a clear clustering between Indonesian and Western ensembles appears using certain psychoacoustic features. These point to a reduced articulation and large-scale form variability of Western ensembles compared to Indonesian ones. The SOM also clusters the ensembles with respect to their tonal systems, but no clusters between Indonesian and Western ensembles can be found in this respect. Therefore, a clear analogy between lower articulatory variability and large-scale form variation and a more exostistic, mediative and calm performance expectation and reception of gamelan in the West therefore appears.
MAPWise: Evaluating Vision-Language Models for Advanced Map Queries
Mukhopadhyay, Srija, Rajgaria, Abhishek, Khatiwada, Prerana, Gupta, Vivek, Roth, Dan
Vision-language models (VLMs) excel at tasks requiring joint understanding of visual and linguistic information. A particularly promising yet under-explored application for these models lies in answering questions based on various kinds of maps. This study investigates the efficacy of VLMs in answering questions based on choropleth maps, which are widely used for data analysis and representation. To facilitate and encourage research in this area, we introduce a novel map-based question-answering benchmark, consisting of maps from three geographical regions (United States, India, China), each containing 1000 questions. Our benchmark incorporates 43 diverse question templates, requiring nuanced understanding of relative spatial relationships, intricate map features, and complex reasoning. It also includes maps with discrete and continuous values, encompassing variations in color-mapping, category ordering, and stylistic patterns, enabling comprehensive analysis. We evaluate the performance of multiple VLMs on this benchmark, highlighting gaps in their abilities and providing insights for improving such models.
Hierarchical Blockmodelling for Knowledge Graphs
Pietrasik, Marcin, Reformat, Marek, Wilbik, Anna
In this paper, we investigate the use of probabilistic graphical models, specifically stochastic blockmodels, for the purpose of hierarchical entity clustering on knowledge graphs. These models, seldom used in the Semantic Web community, decompose a graph into a set of probability distributions. The parameters of these distributions are then inferred allowing for their subsequent sampling to generate a random graph. In a non-parametric setting, this allows for the induction of hierarchical clusterings without prior constraints on the hierarchy's structure. Specifically, this is achieved by the integration of the Nested Chinese Restaurant Process and the Stick Breaking Process into the generative model. In this regard, we propose a model leveraging such integration and derive a collapsed Gibbs sampling scheme for its inference. To aid in understanding, we describe the steps in this derivation and provide an implementation for the sampler. We evaluate our model on synthetic and real-world datasets and quantitatively compare against benchmark models. We further evaluate our results qualitatively and find that our model is capable of inducing coherent cluster hierarchies in small scale settings. The work presented in this paper provides the first step for the further application of stochastic blockmodels for knowledge graphs on a larger scale. We conclude the paper with potential avenues for future work on more scalable inference schemes.
Improving Water Quality Time-Series Prediction in Hong Kong using Sentinel-2 MSI Data and Google Earth Engine Cloud Computing
Effective water quality monitoring in coastal regions is crucial due to the progressive deterioration caused by pollution and human activities. To address this, this study develops time-series models to predict chlorophyll-a (Chl-a), suspended solids (SS), and turbidity using Sentinel-2 satellite data and Google Earth Engine (GEE) in the coastal regions of Hong Kong. Leveraging Long Short-Term Memory (LSTM) Recurrent Neural Networks, the study incorporates extensive temporal datasets to enhance prediction accuracy. The models utilize spectral data from Sentinel-2, focusing on optically active components, and demonstrate that selected variables closely align with the spectral characteristics of Chl-a and SS. The results indicate improved predictive performance over previous methods, highlighting the potential for remote sensing technology in continuous and comprehensive water quality assessment.
Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience
He, Zhonghao, Achterberg, Jascha, Collins, Katie, Nejad, Kevin, Akarca, Danyal, Yang, Yinzhu, Gurnee, Wes, Sucholutsky, Ilia, Tang, Yuhan, Ianov, Rebeca, Ogden, George, Li, Chole, Sandbrink, Kai, Casper, Stephen, Ivanova, Anna, Lindsay, Grace W.
Interpretability research aims to provide a human-understandable explanation for model outputs and behaviors based on the input and model's internal structure [Doshi-Velez and Kim, 2017]. The field's goal is to generate mechanistic explanations of how neural networks perform computations and produce behaviors [Nanda et al., 2023, Olsson et al., 2022], which could help predict the behavior of such networks across a wide range of scenarios and possibly solve notable problems of AI systems, such as hallucination and toxic output [Ji et al., 2023]. Being able to interpret AI systems is therefore a key capability to be able to understand whether models are appropriately fair, reliable, robust, and worthy of user trust [Doshi-Velez and Kim, 2017]. However, understanding the computations of frontier AI systems with hundreds of billions of parameters presents many technical challenges, from the curse of dimensionality [Zhao et al., 2024, Altman and Krzywinski, 2018] to finding a suitable unit of analysis [Olah et al., 2020, Zou et al., 2023]. These challenges are par for the course when studying complex systems. In particular, many challenges around artificial neural networks (ANN) interpretability are intimately familiar to another group of researchers: neuroscientists. Neuroscience (often in partnership with cognitive science and psychology) investigates how neurons, their connections, and their activity patterns give rise to cognition and behavior. Similar to how deep learning researchers have recognized, neuroscientists have realized that simply examining activity profiles of individual neurons in response to a particular input is often insufficient for understanding how the system performs computation. Instead, complex neural systems are best understood across multiple levels of analysis - considering behavior alongside the brain's connectome, population codes, and codes of single neurons to gain a holistic understanding of the inner workings of the brain
Joint Hypergraph Rewiring and Memory-Augmented Forecasting Techniques in Digital Twin Technology
Sakhinana, Sagar Srinivas, Aripirala, Krishna Sai Sudhir, Gupta, Shivam, Runkana, Venkataramana
Digital Twin technology creates virtual replicas of physical objects, processes, or systems by replicating their properties, data, and behaviors. This advanced technology offers a range of intelligent functionalities, such as modeling, simulation, and data-driven decision-making, that facilitate design optimization, performance estimation, and monitoring operations. Forecasting plays a pivotal role in Digital Twin technology, as it enables the prediction of future outcomes, supports informed decision-making, minimizes risks, driving improvements in efficiency, productivity, and cost reduction. Recently, Digital Twin technology has leveraged Graph forecasting techniques in large-scale complex sensor networks to enable accurate forecasting and simulation of diverse scenarios, fostering proactive and data-driven decision making. However, existing Graph forecasting techniques lack scalability for many real-world applications. They have limited ability to adapt to non-stationary environments, retain past knowledge, lack a mechanism to capture the higher order spatio-temporal dynamics, and estimate uncertainty in model predictions. To surmount the challenges, we introduce a hybrid architecture that enhances the hypergraph representation learning backbone by incorporating fast adaptation to new patterns and memory-based retrieval of past knowledge. This balance aims to improve the slowly-learned backbone and achieve better performance in adapting to recent changes. In addition, it models the time-varying uncertainty of multi-horizon forecasts, providing estimates of prediction uncertainty. Our forecasting architecture has been validated through ablation studies and has demonstrated promising results across multiple benchmark datasets, surpassing state-ofthe-art forecasting methods by a significant margin.
Simplified Mamba with Disentangled Dependency Encoding for Long-Term Time Series Forecasting
Weng, Zixuan, Han, Jindong, Jiang, Wenzhao, Liu, Hao
Recently many deep learning models have been proposed for Long-term Time Series Forecasting (LTSF). Based on previous literature, we identify three critical patterns that can improve forecasting accuracy: the order and semantic dependencies in time dimension as well as cross-variate dependency. However, little effort has been made to simultaneously consider order and semantic dependencies when developing forecasting models. Moreover, existing approaches utilize cross-variate dependency by mixing information from different timestamps and variates, which may introduce irrelevant or harmful cross-variate information to the time dimension and largely hinder forecasting performance. To overcome these limitations, we investigate the potential of Mamba for LTSF and discover two key advantages benefiting forecasting: (i) the selection mechanism makes Mamba focus on or ignore specific inputs and learn semantic dependency easily, and (ii) Mamba preserves order dependency by processing sequences recursively. After that, we empirically find that the non-linear activation used in Mamba is unnecessary for semantically sparse time series data. Therefore, we further propose SAMBA, a Simplified Mamba with disentangled dependency encoding. Specifically, we first remove the non-linearities of Mamba to make it more suitable for LTSF. Furthermore, we propose a disentangled dependency encoding strategy to endow Mamba with cross-variate dependency modeling capabilities while reducing the interference between time and variate dimensions. Extensive experimental results on seven real-world datasets demonstrate the effectiveness of SAMBA over state-of-the-art forecasting models.
A Constraint Programming Approach to Fair High School Course Scheduling
Kiyohara, Mitsuka, Ishihata, Masakazu
Issues of inequity in U.S. high schools' course scheduling did not previously exist. However, in recent years, with the increase in student population and course variety, students perceive that the course scheduling method is unfair. Current integer programming (IP) methods to the high school scheduling problem (HSSP) fall short in addressing these fairness concerns. The purpose of this research is to develop a solution methodology that generates feasible and fair course schedules using student preferences. Utilizing principles of fairness, which have been well studied in market design, we define the fair high school scheduling problem (FHSSP), a novel extension to the HSSP, and devise a corresponding algorithm based on integer programming to solve the FHSSP. We test our approach on a real course request dataset from a high school in California, USA. Results show that our algorithm can generate schedules that are both feasible and fair. In this paper, we demonstrate that our IP algorithm not only solves the HSSP and FHSSP in the United States but has the potential to be applied to various real-world scheduling problems. Additionally, we show the feasibility of integrating human emotions into mathematical modeling.
Sliding Window Training -- Utilizing Historical Recommender Systems Data for Foundation Models
Joshi, Swanand, Feng, Yesu, Hsiao, Ko-Jen, Zhang, Zhe, Lamkhede, Sudarshan
Long-lived recommender systems (RecSys) often encounter lengthy Oftentimes in industrial applications, foundation models that user-item interaction histories that span many years. To effectively have inference time restrictions on serving memory footprint cannot learn long term user preferences, Large RecSys foundation models exceed a certain input dimension and model size. This constraint (FM) need to encode this information in pretraining. Usually, this raises a question on how to most effectively utilize a large-scale is done by either generating a long enough sequence length to interaction corpus [1]. The most straightforward way is to truncate take all history sequences as input at the cost of large model input historical interactions. This simplification, however, comes at the dimension or by dropping some parts of the user history to accommodate cost of not using valuable information about user journeys and model size and latency requirements on the production their rich history of interactions during model training [5].