AITopics | Pacific Ocean

Collaborating Authors

Pacific Ocean

Prompt Baking

Bhargava, Aman, Witkowski, Cameron, Detkov, Alexander, Thomson, Matt

arXiv.org Artificial IntelligenceSep-4-2024

Two primary ways to change LLM behavior are prompting and weight updates (e.g., fine-tuning). Prompting LLMs is simple and effective, specifying the desired changes explicitly in natural language, whereas weight updates provide more expressive and permanent behavior changes, specified implicitly via training on large datasets. We present a technique for "baking" prompts into the weights of an LLM. Prompt Baking converts a prompt $u$ and initial weights $\theta$ to a new set of weights $\theta_u$ such that new "baked" LLM behaves like the original prompted LLM. Mathematically, we minimize the KL divergence between $P_\theta(\cdot | u)$ and $P_{\theta_u}(\cdot)$, where $P$ is the LLM's probability distribution over token sequences. Across all our experiments, we find prompts can be readily baked into weight updates. Baking chain-of-thought prompts improves zero-shot performance on GSM8K, ASDiv, MBPP, ARC-Easy, ARC-Challenge, and CommonsenseQA benchmarks. Baking news headlines directly updates an LLM's knowledge. And baking instructions & personas alleviates "prompt forgetting" over long sequences. Furthermore, stopping baking early creates "half-baked" models, continuously scaling prompt strength. Baked models retain their sensitivity to further prompting and baking, including re-prompting with the baked-in prompt. Surprisingly, the re-prompted models yield further performance gains in instruction following, as well as math reasoning and coding benchmarks. Taking re-prompting and re-baking to the limit yields a form of iterative self-improvement we call Prompt Pursuit, and preliminary results on instruction following exhibit dramatic performance gains. Finally, we discuss implications for AI safety, continuous model updating, enhancing real-time learning capabilities in LLM-based agents, and generating more stable AI personas.

final answer, language model, pavel durov, (15 more...)

arXiv.org Artificial Intelligence

2409.13697

Country:

Europe > France (0.29)
North America > Canada > Ontario > Toronto (0.14)
Asia > Japan (0.05)
(7 more...)

Genre: Research Report (0.84)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Clustering of Indonesian and Western Gamelan Orchestras through Machine Learning of Performance Parameters

Linke, Simon, Wendt, Gerrit, Bader, Rolf

arXiv.org Artificial IntelligenceSep-3-2024

Indonesian and Western gamelan ensembles are investigated with respect to performance differences. Thereby, the often exotistic history of this music in the West might be reflected in contemporary tonal system, articulation, or large-scale form differences. Analyzing recordings of four Western and five Indonesian orchestras with respect to tonal systems and timbre features and using self-organizing Kohonen map (SOM) as a machine learning algorithm, a clear clustering between Indonesian and Western ensembles appears using certain psychoacoustic features. These point to a reduced articulation and large-scale form variability of Western ensembles compared to Indonesian ones. The SOM also clusters the ensembles with respect to their tonal systems, but no clusters between Indonesian and Western ensembles can be found in this respect. Therefore, a clear analogy between lower articulatory variability and large-scale form variation and a more exostistic, mediative and calm performance expectation and reception of gamelan in the West therefore appears.

ensemble, instrument, music, (15 more...)

arXiv.org Artificial Intelligence

2409.03713

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Indonesia > Bali > Denpasar (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(27 more...)

Genre: Research Report (0.64)

Industry:

Media > Music (1.00)
Media > Film (1.00)
Leisure & Entertainment (1.00)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MAPWise: Evaluating Vision-Language Models for Advanced Map Queries

Mukhopadhyay, Srija, Rajgaria, Abhishek, Khatiwada, Prerana, Gupta, Vivek, Roth, Dan

arXiv.org Artificial IntelligenceAug-30-2024

Vision-language models (VLMs) excel at tasks requiring joint understanding of visual and linguistic information. A particularly promising yet under-explored application for these models lies in answering questions based on various kinds of maps. This study investigates the efficacy of VLMs in answering questions based on choropleth maps, which are widely used for data analysis and representation. To facilitate and encourage research in this area, we introduce a novel map-based question-answering benchmark, consisting of maps from three geographical regions (United States, India, China), each containing 1000 questions. Our benchmark incorporates 43 diverse question templates, requiring nuanced understanding of relative spatial relationships, intricate map features, and complex reasoning. It also includes maps with discrete and continuous values, encompassing variations in color-mapping, category ordering, and stylistic patterns, enabling comprehensive analysis. We evaluate the performance of multiple VLMs on this benchmark, highlighting gaps in their abilities and providing insights for improving such models.

acc, binary single, gemini 1, (13 more...)

arXiv.org Artificial Intelligence

2409.00255

Country:

Asia > China (0.26)
North America > United States > California (0.04)
North America > United States > Oregon (0.04)
(16 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Hierarchical Blockmodelling for Knowledge Graphs

Pietrasik, Marcin, Reformat, Marek, Wilbik, Anna

arXiv.org Artificial IntelligenceAug-28-2024

In this paper, we investigate the use of probabilistic graphical models, specifically stochastic blockmodels, for the purpose of hierarchical entity clustering on knowledge graphs. These models, seldom used in the Semantic Web community, decompose a graph into a set of probability distributions. The parameters of these distributions are then inferred allowing for their subsequent sampling to generate a random graph. In a non-parametric setting, this allows for the induction of hierarchical clusterings without prior constraints on the hierarchy's structure. Specifically, this is achieved by the integration of the Nested Chinese Restaurant Process and the Stick Breaking Process into the generative model. In this regard, we propose a model leveraging such integration and derive a collapsed Gibbs sampling scheme for its inference. To aid in understanding, we describe the steps in this derivation and provide an implementation for the sampler. We evaluate our model on synthetic and real-world datasets and quantitatively compare against benchmark models. We further evaluate our results qualitatively and find that our model is capable of inducing coherent cluster hierarchies in small scale settings. The work presented in this paper provides the first step for the further application of stochastic blockmodels for knowledge graphs on a larger scale. We conclude the paper with potential avenues for future work on more scalable inference schemes.

hierarchy, knowledge graph, pqr, (11 more...)

arXiv.org Artificial Intelligence

2408.15649

Country:

North America > United States (0.28)
Pacific Ocean (0.04)
Europe > Netherlands > Limburg > Maastricht (0.04)
(28 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Government (0.67)
Consumer Products & Services > Restaurants (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(2 more...)

Add feedback

Improving Water Quality Time-Series Prediction in Hong Kong using Sentinel-2 MSI Data and Google Earth Engine Cloud Computing

Sood, Rohin, Zhu, Kevin

arXiv.org Artificial IntelligenceAug-27-2024

Effective water quality monitoring in coastal regions is crucial due to the progressive deterioration caused by pollution and human activities. To address this, this study develops time-series models to predict chlorophyll-a (Chl-a), suspended solids (SS), and turbidity using Sentinel-2 satellite data and Google Earth Engine (GEE) in the coastal regions of Hong Kong. Leveraging Long Short-Term Memory (LSTM) Recurrent Neural Networks, the study incorporates extensive temporal datasets to enhance prediction accuracy. The models utilize spectral data from Sentinel-2, focusing on optically active components, and demonstrate that selected variables closely align with the spectral characteristics of Chl-a and SS. The results indicate improved predictive performance over previous methods, highlighting the potential for remote sensing technology in continuous and comprehensive water quality assessment.

hong kong, remote sensing, water quality parameter, (11 more...)

arXiv.org Artificial Intelligence

2408.1401

Country:

Asia > China > Hong Kong (0.64)
Pacific Ocean > North Pacific Ocean > South China Sea (0.04)
North America > United States > Wisconsin (0.04)
(4 more...)

Genre: Research Report (0.83)

Industry:

Energy (0.95)
Water & Waste Management > Water Management > Water Supplies & Services (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience

He, Zhonghao, Achterberg, Jascha, Collins, Katie, Nejad, Kevin, Akarca, Danyal, Yang, Yinzhu, Gurnee, Wes, Sucholutsky, Ilia, Tang, Yuhan, Ianov, Rebeca, Ogden, George, Li, Chole, Sandbrink, Kai, Casper, Stephen, Ivanova, Anna, Lindsay, Grace W.

arXiv.org Artificial IntelligenceAug-25-2024

Interpretability research aims to provide a human-understandable explanation for model outputs and behaviors based on the input and model's internal structure [Doshi-Velez and Kim, 2017]. The field's goal is to generate mechanistic explanations of how neural networks perform computations and produce behaviors [Nanda et al., 2023, Olsson et al., 2022], which could help predict the behavior of such networks across a wide range of scenarios and possibly solve notable problems of AI systems, such as hallucination and toxic output [Ji et al., 2023]. Being able to interpret AI systems is therefore a key capability to be able to understand whether models are appropriately fair, reliable, robust, and worthy of user trust [Doshi-Velez and Kim, 2017]. However, understanding the computations of frontier AI systems with hundreds of billions of parameters presents many technical challenges, from the curse of dimensionality [Zhao et al., 2024, Altman and Krzywinski, 2018] to finding a suitable unit of analysis [Olah et al., 2020, Zou et al., 2023]. These challenges are par for the course when studying complex systems. In particular, many challenges around artificial neural networks (ANN) interpretability are intimately familiar to another group of researchers: neuroscientists. Neuroscience (often in partnership with cognitive science and psychology) investigates how neurons, their connections, and their activity patterns give rise to cognition and behavior. Similar to how deep learning researchers have recognized, neuroscientists have realized that simply examining activity profiles of individual neurons in response to a particular input is often insufficient for understanding how the system performs computation. Instead, complex neural systems are best understood across multiple levels of analysis - considering behavior alongside the brain's connectome, population codes, and codes of single neurons to gain a holistic understanding of the inner workings of the brain

arxiv preprint arxiv, neuroscience, representation, (11 more...)

arXiv.org Artificial Intelligence

2408.12664

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
(10 more...)

Genre:

Research Report > New Finding (0.46)
Instructional Material > Course Syllabus & Notes (0.45)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Joint Hypergraph Rewiring and Memory-Augmented Forecasting Techniques in Digital Twin Technology

Sakhinana, Sagar Srinivas, Aripirala, Krishna Sai Sudhir, Gupta, Shivam, Runkana, Venkataramana

arXiv.org Artificial IntelligenceAug-22-2024

Digital Twin technology creates virtual replicas of physical objects, processes, or systems by replicating their properties, data, and behaviors. This advanced technology offers a range of intelligent functionalities, such as modeling, simulation, and data-driven decision-making, that facilitate design optimization, performance estimation, and monitoring operations. Forecasting plays a pivotal role in Digital Twin technology, as it enables the prediction of future outcomes, supports informed decision-making, minimizes risks, driving improvements in efficiency, productivity, and cost reduction. Recently, Digital Twin technology has leveraged Graph forecasting techniques in large-scale complex sensor networks to enable accurate forecasting and simulation of diverse scenarios, fostering proactive and data-driven decision making. However, existing Graph forecasting techniques lack scalability for many real-world applications. They have limited ability to adapt to non-stationary environments, retain past knowledge, lack a mechanism to capture the higher order spatio-temporal dynamics, and estimate uncertainty in model predictions. To surmount the challenges, we introduce a hybrid architecture that enhances the hypergraph representation learning backbone by incorporating fast adaptation to new patterns and memory-based retrieval of past knowledge. This balance aims to improve the slowly-learned backbone and achieve better performance in adapting to recent changes. In addition, it models the time-varying uncertainty of multi-horizon forecasts, providing estimates of prediction uncertainty. Our forecasting architecture has been validated through ablation studies and has demonstrated promising results across multiple benchmark datasets, surpassing state-ofthe-art forecasting methods by a significant margin.

dataset, forecasting, mt data, (15 more...)

arXiv.org Artificial Intelligence

2408.12634

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
(2 more...)

Genre: Research Report > New Finding (0.92)

Industry:

Energy > Power Industry (1.00)
Information Technology (0.67)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Simplified Mamba with Disentangled Dependency Encoding for Long-Term Time Series Forecasting

Weng, Zixuan, Han, Jindong, Jiang, Wenzhao, Liu, Hao

arXiv.org Artificial IntelligenceAug-21-2024

Recently many deep learning models have been proposed for Long-term Time Series Forecasting (LTSF). Based on previous literature, we identify three critical patterns that can improve forecasting accuracy: the order and semantic dependencies in time dimension as well as cross-variate dependency. However, little effort has been made to simultaneously consider order and semantic dependencies when developing forecasting models. Moreover, existing approaches utilize cross-variate dependency by mixing information from different timestamps and variates, which may introduce irrelevant or harmful cross-variate information to the time dimension and largely hinder forecasting performance. To overcome these limitations, we investigate the potential of Mamba for LTSF and discover two key advantages benefiting forecasting: (i) the selection mechanism makes Mamba focus on or ignore specific inputs and learn semantic dependency easily, and (ii) Mamba preserves order dependency by processing sequences recursively. After that, we empirically find that the non-linear activation used in Mamba is unnecessary for semantically sparse time series data. Therefore, we further propose SAMBA, a Simplified Mamba with disentangled dependency encoding. Specifically, we first remove the non-linearities of Mamba to make it more suitable for LTSF. Furthermore, we propose a disentangled dependency encoding strategy to endow Mamba with cross-variate dependency modeling capabilities while reducing the interference between time and variate dimensions. Extensive experimental results on seven real-world datasets demonstrate the effectiveness of SAMBA over state-of-the-art forecasting models.

data mining, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2408.12068

Country:

Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

A Constraint Programming Approach to Fair High School Course Scheduling

Kiyohara, Mitsuka, Ishihata, Masakazu

arXiv.org Artificial IntelligenceAug-21-2024

Issues of inequity in U.S. high schools' course scheduling did not previously exist. However, in recent years, with the increase in student population and course variety, students perceive that the course scheduling method is unfair. Current integer programming (IP) methods to the high school scheduling problem (HSSP) fall short in addressing these fairness concerns. The purpose of this research is to develop a solution methodology that generates feasible and fair course schedules using student preferences. Utilizing principles of fairness, which have been well studied in market design, we define the fair high school scheduling problem (FHSSP), a novel extension to the HSSP, and devise a corresponding algorithm based on integer programming to solve the FHSSP. We test our approach on a real course request dataset from a high school in California, USA. Results show that our algorithm can generate schedules that are both feasible and fair. In this paper, we demonstrate that our IP algorithm not only solves the HSSP and FHSSP in the United States but has the potential to be applied to various real-world scheduling problems. Additionally, we show the feasibility of integrating human emotions into mathematical modeling.

algorithm, constraint, student, (12 more...)

arXiv.org Artificial Intelligence

2408.12032

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Singapore (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
(2 more...)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education > Educational Setting > K-12 Education > Secondary School (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.75)

Add feedback

Sliding Window Training -- Utilizing Historical Recommender Systems Data for Foundation Models

Joshi, Swanand, Feng, Yesu, Hsiao, Ko-Jen, Zhang, Zhe, Lamkhede, Sudarshan

arXiv.org Artificial IntelligenceAug-21-2024

Long-lived recommender systems (RecSys) often encounter lengthy Oftentimes in industrial applications, foundation models that user-item interaction histories that span many years. To effectively have inference time restrictions on serving memory footprint cannot learn long term user preferences, Large RecSys foundation models exceed a certain input dimension and model size. This constraint (FM) need to encode this information in pretraining. Usually, this raises a question on how to most effectively utilize a large-scale is done by either generating a long enough sequence length to interaction corpus [1]. The most straightforward way is to truncate take all history sequences as input at the cost of large model input historical interactions. This simplification, however, comes at the dimension or by dropping some parts of the user history to accommodate cost of not using valuable information about user journeys and model size and latency requirements on the production their rich history of interactions during model training [5].

foundation model, interaction, utilizing historical recommender system data, (10 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3640457.3688051

2409.14517

Country:

North America > United States > California > Santa Clara County > Los Gatos (0.06)
Europe > Italy > Apulia > Bari (0.06)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry: Information Technology (0.37)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)

Add feedback