Construction & Engineering
Bayesian Batch Active Learning as Sparse Subset Approximation
Robert Pinsler, Jonathan Gordon, Eric Nalisnick, José Miguel Hernández-Lobato
Leveraging the wealth of unlabeled data produced in recent years provides great potential for improving supervised models. When the cost of acquiring labels is high, probabilistic active learning methods can be used to greedily select the most informative data points to be labeled. However, for many large-scale problems standard greedy procedures become computationally infeasible and suffer from negligible model change. In this paper, we introduce a novel Bayesian batch active learning approach that mitigates these issues. Our approach is motivated by approximating the complete data posterior of the model parameters. While naive batch construction methods result in correlated queries, our algorithm produces diverse batches that enable efficient active learning at scale. We derive interpretable closed-form solutions akin to existing active learning procedures for linear models, and generalize to arbitrary models using random projections. We demonstrate the benefits of our approach on several large-scale regression and classification tasks.
Don't plug these 7 appliances (including AC units) into extension cords - here's why
Extension cords are generally a safe solution for running power to electronics that are too far from the nearest wall outlet. But the operative word here is "electronics," which is not as all-encompassing as some people might think. Appliances (like refrigerators and toaster ovens) are obviously electronic devices, but they're in a different class from most electronics because of the amperage demands they need to function. Extension cords are manufactured with a maximum capacity to handle electrical current, which is determined by the size or gauge of the wire used in the cord. For instance, a 16-gauge extension cord can handle a maximum of 13 amps, while a 14-gauge cord can handle up to 15 amps (or 1,800 watts), the same as a standard wall outlet in the US.
BTS: Building Timeseries Dataset: Empowering Large-Scale Building Analytics
Buildings play a crucial role in human well-being, influencing occupant comfort, health, and safety. Additionally, they contribute significantly to global energy consumption, accounting for one-third of total energy usage, and carbon emissions. Optimizing building performance presents a vital opportunity to combat climate change and promote human flourishing. However, research in building analytics has been hampered by the lack of accessible, available, and comprehensive realworld datasets on multiple building operations. In this paper, we introduce the Building TimeSeries (BTS) dataset. Our dataset covers three buildings over a three-year period, comprising more than ten thousand timeseries data points with hundreds of unique classes. Moreover, the metadata is standardized using the Brick schema. To demonstrate the utility of this dataset, we performed benchmarks on the multi-label timeseries classification task. This task represent an essential initial step in addressing challenges related to interoperability in building analytics.
dataset release, tournament evaluation, architectural design, input representation, and other insights
We want to thank the reviewers for their helpful comments. Dataset Release: The dataset will be made available to any interested researchers. Tournament Evaluation: In this work, we conducted two forms of tournament settings. Architectural Design: We agree with R3 that there are a lot of non-trivial modeling choices in our architecture. We call the first one unit-based and the latter token-based.
Indoor Air Quality Dataset with Activities of Daily Living in Low to Middle-income Communities
In recent years, indoor air pollution has posed a significant threat to our society, claiming over 3.2 million lives annually. Developing nations, such as India, are most affected since lack of knowledge, inadequate regulation, and outdoor air pollution lead to severe daily exposure to pollutants. However, only a limited number of studies have attempted to understand how indoor air pollution affects developing countries like India. To address this gap, we present spatiotemporal measurements of air quality from 30 indoor sites over six months during summer and winter seasons. The sites are geographically located across four regions of type: rural, suburban, and urban, covering the typical low to middle-income population in India.
From News to Forecast: Integrating Event Analysis in LLM-Based Time Series Forecasting with Reflection
This paper introduces a novel approach that leverages Large Language Models (LLMs) and Generative Agents to enhance time series forecasting by reasoning across both text and time series data. With language as a medium, our method adaptively integrates social events into forecasting models, aligning news content with time series fluctuations to provide richer insights. Specifically, we utilize LLM-based agents to iteratively filter out irrelevant news and employ human-like reasoning to evaluate predictions. This enables the model to analyze complex events, such as unexpected incidents and shifts in social behavior, and continuously refine the selection logic of news and the robustness of the agent's output. By integrating selected news events with time series data, we fine-tune a pre-trained LLM to predict sequences of digits in time series. The results demonstrate significant improvements in forecasting accuracy, suggesting a potential paradigm shift in time series forecasting through the effective utilization of unstructured news data.
Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward
It has long been recognized that multi-agent reinforcement learning (MARL) faces significant scalability issues due to the fact that the size of the state and action spaces are exponentially large in the number of agents. In this paper, we identify a rich class of networked MARL problems where the model exhibits a local dependence structure that allows it to be solved in a scalable manner. Specifically, we propose a Scalable Actor-Critic (SAC) method that can learn a near optimal localized policy for optimizing the average reward with complexity scaling with the state-action space size of local neighborhoods, as opposed to the entire network. Our result centers around identifying and exploiting an exponential decay property that ensures the effect of agents on each other decays exponentially fast in their graph distance.
Bayesian Batch Active Learning as Sparse Subset Approximation
Robert Pinsler, Jonathan Gordon, Eric Nalisnick, José Miguel Hernández-Lobato
Leveraging the wealth of unlabeled data produced in recent years provides great potential for improving supervised models. When the cost of acquiring labels is high, probabilistic active learning methods can be used to greedily select the most informative data points to be labeled. However, for many large-scale problems standard greedy procedures become computationally infeasible and suffer from negligible model change. In this paper, we introduce a novel Bayesian batch active learning approach that mitigates these issues. Our approach is motivated by approximating the complete data posterior of the model parameters. While naive batch construction methods result in correlated queries, our algorithm produces diverse batches that enable efficient active learning at scale. We derive interpretable closed-form solutions akin to existing active learning procedures for linear models, and generalize to arbitrary models using random projections. We demonstrate the benefits of our approach on several large-scale regression and classification tasks.
Your all-in-one solution for website building, project management, and invoicing has arrived
TL;DR: Create a white label website, automate communication, and more with Sellful's AI, now 399 (reg. Calling all small business owners or entrepreneurs--are you juggling multiple roles in marketing, invoicing and sales, and/or customer outreach? You might be feeling stretched thin, but you don't need to rely on a third party to perform those tasks for you. Meet Sellful, an AI-powered platform designed to help small businesses and agencies streamline their operations, providing a single, organized space for you to manage daily growth tasks, automate outreach, and even create a website. Get lifetime access while it's available for only 399 (reg.
BTS: Building Timeseries Dataset: Empowering Large-Scale Building Analytics
Buildings play a crucial role in human well-being, influencing occupant comfort, health, and safety. Additionally, they contribute significantly to global energy consumption, accounting for one-third of total energy usage, and carbon emissions. Optimizing building performance presents a vital opportunity to combat climate change and promote human flourishing. However, research in building analytics has been hampered by the lack of accessible, available, and comprehensive realworld datasets on multiple building operations. In this paper, we introduce the Building TimeSeries (BTS) dataset. Our dataset covers three buildings over a three-year period, comprising more than ten thousand timeseries data points with hundreds of unique classes. Moreover, the metadata is standardized using the Brick schema. To demonstrate the utility of this dataset, we performed benchmarks on the multi-label timeseries classification task. This task represent an essential initial step in addressing challenges related to interoperability in building analytics.