seattle
Vampire: The Masquerade Bloodlines 2 review – an interestingly toothless piece of noir fiction
'A 25-hour story that just about makes sense' Vampire: The Masquerade Bloodlines 2. 'A 25-hour story that just about makes sense' Vampire: The Masquerade Bloodlines 2. Y ou are an ancient and powerful vampire, and you wake up in the basement of some decrepit Seattle building, with no recent memories and a strange sigil on your hand. The first thing you do is feed on the cop who finds you, before smacking his partner into a wall so hard that his blood spatters the brick. A violent fanged rampage ensues, where you beat up and tear apart rival undead and their ghouls while currying the favour of the local court of vampires, and trying to keep your existence hidden from the mortal populace of this sultry city. But this is also a detective story: there's a younger night-stalker sharing your brain, a voice in your head named Fabian, who talks like a 1920s gumshoe (presumably because he once was one). Fabian isn't violent at all; he evidently works with the human police and the vampire underworld, snacking on consenting volunteers' blood and using his mind-delving powers to solve murders.
- North America > United States (0.15)
- Oceania > Australia (0.05)
- Europe > Ukraine (0.05)
- Leisure & Entertainment > Sports (0.98)
- Leisure & Entertainment > Games > Computer Games (0.72)
ToolCritic: Detecting and Correcting Tool-Use Errors in Dialogue Systems
Hamad, Hassan, Xu, Yingru, Zhao, Liang, Yan, Wenbo, Gyanchandani, Narendra
Tool-augmented large language models (LLMs) are increasingly employed in real-world applications, but tool usage errors still hinder their reliability. We introduce ToolCritic, a diagnostic framework that evaluates and improves LLM behavior in multi-turn, tool-augmented dialogues. ToolCritic detects eight distinct error types specific to tool-calling (e.g., premature invocation, argument misalignment, and misinterpretation of tool outputs) and provides targeted feedback to the main LLM. The main LLM, assumed to have strong reasoning, task understanding and orchestration capabilities, then revises its response based on ToolCritic's feedback. We systematically define these error categories and construct a synthetic dataset to train ToolCritic. Experimental results on the Schema-Guided Dialogue (SGD) dataset demonstrate that ToolCritic improves tool-calling accuracy by up to 13% over baselines, including zero-shot prompting and self-correction techniques. This represents a promising step toward more robust LLM integration with external tools in real-world dialogue applications.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Europe > Austria > Vienna (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Task Memory Engine: Spatial Memory for Robust Multi-Step LLM Agents
Large Language Models (LLMs) falter in multi-step interactions -- often hallucinating, repeating actions, or misinterpreting user corrections -- due to reliance on linear, unstructured context. This fragility stems from the lack of persistent memory to track evolving goals and task dependencies, undermining trust in autonomous agents. We introduce the Task Memory Engine (TME), a modular memory controller that transforms existing LLMs into robust, revision-aware agents without fine-tuning. TME implements a spatial memory framework that replaces flat context with graph-based structures to support consistent, multi-turn reasoning. Departing from linear concatenation and ReAct-style prompting, TME builds a dynamic task graph -- either a tree or directed acyclic graph (DAG) -- to map user inputs to subtasks, align them with prior context, and enable dependency-tracked revisions. Its Task Representation and Intent Management (TRIM) component models task semantics and user intent to ensure accurate interpretation. Across four multi-turn scenarios-trip planning, cooking, meeting scheduling, and shopping cart editing -- TME eliminates 100% of hallucinations and misinterpretations in three tasks, and reduces hallucinations by 66.7% and misinterpretations by 83.3% across 27 user turns, outperforming ReAct. TME's modular design supports plug-and-play deployment and domain-specific customization, adaptable to both personal assistants and enterprise automation. We release TME's codebase, benchmarks, and components as open-source resources, enabling researchers to develop reliable LLM agents. TME's scalable architecture addresses a critical gap in agent performance across complex, interactive settings.
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > United States > California > San Francisco County > San Francisco (0.05)
- North America > United States > Washington > King County > Seattle (0.04)
- (2 more...)
Use of Metric Learning for the Recognition of Handwritten Digits, and its Application to Increase the Outreach of Voice-based Communication Platforms
Pant, Devesh, Talukder, Dibyendu, Kumar, Deepak, Pandey, Rachit, Seth, Aaditeshwar, Arora, Chetan
Initiation, monitoring, and evaluation of development programmes can involve field-based data collection about project activities. This data collection through digital devices may not always be feasible though, for reasons such as unaffordability of smartphones and tablets by field-based cadre, or shortfalls in their training and capacity building. Paper-based data collection has been argued to be more appropriate in several contexts, with automated digitization of the paper forms through OCR (Optical Character Recognition) and OMR (Optical Mark Recognition) techniques. We contribute with providing a large dataset of handwritten digits, and deep learning based models and methods built using this data, that are effective in real-world environments. We demonstrate the deployment of these tools in the context of a maternal and child health and nutrition awareness project, which uses IVR (Interactive Voice Response) systems to provide awareness information to rural women SHG (Self Help Group) members in north India. Paper forms were used to collect phone numbers of the SHG members at scale, which were digitized using the OCR tools developed by us, and used to push almost 4 million phone calls. The data, model, and code have been released in the open-source domain.
- North America > United States > Washington > King County > Seattle (0.05)
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- (10 more...)
- Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.88)
PUBLICSPEAK: Hearing the Public with a Probabilistic Framework in Local Government
Xu, Tianliang, Brown, Eva Maxfield, Dwyer, Dustin, Tomkins, Sabina
Local governments around the world are making consequential decisions on behalf of their constituents, and these constituents are responding with requests, advice, and assessments of their officials at public meetings. So many small meetings cannot be covered by traditional newsrooms at scale. We propose PUBLICSPEAK, a probabilistic framework which can utilize meeting structure, domain knowledge, and linguistic information to discover public remarks in local government meetings. We then use our approach to inspect the issues raised by constituents in 7 cities across the United States. We evaluate our approach on a novel dataset of local government meetings and find that PUBLICSPEAK improves over state-of-the-art by 10% on average, and by up to 40%.
- North America > United States > Washington > King County > Seattle (0.28)
- North America > United States > California > Alameda County > Oakland (0.14)
- Asia > Middle East > Palestine (0.14)
- (3 more...)
- Media > News (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Government (1.00)
- Information Technology > Data Science (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Her First Date Felt Off, So She Investigated. What She Found Was Horrifying.
Samantha posted her story on TikTok and shared the scenario on a private Facebook group; many women responded--including her date's wife. Ultimately, as a result of this conversation, Samantha decided to report his profile to Hinge. The next day, the company contacted her to let her know it would be deleting his profile. Mandy and Samantha were pleased with Bumble's and Hinge's swift action to take down the profiles of the men they had matched with--but the experience was indelible. Neither of them plans to use dating apps again.
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law > Criminal Law (0.69)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.48)
GIFT-Eval: A Benchmark For General Time Series Forecasting Model Evaluation
Aksu, Taha, Woo, Gerald, Liu, Juncheng, Liu, Xu, Liu, Chenghao, Savarese, Silvio, Xiong, Caiming, Sahoo, Doyen
Time series foundation models excel in zero-shot forecasting, handling diverse tasks without explicit training. However, the advancement of these models has been hindered by the lack of comprehensive benchmarks. To address this gap, we introduce the General TIme Series ForecasTing Model Evaluation, GIFT-Eval, a pioneering benchmark aimed at promoting evaluation across diverse datasets. GIFT-Eval encompasses 23 datasets over 144,000 time series and 177 million data points, spanning seven domains, 10 frequencies, multivariate inputs, and prediction lengths ranging from short to long-term forecasts. To facilitate the effective pretraining and evaluation of foundation models, we also provide a non-leaking pretraining dataset containing approximately 230 billion data points. Additionally, we provide a comprehensive analysis of 17 baselines, which includes statistical models, deep learning models, and foundation models. We discuss each model in the context of various benchmark characteristics and offer a qualitative analysis that spans both deep learning and foundation models. We believe the insights from this analysis, along with access to this new standard zero-shot time series forecasting benchmark, will guide future developments in time series foundation models. The success of foundation model pretraining in language and vision modalities has catalyzed similar progress in time series forecasting. By pretraining on extensive time series datasets, a universal forecasting model can be developed, equipped to address varied downstream forecasting tasks across multiple domains, frequencies, prediction lengths, and number of variates in a zero-shot manner (Woo et al., 2024; Rasul et al., 2023; Ansari et al., 2024). A critical aspect of foundation model research is creating a high-quality benchmark that includes large, diverse evaluation data, and preferably non-leaking pretraining data to fairly evaluate models and identify their weaknesses. Research in Natural Language Processing (NLP) has produced key benchmarks such as GLUE, MMLU, etc. (Wang et al., 2018; Hendrycks et al., 2020; Srivastava et al., 2022; Chen et al., 2021), which are crucial for developing high-quality models. Unlike NLP, time series foundation models lack a unified, diverse benchmark for fair comparison. For instance, Woo et al. (2024) introduces LOTSA, which remains the largest collection of time series forecasting pre-training data to date. However, the proposed architecture, Moirai, is evaluated on existing benchmarks that are tailored to specific forecasting tasks, such as the LSF (Zhou et al., 2020) dataset for long-term forecast, and the Monash (Godahewa et al., 2021) dataset for univariate forecasts.
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (4 more...)
- Energy > Renewable (0.67)
- Energy > Power Industry (0.45)
Tackling extreme urban heat: a machine learning approach to assess the impacts of climate change and the efficacy of climate adaptation strategies in urban microclimates
Buster, Grant, Cox, Jordan, Benton, Brandon N., King, Ryan N.
As urbanization and climate change progress, urban heat becomes a priority for climate adaptation efforts. High temperatures concentrated in urban heat can drive increased risk of heat-related death and illness as well as increased energy demand for cooling. However, estimating the effects of urban heat is an ongoing field of research typically burdened by an imprecise description of the built environment, significant computational cost, and a lack of high-resolution estimates of the impacts of climate change. Here, we present open-source, computationally efficient machine learning methods that can improve the accuracy of urban temperature estimates when compared to historical reanalysis data. These models are applied to residential buildings in Los Angeles, and we compare the energy benefits of heat mitigation strategies to the impacts of climate change. We find that cooling demand is likely to increase substantially through midcentury, but engineered high-albedo surfaces could lessen this increase by more than 50%. The corresponding increase in heating demand complicates this narrative, but total annual energy use from combined heating and cooling with electric heat pumps in the Los Angeles urban climate is shown to benefit from the engineered cooling strategies under both current and future climates.
- North America > United States > California > Los Angeles County > Los Angeles > Hollywood > West Hollywood (0.04)
- North America > United States > California > Los Angeles County > Los Angeles > Hollywood Hills (0.04)
- Pacific Ocean > North Pacific Ocean > Puget Sound (0.04)
- (5 more...)
- Government > Regional Government > North America Government > United States Government (1.00)
- Energy > Renewable (1.00)
- Construction & Engineering > HVAC (0.88)
FairHome: A Fair Housing and Fair Lending Dataset
Bagalkotkar, Anusha, Karmakar, Aveek, Arnson, Gabriel, Linda, Ondrej
We present a Fair Housing and Fair Lending dataset (FairHome): A dataset with around 75,000 examples across 9 protected categories. To the best of our knowledge, FairHome is the first publicly available dataset labeled with binary labels for compliance risk in the housing domain. We demonstrate the usefulness and effectiveness of such a dataset by training a classifier and using it to detect potential violations when using a large language model (LLM) in the context of real-estate transactions. We benchmark the trained classifier against state-of-the-art LLMs including GPT-3.5, GPT-4, LLaMA-3, and Mistral Large in both zero-shot and fewshot contexts. Our classifier outperformed with an F1-score of 0.91, underscoring the effectiveness of our dataset. WARNING: Some of the examples included in the paper are not polite, in so far as they reveal bias that might feel discriminatory to the readers.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Washington > King County > Kirkland (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (2 more...)
- Law (1.00)
- Banking & Finance > Real Estate (1.00)
- Government > Regional Government > North America Government > United States Government (0.93)
Urban context and delivery performance: Modelling service time for cargo bikes and vans across diverse urban environments
Schrader, Maxwell, Kumar, Navish, Sørig, Esben, Yoon, Soonmyeong, Srivastava, Akash, Xu, Kai, Astefanoaei, Maria, Collignon, Nicolas
Light goods vehicles (LGV) used extensively in the last mile of delivery are one of the leading polluters in cities. Cargo-bike logistics and Light Electric Vehicles (LEVs) have been put forward as a high impact candidate for replacing LGVs. Studies have estimated over half of urban van deliveries being replaceable by cargo-bikes, due to their faster speeds, shorter parking times and more efficient routes across cities. However, the logistics sector suffers from a lack of publicly available data, particularly pertaining to cargo-bike deliveries, thus limiting the understanding of their potential benefits. Specifically, service time (which includes cruising for parking, and walking to destination) is a major, but often overlooked component of delivery time modelling. The aim of this study is to establish a framework for measuring the performance of delivery vehicles, with an initial focus on modelling service times of vans and cargo-bikes across diverse urban environments. We introduce two datasets that allow for in-depth analysis and modelling of service times of cargo bikes and use existing datasets to reason about differences in delivery performance across vehicle types. We introduce a modelling framework to predict the service times of deliveries based on urban context. We employ Uber's H3 index to divide cities into hexagonal cells and aggregate OpenStreetMap tags for each cell, providing a detailed assessment of urban context. Leveraging this spatial grid, we use GeoVex to represent micro-regions as points in a continuous vector space, which then serve as input for predicting vehicle service times. We show that geospatial embeddings can effectively capture urban contexts and facilitate generalizations to new contexts and cities. Our methodology addresses the challenge of limited comparative data available for different vehicle types within the same urban settings.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > United Kingdom > England > Greater London > London (0.14)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.14)
- (17 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.88)
- Transportation > Ground > Road (1.00)
- Transportation > Freight & Logistics Services (1.00)