Goto

Collaborating Authors

 baltimore


TripTide: A Benchmark for Adaptive Travel Planning under Disruptions

Karmakar, Priyanshu, Chaudhuri, Soumyabrata, Mallick, Shubhojit, Gupta, Manish, Jana, Abhik, Ghosh, Shreya

arXiv.org Artificial Intelligence

Recent efforts like TripCraft and TravelPlanner have advanced the use of Large Language Models ( LLMs) for personalized, constraint aware travel itinerary generation. Yet, real travel often faces disruptions. To address this, we present TripTide, the first benchmark evaluating LLM's ability to revise itineraries under realistic disruptions. TripTide models key dimensions such as disruption severity and traveler tolerance, enabling nuanced assessment of LLM adaptability to events like flight cancellations, weather closures, or overbooked attractions. We conduct a threefold evaluation. First, we introduce automatic metrics including Preservation of Intent (how well the revised plan maintains feasibility and goals), Responsiveness (promptness and appropriateness of disruption handling), and Adaptability (semantic, spatial, and sequential divergence between original and revised plans). Second, we apply an LLM-as-a-judge approach to automatically assess revision quality. Third, we perform manual expert evaluation to verify whether revisions preserve semantic, spatial, sequential, and responsive aspects. Our experiments show that LLMs maintain strong sequential consistency and semantic stability, while spatial deviations are larger for shorter trips but decrease with longer ones, indicating that extended plans encourage better geographic coherence. However, disruption-handling ability declines as plan length increases, highlighting limits in LLM robustness. TripTide establishes a benchmark for evaluating adaptability, personalization, and resilience in LLM-based travel planning under real-world uncertainty.



Using Machine Learning in Analyzing Air Quality Discrepancies of Environmental Impact

Wang, Shuangbao Paul, Yang, Lucas, Chouchane, Rahouane, Guo, Jin, Bailey, Michael

arXiv.org Artificial Intelligence

In this study, we apply machine learning and software engineering in analyzing air pollution levels in City of Baltimore. The data model was fed with three primary data sources: 1) a biased method of estimating insurance risk used by homeowners loan corporation, 2) demographics of Baltimore residents, and 3) census data estimate of NO2 and PM2.5 concentrations. The dataset covers 650,643 Baltimore residents in 44.7 million residents in 202 major cities in US. The results show that air pollution levels have a clear association with the biased insurance estimating method. Great disparities present in NO2 level between more desirable and low income blocks. Similar disparities exist in air pollution level between residents' ethnicity. As Baltimore population consists of a greater proportion of people of color, the finding reveals how decades old policies has continued to discriminate and affect quality of life of Baltimore citizens today.


Can a methadone-dispensing robot free up nurses and improve patient care?

The Guardian

Lanea George pulls open a steel security door and enters a windowless room where a video camera stares at what looks like a commercial-grade refrigerator. The machine, dubbed Bodhi, whirrs and spins before spitting out seven small plastic bottles containing precisely 70ml of methadone, a bright pink liquid resembling cherry cough syrup. It is used as a substitute for morphine or heroin in addiction treatment. She scoops the bottles off the tray, bundles them with a rubber band and sets them on a shelf. It's not yet 10am and George, the nurse manager at Man Alive, an opioid treatment program – known colloquially as a methadone clinic – in Baltimore, has already finished prepping the doses for the 100 or so patients who will arrive the next day.


I stopped using Alexa long ago. Here are 6 ways Alexa could lure me back

PCWorld

Writing about smart home technology, smart devices, and voice assistants is my job. Yet, I don't remember the last time I actually spoke with Alexa. Just to be clear, I don't mean to pick on Alexa per se. I rarely speak to Google Assistant or Apple's Siri, either. It's way easier to haul out my phone and use an app than it is to get a supposedly "smart" voice assistant to do what I want.


TripCraft: A Benchmark for Spatio-Temporally Fine Grained Travel Planning

Chaudhuri, Soumyabrata, Purkar, Pranav, Raghav, Ritwik, Mallick, Shubhojit, Gupta, Manish, Jana, Abhik, Ghosh, Shreya

arXiv.org Artificial Intelligence

Recent advancements in probing Large Language Models (LLMs) have explored their latent potential as personalized travel planning agents, yet existing benchmarks remain limited in real world applicability. Existing datasets, such as TravelPlanner and TravelPlanner+, suffer from semi synthetic data reliance, spatial inconsistencies, and a lack of key travel constraints, making them inadequate for practical itinerary generation. To address these gaps, we introduce TripCraft, a spatiotemporally coherent travel planning dataset that integrates real world constraints, including public transit schedules, event availability, diverse attraction categories, and user personas for enhanced personalization. To evaluate LLM generated plans beyond existing binary validation methods, we propose five continuous evaluation metrics, namely Temporal Meal Score, Temporal Attraction Score, Spatial Score, Ordering Score, and Persona Score which assess itinerary quality across multiple dimensions. Our parameter informed setting significantly enhances meal scheduling, improving the Temporal Meal Score from 61% to 80% in a 7 day scenario. TripCraft establishes a new benchmark for LLM driven personalized travel planning, offering a more realistic, constraint aware framework for itinerary generation. Dataset and Codebase will be made publicly available upon acceptance.


OPTIC: Optimizing Patient-Provider Triaging & Improving Communications in Clinical Operations using GPT-4 Data Labeling and Model Distillation

Santamaria-Pang, Alberto, Tuan, Frank, Campbell, Ross, Zhang, Cindy, Jindal, Ankush, Surapur, Roopa, Holloman, Brad, Hanisch, Deanna, Buckley, Rae, Cooney, Carisa, Tarapov, Ivan, Peairs, Kimberly S., Hasselfeld, Brian, Greene, Peter

arXiv.org Artificial Intelligence

The COVID-19 pandemic has accelerated the adoption of telemedicine and patient messaging through electronic medical portals (patient medical advice requests, or PMARs). While these platforms enhance patient access to healthcare, they have also increased the burden on healthcare providers due to the surge in PMARs. This study seeks to develop an efficient tool for message triaging to reduce physician workload and improve patient-provider communication. We developed OPTIC (Optimizing Patient-Provider Triaging & Improving Communications in Clinical Operations), a powerful message triaging tool that utilizes GPT-4 for data labeling and BERT for model distillation. The study used a dataset of 405,487 patient messaging encounters from Johns Hopkins Medicine between January and June 2020. High-quality labeled data was generated through GPT-4-based prompt engineering, which was then used to train a BERT model to classify messages as "Admin" or "Clinical." The BERT model achieved 88.85% accuracy on the test set validated by GPT-4 labeling, with a sensitivity of 88.29%, specificity of 89.38%, and an F1 score of 0.8842. BERTopic analysis identified 81 distinct topics within the test data, with over 80% accuracy in classifying 58 topics. The system was successfully deployed through Epic's Nebula Cloud Platform, demonstrating its practical effectiveness in healthcare settings.


It pays to be pretty! Attractive people earn up to 11% MORE than their ugly colleagues, study finds

Daily Mail - Science & tech

Whether it's taking on more responsibilities or staying late in the office, many employees will go above and beyond to try to get a pay rise. But now a study suggests that if you're not good looking, your efforts may be futile. Researchers from the Institute for Operations Research and the Management Sciences in Baltimore have uncovered a'striking' link between physical attractiveness and career success. In their study, the team analysed the careers of more than 40,000 graduates who had completed MBAs. They found attractive respondents earned up to 11 per cent more than their colleagues who were seen as less good looking.


Guide-to-Explain for Controllable Summarization

Ryu, Sangwon, Do, Heejin, Kim, Daehee, Kim, Yunsu, Lee, Gary Geunbae, Ok, Jungseul

arXiv.org Artificial Intelligence

Recently, large language models (LLMs) have demonstrated remarkable performance in abstractive summarization tasks. However, controllable summarization with LLMs remains underexplored, limiting their ability to generate summaries that align with specific user preferences. In this paper, we first investigate the capability of LLMs to control diverse attributes, revealing that they encounter greater challenges with numerical attributes, such as length and extractiveness, compared to linguistic attributes. To address this challenge, we propose a guide-to-explain framework (GTE) for controllable summarization. Our GTE framework enables the model to identify misaligned attributes in the initial draft and guides it in explaining errors in the previous output. Based on this reflection, the model generates a well-adjusted summary. As a result, by allowing the model to reflect on its misalignment, we generate summaries that satisfy the desired attributes in surprisingly fewer iterations than other iterative methods solely using LLMs.


Zero Inflation as a Missing Data Problem: a Proxy-based Approach

Phung, Trung, Lee, Jaron J. R., Oladapo-Shittu, Opeyemi, Klein, Eili Y., Gurses, Ayse Pinar, Hannum, Susan M., Weems, Kimberly, Marsteller, Jill A., Cosgrove, Sara E., Keller, Sara C., Shpitser, Ilya

arXiv.org Artificial Intelligence

A common type of zero-inflated data has certain true values incorrectly replaced by zeros due to data recording conventions (rare outcomes assumed to be absent) or details of data recording equipment (e.g. artificial zeros in gene expression data). Existing methods for zero-inflated data either fit the observed data likelihood via parametric mixture models that explicitly represent excess zeros, or aim to replace excess zeros by imputed values. If the goal of the analysis relies on knowing true data realizations, a particular challenge with zero-inflated data is identifiability, since it is difficult to correctly determine which observed zeros are real and which are inflated. This paper views zero-inflated data as a general type of missing data problem, where the observability indicator for a potentially censored variable is itself unobserved whenever a zero is recorded. We show that, without additional assumptions, target parameters involving a zero-inflated variable are not identified. However, if a proxy of the missingness indicator is observed, a modification of the effect restoration approach of Kuroki and Pearl allows identification and estimation, given the proxy-indicator relationship is known. If this relationship is unknown, our approach yields a partial identification strategy for sensitivity analysis. Specifically, we show that only certain proxy-indicator relationships are compatible with the observed data distribution. We give an analytic bound for this relationship in cases with a categorical outcome, which is sharp in certain models. For more complex cases, sharp numerical bounds may be computed using methods in Duarte et al.[2023]. We illustrate our method via simulation studies and a data application on central line-associated bloodstream infections (CLABSIs).