Goto

Collaborating Authors

 availability





959ab9a0695c467e7caf75431a872e5c-Paper.pdf

Neural Information Processing Systems

The data-driven nature of modern machine learning (ML) training routines puts pressure on data supply pipelines, which become increasingly more complex. It is common to find separate disks or whole content distribution networks dedicated to servicing massive datasets. Training is often distributed across multiple workers. This emergent complexity gives a perfect opportunity for an attackertodisrupt ML training, while remaining covert.



8 Best Plant-Based Meal Delivery Services and Kits (2025), Tested, Tasted, and Reviewed

WIRED

These plant-based meal kits and delivery services bring healthy preprepared meals and meal kits to your door. Plant-Based meal kit services are a modern miracle for vegetarians and vegans, who usually aren't afforded the same conveniences as meat eaters or those without dietary restrictions. We at WIRED love meal kits, because they're all about modern convenience--you can eat what you want, even if you're on a specialty diet or have strong food preferences, without ever leaving your house. Gone are the days of grocery shopping and scouring online for recipes; these contemporary plant-based meal kit services do the heavy lifting for you using curated menus and algorithms, with choices for both premade microwavable meals and kits where you do the cooking yourself. Some plant-based meal kit services, like Hungryroot, use AI customization to curate menus based on your specific tastes. Others, like Daily Harvest, have a set selection of choices so you can always keep your freezer stocked with plant-based, gluten-free meals to have on hand. I'm vegan, so I know how difficult it can be to find new recipes that will actually taste good without breaking the bank. Plus, plant-based meal kits are a great way to try out new foods and recipes, especially if you're looking to switch to a healthier diet in the new year.


BuckTales: A multi-UAV dataset for multi-object tracking and re-identification of wild antelopes

Neural Information Processing Systems

Understanding animal behaviour is central to predicting, understanding, and miti-gating impacts of natural and anthropogenic changes on animal populations andecosystems. However, the challenges of acquiring and processing long-term, eco-logically relevant data in wild settings have constrained the scope of behaviouralresearch. The increasing availability of Unmanned Aerial Vehicles (UAVs), cou-pled with advances in machine learning, has opened new opportunities for wildlifemonitoring using aerial tracking. However, the limited availability of datasets with wildanimals in natural habitats has hindered progress in automated computer visionsolutions for long-term animal tracking. Here, we introduce the first large-scaleUAV dataset designed to solve multi-object tracking (MOT) and re-identification(Re-ID) problem in wild animals, specifically the mating behaviour (or lekking) ofblackbuck antelopes. Collected in collaboration with biologists, the MOT datasetincludes over 1.2 million annotations including 680 tracks across 12 high-resolution(5.4K)


SynMob: Creating High-Fidelity Synthetic GPS Trajectory Dataset for Urban Mobility Analysis

Neural Information Processing Systems

Urban mobility analysis has been extensively studied in the past decade using a vast amount of GPS trajectory data, which reveals hidden patterns in movement and human activity within urban landscapes. Despite its significant value, the availability of such datasets often faces limitations due to privacy concerns, proprietary barriers, and quality inconsistencies. To address these challenges, this paper presents a synthetic trajectory dataset with high fidelity, offering a general solution to these data accessibility issues. Specifically, the proposed dataset adopts a diffusion model as its synthesizer, with the primary aim of accurately emulating the spatial-temporal behavior of the original trajectory data. These synthesized data can retain the geo-distribution and statistical properties characteristic of real-world datasets. Through rigorous analysis and case studies, we validate the high similarity and utility between the proposed synthetic trajectory dataset and real-world counterparts. Such validation underscores the practicality of synthetic datasets for urban mobility analysis and advocates for its wider acceptance within the research community. Finally, we publicly release the trajectory synthesizer and datasets, aiming to enhance the quality and availability of synthetic trajectory datasets and encourage continued contributions to this rapidly evolving field.


Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data

Neural Information Processing Systems

We develop and analyze algorithms for instrumental variable regression by viewing the problem as a conditional stochastic optimization problem. In the context of least-squares instrumental variable regression, our algorithms neither require matrix inversions nor mini-batches thereby providing a fully online approach for performing instrumental variable regression with streaming data. When the true model is linear, we derive rates of convergence in expectation, that are of order $\mathcal{O}(\log T/T)$ and $\mathcal{O}(1/T^{1-\epsilon})$ for any $\epsilon> 0$, respectively under the availability of two-sample and one-sample oracles respectively. Importantly, under the availability of the two-sample oracle, the aforementioned rate is actually agnostic to the relationship between confounder and the instrumental variable demonstrating the flexibility of the proposed approach in alleviating the need for explicit model assumptions required in recent works based on reformulating the problem as min-max optimization problems. Experimental validation is provided to demonstrate the advantages of the proposed algorithms over classical approaches like the 2SLS method.


Active Learning with LLMs for Partially Observed and Cost-Aware Scenarios

Neural Information Processing Systems

Conducting experiments and gathering data for machine learning models is a complex and expensive endeavor, particularly when confronted with limited information. Typically, extensive _experiments_ to obtain features and labels come with a significant acquisition cost, making it impractical to carry out all of them. Therefore, it becomes crucial to strategically determine what to acquire to maximize the predictive performance while minimizing costs. To perform this task, existing data acquisition methods assume the availability of an initial dataset that is both fully-observed and labeled, crucially overlooking the **partial observability** of features characteristic of many real-world scenarios. In response to this challenge, we present Partially Observable Cost-Aware Active-Learning (POCA), a new learning approach aimed at improving model generalization in data-scarce and data-costly scenarios through label and/or feature acquisition. Introducing $\mu$POCA as an instantiation, we maximise the uncertainty reduction in the predictive model when obtaining labels and features, considering associated costs.