Bali
- North America > United States (0.29)
- Asia > Indonesia > Bali (0.04)
- North America > Canada (0.04)
- Africa > South Africa (0.04)
- Government (0.69)
- Law (0.46)
WildfireSpreadTS: A dataset of multi-modal time series for wildfire spread prediction
We present a multi-temporal, multi-modal remote-sensing dataset for predicting how active wildfires will spread at a resolution of 24 hours. The dataset consists of 13 607 images across 607 fire events in the United States from January 2018 to October 2021. For each fire event, the dataset contains a full time series of daily observations, containing detected active fires and variables related to fuel, topography and weather conditions. The dataset is challenging due to: a) its inputs being multi-temporal, b) the high number of 23 multi-modal input channels, c) highly imbalanced labels and d) noisy labels, due to smoke, clouds, and inaccuracies in the active fire detection.
- Asia > Indonesia > Bali (0.04)
- North America > United States > Utah > Weber County > Ogden (0.04)
- North America > United States > Rocky Mountains (0.04)
- (7 more...)
Language Model Tokenizers Introduce Unfairness Between Languages
Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tok-enization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.
- North America > Haiti (0.14)
- Asia > Philippines > Luzon > Ilocos Region > Province of Pangasinan (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (38 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
- North America > United States > Massachusetts (0.04)
- North America > United States > Indiana (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (6 more...)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > Dominican Republic (0.04)
- (5 more...)
- Law (0.67)
- Information Technology (0.67)
- Government (0.46)
Unsupervised Text Segmentation via Kernel Change-Point Detection on Sentence Embeddings
Jia, Mumin, Diaz-Rodriguez, Jairo
Unsupervised text segmentation is crucial because boundary labels are expensive, subjective, and often fail to transfer across domains and granularity choices. We propose Embed-KCPD, a training-free method that represents sentences as embedding vectors and estimates boundaries by minimizing a penalized KCPD objective. Beyond the algorithmic instantiation, we develop, to our knowledge, the first dependence-aware theory for KCPD under $m$-dependent sequences, a finite-memory abstraction of short-range dependence common in language. We prove an oracle inequality for the population penalized risk and a localization guarantee showing that each true change point is recovered within a window that is small relative to segment length. To connect theory to practice, we introduce an LLM-based simulation framework that generates synthetic documents with controlled finite-memory dependence and known boundaries, validating the predicted scaling behavior. Across standard segmentation benchmarks, Embed-KCPD often outperforms strong unsupervised baselines. A case study on Taylor Swift's tweets illustrates that Embed-KCPD combines strong theoretical guarantees, simulated reliability, and practical effectiveness for text segmentation.
- North America > United States > New Mexico > Doña Ana County > Las Cruces (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (12 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- (2 more...)
Towards Latent Diffusion Suitable For Text
Midavaine, Nesta, Naesseth, Christian A., Bartosh, Grigory
Language diffusion models aim to improve sampling speed and coherence over autoregressive LLMs. We introduce Neural Flow Diffusion Models for language generation, an extension of NFDM that enables the straightforward application of continuous diffusion models to discrete state spaces. NFDM learns a multivariate forward process from the data, ensuring that the forward process and generative trajectory are a good fit for language modeling. Our model substantially reduces the likelihood gap with autoregressive models of the same size, while achieving sample quality comparable to that of previous latent diffusion models.
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
- (2 more...)
Pigs have been island hopping for 50,000 years
With human help, the mammals can defy'the world's most fundamental natural boundaries.' Breakthroughs, discoveries, and DIY tips sent every weekday. Despite not exactly being world-renowned swimmers, pigs have spread across the Asia-Pacific region for thousands of years . With the genetic and archeological data from over 700 pigs, a team of scientists documented how people helped the mammals make their way across thousands of miles. "This research reveals what happens when people transport animals enormous distances, across one of the world's most fundamental natural boundaries," evolutionary geneticist and study co-author author Dr. David Stanton of the University of Cardiff and Queen Mary University of London said in a statement. "These movements led to pigs with a melting pot of ancestries. These patterns were technically very difficult to disentangle, but have ultimately helped us understand how and why animals came to be distributed across the Pacific islands."
- Asia > Southeast Asia (0.06)
- Oceania > Vanuatu (0.05)
- South America > Brazil (0.05)
- (14 more...)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- Asia > Vietnam > Long An Province > Tân An (0.04)
- Asia > Indonesia > Bali (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)