Goto

Collaborating Authors

 cloud coverage



Supplementary Material for " AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery "

Neural Information Processing Systems

In Sec. 2 we include a We include a datasheet for our dataset following the methodology from "Datasheets for Datasets" Ge-17 In this section, we include the prompts from Gebru et al. [2021] in blue, and in For what purpose was the dataset created? Was there a specific task in mind? The dataset was created to facilitate research development on cloud removal in satellite imagery. Specifically, our task is more temporally aligned than previous benchmarks. Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., Who funded the creation of the dataset?



Supplementary Material for " AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery "

Neural Information Processing Systems

In Sec. 2 we include a We include a datasheet for our dataset following the methodology from "Datasheets for Datasets" Ge-17 In this section, we include the prompts from Gebru et al. [2021] in blue, and in For what purpose was the dataset created? Was there a specific task in mind? The dataset was created to facilitate research development on cloud removal in satellite imagery. Specifically, our task is more temporally aligned than previous benchmarks. Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., Who funded the creation of the dataset?


What exactly is the UV Index? A dermatologist explains.

Popular Science

What exactly is the UV Index? The measurement has nothing to do with how hot it is outside. Wearing sunscreen all year round can help protect your skin from harmful UV rays. Breakthroughs, discoveries, and DIY tips sent every weekday. As of 2:19 p.m. EST today, it is officially autumn in the Northern Hemisphere .


AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery

Zhou, Hangyu, Kao, Chia-Hsiang, Phoo, Cheng Perng, Mall, Utkarsh, Hariharan, Bharath, Bala, Kavita

arXiv.org Artificial Intelligence

Clouds in satellite imagery pose a significant challenge for downstream applications. A major challenge in current cloud removal research is the absence of a comprehensive benchmark and a sufficiently large and diverse training dataset. To address this problem, we introduce the largest public dataset -- $\textit{AllClear}$ for cloud removal, featuring 23,742 globally distributed regions of interest (ROIs) with diverse land-use patterns, comprising 4 million images in total. Each ROI includes complete temporal captures from the year 2022, with (1) multi-spectral optical imagery from Sentinel-2 and Landsat 8/9, (2) synthetic aperture radar (SAR) imagery from Sentinel-1, and (3) auxiliary remote sensing products such as cloud masks and land cover maps. We validate the effectiveness of our dataset by benchmarking performance, demonstrating the scaling law -- the PSNR rises from $28.47$ to $33.87$ with $30\times$ more data, and conducting ablation studies on the temporal length and the importance of individual modalities. This dataset aims to provide comprehensive coverage of the Earth's surface and promote better cloud removal results.


Super Resolution On Global Weather Forecasts

Zhang, Lawrence, Yang, Adam, Amor, Rodz Andrie, Zhang, Bryan, Rao, Dhruv

arXiv.org Artificial Intelligence

Weather forecasting is a vitally important tool for tasks ranging from planning day to day activities to disaster response planning. However, modeling weather has proven to be challenging task due to its chaotic and unpredictable nature. Each variable, from temperature to precipitation to wind, all influence the path the environment will take. As a result, all models tend to rapidly lose accuracy as the temporal range of their forecasts increase. Classical forecasting methods use a myriad of physics-based, numerical, and stochastic techniques to predict the change in weather variables over time. However, such forecasts often require a very large amount of data and are extremely computationally expensive. Furthermore, as climate and global weather patterns change, classical models are substantially more difficult and time-consuming to update for changing environments. Fortunately, with recent advances in deep learning and publicly available high quality weather datasets, deploying learning methods for estimating these complex systems has become feasible. The current state-of-the-art deep learning models have comparable accuracy to the industry standard numerical models and are becoming more ubiquitous in practice due to their adaptability. Our group seeks to improve upon existing deep learning based forecasting methods by increasing spatial resolutions of global weather predictions. Specifically, we are interested in performing super resolution (SR) on GraphCast temperature predictions by increasing the global precision from 1 degree of accuracy to 0.5 degrees, which is approximately 111km and 55km respectively.


No, You're Not Alone. Google Is Also Making This Big Mistake On AI

#artificialintelligence

Just this past month, an article was shared that showed that over 30% of the data used by Google for one of their shared machine learning models was mislabeled with the wrong data. Not only was the model itself full of errors, but the actual training data used by that model itself was full of mistakes. How could anyone using Google's model ever hope to trust the results if it's full of human-induced errors that computers can't fix. And Google isn't alone with major data mislabeling, an MIT study in 2021 found that almost 6% of the images in the industry-standard ImageNet database are mislabeled, and furthermore, found "label errors in the test sets of 10 of the most commonly-used computer vision, natural language, and audio datasets". How can we hope to trust or use these models if the data used to train those models is so bad?


The Death of Data Scientists – will AutoML replace them? - KDnuggets

#artificialintelligence

One cannot introduce AutoML without mentioning the machine learning project's life cycle, which includes data cleaning, feature selection/engineering, model selection, parameter optimization, and finally, model validation. As advanced as technology has become, the traditional data science project still incorporates a lot of manual processes and remains time-consuming and repetitive. AutoML came into the picture to automate the entire process from data cleaning to parameter optimization. It provides tremendous value for machine learning projects in terms of both time savings and performance. Launched in 2018, Google Cloud AutoML quickly gained popularity with its user-friendly interface and high performance. The chart below is a demonstration of Google's performance (blue bars) comparing to other AutoML platforms.


Satellite images and machine learning can identify remote communities to facilitate access to health services

#artificialintelligence

Community health systems operating in remote areas require accurate information about where people live to efficiently provide services across large regions. We sought to determine whether a machine learning analyses of satellite imagery can be used to map remote communities to facilitate service delivery and planning. We developed a method for mapping communities using a deep learning approach that excels at detecting objects within images. We trained an algorithm to detect individual buildings, then examined building clusters to identify groupings suggestive of communities. The approach was validated in southeastern Liberia, by comparing algorithmically generated results with community location data collected manually by enumerators and community health workers. The deep learning approach achieved 86.47% positive predictive value and 79.49% sensitivity with respect to individual building detection. The approach identified 75.67% (n 451) of communities registered through the community enumeration process, and identified an additional 167 potential communities not previously registered. Several instances of false positives and false negatives were identified.