AITopics | Dao, David

Collaborating Authors

Dao, David

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OAM-TCD: A globally diverse dataset of high-resolution tree cover maps

Veitch-Michaelis, Josh, Cottam, Andrew, Schweizer, Daniella, Broadbent, Eben N., Dao, David, Zhang, Ce, Zambrano, Angelica Almeyda, Max, Simeon

arXiv.org Artificial IntelligenceJul-16-2024

Accurately quantifying tree cover is an important metric for ecosystem monitoring and for assessing progress in restored sites. Recent works have shown that deep learning-based segmentation algorithms are capable of accurately mapping trees at country and continental scales using high-resolution aerial and satellite imagery. Mapping at high (ideally sub-meter) resolution is necessary to identify individual trees, however there are few open-access datasets containing instance level annotations and those that exist are small or not geographically diverse. We present a novel open-access dataset for individual tree crown delineation (TCD) in high-resolution aerial imagery sourced from OpenAerialMap (OAM). Our dataset, OAM-TCD, comprises 5072 2048x2048 px images at 10 cm/px resolution with associated human-labeled instance masks for over 280k individual and 56k groups of trees. By sampling imagery from around the world, we are able to better capture the diversity and morphology of trees in different terrestrial biomes and in both urban and natural environments. Using our dataset, we train reference instance and semantic segmentation models that compare favorably to existing state-of-the-art models. We assess performance through k-fold cross-validation and comparison with existing datasets; additionally we demonstrate compelling results on independent aerial imagery captured over Switzerland and compare to municipal tree inventories and LIDAR-derived canopy maps in the city of Zurich. Our dataset, models and training/benchmark code are publicly released under permissive open-source licenses: Creative Commons (majority CC BY 4.0), and Apache 2.0 respectively.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2407.11743

Country:

North America > United States (0.67)
Europe > Switzerland > Zürich > Zürich (0.25)

Genre: Research Report > Promising Solution (0.34)

Industry:

Information Technology (1.00)
Government > Regional Government (0.46)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.88)

Add feedback

GEO-Bench: Toward Foundation Models for Earth Monitoring

Lacoste, Alexandre, Lehmann, Nils, Rodriguez, Pau, Sherwin, Evan David, Kerner, Hannah, Lütjens, Björn, Irvin, Jeremy Andrew, Dao, David, Alemohammad, Hamed, Drouin, Alexandre, Gunturkun, Mehmet, Huang, Gabriel, Vazquez, David, Newman, Dava, Bengio, Yoshua, Ermon, Stefano, Zhu, Xiao Xiang

arXiv.org Artificial IntelligenceDec-23-2023

Recent progress in self-supervision has shown that pre-training large neural networks on vast amounts of unsupervised data can lead to substantial increases in generalization to downstream tasks. Such models, recently coined foundation models, have been transformational to the field of natural language processing. Variants have also been proposed for image data, but their applicability to remote sensing tasks is limited. To stimulate the development of foundation models for Earth monitoring, we propose a benchmark comprised of six classification and six segmentation tasks, which were carefully curated and adapted to be both relevant to the field and well-suited for model evaluation. We accompany this benchmark with a robust methodology for evaluating models and reporting aggregated results to enable a reliable assessment of progress. Finally, we report results for 20 baselines to gain information about the performance of existing models. We believe that this benchmark will be a driver of progress across a variety of Earth monitoring tasks.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2306.03831

Country:

Europe (0.67)
North America > United States > Virginia (0.28)

Genre: Research Report (1.00)

Industry:

Energy > Oil & Gas (1.00)
Education (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.37)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms

Jia, Ruoxi, Dao, David, Wang, Boxin, Hubis, Frances Ann, Gurel, Nezihe Merve, Li, Bo, Zhang, Ce, Spanos, Costas J., Song, Dawn

arXiv.org Machine LearningAug-22-2019

Given a data set $\mathcal{D}$ containing millions of data points and a data consumer who is willing to pay for \$$X$ to train a machine learning (ML) model over $\mathcal{D}$, how should we distribute this \$$X$ to each data point to reflect its "value"? In this paper, we define the "relative value of data" via the Shapley value, as it uniquely possesses properties with appealing real-world interpretations, such as fairness, rationality and decentralizability. For general, bounded utility functions, the Shapley value is known to be challenging to compute: to get Shapley values for all $N$ data points, it requires $O(2^N)$ model evaluations for exact computation and $O(N\log N)$ for $(\epsilon, \delta)$-approximation. In this paper, we focus on one popular family of ML models relying on $K$-nearest neighbors ($K$NN). The most surprising result is that for unweighted $K$NN classifiers and regressors, the Shapley value of all $N$ data points can be computed, exactly, in $O(N\log N)$ time -- an exponential improvement on computational complexity! Moreover, for $(\epsilon, \delta)$-approximation, we are able to develop an algorithm based on Locality Sensitive Hashing (LSH) with only sublinear complexity $O(N^{h(\epsilon,K)}\log N)$ when $\epsilon$ is not too small and $K$ is not too large. We empirically evaluate our algorithms on up to $10$ million data points and even our exact algorithm is up to three orders of magnitude faster than the baseline approximation algorithm. The LSH-based approximation algorithm can accelerate the value calculation process even further. We then extend our algorithms to other scenarios such as (1) weighed $K$NN classifiers, (2) different data points are clustered by different data curators, and (3) there are data analysts providing computation who also requires proper valuation.

algorithm, artificial intelligence, health & medicine, (20 more...)

arXiv.org Machine Learning

1908.08619

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.92)
Leisure & Entertainment > Games (0.46)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Towards Efficient Data Valuation Based on the Shapley Value

Jia, Ruoxi, Dao, David, Wang, Boxin, Hubis, Frances Ann, Hynes, Nick, Gurel, Nezihe Merve, Li, Bo, Zhang, Ce, Song, Dawn, Spanos, Costas

arXiv.org Machine LearningFeb-26-2019

"How much is my data worth?" is an increasingly common question posed by organizations and individuals alike. An answer to this question could allow, for instance, fairly distributing profits among multiple data contributors and determining prospective compensation when data breaches happen. In this paper, we study the problem of data valuation by utilizing the Shapley value, a popular notion of value which originated in coopoerative game theory. The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value. However, the Shapley value often requires exponential time to compute. To meet this challenge, we propose a repertoire of efficient algorithms for approximating the Shapley value. We also demonstrate the value of each training instance for various benchmark datasets.

data valuation, game theory, health & medicine, (20 more...)

arXiv.org Machine Learning

1902.10275

Country:

Asia (0.28)
North America > United States (0.28)

Genre: Research Report (0.65)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback