Well File:
- Well Planning ( results)
- Shallow Hazard Analysis ( results)
- Well Plat ( results)
- Wellbore Schematic ( results)
- Directional Survey ( results)
- Fluid Sample ( results)
- Log ( results)
- Density ( results)
- Gamma Ray ( results)
- Mud ( results)
- Resistivity ( results)
- Report ( results)
- Daily Report ( results)
- End of Well Report ( results)
- Well Completion Report ( results)
- Rock Sample ( results)
NoiseGPT: Label Noise Detection and Rectification through Probability Curvature
Machine learning craves high-quality data which is a major bottleneck during realistic deployment, as it takes abundant resources and massive human labor to collect and label data. Unfortunately, label noise where image data mismatches with incorrect label exists ubiquitously in all kinds of datasets, significantly degrading the learning performance of deep networks. Learning with Label Noise (LNL) has been a common strategy for mitigating the influence of noisy labels. However, existing LNL methods either require pertaining using the memorization effect to separate clean data from noisy ones or rely on dataset assumptions that cannot extend to various scenarios. Thanks to the development of Multimodal Large Language Models (MLLMs) which possess massive knowledge and hold In-Context Learning (ICL) ability, this paper proposes NoiseGPT to effectively leverage MLLMs as a knowledge expert for conducting label noise detection and rectification. Specifically, we observe a probability curvature effect of MLLMs where clean and noisy examples reside on curvatures with different smoothness, further enabling the detection of label noise.
Learning from Offline Foundation Features with Tensor Augmentations Emir Konuk 1,2
We introduce Learning from Offline Foundation Features with Tensor Augmentations (LOFF-TA), an efficient training scheme designed to harness the capabilities of foundation models in limited resource settings where their direct development is not feasible. LOFF-TA involves training a compact classifier on cached feature embeddings from a frozen foundation model, resulting in up to 37 faster training and up to 26 reduced GPU memory usage. Because the embeddings of augmented images would be too numerous to store, yet the augmentation process is essential for training, we propose to apply tensor augmentations to the cached embeddings of the original non-augmented images. LOFF-TA makes it possible to leverage the power of foundation models, regardless of their size, in settings with limited computational capacity. Moreover, LOFF-TA can be used to apply foundation models to high-resolution images without increasing compute. In certain scenarios, we find that training with LOFF-TA yields better results than directly fine-tuning the foundation model.
Can Large Language Models Explore In-Context? 2
We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making. We focus on native performance of existing LLMs, without training interventions. We deploy LLMs as agents in simple multi-armed bandit environments, specifying the environment description and interaction history entirely in-context, i.e., within the LLM prompt.
Supplementary Material and Datasheet for the WorldStrat Dataset J. Cornebise, I. Orลกoliฤ, F. Kalaitzis 2022-06-16 4 2 Cloud coverage statistics 4 3 Full List of Hyperparameters for Benchmark
Does this timeframe match the creation timeframe of the data associated with the instances (e.g., recent crawl of old news articles)? LCCS comprises of 23 classes and 14 sub-classes. The dataset, along with its machine-readable metadata, is hosted on CERN-backed Zenodo data repository: https://zenodo.org/record/6810792 Its longterm maintenance is discussed in the Datasheet. This includes reproducible code for the Benchmarks of Section 4 of [Cornebise et al., 2022a], following the ML Reproducibility Checklist [Pineau et al., 2021a,b]. The project also has its own website available at https://worldstrat.github.io/, The authors hereby state that they bear all responsibility in case of violation of rights, etc., and confirm that the data license is as follows: The low-resolution imagery, labels, metadata, and pretrained models are released under Creative Commons with Attribution 4.0 International (CC BY 4.0) The mean of the cloud coverage over the Sentinel 2 product areas is 7.98 %, with a standard deviation of 14.22. The quantiles are: 0.025: 0.00% 0.25: 0.00% 0.5: 0.66% 0.75: 10.05% 0.975: 49.95% It is important to note that this cloud cover percentage, as mentioned in the article and datasheet, is calculated on the entire product size of the provider, which varies in size but is much larger than the 2.5km we target. This means that even an image with a large cloud cover percentage can be cloud free, and in extreme cases (though unlikely), vice-versa. Also there are indeed considerable difference across sampled regions and land cover types. A simple example would be rainforests and non-desert equatorial regions. Using a strict no-cloud policy would make sampling enough low-resolution images either impossible or would make the temporal difference extremely large (up to 7 years for some AOIs). With that in mind, we strived to keep the cloud coverage as low as possible, ideally under 5%, while maintaining the temporal difference as small as possible.
Tangent Space Causal Inference: Leveraging Vector Fields for Causal Discovery in Dynamical Systems Daniel Waxman
Causal discovery with time series data remains a challenging yet increasingly important task across many scientific domains. Convergent cross mapping (CCM) and related methods have been proposed to study time series that are generated by dynamical systems, where traditional approaches like Granger causality are unreliable. However, CCM often yields inaccurate results depending upon the quality of the data. We propose the Tangent Space Causal Inference (TSCI) method for detecting causalities in dynamical systems. TSCI works by considering vector fields as explicit representations of the systems' dynamics and checks for the degree of synchronization between the learned vector fields. The TSCI approach is modelagnostic and can be used as a drop-in replacement for CCM and its generalizations. We first present a basic version of the TSCI algorithm, which is shown to be more effective than the basic CCM algorithm with very little additional computation.
Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs
Hybrid Reinforcement Learning (RL), where an agent learns from both an offline dataset and online explorations in an unknown environment, has garnered significant recent interest. A crucial question posed by Xie et al. (2022b) is whether hybrid RL can improve upon the existing lower bounds established for purely offline or online RL without requiring that the behavior policy visit every state and action the optimal policy does. While Li et al. (2023b) provided an affirmative answer for tabular PAC RL, the question remains unsettled for both the regretminimizing and non-tabular cases. In this work, building upon recent advancements in offline RL and reward-agnostic exploration, we develop computationally efficient algorithms for both PAC and regret-minimizing RL with linear function approximation, without requiring concentrability on the entire state-action space. We demonstrate that these algorithms achieve sharper error or regret bounds that are no worse than, and can improve on, the optimal sample complexity in offline RL (the first algorithm, for PAC RL) and online RL (the second algorithm, for regret-minimizing RL) in linear Markov decision processes (MDPs), regardless of the quality of the behavior policy. To our knowledge, this work establishes the tightest theoretical guarantees currently available for hybrid RL in linear MDPs.