Well File:
- Well Planning ( results)
- Shallow Hazard Analysis ( results)
- Well Plat ( results)
- Wellbore Schematic ( results)
- Directional Survey ( results)
- Fluid Sample ( results)
- Log ( results)
- Density ( results)
- Gamma Ray ( results)
- Mud ( results)
- Resistivity ( results)
- Report ( results)
- Daily Report ( results)
- End of Well Report ( results)
- Well Completion Report ( results)
- Rock Sample ( results)
Discover and Align Taxonomic Context Priors for Open-world Semi-Supervised Learning (Supplementary Materials)
In this section, we provide another interpretation of TIDA, which can provide more insights into the nature of the learned structure priors. TIDA intrinsically: (a) builds hierarchical vMF distributions to cluster samples and discovery taxonomic context by optimizing Eq. VI; (b) applies the consistency constraint on the hierarchical vMF distributions to build communication and alignment across taxonomic context as Eq.
Discover and Align Taxonomic Context Priors for Open-world Semi-Supervised Learning
Open-world Semi-Supervised Learning (OSSL) is a realistic and challenging task, aiming to classify unlabeled samples from both seen and novel classes using partially labeled samples from the seen classes. Previous works typically explore the relationship of samples as priors on the pre-defined single-granularity labels to help novel class recognition. In fact, classes follow a taxonomy and samples can be classified at multiple levels of granularity, which contains more underlying relationships for supervision. We thus argue that learning with single-granularity labels results in sub-optimal representation learning and inaccurate pseudo labels, especially with unknown classes. In this paper, we take the initiative to explore and propose a uniformed framework, called Taxonomic context prIors Discovering and Aligning (TIDA), which exploits the relationship of samples under various granularity. It allows us to discover multi-granularity semantic concepts as taxonomic context priors (i.e., sub-class, target-class, and super-class), and then collaboratively leverage them to enhance representation learning and improve the quality of pseudo labels.
Policy Mirror Descent with Lookahead
Policy Mirror Descent (PMD) stands as a versatile algorithmic framework encompassing several seminal policy gradient algorithms such as natural policy gradient, with connections with state-of-the-art reinforcement learning (RL) algorithms such as TRPO and PPO. PMD can be seen as a soft Policy Iteration algorithm implementing regularized 1-step greedy policy improvement. However, 1-step greedy policies might not be the best choice and recent remarkable empirical successes in RL such as AlphaGo and AlphaZero have demonstrated that greedy approaches with respect to multiple steps outperform their 1-step counterpart. In this work, we propose a new class of PMD algorithms called h-PMD which incorporates multi-step greedy policy improvement with lookahead depth h to the PMD update rule.
References
Available: https://openreview.net/forum?id=m5Qsh0kBQG [4] Q. Lu, J. Ren, and Z. Wang, "Using genetic programming with prior formula knowledge to solve symbolic regression problem," Computational intelligence and neuroscience, vol. For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and scope? If you used crowdsourcing or conducted research with human subjects... (a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? A.1 Model Overview with Pseudo Codes In this subsection, we provide a high-level summary of our framework for better understanding. We present the summary in the form of pseudo codes, shown in Algorithm 1. Algorithm 1 The Overview of the Proposed Framework.
supp
Dataset documentation and intended uses: 1. Introduction The current methodologies for enzyme annotation primarily rely on established databases and classifications such as KEGG Orthology (KO), Enzyme Commission (EC) numbers, and Gene Ontology (GO) annotations, each with its specific focus and methodology. For instance, the EC system categorizes enzymes based on the chemical reactions they catalyze, providing a hierarchical numerical classification. KO links gene products to their functional orthologs across different species, whereas GO offers a broader ontology for describing the roles of genes and proteins in any organism. Despite their widespread use, these systems have notable limitations. The EC classification, while widely used, sometimes groups vastly different enzymes under the same category or subdivides similar ones excessively, based on the substrates they interact with--leading to ambiguities in enzyme function characterization.
ReactZyme: A Benchmark for Enzyme-Reaction Prediction Chenqing Hua 1,3
Enzymes, with their specific catalyzed reactions, are necessary for all aspects of life, enabling diverse biological processes and adaptations. Predicting enzyme functions is essential for understanding biological pathways, guiding drug development, enhancing bioproduct yields, and facilitating evolutionary studies. Addressing the inherent complexities, we introduce a new approach to annotating enzymes based on their catalyzed reactions. This method provides detailed insights into specific reactions and is adaptable to newly discovered reactions, diverging from traditional classifications by protein family or expert-derived reaction classes. We employ machine learning algorithms to analyze enzyme reaction datasets, delivering a much more refined view on the functionality of enzymes. Our evaluation leverages the largest enzyme-reaction dataset to date, derived from the SwissProt and Rhea databases with entries up to January 8, 2024. We frame the enzyme-reaction prediction as a retrieval problem, aiming to rank enzymes by their catalytic ability for specific reactions. With our model, we can recruit proteins for novel reactions and predict reactions in novel proteins, facilitating enzyme discovery and function annotation (https://github.com/WillHua127/ReactZyme).
Direct Training of SNN using Local Zeroth Order Method William de Vazelhes
Spiking neural networks are becoming increasingly popular for their low energy requirement in real-world tasks with accuracy comparable to traditional ANNs. SNN training algorithms face the loss of gradient information and non-differentiability due to the Heaviside function in minimizing the model loss over model parameters. To circumvent this problem, the surrogate method employs a differentiable approximation of the Heaviside function in the backward pass, while the forward pass continues to use the Heaviside as the spiking function. We propose to use the zerothorder technique at the local or neuron level in training SNNs, motivated by its regularizing and potential energy-efficient effects and establish a theoretical connection between it and the existing surrogate methods. We perform experimental validation of the technique on standard static datasets (CIFAR-10, CIFAR-100, ImageNet-100) and neuromorphic datasets (DVS-CIFAR-10, DVS-Gesture, N-Caltech-101, NCARS) and obtain results that offer improvement over the state-of-the-art results. The proposed method also lends itself to efficient implementations of the backpropagation method, which could provide 3-4 times overall speedup in training time.
A Additional related works
The Column Subset Selection Problem is one of the most classical tasks in matrix approximation (Boutsidis et al., 2008). The original version of the problem compares the projection error of a subset of size k to the best rank k approximation error. The techniques used for finding good subsets have included many randomized methods (Deshpande et al., 2006; Boutsidis et al., 2008; Belhadji et al., 2018; Boutsidis & Woodruff, 2014), as well as deterministic methods (Gu & Eisenstat, 1996). Variants of these algorithms have also been extended to more general losses (Chierichetti et al., 2017; Khanna et al., 2017; Elenberg et al., 2018). Later on, most works have relaxed the problem formulation by allowing the number of selected columns |S| to exceed the rank k.
Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nyström method
The Column Subset Selection Problem (CSSP) and the Nyström method are among the leading tools for constructing small low-rank approximations of large datasets in machine learning and scientific computing. A fundamental question in this area is: how well can a data subset of size k compete with the best rank k approximation? We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees which go beyond the standard worstcase analysis. Our approach leads to significantly better bounds for datasets with known rates of singular value decay, e.g., polynomial or exponential decay. Our analysis also reveals an intriguing phenomenon: the approximation factor as a function of k may exhibit multiple peaks and valleys, which we call a multipledescent curve. A lower bound we establish shows that this behavior is not an artifact of our analysis, but rather it is an inherent property of the CSSP and Nyström tasks. Finally, using the example of a radial basis function (RBF) kernel, we show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.