Not enough data to create a plot.
Try a different view from the menu above.
Beyond Pinball Loss: Quantile Methods for Calibrated Uncertainty Quantification
Among the many ways of quantifying uncertainty in a regression setting, specifying the full quantile function is attractive, as quantiles are amenable to interpretation and evaluation. A model that predicts the true conditional quantiles for each input, at all quantile levels, presents a correct and efficient representation of the underlying uncertainty. To achieve this, many current quantile-based methods focus on optimizing the pinball loss. However, this loss restricts the scope of applicable regression models, limits the ability to target many desirable properties (e.g.
DAPE: Data-Adaptive Positional Encoding for Length Extrapolation 2
Positional encoding plays a crucial role in transformers, significantly impacting model performance and length generalization. Prior research has introduced absolute positional encoding (APE) and relative positional encoding (RPE) to distinguish token positions in given sequences. However, both APE and RPE remain fixed after model training regardless of input data, limiting their adaptability and flexibility. Hence, we expect that the desired positional encoding should be data-adaptive and can be dynamically adjusted with the given attention. In this paper, we propose a Data-Adaptive Positional Encoding (DAPE) method, which dynamically and semantically adjusts based on input context and learned fixed priors. Experimental validation on real-world datasets (Arxiv, Books3, and CHE) demonstrates that DAPE enhances model performances in terms of trained length and length generalization, where the improvements are statistically significant. The model visualization suggests that our model can keep both local and anti-local information. Finally, we successfully train the model on sequence length 128 and achieve better performance at evaluation sequence length 8192, compared with other static positional encoding methods, revealing the benefit of the adaptive positional encoding method.
Checklist
For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and scope? If you used crowdsourcing or conducted research with human subjects... (a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? Proposition 4. For Γ H and domains S, T we have: T Proposition 5 (equivalence between transferability and transfer measures). Summing over (A.12) and (A.14) we have: 2 sup |ɛ In the proof above, we assumed a classifier h Γ is allowed to take a garbage value 0 if it is not sure which label to choose. This is a mild assumption that can hold in practice.
We thank the reviewers (R1, R2, R3, R4, and R5) for their thoughtful reviews, and respond to as much as we can given
Their Theorem 2.2 gives a Given true expected regret, Lemma 2.1 allows one It is precisely this quantity which vanilla RegretNet can only approximate but which we can compute. Due to RegretNet's sensitivity to hyperparameters, we believe that reproducing optimal These changes might explain the performance differences. We agree with this and will add such discussion. As such, much of the comparison in Duetting et al. to previous work applies to our technique as well. We will add discussion briefly in 1 and as a new subsection in 2. We will explicitly clarify this assumption as well.
Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data
For classification and regression on tabular data, the dominance of gradient-boosted decision trees (GBDTs) has recently been challenged by often much slower deep learning methods with extensive hyperparameter tuning. We address this discrepancy by introducing (a) RealMLP, an improved multilayer perceptron (MLP), and (b) strong meta-tuned default parameters for GBDTs and RealMLP. We tune RealMLP and the default parameters on a meta-train benchmark with 118 datasets and compare them to hyperparameter-optimized versions on a disjoint meta-test benchmark with 90 datasets, as well as the GBDT-friendly benchmark by Grinsztajn et al. (2022). Our benchmark results on medium-to-large tabular datasets (1K-500K samples) show that RealMLP offers a favorable time-accuracy tradeoff compared to other neural baselines and is competitive with GBDTs in terms of benchmark scores. Moreover, a combination of RealMLP and GBDTs with improved default parameters can achieve excellent results without hyperparameter tuning. Finally, we demonstrate that some of RealMLP's improvements can also considerably improve the performance of TabR with default parameters.