klog
OnRobustOptimalTransport Computational
In Appendix A, we introduce and recall necessary notations for the supplementary material. Regarding Sinkhorn algorithm, uk,vk are the updates of thek-th iteration. The main idea for deriving this bound comes from the geometric convergence rate (i.e. First, we represent the above difference by other quantities that are straightforward to bound. Thus, it has an unique optimal solution which could be directly calculated as Xi =B(ui,vi;Ci).
12151_differentially_private_general.pdf
Hence, the function over this constraint set isG-Lipschitz. Finally, in Lemma6, we provide bounds on excess empirical risk and average regret of gradient descent. Let โ be a non-negative eH smooth convex loss function. Let bw:= A(S), S(i) be the dataset where thei-th data point is replaced by an i.i.d. A.4 HighDimensionProofofTheorem 2. Let ฮฑ 1 be a parameter to be set later.
Supplemental to Differential Privacy Over Riemannian Manifolds 1 Simulation details
We use a gradient descent algorithm to compute the Fr echet mean of a sample D ={x1,x2,...,xn}. We initialize the mean หยต0 at any data point, take a small step in the average direction of the gradient of energy functional F2:M R, and iterate. Then, the estimate of the Fr echet mean at iterate k is หยตk = expหยตk 1(tkvk) where tk (0,1] is the step size. The algorithm is assumed to have converged once the change in the mean across subsequent steps is no longer significant, measured using the intrinsic distance ฯ on M; that is, the algorithm terminates if ฯ(ยตk,ยตk 1)<ฮป for some pre-specifiedฮป>0. Wechoosethestepsizetk =0.5andฮป=10 5. Inaddition, one could set a maximum number of iterations for situations when the mean oscillates between local optima, and we set this at 500 but note that in our settings the algorithm typically converges in fewer than 200 iterations.
Cluster weighted models with multivariate skewed distributions for functional data
Anton, Cristina, Shreshtth, Roy Shivam Ram
Cluster weighted models with multivariate skewed distributions for functional data Cristina Anton, 1 Roy Shivam Ram Shreshtth 2 1 Department of Mathematics and Statistics, MacEwan University, 103C, 10700-104 Ave., Edmonton, AB T5J 4S2, Canada, email: popescuc@macewan.ca 2 Department of Mathematics and Statistics, Indian Institute of Technology Kanpur Abstract We propose a clustering method, funWeightClustSkew, based on mixtures of functional linear regression models and three skewed multivariate distributions: the variance-gamma distribution, the skew-t distribution, and the normal-inverse Gaussian distribution. Our approach follows the framework of the functional high dimensional data clustering (funHDDC) method, and we extend to functional data the cluster weighted models based on skewed distributions used for finite dimensional multivariate data. We consider several parsimonious models, and to estimate the parameters we construct an expectation maximization (EM) algorithm. We illustrate the performance of funWeightClustSkew for simulated data and for the Air Quality dataset. Keywords: Cluster weighted models, Functional linear regression, EM algorithm, Skewed distributions, Multivariate functional principal component analysis 1 Introduction Smart devices and other modern technologies record huge amounts of data measured continuously in time. These data are better represented as curves instead of finite-dimensional vectors, and they are analyzed using statistical methods specific to functional data (Ramsay and Silverman, 2006; Ferraty and Vieu, 2006; Horv ath and Kokoszka, 2012). Many times more than one curve is collected for one individual, e.g.
Incentive-compatible Bandits: Importance Weighting No More
Zimmert, Julian, Marinov, Teodor V.
We study the problem of incentive-compatible online learning with bandit feedback. In this class of problems, the experts are self-interested agents who might misrepresent their preferences with the goal of being selected most often. The goal is to devise algorithms which are simultaneously incentive-compatible, that is the experts are incentivised to report their true preferences, and have no regret with respect to the preferences of the best fixed expert in hindsight. \citet{freeman2020no} propose an algorithm in the full information setting with optimal $O(\sqrt{T \log(K)})$ regret and $O(T^{2/3}(K\log(K))^{1/3})$ regret in the bandit setting. In this work we propose the first incentive-compatible algorithms that enjoy $O(\sqrt{KT})$ regret bounds. We further demonstrate how simple loss-biasing allows the algorithm proposed in Freeman et al. 2020 to enjoy $\tilde O(\sqrt{KT})$ regret. As a byproduct of our approach we obtain the first bandit algorithm with nearly optimal regret bounds in the adversarial setting which works entirely on the observed loss sequence without the need for importance-weighted estimators. Finally, we provide an incentive-compatible algorithm that enjoys asymptotically optimal best-of-both-worlds regret guarantees, i.e., logarithmic regret in the stochastic regime as well as worst-case $O(\sqrt{KT})$ regret.
On the Optimal Bounds for Noisy Computing
Zhu, Banghua, Wang, Ziao, Ghaddar, Nadim, Jiao, Jiantao, Wang, Lele
We revisit the problem of computing with noisy information considered in Feige et al. 1994, which includes computing the OR function from noisy queries, and computing the MAX, SEARCH and SORT functions from noisy pairwise comparisons. For $K$ given elements, the goal is to correctly recover the desired function with probability at least $1-\delta$ when the outcome of each query is flipped with probability $p$. We consider both the adaptive sampling setting where each query can be adaptively designed based on past outcomes, and the non-adaptive sampling setting where the query cannot depend on past outcomes. The prior work provides tight bounds on the worst-case query complexity in terms of the dependence on $K$. However, the upper and lower bounds do not match in terms of the dependence on $\delta$ and $p$. We improve the lower bounds for all the four functions under both adaptive and non-adaptive query models. Most of our lower bounds match the upper bounds up to constant factors when either $p$ or $\delta$ is bounded away from $0$, while the ratio between the best prior upper and lower bounds goes to infinity when $p\rightarrow 0$ or $p\rightarrow 1/2$. On the other hand, we also provide matching upper and lower bounds for the number of queries in expectation, improving both the upper and lower bounds for the variable-length query model.
Stability and Risk Bounds of Iterative Hard Thresholding
In this paper, we analyze the generalization performance of the Iterative Hard Thresholding (IHT) algorithm widely used for sparse recovery problems. The parameter estimation and sparsity recovery consistency of IHT has long been known in compressed sensing. From the perspective of statistical learning, another fundamental question is how well the IHT estimation would predict on unseen data. This paper makes progress towards answering this open question by introducing a novel sparse generalization theory for IHT under the notion of algorithmic stability. Our theory reveals that: 1) under natural conditions on the empirical risk function over $n$ samples of dimension $p$, IHT with sparsity level $k$ enjoys an $\mathcal{\tilde O}(n^{-1/2}\sqrt{k\log(n)\log(p)})$ rate of convergence in sparse excess risk; 2) a tighter $\mathcal{\tilde O}(n^{-1/2}\sqrt{\log(n)})$ bound can be established by imposing an additional iteration stability condition on a hypothetical IHT procedure invoked to the population risk; and 3) a fast rate of order $\mathcal{\tilde O}\left(n^{-1}k(\log^3(n)+\log(p))\right)$ can be derived for strongly convex risk function under proper strong-signal conditions. The results have been substantialized to sparse linear regression and sparse logistic regression models to demonstrate the applicability of our theory. Preliminary numerical evidence is provided to confirm our theoretical predictions.