wehave
On the Expressive Power of Contextual Relations in Transformers
Transformer architectures have achieved remarkable empirical success in modeling contextual relationships in natural language, yet a precise mathematical characterization of their expressive power remains incomplete. In this work, we introduce a measure-theoretic framework for contextual representations in which texts are modeled as probability measures over a semantic embedding space, and contextual relations between words, are represented as coupling measures between them. Within this setting, we introduce Sinkhorn Transformer, a transformer-like architecture. Our main result is a universal approximation theorem: any continuous coupling function between probability measures, that encodes the semantic relation coupling measure, can be uniformly approximated by a Sinkhorn Transformer with appropriate parameters.
Conditional Distributional Treatment Effects: Doubly Robust Estimation and Testing
Beyond conditional average treatment effects, treatments may impact the entire outcome distribution in covariate-dependent ways, for example, by altering the variance or tail risks for specific subpopulations. We propose a novel estimand to capture such conditional distributional treatment effects, and develop a doubly robust estimator that is minimax optimal in the local asymptotic sense. Using this, we develop a test for the global homogeneity of conditional potential outcome distributions that accommodates discrepancies beyond the maximum mean discrepancy (MMD), has provably valid type 1 error, and is consistent against fixed alternatives -- the first test, to our knowledge, with such guarantees in this setting. Furthermore, we derive exact closed-form expressions for two natural discrepancies (including the MMD), and provide a computationally efficient, permutation-free algorithm for our test.
- North America > United States (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
Large-batchOptimizationforDenseVisualPredictions
At thet-th backward propagation step, we can derive the gradient il(wt)toupdatei-th module inM. The number in the bracket represents the batch size. We see that when the batch size is small (i.e.,32), the gradientvariancesaresimilar. N and K indicate the number of FPN levels and region proposals fed into the detection head. To evaluate this assumption, as shown in Figure 1, we have three observations. As illustrated by the second figure in Figure 1, the gradient misalignment phenomenon between detection head and backbone has been reduced.
OntheSaturationEffectsofSpectralAlgorithms inLargeDimensions
Manynon-parametric regression methods areproposed to solve the regression problem by assuming thatf falls into certain function classes, including polynomial splines Stone (1994), local polynomials Cleveland (1979); Stone (1977), the spectral algorithmsCaponnetto(2006);CaponnettoandDeVito(2007);CaponnettoandYao(2010),etc.
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Massachusetts (0.04)
- North America > United States > Colorado > Denver County > Denver (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > United States > Arizona > Maricopa County > Phoenix (0.04)
- (4 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Romania > Sud-Est Development Region > Constanța County > Constanța (0.04)
- North America > United States > New York > Rensselaer County > Troy (0.04)
- Europe > Belgium > Flanders > East Flanders > Ghent (0.04)
ANear-OptimalBest-of-Both-WorldsAlgorithm forOnlineLearningwithFeedbackGraphs
We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is O( αT), where T is the time horizon andα is the independence number of the feedback graph.
- Europe > Italy (0.04)
- Europe > Denmark (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada > Quebec > Montreal (0.05)
- Europe > Switzerland > Vaud > Lausanne (0.05)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)