basis function
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (9 more...)
- Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.68)
- Health & Medicine (0.68)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (9 more...)
- Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.68)
- Health & Medicine (0.68)
- Asia > China > Guangxi Province > Nanning (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- North America > United States > Texas > Brazos County > College Station (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York (0.04)
- (4 more...)
- Research Report > Experimental Study (0.47)
- Research Report > New Finding (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > Canada > Alberta (0.14)
- Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
A Algorithm
The proposed implementation of Gunsilius' algorithm computes For example, in the expenditure dataset (see Section I.3), In Figure 4, we show the results of Gunsilius's algorithm for three different Note that this algorithm works on the empirical CDFs of all variables, i.e., they are all scaled to lie Figure 4: We show results of Gunsilius's algorithm for 3 different settings of The practical issue of course is the optimization. That alone is already very computationally demanding and has convergence problems. A practical resource, sample size, limits the representational size of the estimator. How to achieve "enough variability" without aiming at a completely flexible distribution of In any case, the finite mixture of Gaussians approach can still be implemented with the reparameter-ization trick. The relation to Gunsilius algorithm is that our "base measure" is smoothly adaptive, leading to possibly more stable behavior in practice.
In-Context Learning with Representations: Contextual Generalization of Trained Transformers
In-context learning (ICL) refers to a remarkable capability of pretrained large language models, which can learn a new task given a few examples during inference. However, theoretical understanding of ICL is largely under-explored, particularly whether transformers can be trained to generalize to unseen examples in a prompt, which will require the model to acquire contextual knowledge of the prompt for generalization. This paper investigates the training dynamics of transformers by gradient descent through the lens of non-linear regression tasks. The contextual generalization here can be attained via learning the template function for each task in-context, where all template functions lie in a linear space with $m$ basis functions. We analyze the training dynamics of one-layer multi-head transformers to {in-contextly} predict unlabeled inputs given partially labeled prompts, where the labels contain Gaussian noise and the number of examples in each prompt are not sufficient to determine the template. Under mild assumptions, we show that the training loss for a one-layer multi-head transformer converges linearly to a global minimum. Moreover, the transformer effectively learns to perform ridge regression over the basis functions. To our knowledge, this study is the first provable demonstration that transformers can learn contextual (i.e., template) information to generalize to both unseen examples and tasks when prompts contain only a small number of query-answer pairs.