Goto

Collaborating Authors

 projector


On the Limits of Latent Reuse in Diffusion Models

arXiv.org Machine Learning

Diffusion models are often trained in low-dimensional latent spaces, which are then reused for related but shifted datasets. In this work, we study when such latent reuse remains reliable under distribution shift. We consider a source-target setting in which both datasets are approximately low-dimensional but may lie near different subspaces. We show that freezing and reusing a source latent space induces a target-domain score error governed by two quantities: the principal-angle misalignment between the source and target subspaces, and the target ambient noise amplified by the diffusion time scale. Motivated by these limits, we further study mixed source-target training and characterize how the required shared latent dimension depends on the relative geometry of the two distributions. Our results provide theoretical guidance on when latent reuse is reliable and when learning a shared representation may be necessary.


Epson Lifestudio Grand Plus Review: Rich Colors, Gemini Support

WIRED

The configuration process is outdated. Google Home did not recognize the projector on my network. Ultrashort-throw (UST) projectors offer more flexibility than traditional (long-throw) models. No one can ever step in front of one and block the projection, since the unit doesn't require distance and can sit up close to the screen rather than at the back of the room. This also lets all your streaming gear, a soundbar, and a game console connect close to the screen.


A Theory of Generalization in Deep Learning

arXiv.org Machine Learning

We present a non-asymptotic theory of generalization in deep learning where the empirical neural tangent kernel partitions the output space. In directions corresponding to signal, error dissipates rapidly; in the vast orthogonal dimensions corresponding to noise, the kernel's near-zero eigenvalues trap residual error in a test-invisible reservoir. Within the signal channel, minibatch SGD ensures that coherent population signal accumulates via fast linear drift, while idiosyncratic memorization is suppressed into a slow, diffusive random walk. We prove generalization survives even when the kernel evolves $\mathcal{O}(1)$ in operator norm, the full feature-learning regime. This theory naturally explains disparate phenomena in deep learning theory, such as benign overfitting, double descent, implicit bias, and grokking. Lastly, we derive an exact population-risk objective from a single training run with no validation data, for any architecture, loss, or optimizer, and prove that it measures precisely the noise in the signal channel. This objective reduces in practice to an SNR preconditioner on top of Adam, adding one state vector at no extra cost; it accelerates grokking by $5 \times$, suppresses memorization in PINNs and implicit neural representations, and improves DPO fine-tuning under noisy preferences while staying $3 \times$ closer to the reference policy.


1cc70be9fb6a83bc46cf4ac21a91e0b0-Supplemental-Conference.pdf

Neural Information Processing Systems

Algorithm 1 Association Graph Learning (TRAININGTIME) Require: {Dtrt }Tt=1: Training sets of all tasks; T: Number of tasks; C: Number of all classes; E: Shared feature extractor; WT,WC: Parameters of metric functions in the association graph; L: Number of GNN layers; {Wl}Ll=1: Parameters of all GNN layers; {ft}Tt=1: Task-specific classifiers; ฮป: Learning rate. For clarity, we provide the algorithms during training and test in Algorithm 1 and Algorithm 2, respectively. Algorithm 2 Association Graph Learning (TESTTIME) Require: xt: one test instance from the t-th task; E: Trained the feature extractor; GT,GC: Trained task and class graph; L: Number of GNN layers; {Wl}Ll=1: Trained parameters of all GNN layers; ft: The trained task-specific classifier. In this section, we provide the class assignment of all datasets under different missing rates. Table B.1, B.2, B.3 shows the class assignment for Office-Home, Office-Caltechand ImageCLEF, respectively.


On the Asymptotics of Self-Supervised Pre-training: Two-Stage M-Estimation and Representation Symmetry

arXiv.org Machine Learning

Self-supervised pre-training, where large corpora of unlabeled data are used to learn representations for downstream fine-tuning, has become a cornerstone of modern machine learning. While a growing body of theoretical work has begun to analyze this paradigm, existing bounds leave open the question of how sharp the current rates are, and whether they accurately capture the complex interaction between pre-training and fine-tuning. In this paper, we address this gap by developing an asymptotic theory of pre-training via two-stage M-estimation. A key challenge is that the pre-training estimator is often identifiable only up to a group symmetry, a feature common in representation learning that requires careful treatment. We address this issue using tools from Riemannian geometry to study the intrinsic parameters of the pre-training representation, which we link with the downstream predictor through a notion of orbit-invariance, precisely characterizing the limiting distribution of the downstream test risk. We apply our main result to several case studies, including spectral pre-training, factor models, and Gaussian mixture models, and obtain substantial improvements in problem-specific factors over prior art when applicable.


Soundcore Nebula P1i projector review: An affordable option with accurate color and loud sound

Engadget

Anker's P1i offers an easy setup, Google TV and fold out speakers, but lacks brightness. Anker's Soundcore projectors have become an attractive option for buyers thanks to models like the P1 and Nebula X1 that combine performance and portability. Now, the company has added affordability to that equation with its latest model, the $369 P1i . Instead of being detachable like on the P1, its speakers fold out toward listeners, promising better and louder sound than most cheap projectors. The P1i also delivers 1080p video, Google TV for streaming and the same easy screen fit setup as other Anker projectors.


The Best Large TVs (Best Over 75 Inches): Samsung, LG, and More

WIRED

TVs are bigger and better than ever. These are my favorite screens that come in extra-large sizes, from affordable to ostentatious. TVs have (literally) never been bigger. TV brands like LG, Samsung, TCL, Sony, and others have gotten the message buyers have been sending for some time now: Go big or go home. The demand has led to exponential growth for the big-screen TV--virtually every brand I talk to cites this as their fastest-growing segment--and thanks to a dizzying array of major leaps in display technology across brands, the best large TVs have never looked better cost less.


ImprovedFeature

Neural Information Processing Systems

In this section, we further investigate the effectiveness of the proposed method when the feature dimensions of the student and teacher are different. In our experiments, we find that simply initializing different projectors with different seeds and the default initialization method of linear layer in Pytorch is sufficient to yield good performance. Therefore, we stick to this strategy to make the proposed method as simple as possible. Experimentalresults showthatmixing differentinitialization methods hasaslightimpact ontheperformance and is a potential way to further improve the distillation performance. We can see that the training times and memory usages of our method will slightly increase with theincrease ofthenumber ofprojectors.


4ec0b6648bdf487a2f1c815924339022-Paper-Conference.pdf

Neural Information Processing Systems

In knowledge distillation, previous feature distillation methods mainly focus on the design of loss functions and the selection of the distilled layers, while the effectofthefeatureprojector between thestudent andtheteacher remains underexplored.