Goto

Collaborating Authors

 redundancy








Amazon cuts thousands of jobs amid AI push

Al Jazeera

Amazon is slashing 16,000 jobs in a second wave of layoffs at the e-commerce giant in three months, as the company restructures and leans on artificial intelligence. Wednesday's cuts follow the 14,000 redundancies that the Seattle, Washington-based company made in October. The layoffs are expected to affect employees working in Prime Video, Amazon Web Services, and the company's human resources department, according to the Reuters news agency, which first reported the cuts. In a memo to the employees, shared with Al Jazeera, Amazon said workers in the United States impacted by the cuts will have a 90-day window to find a new role in the company. "Teammates who are unable to find a new role at Amazon or who choose not to look for one, we'll provide transition support including severance pay, outplacement services, health insurance benefits [as applicable], and more," Beth Galetti, senior vice president of People Experience and Technology at Amazon, said in the note provided to Al Jazeera.


Deriving Decoder-Free Sparse Autoencoders from First Principles

Oursland, Alan

arXiv.org Machine Learning

Gradient descent on log-sum-exp (LSE) objectives performs implicit expectation--maximization (EM): the gradient with respect to each component output equals its responsibility. The same theory predicts collapse without volume control analogous to the log-determinant in Gaussian mixture models. We instantiate the theory in a single-layer encoder with an LSE objective and InfoMax regularization for volume control. Experiments confirm the theory's predictions. The gradient--responsibility identity holds exactly; LSE alone collapses; variance prevents dead components; decorrelation prevents redundancy. The model exhibits EM-like optimization dynamics in which lower loss does not correspond to better features and adaptive optimizers offer no advantage. The resulting decoder-free model learns interpretable mixture components, confirming that implicit EM theory can prescribe architectures.


DiffusionPID: Interpreting Diffusion via Partial Information Decomposition

Neural Information Processing Systems

Text-to-image diffusion models have made significant progress in generating naturalistic images from textual inputs, and demonstrate the capacity to learn and represent complex visual-semantic relationships. While these diffusion models have achieved remarkable success, the underlying mechanisms driving their performance are not yet fully accounted for, with many unanswered questions surrounding what they learn, how they represent visual-semantic relationships, and why they sometimes fail to generalize.


DiTFastAttn: Attention Compression for Diffusion Transformer Models

Neural Information Processing Systems

Diffusion Transformers (DiT) excel at image and video generation but face computational challenges due to the quadratic complexity of self-attention operators. We propose DiTFastAttn, a post-training compression method to alleviate the computational bottleneck of DiT.