Goto

Collaborating Authors

 informativeness



e4d2b6e6fdeca3e60e0f1a62fee3d9dd-Paper.pdf

Neural Information Processing Systems

AwidevarietyofNLPapplications, suchasmachinetranslation, summarization, and dialog, involve text generation. One major challenge for these applications is how to evaluate whether such generated texts are actually fluent, accurate, or effective. In this work, we conceptualize theevaluation of generated text as a text generation problem, modeled using pre-trained sequence-to-sequence models. The general idea is that models trained to convert the generated text to/from a reference output or the source text will achieve higher scores when the generated text is better.


Predicts HumanVisualSelectivity

Neural Information Processing Systems

The 1For our experiments we are counting the number of AMTHuman Intelligence Tasks (HITs) that were completed. Wedid not exclude AMT workers from completing multiple HITs. The authors posit that this noisiness is because the gradient may fluctuate sharply at small scales, which seems plausible especially given that, duetoReLUactivationfunctions, theoutput generally isnotevencontinuously differentiable. ThisCAM indicates the discriminative regions of the image used by the CNN to identify that class. We used each of the above passive attention methods to acquire attention maps from each of the modelsinthetoppartofTable2.




Mutual Information Collapse Explains Disentanglement Failure in $β$-VAEs

Vu, Minh, Wan, Xiaoliang, Wei, Shuangqing

arXiv.org Machine Learning

The $β$-VAE is a foundational framework for unsupervised disentanglement, using $β$ to regulate the trade-off between latent factorization and reconstruction fidelity. Empirically, however, disentanglement performance exhibits a pervasive non-monotonic trend: benchmarks such as MIG and SAP typically peak at intermediate $β$ and collapse as regularization increases. We demonstrate that this collapse is a fundamental information-theoretic failure, where strong Kullback-Leibler pressure promotes marginal independence at the expense of the latent channel's semantic informativeness. By formalizing this mechanism in a linear-Gaussian setting, we prove that for $β> 1$, stationarity-induced dynamics trigger a spectral contraction of the encoder gain, driving latent-factor mutual information to zero. To resolve this, we introduce the $λβ$-VAE, which decouples regularization pressure from informational collapse via an auxiliary $L_2$ reconstruction penalty $λ$. Extensive experiments on dSprites, Shapes3D, and MPI3D-real confirm that $λ> 0$ stabilizes disentanglement and restores latent informativeness over a significantly broader range of $β$, providing a principled theoretical justification for dual-parameter regularization in variational inference backbones.


EmergentCommunication

Neural Information Processing Systems

Recall that ˆmc(u) is exactly the listener's decoder in the IB framework (see Section 3.1.1). Therefore, anyother decoder would lend an upper bound on the informativeness loss term. Notice that under our assumptions,ˆmc is a Gaussian mixture, whereas the speaker's beliefs are simply Gaussian. All the systems with the samek form an equivalence class and the canonical system within each class is the one with minimalk. These canonical systems are the natural one to prefer, because they can attain the optimum for a given complexity with aminimal codebook.