parity
The two clocks and the innovation window: When and how generative models learn rules
Wang, Binxu, Finn, Emma Lucia Byrnes, Liu, Bingbin
Generative models trained on finite data face a fundamental tension: their score-matching or next-token objective converges to the empirical training distribution rather than the population distribution we seek to learn. Using rule-valid synthetic tasks, we trace this tension across two training timescales: $ฯ_{\mathrm{rule}}$, the step at which generations first become rule-valid, and $ฯ_{\mathrm{mem}}$, the step at which models begin reproducing training samples. Focusing on parity and extending to other binary rules and combinatorial puzzles, we characterize how these two clocks, $ฯ_{\mathrm{rule}}$ and $ฯ_{\mathrm{mem}}$, depend on key aspects of the learning setup. Specifically, we show that $ฯ_{\mathrm{rule}}$ increases with rule complexity and decreases with model capacity, while $ฯ_{\mathrm{mem}}$ is approximately invariant to the rule and scales nearly linearly with dataset size $N$. We define the \emph{innovation window} as the interval $[ฯ_{\mathrm{rule}}, ฯ_{\mathrm{mem}}]$. This window widens with increasing $N$ and narrows with rule complexity, and may vanish entirely when $ฯ_{\mathrm{rule}} \geq ฯ_{\mathrm{mem}}$. The same two-clock structure arises in both diffusion (DiT) and autoregressive (GPT) models, with architecture-dependent offsets. Dissecting the learned score of DiT models reveals a corresponding evolution of the optimization landscapes, where rule-valid samples' basins expand substantially around $ฯ_{\mathrm{rule}}$, while training samples' basins begin to dominate around $ฯ_{\mathrm{mem}}$. Together, these results yield a unified and predictive account of when and how generative models exhibit genuine innovation.
TensorNet: Cartesian Tensor Representations for Efficient Learning of Molecular Potentials
The development of efficient machine learning models for molecular systems representation is becoming crucial in scientific research. We introduce TensorNet, an innovative O(3)-equivariant message-passing neural network architecture that leverages Cartesian tensor representations. By using Cartesian tensor atomic embeddings, feature mixing is simplified through matrix product operations. Furthermore, the cost-effective decomposition of these tensors into rotation group irreducible representations allows for the separate processing of scalars, vectors, and tensors when necessary. Compared to higher-rank spherical tensor models, TensorNet demonstrates state-of-the-art performance with significantly fewer parameters. For small molecule potential energies, this can be achieved even with a single interaction layer. As a result of all these properties, the model's computational cost is substantially decreased. Moreover, the accurate prediction of vector and tensor molecular quantities on top of potential energies and forces is possible. In summary, TensorNet's framework opens up a new space for the design of state-of-the-art equivariant models.
Shallow Representation of Option Implied Information
Option prices encode the market's collective outlook through implied density and implied volatility. An explicit link between implied density and implied volatility translates the risk-neutrality of the former into conditions on the latter to rule out static arbitrage. Despite earlier recognition of their parity, the two had been studied in isolation for decades until the recent demand in implied volatility modeling rejuvenated such parity. This paper provides a systematic approach to build neural representations of option implied information. As a preliminary, we first revisit the explicit link between implied density and implied volatility through an alternative and minimalist lens, where implied volatility is viewed not as volatility but as a pointwise corrector mapping the Black-Scholes quasi-density into the implied risk-neutral density. Building on this perspective, we propose the neural representation that incorporates arbitrage constraints through the differentiable corrector. With an additive logistic model as the synthetic benchmark, extensive experiments reveal that deeper or wider network structures do not necessarily improve the model performance due to the nonlinearity of both arbitrage constraints and neural derivatives. By contrast, a shallow feedforward network with a single hidden layer and a specific activation effectively approximates implied density and implied volatility.
Beyond Parity: Fairness Objectives for Collaborative Filtering
We study fairness in collaborative-filtering recommender systems, which are sensitive to discrimination that exists in historical data. Biased data can lead collaborative-filtering methods to make unfair predictions for users from minority groups. We identify the insufficiency of existing fairness metrics and propose four new metrics that address different forms of unfairness. These fairness metrics can be optimized by adding fairness terms to the learning objective. Experiments on synthetic and real data show that our new metrics can better measure fairness than the baseline, and that the fairness objectives effectively help reduce unfairness.