ber
Product distribution learning with imperfect advice
We revisit this problem when the learner is also given as advice the parameters of a product distribution Q. We show that there is an efficient algorithm to learn P within TV distance ฮตthat has sample complexity O(d1 ฮท/ฮต2), if p q 1 < ฮตd0.5 โฆ(ฮท). Here, p and q are the mean vectors of P and Q respectively, and no bound on p q 1 is known to the algorithm a priori.
Breaking the Finite-Sample Barrier in Entropy Coupling
Dependence among marginally constrained observations can break a finite-sample barrier. To formalize this phenomenon, we introduce the \emph{minimum list entropy coupling} $H(P\|Q_1,\dots,Q_m)$, the minimum conditional entropy $H(X|Y_1,\dots,Y_m)$ over all joint distributions with prescribed discrete marginals $X\sim P$ and $Y_i\sim Q_i$. Unlike classical formulations based on independent observations, our model allows $Y_1,\dots,Y_m$ to be arbitrarily dependent while keeping each marginal fixed. This enlarged coupling space reveals a sharp dichotomy: independent observations reduce residual uncertainty exponentially, whereas dependent observations can eliminate it exactly after finitely many samples. We characterize this zero-entropy regime through necessary and sufficient conditions and give concrete structural criteria under which it occurs. In particular, under mild support assumptions, zero entropy is achieved with $O(\log(1/P_{\min}))$ observations, where $P_{\min}$ is the minimum nonzero mass of $P$. We also develop a greedy algorithm with monotone approximation guarantees for computing $H(P\|Q_1,\dots,Q_m)$. Finally, we show that the same framework formalizes finite-sample limits in distribution-matching representation learning and randomness extraction, where zero entropy corresponds to exact recovery and exact extraction.
AdaptiveLearningofRank-OneModelsfor EfficientPairwiseSequenceAlignment
A key step in many bioinformatics analysis pipelines is the identification of regions of similarity between pairs of DNA sequencing reads. This task, known aspairwise sequence alignment, is a heavy computational burden, particularly in the context of third-generation long-read sequencing technologies,whichproducenoisyreads[45].