Wormhole Loss for Partial Shape Matching Thomas Dagรจs

Neural Information Processing Systems

When matching parts of a surface to its whole, a fundamental question arises: Which points should be included in the matching process? The issue is intensified when using isometry to measure similarity, as it requires the validation of whether distances measured between pairs of surface points should influence the matching process. The approach we propose treats surfaces as manifolds equipped with geodesic distances, and addresses the partial shape matching challenge by introducing a novel criterion to meticulously search for consistent distances between pairs of points. The new criterion explores the relation between intrinsic geodesic distances between the points, geodesic distances between the points and surface boundaries, and extrinsic distances between boundary points measured in the embedding space. It is shown to be less restrictive compared to previous measures and achieves state-of-the-art results when used as a loss function in training networks for partial shape matching.




GREATS: Online Selection of High-Quality Data for LLM Training in Every Iteration

Neural Information Processing Systems

Online batch selection methods offer an adaptive alternative to static training data selection by dynamically selecting data batches during training. However, existing methods either rely on impractical reference models or simple heuristics that may not capture true data informativeness. To address these limitations, we propose GREedy Approximation Taylor Selection (GREATS), a principled and efficient online batch selection method that applies greedy algorithm to optimize the data batch quality approximated by Taylor expansion. We develop a series of techniques to scale GREATS to large-scale model training. Extensive experiments with large language models (LLMs) demonstrate that GREATS significantly improves training convergence speed and generalization performance.





Appendix A Proteomics Terminology and Acronyms

Neural Information Processing Systems

Table 9 highlights interesting patterns observed in Figure 3. First, the same modification occurring at different residues can have varying effects on the peptide properties, implying that including amino acid PTM information is essential to achieve better predictions. Second, some modifications have the same Unimod ID and the same molecular structure but only differ in their stereo-chemistry (spatial arrangement of atoms), yet they impact the peptide properties differently. Such scenarios are present in modified sequences and require a proper representation of PTMs (via encoding and domain-specific features) to predict peptide properties accurately. Table 11 in Appendix Section D shows the impact of PTMs on retention time for the special cases from Table 9.


PROSPECT PTMs: Rich Labeled Tandem Mass Spectrometry Dataset of Modified Peptides for Machine Learning in Proteomics Wassim Gabriel 1 Omar Shouman 1 Ayla Schroeder 1

Neural Information Processing Systems

Post-Translational Modifications (PTMs) are changes that occur in proteins after synthesis, influencing their structure, function, and cellular behavior. PTMs are essential in cell biology; they regulate protein function and stability, are involved in various cellular processes, and are linked to numerous diseases. A particularly interesting class of PTMs are chemical modifications such as phosphorylation introduced on amino acid side chains because they can drastically alter the physicochemical properties of the peptides once they are present. One or more PTMs can be attached to each amino acid of the peptide sequence. The most commonly applied technique to detect PTMs on proteins is bottom-up Mass Spectrometrybased proteomics (MS), where proteins are digested into peptides and subsequently analyzed using Tandem Mass Spectrometry (MS/MS).