intf
OntheAccuracyofInfluenceFunctions forMeasuringGroupEffects
Influence functions estimate the effect of removing a training point on a model without theneedtoretrain. Theyarebasedonafirst-order Taylorapproximation thatisguaranteed tobeaccurate forsufficiently small changes tothemodel, and so are commonly used to study the effect of individual points in large datasets. However, we often want to study the effects of largegroups of training points, e.g., todiagnose batch effects orapportion credit between different data sources.
- Asia > Middle East > Jordan (0.05)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
ProteinWeaver: A Divide-and-Assembly Approach for Protein Backbone Design
Ma, Yiming, Ye, Fei, Zhou, Yi, Zheng, Zaixiang, Xue, Dongyu, Gu, Quanquan
Nature creates diverse proteins through a'divide and assembly' strategy. Inspired by this idea, we introduce ProteinWeaver, a two-stage framework for protein backbone design. Our method first generates individual protein domains and then employs an SE(3) diffusion model to flexibly assemble these domains. A key challenge lies in the assembling step, given the complex and rugged nature of the interdomain interaction landscape. To address this challenge, we employ preference alignment to discern complex relationships between structure and interaction landscapes through comparative analysis of generated samples. Comprehensive experiments demonstrate that ProteinWeaver: (1) generates high-quality, novel protein backbones through versatile domain assembly; (2) outperforms RFdiffusion, the current state-of-the-art in backbone design, by 13% and 39% for long-chain proteins; (3) shows the potential for cooperative function design through illustrative case studies. To sum up, by introducing a'divide-and-assembly' paradigm, ProteinWeaver advances protein engineering and opens new avenues for functional protein design. Nature employs a sophisticated'divide and assemble' strategy to create large and intricate protein structures that meet diverse biological functional needs (Figure 1A) (Pawson & Nash, 2003; Huddy et al., 2024; P Bagowski et al., 2010). This process primarily involves the recombination of existing structural blocks, particularly protein domains, which serve as the fundamental, recurring units in protein structures. Remarkably, a limited number of protein domains (approximately 500 as classified in CATH) suffice to create more than hundreds of thousands of structures satisfying a wide array of functions (Orengo et al., 1997). This strategy enables the creation of multi-domain protein backbones, facilitating the emergence of cooperative functions. However, our analysis reveals a significant limitation: designability decreases markedly as the backbone length increases (Figure 1E).
Hidden Variables unseen by Random Forests
Blum, Ricardo, Hiabu, Munir, Mammen, Enno, Meyer, Joseph Theo
Random Forests are widely claimed to capture interactions well. However, some simple examples suggest that they perform poorly in the presence of certain pure interactions that the conventional CART criterion struggles to capture during tree construction. We argue that simple alternative partitioning schemes used in the tree growing procedure can enhance identification of these interactions. In a simulation study we compare these variants to conventional Random Forests and Extremely Randomized trees. Our results validate that the modifications considered enhance the model's fitting ability in scenarios where pure interactions play a crucial role.
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)