Goto

Collaborating Authors

 Redwood City


ParallelandEfficientHierarchicalk-Median Clustering

Neural Information Processing Systems

Inparticular,standardmetricformulations as hierarchical k-center,k-means, andk-median received a lot of attention and the problems have been studied extensively in different models of computation.





Meta is reportedly working on a new AI model called 'Avocado' and it might not be open source

Engadget

GPU prices could follow RAM's big rise Meta is reportedly working on a new AI model called'Avocado' and it might not be open source Mark Zuckerberg has been shaking up the company's AI strategy as it pursues superintelligence. Meta CEO Mark Zuckerberg speaks during an event at the Biohub Imaging Institute in Redwood City, Calif., Wednesday, Nov. 5, 2025. Mark Zuckerberg has for months publicly hinted that he is backing away from open-source AI models. Now, Meta's latest AI pivot is starting to come into focus. The company is reportedly working on a new model, known inside of Meta as Avocado, which could mark a major shift away from its previous open-source approach to AI development.



AI drives dramatic expansion of Chan Zuckerberg Initiative's funding to end all diseases

Science

As the promise of artificial intelligence (AI) captivates biomedicine, few people are riding the wave like Priscilla Chan--because few people have her resources. Trained as a pediatrician, Chan and her husband, Facebook creator Mark Zuckerberg, co-run a philanthropy that launched in 2015 with the wildly ambitious--some would say quixotic--goal of curing, preventing, or managing every disease by the end of the century. The couple pledged nearly their entire fortune-- 45 billion then and more than 200 billion today--to the Chan Zuckerberg Initiative (CZI), which would also support their education and progressive causes. Recently, however, the foundation has wound down support for almost everything but science. And this week, CZI announced it is increasing its research spending, doubling down on AI, and vowing to meet Chan and Zuckerberg's biomedical goal even earlier--although CZI won't set a specific target.


Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models

arXiv.org Machine Learning

Computational modeling of single-cell gene expression is crucial for understanding cellular processes, but generating realistic expression profiles remains a major challenge. This difficulty arises from the count nature of gene expression data and complex latent dependencies among genes. Existing generative models often impose artificial gene orderings or rely on shallow neural network architectures. We introduce a scalable latent diffusion model for single-cell gene expression data, which we refer to as scLDM, that respects the fundamental exchangeability property of the data. Our VAE uses fixed-size latent variables leveraging a unified Multi-head Cross-Attention Block (MCAB) architecture, which serves dual roles: permutation-invariant pooling in the encoder and permutation-equivariant unpooling in the decoder. We enhance this framework by replacing the Gaussian prior with a latent diffusion model using Diffusion Transformers and linear interpolants, enabling high-quality generation with multi-conditional classifier-free guidance. We show its superior performance in a variety of experiments for both observational and perturbational single-cell data, as well as downstream tasks like cell-level classification.


Metadata Extraction Leveraging Large Language Models

arXiv.org Machine Learning

The advent of Large Language Models has revolutionized tasks across domains, including the automation of legal document analysis, a critical component of modern contract management systems. This paper presents a comprehensive implementation of LLM-enhanced metadata extraction for contract review, focusing on the automatic detection and annotation of salient legal clauses. Leveraging both the publicly available Contract Understanding Atticus Dataset (CUAD) and proprietary contract datasets, our work demonstrates the integration of advanced LLM methodologies with practical applications. We identify three pivotal elements for optimizing metadata extraction: robust text conversion, strategic chunk selection, and advanced LLM-specific techniques, including Chain of Thought (CoT) prompting and structured tool calling. The results from our experiments highlight the substantial improvements in clause identification accuracy and efficiency. Our approach shows promise in reducing the time and cost associated with contract review while maintaining high accuracy in legal clause identification. The results suggest that carefully optimized LLM systems could serve as valuable tools for legal professionals, potentially increasing access to efficient contract review services for organizations of all sizes.