Well File:

BoxE: A Box Embedding Model for Knowledge Base Completion

Neural Information Processing Systems

Knowledge base completion (KBC) aims to automatically infer missing facts by exploiting information already present in a knowledge base (KB). A promising approach for KBC is to embed knowledge into latent spaces and make predictions from learned embeddings. However, existing embedding models are subject to at least one of the following limitations: (1) theoretical inexpressivity, (2) lack of support for prominent inference patterns (e.g., hierarchies), (3) lack of support for KBC over higher-arity relations, and (4) lack of support for incorporating logical rules. Here, we propose a spatio-translational embedding model, called BoxE, that simultaneously addresses all these limitations. BoxE embeds entities as points, and relations as a set of hyper-rectangles (or boxes), which spatially characterize basic logical properties. This seemingly simple abstraction yields a fully expressive model offering a natural encoding for many desired logical properties. BoxE can both capture and inject rules from rich classes of rule languages, going well beyond individual inference patterns. By design, BoxE naturally applies to higher-arity KBs. We conduct a detailed experimental analysis, and show that BoxE achieves state-of-the-art performance, both on benchmark knowledge graphs and on more general KBs, and we empirically show the power of integrating logical rules.


The Download: Google's AI mission, and America's reliance on natural gas

MIT Technology Review

If you want to know where AI is headed, this year's Google I/O has you covered. The company's annual showcase of next-gen products, which kicked off yesterday, has all of the pomp and pizzazz, the sizzle reels and celebrity walk-ons, that you'd expect from a multimillion dollar marketing event. But it also shows us just how fast this still-experimental technology is being subsumed into a line-up designed to sell phones and subscription tiers. Never before have I seen this thing we call artificial intelligence appear so normal. Last December, Meta announced plans to build a massive 10 billion data center for training its artificial intelligence models in rural northeast Louisiana.


Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification - supplementary material Francesca Mignacco

Neural Information Processing Systems

The derivation of the self-consistent stochastic process discussed in the main text can be obtained using tools of statistical physics of disordered systems. In particular, it has been done very recently for a related model, the spherical perceptron with random labels, in [1]. Our derivation extends the known DMFT equations by including structure in the data; a stochastic version of gradient descent as discussed in the main text; the relaxation of the spherical constraint over the weights and the introduction of a Ridge regularization term. There are at least two ways to write the DMFT equations. One is by using field theoretical techniques; otherwise one can employ a dynamical version of the so-called cavity method [2].


What AI Thinks It Knows About You

The Atlantic - Technology

Large language models such as GPT, Llama, Claude, and DeepSeek can be so fluent that people feel it as a "you," and it answers encouragingly as an "I." The models can write poetry in nearly any given form, read a set of political speeches and promptly sift out and share all the jokes, draw a chart, code a website. How do they do these and so many other things that were just recently the sole realm of humans? Practitioners are left explaining jaw-dropping conversational rabbit-from-a-hat extractions with arm-waving that the models are just predicting one word at a time from an unthinkably large training set scraped from every recorded written or spoken human utterance that can be found--fair enough--or a with a small shrug and a cryptic utterance of "fine-tuning" or "transformers!" These aren't very satisfying answers for how these models can converse so intelligently, and how they sometimes err so weirdly.


By putting AI into everything, Google wants to make it invisible

MIT Technology Review

Yes, Google's roster of consumer-facing products is the slickest on offer. The firm is bundling most of its multimodal models into its Gemini app, including the new Imagen 4 image generator and the new Veo 3 video generator. That means you can now access Google's full range of generative models via a single chatbot. It also announced Gemini Live, a feature that lets you share your phone's screen or your camera's view with the chatbot and ask it about what it can see. Those features were previously only seen in demos of Project Astra, a "universal AI assistant" that Google DeepMind is working on.


Handling Learnwares from Heterogeneous Feature Spaces with Explicit Label Exploitation

Neural Information Processing Systems

The learnware paradigm aims to help users leverage numerous existing highperforming models instead of starting from scratch, where a learnware consists of a well-trained model and the specification describing its capability. Numerous learnwares are accommodated by a learnware dock system. When users solve tasks with the system, models that fully match the task feature space are often rare or even unavailable. However, models with heterogeneous feature space can still be helpful. This paper finds that label information, particularly model outputs, is helpful yet previously less exploited in the accommodation of heterogeneous learnwares. We extend the specification to better leverage model pseudo-labels and subsequently enrich the unified embedding space for better specification evolvement. With label information, the learnware identification can also be improved by additionally comparing conditional distributions. Experiments demonstrate that, even without a model explicitly tailored to user tasks, the system can effectively handle tasks by leveraging models from diverse feature spaces.


A Details of Experiments

Neural Information Processing Systems

This paper solves three NP-hard routing problems, traveling salesman problem (TSP), prize collecting TSP (PCTSP), and capacitated vehicle routing problem (CVRP). This section provides detailed descriptions of PCTSP and CVRP (for TSP, see section 3). The PCTSP is similar to TSP, while there are differences in that we do not have to visit all the nodes and that the destination is not the first node but the depot node, i.e., a tour is not a cycle. Let N be the number of nodes. R is the prize of visited node.


Variational Distillation of Diffusion Policies into Mixture of Experts Denis Blessing

Neural Information Processing Systems

This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE) through variational inference. Diffusion Models are the current state-of-the-art in generative modeling due to their exceptional ability to accurately learn and represent complex, multi-modal distributions. This ability allows Diffusion Models to replicate the inherent diversity in human behavior, making them the preferred models in behavior learning such as Learning from Human Demonstrations (LfD). However, diffusion models come with some drawbacks, including the intractability of likelihoods and long inference times due to their iterative sampling process. The inference times, in particular, pose a significant challenge to real-time applications such as robot control. In contrast, MoEs effectively address the aforementioned issues while retaining the ability to represent complex distributions but are notoriously difficult to train.


I'm an AI expert, and these 8 announcements at Google I/O impressed me the most

ZDNet

The past two Google I/O developer conferences have mainly been AI events, and this year is no different. The tech giant used the stage to unveil features across all its most popular products, even bringing AI experiments that were previously announced to fruition. This means that dozens of AI features and tools were unveiled. They're meant to transform how you use Google offerings, including how you shop, video call, sort your inbox, search the web, create images, edit video, code, and more. Since such a firehose of information is packed into a two-hour keynote address, you may be wondering which features are actually worth paying attention to.


Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference

Neural Information Processing Systems

A conventional LLM unlearning task typically involves two goals: (1) The target LLM should forget the knowledge in the specified forget documents, and (2) it should retain the other knowledge that the LLM possesses, for which we assume access to a small number of retain documents. To achieve both goals, a mainstream class of LLM unlearning methods introduces an optimization framework with a combination of two objectives - maximizing the prediction loss on the forget documents while minimizing that on the retain documents, which suffers from two challenges, degenerated output and catastrophic forgetting. In this paper, we propose a novel unlearning framework called Unlearning from Logit Difference (ULD), which introduces an assistant LLM that aims to achieve the opposite of the unlearning goals: remembering the forget documents and forgetting the retain knowledge. ULD then derives the unlearned LLM by computing the logit difference between the target and the assistant LLMs. We show that such reversed objectives would naturally resolve both aforementioned challenges while significantly improving the training efficiency. Extensive experiments demonstrate that our method efficiently achieves the intended forgetting while preserving the LLM's overall capabilities, reducing training time by more than threefold. Notably, our method loses 0% of model utility on the ToFU benchmark, whereas baseline methods may sacrifice 17% of utility on average to achieve comparable forget quality.