Goto

Collaborating Authors

Task-Free Continual Learning via Online Discrepancy Distance Learning

Neural Information Processing Systems

Learning from non-stationary data streams, also called Task-Free Continual Learning (TFCL) remains challenging due to the absence of explicit task information in most applications. Even though recently some algorithms have been proposed for TFCL, these methods lack theoretical guarantees. Moreover, there are no theoretical studies about forgetting during TFCL. This paper develops a new theoretical analysis framework that derives generalization bounds based on the discrepancy distance between the visited samples and the entire information made available for training the model. This analysis provides new insights into the forgetting behaviour in classification tasks. Inspired by this theoretical model, we propose a new approach enabled with the dynamic component expansion mechanism for a mixture model, namely Online Discrepancy Distance Learning (ODDL). ODDL estimates the discrepancy between the current memory and the already accumulated knowledge as an expansion signal aiming to ensure a compact network architecture with optimal performance. We then propose a new sample selection approach that selectively stores the samples into the memory buffer through the discrepancybased measure, further improving the performance. We perform several TFCL experiments with the proposed methodology, which demonstrate that the proposed approach achieves the state of the art performance.


Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization

Neural Information Processing Systems

Despite the significant interests and many progresses in decentralized multi-player multi-armed bandits (MP-MAB) problems in recent years, the regret gap to the natural centralized lower bound in the heterogeneous MP-MAB setting remains open. In this paper, we propose BEACON - Batched Exploration with Adaptive COmmunicatioN - that closes this gap. BEACON accomplishes this goal with novel contributions in implicit communication and efficient exploration. For the former, we propose a novel adaptive differential communication (ADC) design that significantly improves the implicit communication efficiency. For the latter, a carefully crafted batched exploration scheme is developed to enable incorporation of the combinatorial upper confidence bound (CUCB) principle. We then generalize the existing linear-reward MP-MAB problems, where the system reward is always the sum of individually collected rewards, to a new MP-MAB problem where the system reward is a general (nonlinear) function of individual rewards. We extend BEACON to solve this problem and prove a logarithmic regret. BEACON bridges the algorithm design and regret analysis of combinatorial MAB (CMAB) and MP-MAB, two largely disjointed areas in MAB, and the results in this paper suggest that this previously ignored connection is worth further investigation.


Amazons latest AI shopping feature produces quick audio product summaries

Mashable

Amazon is aiming to make shopping just a bit easier. This week, Amazon launched a new generative AI feature that produces short audio summaries, detailing everything you need to know about a product. The audio descriptions, which Amazon is calling "hear the highlights", are created from on-page product summaries, reviews, and information from other websites, crafting short snippets that deliver everything you need to know about a product. The product summaries are now available on a limited number of items on Amazon and for US customers only. To access "Hear the highlights", you can do so in the Amazon app.


JD Vance calls dating apps destructive

Mashable

Dating apps are getting a lot of flak lately. Daters are opting for in-person events -- even dungeon sound baths -- and moving away from increasing AI features and apps that seem to be copying each other. Vice President JD Vance also has no love for dating apps, apparently. In an interview on the New York Times's "Interesting Times" podcast, Vance spoke about his "noneconomic" concerns with AI and tech. He told host and Times opinion columnist Ross Douthat, "If you look at basic dating behavior among young people -- and I think a lot of this is that the dating apps are probably more destructive than we fully appreciate."


On the Benefits of Public Representations for Private Transfer Learning under Distribution Shift

Neural Information Processing Systems

Public pretraining is a promising approach to improve differentially private model training. However, recent work has noted that many positive research results studying this paradigm only consider in-distribution tasks, and may not apply to settings where there is distribution shift between the pretraining and finetuning data--a scenario that is likely when finetuning private tasks due to the sensitive nature of the data. In this work, we show empirically across three tasks that even in settings with large distribution shift, where both zero-shot performance from public data and training from scratch with private data give unusably weak results, public features can in fact improve private training accuracy by up to 67% over private training from scratch. We provide a theoretical explanation for this phenomenon, showing that if the public and private data share a low-dimensional representation, public representations can improve the sample complexity of private training even if it is impossible to learn the private task from the public data alone. Altogether, our results provide evidence that public data can indeed make private training practical in realistic settings of extreme distribution shift.


AI Is Eating Data Center Power Demand--and It's Only Getting Worse

WIRED

AI's energy use already represents as much as 20 percent of global data-center power demand, research published Thursday in the journal Joule shows. That demand from AI, the research states, could double by the end of this year, comprising nearly half of all total data-center electricity consumption worldwide, excluding the electricity used for bitcoin mining. The new research is published in a commentary by Alex de Vries-Gao, the founder of Digiconomist, a research company that evaluates the environmental impact of technology. De Vries-Gao started Digiconomist in the late 2010s to explore the impact of bitcoin mining, another extremely energy-intensive activity, would have on the environment. Looking at AI, he says, has grown more urgent over the past few years because of the widespread adoption of ChatGPT and other large language models that use massive amounts of energy. According to his research, worldwide AI energy demand is now set to surpass demand from bitcoin mining by the end of this year.


Anthropics new Claude Opus 4 can run autonomously for seven hours straight

Mashable

After whirlwind week of announcements from Google and OpenAI, Anthropic has its own news to share. On Thursday, Anthropic announced Claude Opus 4 and Claude Sonnet 4, its next generation of models, with an emphasis on coding, reasoning, and agentic capabilities. According to Rakuten, which got early access to the model, Claude Opus 4 ran "independently for seven hours with sustained performance." Claude Opus is Anthropic's largest version of the model family with more power for longer, complex tasks, whereas Sonnet is generally speedier and more efficient. Claude Opus 4 is a step up from its previous version, Opus 3, and Sonnet 4 replaces Sonnet 3.7.


How to try Veo 3, Google's AI video generator that's going viral on the internet

ZDNet

AI-generated video has been advancing rapidly, with leading tech developers racing to build and commercialize their own models. We're now seeing the rise of tools that can generate strikingly photorealistic video from a single prompt in natural language. For the most part, however, AI-generated video has had a glaring shortcoming: it's silent. At its annual I/O developer conference on Tuesday, Google announced the release of Veo 3, the latest iteration of its video-generating AI model, which also comes with the ability to generate synchronized audio. Imagine you prompt the system to generate a video set inside a busy subway car, for example.


Congress Passed a Sweeping Free-Speech Crackdown--and No One's Talking About It

Slate

Sign up for the Slatest to get the most insightful analysis, criticism, and advice out there, delivered to your inbox daily. Had you scanned any of the latest headlines around the TAKE IT DOWN Act, legislation that President Donald Trump signed into law Monday, you would have come away with a deeply mistaken impression of the bill and its true purpose. The surface-level pitch is that this is a necessary law for addressing nonconsensual intimate images--known more widely as revenge porn. Obfuscating its intent with a classic congressional acronym (Tools to Address Known Exploitation by Immobilizing Technological Deepfakes on Websites and Networks), the TAKE IT DOWN Act purports to help scrub the internet of exploitative, nonconsensual sexual media, whether real or digitally mocked up, at a time when artificial intelligence tools and automated image generators have supercharged its spread. Enforcement is delegated to the Federal Trade Commission, which will give online communities that specialize primarily in user-generated content (e.g., social media, message boards) a heads-up and a 48-hour takedown deadline whenever an appropriate example is reported.


A Definition of a batch normalization layer

Neural Information Processing Systems

A small constant is included in the denominator for numerical stability. For distributed training, the batch statistics are usually estimated locally on a subset of the training minibatch ("ghost batch normalization" [32]). In figure 2 of the main text, we studied the variance of hidden activations and the batch statistics of residual blocks at a range of depths in three different architectures; a deep linear fully connected unnormalized residual network, a deep linear fully connected normalized residual network and a deep convolutional normalized residual network with ReLUs. We now define the three models in full. Deep fully connected linear residual network without normalization: The inputs are 100 dimensional vectors composed of independent random samples from the unit normal distribution, and the batch size is 1000.