Plotting

 Genre


Task-Free Continual Learning via Online Discrepancy Distance Learning

Neural Information Processing Systems

Learning from non-stationary data streams, also called Task-Free Continual Learning (TFCL) remains challenging due to the absence of explicit task information in most applications. Even though recently some algorithms have been proposed for TFCL, these methods lack theoretical guarantees. Moreover, there are no theoretical studies about forgetting during TFCL. This paper develops a new theoretical analysis framework that derives generalization bounds based on the discrepancy distance between the visited samples and the entire information made available for training the model. This analysis provides new insights into the forgetting behaviour in classification tasks. Inspired by this theoretical model, we propose a new approach enabled with the dynamic component expansion mechanism for a mixture model, namely Online Discrepancy Distance Learning (ODDL). ODDL estimates the discrepancy between the current memory and the already accumulated knowledge as an expansion signal aiming to ensure a compact network architecture with optimal performance. We then propose a new sample selection approach that selectively stores the samples into the memory buffer through the discrepancybased measure, further improving the performance. We perform several TFCL experiments with the proposed methodology, which demonstrate that the proposed approach achieves the state of the art performance.


Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization

Neural Information Processing Systems

Despite the significant interests and many progresses in decentralized multi-player multi-armed bandits (MP-MAB) problems in recent years, the regret gap to the natural centralized lower bound in the heterogeneous MP-MAB setting remains open. In this paper, we propose BEACON - Batched Exploration with Adaptive COmmunicatioN - that closes this gap. BEACON accomplishes this goal with novel contributions in implicit communication and efficient exploration. For the former, we propose a novel adaptive differential communication (ADC) design that significantly improves the implicit communication efficiency. For the latter, a carefully crafted batched exploration scheme is developed to enable incorporation of the combinatorial upper confidence bound (CUCB) principle. We then generalize the existing linear-reward MP-MAB problems, where the system reward is always the sum of individually collected rewards, to a new MP-MAB problem where the system reward is a general (nonlinear) function of individual rewards. We extend BEACON to solve this problem and prove a logarithmic regret. BEACON bridges the algorithm design and regret analysis of combinatorial MAB (CMAB) and MP-MAB, two largely disjointed areas in MAB, and the results in this paper suggest that this previously ignored connection is worth further investigation.


Hidden city built 5,000 years ago by lost advanced civilization discovered underneath vast desert

Daily Mail - Science & tech

For centuries, the Rub' al-Khali desert near Saudi Arabia and Dubai -- known as the Empty Quarter -- was dismissed as a lifeless sea of sand. In 2002, Sheikh Mohammed bin Rashid Al Maktoum, ruler of Dubai, spotted unusual dune formations and a large black deposit while flying over the desert. That led to the discovery of Saruq Al-Hadid, an archaeological site rich in remnants of copper and iron smelting, which is now believed to be part of a 5,000-year-old civilization buried beneath the sands. Researchers have now found traces of this ancient society approximately 10 feet beneath the desert surface, hidden in plain sight and long overlooked due to the harsh environment and shifting dunes of the Empty Quarter. This discovery brings fresh life to the legend of a mythical city known as'Atlantis of the Sands.'


Chatbots will be able to teach children TWICE as fast as teachers in the next 10 years, says the 'godfather of AI'

Daily Mail - Science & tech

Chatbots will be able to teach children more than twice as fast as teachers can within the next decade, the so-called godfather of AI has predicted. Geoffrey Hinton, who won a Nobel Prize for his work on the technology, also claimed AI personal tutors would'be much more efficient and less boring'. Speaking at Gitex Europe, the British computer scientist said: 'It's not there yet, but it's coming, and so we'll get much better education at many levels.' AI personal tutors are already being trialled in UK schools, with the technology now able to talk directly to the student and adapt lesson plans to their knowledge level. The government has already funnelled millions of pounds into AI education initiatives โ€“ though it has claimed the technology will'absolutely not' replace teachers.


On the Benefits of Public Representations for Private Transfer Learning under Distribution Shift

Neural Information Processing Systems

Public pretraining is a promising approach to improve differentially private model training. However, recent work has noted that many positive research results studying this paradigm only consider in-distribution tasks, and may not apply to settings where there is distribution shift between the pretraining and finetuning data--a scenario that is likely when finetuning private tasks due to the sensitive nature of the data. In this work, we show empirically across three tasks that even in settings with large distribution shift, where both zero-shot performance from public data and training from scratch with private data give unusably weak results, public features can in fact improve private training accuracy by up to 67% over private training from scratch. We provide a theoretical explanation for this phenomenon, showing that if the public and private data share a low-dimensional representation, public representations can improve the sample complexity of private training even if it is impossible to learn the private task from the public data alone. Altogether, our results provide evidence that public data can indeed make private training practical in realistic settings of extreme distribution shift.


AI Is Eating Data Center Power Demand--and It's Only Getting Worse

WIRED

AI's energy use already represents as much as 20 percent of global data-center power demand, research published Thursday in the journal Joule shows. That demand from AI, the research states, could double by the end of this year, comprising nearly half of all total data-center electricity consumption worldwide, excluding the electricity used for bitcoin mining. The new research is published in a commentary by Alex de Vries-Gao, the founder of Digiconomist, a research company that evaluates the environmental impact of technology. De Vries-Gao started Digiconomist in the late 2010s to explore the impact of bitcoin mining, another extremely energy-intensive activity, would have on the environment. Looking at AI, he says, has grown more urgent over the past few years because of the widespread adoption of ChatGPT and other large language models that use massive amounts of energy. According to his research, worldwide AI energy demand is now set to surpass demand from bitcoin mining by the end of this year.


Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

Neural Information Processing Systems

Batch normalization dramatically increases the largest trainable depth of residual networks, and this benefit has been crucial to the empirical success of deep residual networks on a wide range of benchmarks. We show that this key benefit arises because, at initialization, batch normalization downscales the residual branch relative to the skip connection, by a normalizing factor on the order of the square root of the network depth. This ensures that, early in training, the function computed by normalized residual blocks in deep networks is close to the identity function (on average). We use this insight to develop a simple initialization scheme that can train deep residual networks without normalization. We also provide a detailed empirical study of residual networks, which clarifies that, although batch normalized networks can be trained with larger learning rates, this effect is only beneficial in specific compute regimes, and has minimal benefits when the batch size is small.


Anthropic's latest Claude AI models are here - and you can try one for free today

ZDNet

Since its founding in 2021, Anthropic has quickly become one of the leading AI companies and a worthy competitor to OpenAI, Google, and Microsoft with its Claude models. Building on this momentum, the company held its first developer conference, Thursday, -- Code with Claude -- which showcased what the company has done so far and where it is going next. Also: I let Google's Jules AI agent into my code repo and it did four hours of work in an instant Anthropic used the event stage to unveil two highly anticipated models, Claude Opus 4 and Claude Sonnet 4. Both offer improvements over their preceding models, including better performance in coding and reasoning. Beyond that, the company launched new features and tools for its models that should improve the user experience. Keep reading to learn more about the new models.


News/Media Alliance says Google's AI takes content by force

Mashable

Is Google's new AI Mode feature theft? The News/Media Alliance, trade association representing news media organizations in the U.S. and Canada, certainly thinks so. At Google's I/O showcase earlier this week, the tech company announced the public release of AI Mode in Google Search. AI Mode expands AI Overviews in search and signifies a pivot away from Google's traditional search. Users will see a tab at the top of their Google Search page that takes them to a chatbot interface much like, say, ChatGPT, instead of your typical Google Search results.


Nonlinear dynamics of localization in neural receptive fields

Neural Information Processing Systems

Localized receptive fields--neurons that are selective for certain contiguous spatiotemporal features of their input--populate early sensory regions of the mammalian brain. Unsupervised learning algorithms that optimize explicit sparsity or independence criteria replicate features of these localized receptive fields, but fail to explain directly how localization arises through learning without efficient coding, as occurs in early layers of deep neural networks and might occur in early sensory regions of biological systems. We consider an alternative model in which localized receptive fields emerge without explicit top-down efficiency constraints--a feedforward neural network trained on a data model inspired by the structure of natural images. Previous work identified the importance of non-Gaussian statistics to localization in this setting but left open questions about the mechanisms driving dynamical emergence. We address these questions by deriving the effective learning dynamics for a single nonlinear neuron, making precise how higher-order statistical properties of the input data drive emergent localization, and we demonstrate that the predictions of these effective dynamics extend to the many-neuron setting. Our analysis provides an alternative explanation for the ubiquity of localization as resulting from the nonlinear dynamics of learning in neural circuits.