output
Learning to Add, Multiply, and Execute Algorithmic Instructions Exactly with Neural Networks
Neural networks are known for their ability to approximate smooth functions, yet they fail to generalize perfectly to unseen inputs when trained on discrete operations. Such operations lie at the heart of algorithmic tasks such as arithmetic, which is often used as a test bed for algorithmic execution in neural networks. In this work, we ask: can neural networks learn to execute binary-encoded algorithmic instructions exactly? We use the Neural Tangent Kernel (NTK) framework to study the training dynamics of two-layer fully connected networks in the infinite-width limit and show how a sufficiently large ensemble of such models can be trained to execute exactly, with high probability, four fundamental tasks: binary permutations, binary addition, binary multiplication, and Subtract and Branch if Negative (SBN) instructions. Since SBN is Turing-complete, our framework extends to computable functions. We show how this can be efficiently achieved using only logarithmically many training data. Our approach relies on two techniques: structuring the training data to isolate bit-level rules, and controlling correlations in the NTK regime to align model predictions with the target algorithmic executions.
Approximating Real-Time Recurrent Learning with Random Kronecker Factors
Asier Mujika, Florian Meier, Angelika Steger
Wealso confirm these theoretical results experimentally. Further,we showempirically thattheKF-RTRLalgorithm captures long-term dependencies and almost matches the performance of TBPTT on real world tasks by trainingRecurrent Highway Networks on a synthetic string memorization task and onthe Penn TreeBank task, respectively.
e8258e5140317ff36c7f8225a3bf9590-Supplemental.pdf
The original MuZero did not use sticky actions (Machado et al., 2017) (a 25% chance that the selected action is ignored and that instead the previous action is repeated) for Atari experiments. For all experiments in this work we used a network architecture based on the one introduced by MuZero(Schrittwieser etal.,2020), To implement the network, we used the modules provided by the Haiku neural network library (Henniganetal.,2020). We did not observe any benefit from using a Gaussian mixture, so instead inallourexperiments weusedasingle Gaussian withdiagonal covariance. All experiments used the Adam optimiser (Kingma & Ba, 2015) with decoupled weight decay (Loshchilov & Hutter, 2017) for training.
ShoppingMMLU: AMassiveMulti-TaskOnline ShoppingBenchmarkforLargeLanguageModels
However,existingmodelsand benchmarks are commonly tailored to specific tasks, falling short of capturing the full complexity of online shopping. Large Language Models (LLMs), with their multi-task and few-shot learning abilities, have the potential to profoundly transform online shopping byalleviating task-specific engineering effortsandby providing users with interactiveconversations.
GreenMachine: Automatic Design of Zero-Cost Proxies for Energy-Efficient NAS
Cortês, Gabriel, Lourenço, Nuno, Machado, Penousal
Artificial Intelligence (AI) has driven innovations and created new opportunities across various sectors. However, leveraging domain-specific knowledge often requires automated tools to design and configure models effectively. In the case of Deep Neural Networks (DNNs), researchers and practitioners usually resort to Neural Architecture Search (NAS) approaches, which are resource- and time-intensive, requiring the training and evaluation of numerous candidate architectures. This raises sustainability concerns, particularly due to the high energy demands involved, creating a paradox: the pursuit of the most effective model can undermine sustainability goals. To mitigate this issue, zero-cost proxies have emerged as a promising alternative. These proxies estimate a model's performance without the need for full training, offering a more efficient approach. This paper addresses the challenges of model evaluation by automatically designing zero-cost proxies to assess DNNs efficiently. Our method begins with a randomly generated set of zero-cost proxies, which are evolved and tested using the NATS-Bench benchmark. We assess the proxies' effectiveness using both randomly sampled and stratified subsets of the search space, ensuring they can differentiate between low- and high-performing networks and enhance generalizability. Results show our method outperforms existing approaches on the stratified sampling strategy, achieving strong correlations with ground truth performance, including a Kendall correlation of 0.89 on CIFAR-10 and 0.77 on CIFAR-100 with NATS-Bench-SSS and a Kendall correlation of 0.78 on CIFAR-10 and 0.71 on CIFAR-100 with NATS-Bench-TSS.
GeoGalactica: A Scientific Large Language Model in Geoscience
Lin, Zhouhan, Deng, Cheng, Zhou, Le, Zhang, Tianhang, Xu, Yi, Xu, Yutong, He, Zhongmou, Shi, Yuanyuan, Dai, Beiya, Song, Yunchong, Zeng, Boyi, Chen, Qiyuan, Shi, Tao, Huang, Tianyu, Xu, Yiwei, Wang, Shu, Fu, Luoyi, Zhang, Weinan, He, Junxian, Ma, Chao, Zhu, Yunqiang, Wang, Xinbing, Zhou, Chenghu
Large language models (LLMs) have achieved huge success for their general knowledge and ability to solve a wide spectrum of tasks in natural language processing (NLP). Due to their impressive abilities, LLMs have shed light on potential inter-discipline applications to foster scientific discoveries of a specific domain by using artificial intelligence (AI for science, AI4S). In the meantime, utilizing NLP techniques in geoscience research and practice is wide and convoluted, contributing from knowledge extraction and document classification to question answering and knowledge discovery. In this work, we take the initial step to leverage LLM for science, through a rather straightforward approach. We try to specialize an LLM into geoscience, by further pre-training the model with a vast amount of texts in geoscience, as well as supervised fine-tuning (SFT) the resulting model with our custom collected instruction tuning dataset. These efforts result in a model GeoGalactica consisting of 30 billion parameters. To our best knowledge, it is the largest language model for the geoscience domain. More specifically, GeoGalactica is from further pre-training of Galactica. We train GeoGalactica over a geoscience-related text corpus containing 65 billion tokens curated from extensive data sources in the big science project Deep-time Digital Earth (DDE), preserving as the largest geoscience-specific text corpus. Then we fine-tune the model with 1 million pairs of instruction-tuning data consisting of questions that demand professional geoscience knowledge to answer. In this technical report, we will illustrate in detail all aspects of GeoGalactica, including data collection, data cleaning, base model selection, pre-training, SFT, and evaluation. We open-source our data curation tools and the checkpoints of GeoGalactica during the first 3/4 of pre-training.
Top 100 Python Interview Questions You Must Know
In this Python Interview Questions tutorial, I will introduce you to the most frequently asked questions in Python interviews. Our Python Interview Questions is the one-stop resource from where you can boost your interview preparation. We have 100 questions on Python Programming basics which will help you with different expertise levels to reap the maximum benefit from our blog. What is the difference between list and tuples in Python? What are the key features of Python? What type of language is python? How is Python an interpreted language? How is memory managed in Python? What is name space in Python?