output
Approximating Real-Time Recurrent Learning with Random Kronecker Factors
Asier Mujika, Florian Meier, Angelika Steger
Wealso confirm these theoretical results experimentally. Further,we showempirically thattheKF-RTRLalgorithm captures long-term dependencies and almost matches the performance of TBPTT on real world tasks by trainingRecurrent Highway Networks on a synthetic string memorization task and onthe Penn TreeBank task, respectively.
- Europe > Switzerland (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
e8258e5140317ff36c7f8225a3bf9590-Supplemental.pdf
The original MuZero did not use sticky actions (Machado et al., 2017) (a 25% chance that the selected action is ignored and that instead the previous action is repeated) for Atari experiments. For all experiments in this work we used a network architecture based on the one introduced by MuZero(Schrittwieser etal.,2020), To implement the network, we used the modules provided by the Haiku neural network library (Henniganetal.,2020). We did not observe any benefit from using a Gaussian mixture, so instead inallourexperiments weusedasingle Gaussian withdiagonal covariance. All experiments used the Adam optimiser (Kingma & Ba, 2015) with decoupled weight decay (Loshchilov & Hutter, 2017) for training.
ShoppingMMLU: AMassiveMulti-TaskOnline ShoppingBenchmarkforLargeLanguageModels
However,existingmodelsand benchmarks are commonly tailored to specific tasks, falling short of capturing the full complexity of online shopping. Large Language Models (LLMs), with their multi-task and few-shot learning abilities, have the potential to profoundly transform online shopping byalleviating task-specific engineering effortsandby providing users with interactiveconversations.
- Retail > Online (0.58)
- Information Technology (0.58)
- North America > United States > Montana (0.04)
- North America > United States > Alabama (0.04)
- Asia > China > Jiangsu Province > Yancheng (0.04)
- (2 more...)
- North America > United States > California (0.46)
- Asia (0.45)
- North America > United States > New York (0.14)
- Europe > France (0.14)
- Transportation > Ground > Road (1.00)
- Transportation > Electric Vehicle (1.00)
- Energy > Power Industry (1.00)
- (8 more...)
GreenMachine: Automatic Design of Zero-Cost Proxies for Energy-Efficient NAS
Cortês, Gabriel, Lourenço, Nuno, Machado, Penousal
Artificial Intelligence (AI) has driven innovations and created new opportunities across various sectors. However, leveraging domain-specific knowledge often requires automated tools to design and configure models effectively. In the case of Deep Neural Networks (DNNs), researchers and practitioners usually resort to Neural Architecture Search (NAS) approaches, which are resource- and time-intensive, requiring the training and evaluation of numerous candidate architectures. This raises sustainability concerns, particularly due to the high energy demands involved, creating a paradox: the pursuit of the most effective model can undermine sustainability goals. To mitigate this issue, zero-cost proxies have emerged as a promising alternative. These proxies estimate a model's performance without the need for full training, offering a more efficient approach. This paper addresses the challenges of model evaluation by automatically designing zero-cost proxies to assess DNNs efficiently. Our method begins with a randomly generated set of zero-cost proxies, which are evolved and tested using the NATS-Bench benchmark. We assess the proxies' effectiveness using both randomly sampled and stratified subsets of the search space, ensuring they can differentiate between low- and high-performing networks and enhance generalizability. Results show our method outperforms existing approaches on the stratified sampling strategy, achieving strong correlations with ground truth performance, including a Kendall correlation of 0.89 on CIFAR-10 and 0.77 on CIFAR-100 with NATS-Bench-SSS and a Kendall correlation of 0.78 on CIFAR-10 and 0.71 on CIFAR-100 with NATS-Bench-TSS.
- Europe > Switzerland (0.04)
- Europe > Austria (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (8 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
GeoGalactica: A Scientific Large Language Model in Geoscience
Lin, Zhouhan, Deng, Cheng, Zhou, Le, Zhang, Tianhang, Xu, Yi, Xu, Yutong, He, Zhongmou, Shi, Yuanyuan, Dai, Beiya, Song, Yunchong, Zeng, Boyi, Chen, Qiyuan, Shi, Tao, Huang, Tianyu, Xu, Yiwei, Wang, Shu, Fu, Luoyi, Zhang, Weinan, He, Junxian, Ma, Chao, Zhu, Yunqiang, Wang, Xinbing, Zhou, Chenghu
Large language models (LLMs) have achieved huge success for their general knowledge and ability to solve a wide spectrum of tasks in natural language processing (NLP). Due to their impressive abilities, LLMs have shed light on potential inter-discipline applications to foster scientific discoveries of a specific domain by using artificial intelligence (AI for science, AI4S). In the meantime, utilizing NLP techniques in geoscience research and practice is wide and convoluted, contributing from knowledge extraction and document classification to question answering and knowledge discovery. In this work, we take the initial step to leverage LLM for science, through a rather straightforward approach. We try to specialize an LLM into geoscience, by further pre-training the model with a vast amount of texts in geoscience, as well as supervised fine-tuning (SFT) the resulting model with our custom collected instruction tuning dataset. These efforts result in a model GeoGalactica consisting of 30 billion parameters. To our best knowledge, it is the largest language model for the geoscience domain. More specifically, GeoGalactica is from further pre-training of Galactica. We train GeoGalactica over a geoscience-related text corpus containing 65 billion tokens curated from extensive data sources in the big science project Deep-time Digital Earth (DDE), preserving as the largest geoscience-specific text corpus. Then we fine-tune the model with 1 million pairs of instruction-tuning data consisting of questions that demand professional geoscience knowledge to answer. In this technical report, we will illustrate in detail all aspects of GeoGalactica, including data collection, data cleaning, base model selection, pre-training, SFT, and evaluation. We open-source our data curation tools and the checkpoints of GeoGalactica during the first 3/4 of pre-training.
- North America > United States (0.93)
- Asia > Middle East (0.27)
- Europe (0.14)
- Asia > China > Sichuan Province (0.14)
- Materials (1.00)
- Law (1.00)
- Information Technology (1.00)
- (5 more...)
Top 100 Python Interview Questions You Must Know
In this Python Interview Questions tutorial, I will introduce you to the most frequently asked questions in Python interviews. Our Python Interview Questions is the one-stop resource from where you can boost your interview preparation. We have 100 questions on Python Programming basics which will help you with different expertise levels to reap the maximum benefit from our blog. What is the difference between list and tuples in Python? What are the key features of Python? What type of language is python? How is Python an interpreted language? How is memory managed in Python? What is name space in Python?
GitHub - deepmind/mctx: Monte Carlo tree search in JAX
Mctx is a library with a JAX-native implementation of Monte Carlo tree search (MCTS) algorithms such as AlphaZero, MuZero, and Gumbel MuZero. For computation speed up, the implementation fully supports JIT-compilation. Search algorithms in Mctx are defined for and operate on batches of inputs, in parallel. This allows to make the most of the accelerators and enables the algorithms to work with large learned environment models parameterized by deep neural networks. Learning and search have been important topics since the early days of AI research.