Researchers from all over the world contribute to this repository as a prelude to the peer review process for publication in traditional journals. We hope to save you some time by picking out articles that represent the most promise for the typical data scientist. The articles listed below represent a fraction of all articles appearing on the preprint server. They are listed in no particular order with a link to each paper along with a brief overview. Especially relevant articles are marked with a "thumbs up" icon.
The discipline of AI most often mentioned these days is deep learning (DL) along with its many incarnations implemented with deep neural networks. DL also is a rapidly accelerating area of research with papers being published at a fast clip by research teams from around the globe. I enjoy keeping a pulse on deep learning research and so far in 2021 research innovations have propagated at a quick pace. In this article, we'll take a brief tour of my top picks for deep learning research (in no particular order) of papers that I found to be particularly compelling. I'm pretty attached to this leading-edge research. I'm known to carry a thick folder of recent research papers around in my backpack and consume all the great developments when I have a spare moment.
Overfitting & underfitting and stable training are an important challenges in machine learning. Current approaches for these issues are mixup, SamplePairing and BC learning. In our work, we state the hypothesis that mixing many images together can be more effective than just two. Batchboost pipeline has three stages: (a) pairing: method of selecting two samples. (b) mixing: how to create a new one from two samples. (c) feeding: combining mixed samples with new ones from dataset into batch (with ratio $\gamma$). Note that sample that appears in our batch propagates with subsequent iterations with less and less importance until the end of training. Pairing stage calculates the error per sample, sorts the samples and pairs with strategy: hardest with easiest one, than mixing stage merges two samples using mixup, $x_1 + (1-\lambda)x_2$. Finally, feeding stage combines new samples with mixed by ratio 1:1. Batchboost has 0.5-3% better accuracy than the current state-of-the-art mixup regularization on CIFAR-10 & Fashion-MNIST. Our method is slightly better than SamplePairing technique on small datasets (up to 5%). Batchboost provides stable training on not tuned parameters (like weight decay), thus its a good method to test performance of different architectures. Source code is at: https://github.com/maciejczyzewski/batchboost
Deep learning has continued its forward movement during 2019 with advances in many exciting research areas like generative adversarial networks (GANs), auto-encoders, and reinforcement learning. In terms of deployments, deep learning is the darling of many contemporary application areas such as computer vision, image recognition, speech recognition, natural language processing, machine translation, autonomous vehicles, and many more. Earlier this year, we saw Google AI Language revolutionize the NLP segment of deep learning with the new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. The already seminal paper was released on arXiv on May 24. This has led to a storm of follow-on research results.
Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline. At the core of this revolution lies the tools and the methods that are driving it, from processing the massive piles of data generated each day to learning from and taking useful action. Deep neural networks, along with advancements in classical ML and scalable general-purpose GPU computing, have become critical components of artificial intelligence, enabling many of these astounding breakthroughs and lowering the barrier to adoption. Python continues to be the most preferred language for scientific computing, data science, and machine learning, boosting both performance and productivity by enabling the use of low-level libraries and clean high-level APIs. This survey offers insight into the field of machine learning with Python, taking a tour through important topics to identify some of the core hardware and software paradigms that have enabled it. We cover widely-used libraries and concepts, collected together for holistic comparison, with the goal of educating the reader and driving the field of Python machine learning forward.