tidyverse
Tidy Modeling with R
Welcome to Tidy Modeling with R! This book is a guide to using a collection of software in the R programming language for model building called tidymodels, and it has two main goals: First and foremost, this book provides a practical introduction to how to use these specific R packages to create models. We focus on a dialect of R called the tidyverse that is designed with a consistent, human-centered philosophy, and demonstrate how the tidyverse and the tidymodels packages can be used to produce high quality statistical and machine learning models. Second, this book will show you how to develop good methodology and statistical practices. Whenever possible, our software, documentation, and other materials attempt to prevent common pitfalls. In Chapter 1, we outline a taxonomy for models and highlight what good software for modeling is like.
GitHub - business-science/timetk: Time series analysis in the `tidyverse`
There are many R packages for working with Time Series data. Here's how timetk compares to the "tidy" time series R packages for data visualization, wrangling, and feature engineeering (those that leverage data frames or tibbles). Timetk is an amazing package that is part of the modeltime ecosystem for time series analysis and forecasting. Your probably thinking how am I ever going to learn time series forecasting. Here's the solution that will save you years of struggling.
Train and deploy ML models with R and plumber on Vertex AI
R is one of the most widely used programming languages for statistical computing and machine learning. Many data scientists love it, especially for the rich world of packages from tidyverse, an opinionated collection of R packages for data science. Besides the tidyverse, there are over 18,000 open-source packages on CRAN, the package repository for R. RStudio, available as desktop version or on the Google Cloud Marketplace, is a popular Integrated Development Environment (IDE) used by data professionals for visualization and machine learning model development. Once a model has been built successfully, a recurring question among data scientists is: "How do I deploy models written in the R language to production in a scalable, reliable and low-maintenance way?" In this blog post, you will walk through how to use Google Vertex AI to train and deploy enterprise-grade machine learning models built with R. Managing machine learning models on Vertex AI can be done in a variety of ways, including using the User Interface of the Google Cloud Console, API calls, or the Vertex AI SDK for Python.
Fall & Winter Workshop Roundup
We'll be hosting a few different workshops in a variety of cities across the US and UK. See below for more details on each workshop and how to register. Chief Data Scientist Hadley Wickham is hosting his popular "Building Tidy Tools" workshop in Atlanta, Georgia this October. You should take this workshop if you have experience programming in R and want to learn how to tackle larger scale problems. You'll get the most from it if you're already familiar with functions and are comfortable with R's basic data structures (vectors, matrices, arrays, lists, and data frames).
Five things to focus on as an aspiring Data Scientist
With so many interesting machine learning models to learn about and problems to solve it can be easy to forget about one of the core skills you actually need to be an effective data scientist. The vast majority of data scientists are going to have end-to-end responsibility for owning their own data pipelines and sourcing their own data. Also -- you should never have to rely on someone else to pull data for you. It might now be one of the "sexy" skills that are associated with the job, but it is essential (trust me on this one, in FAANG companies you will likely be tested on SQL at interview as an initial screening task). This is perhaps a little more controversial.
Text Mining with R: The Free eBook - KDnuggets
I readily admit that I'm biased toward Python. This isn't intentional -- such is the case with many biases -- but coming from a computer science background and having been programming since a very young age, I have naturally tended towards general purpose programming languages (Java, C, C, Python, etc.). This is the major reason that Python books and resources are at the forefront of my radar, recommendations, and reviews. Obviously, however, not all data scientists are in this same position, given that there are innumerable paths to data science. Given that, and since R is powerful and popular programming language for a large swath of data scientists, today let's take a look at a book which uses R as a tool to implement solutions to data science problems.
Tidymodels: tidy machine learning in R
Over the past few years, tidymodels has been gradually emerging as the tidyverse's machine learning toolkit. Well, it turns out that R has a consistency problem. Since everything was made by different people and using different principles, everything has a slightly different interface, and trying to keep everything in line can be frustrating. Several years ago, Max Kuhn (formerly at Pfeizer, now at RStudio) developed the caret R package (see my caret tutorial) aimed at creating a uniform interface for the massive variety of machine learning models that exist in R. Caret was great in a lot of ways, but also limited in others. In my own use, I found it to be quite slow whenever I tried to use on problems of any kind of modest size.
Manning Introduces - Machine Learning with R, tidyverse, and mlr
With ML algorithms, you can cluster and classify data for tasks like making recommendations or fraud detection and make predictions for sales trends, risk analysis, and other forecasts. Once the domain of academic data scientists, machine learning has become a mainstream business process, and tools like the easy-to-learn R programming language put high-quality data analysis in the hands of any programmer. Machine Learning with R, tidyverse, and mlr teaches you widely used ML techniques and how to apply them to your own datasets using the R programming language and its powerful ecosystem of tools. This book will get you started!
matloff/R-vs.-Python-for-Data-Science
This Web page is aimed at shedding some light on the perennial R-vs.-Python debates in the Data Science community. As a professional computer scientist and statistician, I hope to shed some useful light on the topic. I have potential bias: I've written four R-related books, I've given a keynote talk at useR!; I currently serve as Editor-in-Chief of the R Journal; etc. But I hope this analysis will be considered fair and helpful. This is subjective, of course, but having written (and taught) in many different programming languages, I really appreciate Python's greatly reduced use of parentheses and braces: This is of particular interest to me, as an educator.
R vs. Python: Which is a better programming language for data science?
Python vs. R is a common debate among data scientists, as both languages are useful for data work and among the most frequently mentioned skills in job postings for data science positions. Each language offers different advantages and disadvantages for data science work, and should be chosen depending on the work you are doing. To help data scientists select the right language, Norm Matloff, a professor of computer science at the University of California Davis wrote a Github post aiming to shed some light on the debate. While this is subjective, Python greatly reduces the use of parentheses and braces when coding, making it more sleek, Matloff wrote in the post. While data scientists working with Python must learn a lot of material to get started, including NumPy, Pandas and matplotlib, matrix types and basic graphics are already built into base R, Matloff wrote.