This Web page is aimed at shedding some light on the perennial R-vs.-Python debates in the Data Science community. As a professional computer scientist and statistician, I hope to shed some useful light on the topic. I have potential bias -- I've written 4 R-related books, and currently serve as Editor-in-Chief of the R Journal -- but I hope this analysis will be considered fair and helpful. This is subjective, of course, but having written (and taught) in many different programming languages, I really appreciate Python's greatly reduced use of parentheses and braces: This is of particular interest to me, as an educator. I've taught a number of subjects -- math, stat, CS and even English As a Second Language -- and have given intense thought to the learning process for many, many years.
Python vs. R is a common debate among data scientists, as both languages are useful for data work and among the most frequently mentioned skills in job postings for data science positions. Each language offers different advantages and disadvantages for data science work, and should be chosen depending on the work you are doing. To help data scientists select the right language, Norm Matloff, a professor of computer science at the University of California Davis wrote a Github post aiming to shed some light on the debate. While this is subjective, Python greatly reduces the use of parentheses and braces when coding, making it more sleek, Matloff wrote in the post. While data scientists working with Python must learn a lot of material to get started, including NumPy, Pandas and matplotlib, matrix types and basic graphics are already built into base R, Matloff wrote.
On large numerical datasets, my impression is that Python is faster and more flexible. You will be able to choose long/short floats and trade off storage for accuracy. And, Python has more libraries for medium-sized data management (pytable, dask); these can come in handy. In R, when dealing with large data out of core, my approach has been to rely on standard DBMS (pPostgres, MS server, with which R has native integration, Redshift). As a bonus, dplyr in R offers an excellent, easy interface to these databases.
Hi! I'm Jose Portilla and I'm an instructor on Udemy with over 250,000 students enrolled across various courses on Python for Data Science and Machine Learning, R Programming for Data Science, Python for Big Data, and many more. What should I do to become a data scientist? In this post, I'll try my best to help answer this question and point to resources that can help guide you to an answer, also hopefully this post serves as something I can quickly link to my students:) I've broken down the steps into some key topics and discussed helpful details for each. "The secret of getting ahead is getting started." If you are interested in becoming a data scientist the best advice is to begin preparing for your journey now!