This is a very common question. Especially for the starters. Where to start? Even for intermediate-level data scientists, this can be a question. Because different people have different choices or different styles of work. Some companies prefer Python and some companies prefer R. I have friends who learned to start Python first and then some recruiters or some employers said they should learn R. Now they start learning R. Actually which one is better? Show
I started with Python. As I started my MS at Boston University, I had to learn R. Because some of the data analytics courses use R only. It was uncomfortable in the beginning. Now I am happy that I got to learn R. As I know Python and R both now, I thought I should share my opinion here. My Own ExperienceAs I knew Python pretty well, I could learn R fast. It wasn’t too hard. Especially, if you know the data manipulation libraries in Python, you may find many commands similar(not the same). So learning wasn’t hard. But still, it takes time. It takes a lot of practice. Because so many libraries are available out there for data manipulation and analysis, it is challenging to keep up sometimes for beginners. But it becomes easier pretty soon.
For me, yes! Python is pretty strong. You can manage pretty much most of the staff in Python that is available in R. But knowing R will give you a lot of flexibility. A lot of libraries are just better structured in R than Python. If you are good at both of them, you will have options. For example, I like to use R for inferential statistics than Python. I feel like the libraries and packages in R are better than the packages in Python. It is just my opinion. I almost always use R for statistical analysis. Some may like ggplot2 better than Matplotlib and Seaborn. Again, if I need to use a machine learning library, I prefer Python’s scikit-learn library more than different R packages. At this point, I feel, for intermediate-level learners, it is good to learn both Python and R. It will open a lot of avenues if you are a freelancer or a job seeker. Where to Start for BeginnersIn my opinion, it is good to start with Python. If you are an aspiring data scientist and learning your first language, that should be python. Simply because python is more popular. Also, I find more resources out there for Python. If you look at popular sites for programmers like Geeks for Geeks, tutorials point, or programiz, you will see that they have solutions in several different languages. Python is one of them. But you won’t find R there. So, learning will be much easier. Also if you get stuck, you will find help faster in Python. If you are a data analyst, Python or R either one will work for you to complete your tasks. But if you are a data scientist and also want to go deeper into machine learning and artificial intelligence with time, then you should definitely choose Python. Because you might have to collaborate with software engineers. You won’t find many software engineers who would like to work in R. Also all the good online courses or master’s programs I have seen till now teach machine learning using Python. Last WordAs you can see, I emphasized a lot on learning Python. But again, if possible, learn both of them. If you want to work as a data analyst, either Python or R will do. But If you are planning to be a data scientist, Python is recommended. Learning both is even better! My suggestion is, learn one very well first. #Python #RProgramming #DataScience #DataAnalytics Key Difference Between R and Python
R and Python are both open-source programming languages with a large community. New libraries or tools are added continuously to their respective catalog. R is mainly used for statistical analysis while Python provides a more general approach to data science. R and Python are state of the art in terms of programming language oriented towards data science. Learning both of them is, of course, the ideal solution. R and Python requires a time-investment, and such luxury is not available for everyone. Python is a general-purpose language with a readable syntax. R, however, is built by statisticians and encompasses their specific language. Academics and statisticians have developed R over two decades. R has now one of the richest ecosystems to perform data analysis. There are around 12000 packages available in CRAN (open-source repository). It is possible to find a library for whatever the analysis you want to perform. The rich variety of library makes R the first choice for statistical analysis, especially for specialized analytical work. The cutting-edge difference between R and the other statistical products is the output. R has fantastic tools to communicate the results. Rstudio comes with the library knitr. Xie Yihui wrote this package. He made reporting trivial and elegant. Communicating the findings with a presentation or a document is easy. PythonPython can pretty much do the same tasks as R: data wrangling, engineering, feature selection web scrapping, app and so on. Python is a tool to deploy and implement machine learning at a large-scale. Python codes are easier to maintain and more robust than R. Years ago; Python didn’t have many data analysis and machine learning libraries. Recently, Python is catching up and provides cutting-edge API for machine learning or Artificial Intelligence. Most of the data science job can be done with five Python libraries: Numpy, Pandas, Scipy, Scikit-learn and Seaborn. Python, on the other hand, makes replicability and accessibility easier than R. In fact, if you need to use the results of your analysis in an application or website, Python is the best choice. Popularity indexThe IEEE Spectrum ranking is a metrics that quantify the popularity of a programming language. The left column shows the ranking in 2017 and the right column in 2016. In 2017, Python made it at the first place compared to a third rank a year before. R is in 6th place. Job OpportunityThe picture below shows the number of jobs related to data science by programming languages. SQL is far ahead, followed by Python and Java. R ranks 5th. If we focus on the long-term trend between Python (in yellow) and R (blue), we can see that Python is more often quoted in job description than R. Analysis done by R and PythonHowever, if we look at the data analysis jobs, R is by far, the best tool. Percentage of people switchingThere are two keys points in the picture below.
Difference between R and Python
R or Python UsagePython has been developed by Guido van Rossum, a computer guy, circa 1991. Python has influential libraries for math, statistic and Artificial Intelligence. You can think Python as a pure player in Machine Learning. However, Python is not entirely mature (yet) for econometrics and communication. Python is the best tool for Machine Learning integration and deployment but not for business analytics. The good news is R is developed by academics and scientist. It is designed to answer statistical problems, machine learning, and data science. R is the right tool for data science because of its powerful communication libraries. Besides, R is equipped with many packages to perform time series analysis, panel data and data mining. On the top of that, there are not better tools compared to R. In our opinion, if you are a beginner in data science with necessary statistical foundation, you need to ask yourself following two questions:
If your answer to both questions is yes, you’d probably begin to learn Python first. On the one hand, Python includes great libraries to manipulate matrix or to code the algorithms. As a beginner, it might be easier to learn how to build a model from scratch and then switch to the functions from the machine learning libraries. On the other hand, you already know the algorithm or want to go into the data analysis right away, then both R and Python are okay to begin with. One advantage for R if you’re going to focus on statistical methods. Secondly, if you want to do more than statistics, let’s say deployment and reproducibility, Python is a better choice. R is more suitable for your work if you need to write a report and create a dashboard. In a nutshell, the statistical gap between R and Python are getting closer. Most of the job can be done by both languages. You’d better choose the one that suits your needs but also the tool your colleagues are using. It is better when all of you speak the same language. After you know your first programming language, learning the second one is simpler. ConclusionIn the end, the choice between R or Python depends on:
Should I learn both R and Python for data science?In the real world of data science, Python and R users intersect a lot. So whichever industry or discipline you are interested in you are likely to run into projects done in both languages. To appreciate it all you need to have at least a basic understanding of both R and Python.
Why Python is better than R for data science careers?Unlike R language, Python language does not have in-built packages but it has support for libraries like Scikit, Numpy, Pandas, Scipy and Seaborn that data scientists can use to perform useful statistical and machine learningtasks.
Should I learn R or Python machine learning?If you are looking for statistical learning and data exploration, R will be a good match. Or, if you are looking for building large scale, production ready, machine learning applications, Python will be the best match.
|