This page contains a number of introduction to data science tutorials and a list of resources that I’ve found helpful to learn data science.
The (free) R for Data Science by Hadley Wickham is a fantastic resource to learn R.
If you are brand new to R see this page to get started.
Practice data manipulation and plotting with dplyr
and ggplot
. If you are familiar with base R these will be helpful to learn the tidyverse.
Analyze UNC deparments (more viz + subsetting).
Practice fitting machine learning models and visualizing high-dimensional data.
The JHU data science specialization on Coursera is a good crash course in R.
If you are new to Python and have some previous programming experience I suggest going through Google’s Python class to get started with Python. Spend a couple hours working through the class until you get the hang of it. Then download Anaconda which has most of the stats/machine learning Python packages you’ll need. It also comes with Jupyter notebook (formerly iPython) which you should use for data science projects. Warning: both Python 2 and 3 are commonly used.
Practice data manipulation and plotting with pandas
and matplotlib
.
Practice fitting machine learning models and visualizing high-dimensional data with sklearn
and statsmodels
.
Chris Albon’s website (great for a lot of things, particularly for pandas)
Python for Data Science as a Jupyter notebook
Introduction to Statistical Learning with Applications in R is a fantastics textbook.
This google doc contains a list of my favorite statistics/machine learning/data science resources (most are free online).
A number of people helped created these tutorials including