This page contains a number of introduction to data science tutorials and a list of resources that I’ve found helpful to learn data science.

R

The (free) R for Data Science by Hadley Wickham is a fantastic resource to learn R.

If you are brand new to R see this page to get started.

Exercises

Viz + subsetting

Practice data manipulation and plotting with dplyr and ggplot. If you are familiar with base R these will be helpful to learn the tidyverse.

Analyze UNC deparments (more viz + subsetting).

Modeling + more viz

Practice fitting machine learning models and visualizing high-dimensional data.

Resources

R for Data Science

The JHU data science specialization on Coursera is a good crash course in R.

Python

If you are new to Python and have some previous programming experience I suggest going through Google’s Python class to get started with Python. Spend a couple hours working through the class until you get the hang of it. Then download Anaconda which has most of the stats/machine learning Python packages you’ll need. It also comes with Jupyter notebook (formerly iPython) which you should use for data science projects. Warning: both Python 2 and 3 are commonly used.

Exercises

Viz + subsetting

Practice data manipulation and plotting with pandas and matplotlib.

Modeling + more viz

Practice fitting machine learning models and visualizing high-dimensional data with sklearn and statsmodels.

Resources

Chris Albon’s website (great for a lot of things, particularly for pandas)

Python for Data Science as a Jupyter notebook

Computational Statistics

Other resources

Introduction to Statistical Learning with Applications in R is a fantastics textbook.

This google doc contains a list of my favorite statistics/machine learning/data science resources (most are free online).

Acknowledgements

A number of people helped created these tutorials including