You final project is to do a novel data analysis to answer a question and write about it. This can be interpreted broadly and the requirements are discussed below.

The rough outline of the project is: Start with a question. Find data that might get at that question. Play around with the data. Attempt to answer the question. Iterate. Communicate.

Your project should have one significant aspect to it. Examples might include,

  • put together a novel data set (e.g. scrape something from the web)
  • answer an interesting question
  • a “sophisticated” statistical/machine learning model
  • a really compelling visualization
  • write a shiny app

You will work in groups of 3 people, assigned by the instructor. See below for grading details and the group work policies.

Final deliverables

There are two final deliverables which will be posted as Github pages: a blog post and the analysis code/data. The final project is due Tuesday May 9th at 11:00pm.

Blog post

Write a blog post in R Markdown aimed at a general audience (think 538).

  • should be 1000-1500 words
  • have at least two figures

Analysis code

All analysis code should be posted and well documented. The main techincal results (plots, regressions, etc) should be written up in a well documented, supporting techincal document (using R Markdown). You might also include R scripts for cleaning data or helper functions.

Your data set (both raw and processed) should be posted to github as well. See instructions for creating a github page

Where to find data?

You can find a seriously large amount of data online. I encourage you to “gather your own data online” by doing something like scraping Twitter.

There are some obvious places to look like data.gov. I’ve put together a collection (of collections) of interesting datasets you can find online: here.

If you are already doing reserach with a dataset you are welcome to use it, but you have to do something new.

Grading

Your team’s grade will be 50% blog post and 50% analysis code. Your individual grade will be weighted by your team member’s reviews.

The project will be graded on

  • Communication
    • Clear writing (both in the blog post and in the supporting technical document)
    • Document code
  • Accuracy
    • Did you use reasonable statistics?
    • Does your final code run?
    • How well do your findings support your conclusions? The evidence is inconclusive" is a very possible, and completely acceptable answer.
  • Ambition
    • The project should take some creativity and effort i.e. should be more than a matter of copy/pasting code.
  • Groupwork
    • You will anonomyously rate your team members and yourself on team citizenship (e.g. attends meetings, does what they promise, etc), not on ability.
    • Final grades will be adjusted based on peer ratings (see Peer Ratings in Coorperative Learning Teams).
      • Individual grades are based on the project grade and a multiplied computed from the peer ratings. This multiplier will range from 1.05 (for people who go above and beyond) down to 0 (for people who don’t participate).
    • As a last resort you may fire a team member who refuses to participate. Please contact the instructor well before it comes to this.
      • If you are fired you must start a new project and your peer rating multiplier will take a hit.

Examples for inspiration

These are some examples of interesting analyses. Many of these examples would take longer than you have for the final project.

Important Dates

  • Initial project proposal: due 4/6 at 11pm
    • Describe your proposed project (one page)
      • What question(s) will you try to answer?
      • What datasets will you use? You should have already found and taken a first look at the data set
      • How will you use the data to try to answer the question?
  • Exploratory analysis: due 4/20 at 11pm
    • Write up your initial findings in an R Markdown document.
    • You should have at least K plots (still deciding K).
  • Data analysis/techincal document: due 5/2 at 11pm
    • Write up your techincal results in an R Markdown document.
    • Put all code, data, etc together.
    • Upload to github and create a github page.
    • Write a “process book” describing your analysis process (see section titled “IPython Process Book” from this page
  • Blog post: due 5/9 at 11pm