I recently released my first R and Python packages. This post contains some thoughts and advice about releasing software packages – particularly for other graduate students.

The question of “should you release a package?” is highly context dependent (e.g. if you are a probabilist the answer is probably no). There are a number of trade-offs to consider. For example, academia does not seem to value software very much. More importantly, there is a large time cost cost to develop software packages that could have been spent writing papers, this includes:

  • Coding the basic functionality
  • Turning your code into a package someone else can download and use
  • Documentation for the code
  • Providing data analysis examples
  • Maintaining and updating the package
  • Responding to user feedback
  • Surveying the existing literature to make sure your package provides new functionality

I think academia is starting to value software more than it used to1. I would argue that, in many cases, releasing code is as important as writing a paper. Some of the benefits to you that come from releasing a software package include:

  • Save future you time. Better code new = less headache in the future.
  • Fame/glory/prestige for people using your work.
  • Help other people solve their problems. If part of your rational for doing research/academia is helping to solve problems then good code might be as (or more) impactful as a paper.
  • Software skills are highly valued in industry.
  • You might learn new things out of necessity (e.g. computational linear algebra) and/or better understand your own research.

Resources

Programming is typically a small part of the statistics curriculum (and most other scientific disciplines); we don’t think of ourselves as software engineers even though many of us spend a lot of time writing code. Luckily there are many quality, open-source resources that show you how to write better code and release software. Without these resources (particularly the R Packages book) it would have taken me 1-2 orders of magnitude more time to build these packages2.

These resources are helpful for creating R/Python packages:

These resources helped me become a better programmer:


  1. For example, some statistics postdoc positions require (or highly encourage) applicants to have released an open source package. 

  2. The time cost to build a package is obviously very context dependent (e.g. your experience, the complexity of the algorithm, etc). To give you one data point; these packages took me 1-2 weeks each and I have about 2 years of coding experience.