We identified 61 faculty members and labs at Syracuse University who create and share open source software on GitHub. Across these 61 GitHub accounts, we analyzed 814 of their repositories1. We used the PyGitHub package to access the GitHub REST API and assess coding language and license popularity. The source code for these analyses and visualizations is available on GitHub.
We assessed the most popular coding languages across the 814 repositories. Python was overwhelmingly the most popular, being the primary language of 28.5% (232) of the repositories. Jupyter Notebook is the second most popular language (and likely also represents further Python usage) at 11.2%. It appears that 16.6% (135) of the repositories are used to store website materials, with HTML and Javascript representing the next two most popular languages. Other somewhat popular coding languages include Java (46 repos), C++ (37 repos), and R (33 repos). Other minor coding languages included MATLAB (29 repos), TeX (21 repos), Groovy (11 repos), Ruby (10 repos), and Go (8 repos). Below, the pie chart on the right shows these overall proportions and the bar graph on the left shows the proportions through time, by year of repository creation.
We also assessed the popularity of these coding languages by calculating how many lines of code were written in each language, as a proportion of the total number of lines across all 814 repositories (1,868,845,875 lines2). Interestingly, this metric shakes up the popularity and Jupyter Notebook comes out on top, representing over a quarter (26.9%) of all lines of code. HTML beats out Python for the silver medal (18.5%), and Python comes in at third place (12.1%). Below, the bar graph shows the percent of total lines of code written in each language.
We extracted the license for each of the 814 repositories. Over half of the repos (63%) have no license whatsoever. When a license is included, the MIT License is the most common, being used by 12.9% of all repositories and 34.9% of repositories with licenses. Other licenses include various flavors of the General Public License, the Apache License, and the BSD 3-Clause License. Below, the pie chart on the right shows these overall proportions and the bar graph on the left shows the proportions through time, by year of repository creation.
Here is a selection of these open source projects that represents the breadth of open source software development at Syracuse University:
Footnotes
- We excluded 324 repositories that GitHub listed as not having a major programming language. We assumed that these represented non-OSS repositories for data, course materials, etc.
- Note, this only includes lines of code that are able to be identified (by GitHub) to a particular coding language. This does not include lines of “code” corresponding to image files, data files, etc.