This guide was assembled based on our past experiences working with students and faculty on OSS development. We start every new project by introducing workers and their mentors/clients to this guide and walking them through the major points. Please feel free to reach out (ospo@syr.edu) if you have any questions or suggestions.
Milestones and Communication
Setting Reasonable Milestones/Expectations
- Set reasonable overall goals for the project, then break that work into small, demonstrable increments. Aim for milestones that can be completed in 1–2 weeks. Each milestone should produce something tangible (e.g., a working feature, a passing test suite).
- Estimate conservatively, then add buffer. Research software often involves exploration and uncertainty. A good rule of thumb: make your best estimate, then multiply by 2x.
- Account for the “hidden” work. Documentation, testing, and deployment aren’t extras—they are part of the work, especially for open-source software. Build time for them into your milestones rather than treating them as afterthoughts.
Setting Communication Expectations
- Document the “how” and “where” of communication upfront. Where do we discuss technical decisions (GitHub Issues, Pull Request comments)? Where do we ask quick questions (Slack, Discord, Email)?
- Communicate proactively about delays or changes. Don’t wait for the weekly meeting to surface a problem. A quick Slack message or GitHub comment like “blocked on X” or “need help with Y” keeps small issues from becoming big delays.
Project Management and Monitoring Milestones
- Use a tracking system to consistently monitor your milestones. We recommend GitHub Issues/Projects (since that’s where the code will be, too), but other systems like Trello and Jira work too.
- Hold regular check-ins. A weekly 30- to 60-minute sync is often sufficient for small teams. Use this time to review what’s been done, what’s blocked, and what’s next.
Resources
GitHub and Git
GitHub is the bread and butter of open-source software. It’s where your code will end up, where your users will find you and report issues, and how you might even be judged for job opportunities in the future. It’s important to get familiar with all the ins and outs of GitHub and Git (the local software used to push code to GitHub).
Commit Early and Often
- Make a new repo when you start a new project. Don’t wait to make a GitHub repository. Your first commit should be on the same day as the first day of the project.
- Make committing a daily habit, not an event. If you’re writing code, you should be committing. Commits aren’t just about preserving entire features; they are also about preserving the coding process that led to those features. And commit messages are just as important!
- Committing is the only way to show progress. Whether you are getting paid or not, committing is the only way to “show your work” and indicate to collaborators and/or supervisors that you’ve made some sort of progress and used your time effectively.
- Don’t let “it’s not ready” stop you. Branches exist precisely so you can commit incomplete work without affecting others. Push your branch daily even if the feature isn’t finished—this backs up your work and lets others see your progress if needed.
- Don’t commit unnecessary files. Users don’t need your random csv or pdf files.
Contained Commits, Branches, and Pull Requests
- One logical change per commit. A commit should do one thing: fix a bug, add a function, update documentation. If your commit message needs “and” in it, you’re probably combining unrelated changes.
- Create branches for every distinct piece of work. Never commit directly to main. Use descriptive branch names that include context.
- Keep branches short-lived. Long-running branches diverge from main and become painful to merge. Aim to merge branches within a week or two. If a feature is too large for that, break it into smaller, independently mergeable pieces.
- Make pull requests reviewable. A good PR is small enough to review in one sitting (under 500 lines of code is a useful heuristic). Include a clear description: What does this change? Why? How can the reviewer test it? Link to relevant Issues.
Continuous Integration (CI)
- Automate everything you’d otherwise forget. CI can run your tests, check code formatting, lint for common errors, and build your documentation—automatically, on every push and pull request. GitHub Actionsisfree for all public repositories!
Resources
Documentation
Open-source software is all about building a user base and a contributor network. If your software isn’t documented well, no one will want to use it nor will they be able to contribute to it. Therefore, you should treat documentation as just as important as the source code.
README.md
- Treat your README as the front door. For most visitors, the README is the first—and often only—documentation they’ll read. It should answer the fundamental questions: What is this project? Why would I use it? How do I get started? If someone can’t answer these within 60 seconds of landing on your repo, the README needs work.
- Start with a one-paragraph summary. Lead with a clear, jargon-minimal description of what the software does and who it’s for.
- Include a quick-start example. Show, don’t just tell. A brief code snippet or command that demonstrates basic usage is worth more than paragraphs of description.
- Document installation clearly. List prerequisites, dependencies, and step-by-step installation instructions. Don’t assume users have your environment. If there are multiple installation methods (pip, conda, from source), document all of them.
Licensing
- Always include a license. Code without a license is not “free to use”—it’s legally ambiguous and effectively unusable by others. No license means default copyright applies, which prohibits copying, modification, and distribution.
- Choose the license deliberately. For research software intended to be widely used and built upon, permissive licenses like MIT or BSD-3-Clause are common choices. If you want modifications to remain open, consider copyleft licenses like GPL-3.0. Use the chosen license without any modifications.
Function Documentation
- Document the “what” and “why,” not just the “how.” Good function documentation explains what a function does, what inputs it expects, what it returns, and any important caveats—not a line-by-line translation of the code.
- Use your language’s documentation conventions. Follow established standards so tools can parse your docs automatically:
- Python: docstrings with NumPy, Google, or Sphinx style
- R: roxygen2 comments
- JavaScript/TypeScript: JSDoc
- Documentation shouldn’t be an afterthought. Whenever you add a new function, you should add documentation as well. And whenever you modify a function, you should review the documentation to make sure it is still valid.
Being Contributor Friendly
- Create a CONTRIBUTING.md file. As you build your own codebase, you should ensure it is contributor-friendly by establishing your workflow and review norms. The following are good examples:
- Welcome newcomers explicitly. A brief statement that contributions are welcome sets a positive tone. Mention that you’re happy to help first-time contributors find good starting points.
- Label Issues for discoverability. Use labels like “good first issue”, “help wanted”, or “documentation” to guide contributors toward appropriate tasks.
- Adopt a Code of Conduct. A Code of Conduct establishes expected behavior and demonstrates that your project takes community standards seriously. The Contributor Covenant is the most widely adopted option and is appropriate for most projects. Other options include the Citizen Code of Conduct and the Apache Foundation’s Code of Conduct.
Resources
- Simple guide and template for READMEs
- Choose a License
- NumPy Docstring Guide
- roxygen2 – R Package Documentation Standard
- Contributor Covenant – Code of Conduct template
Testing
Unit Tests
- Test the smallest meaningful units of behavior. A unit test should verify that a single function or method behaves correctly given specific inputs.
- Test edge cases and boundary conditions. What happens with empty input? A single element? Null or None values? These boundaries are where bugs hide. Test malformed input, missing values, and unexpected formats.
- Use standard testing frameworks. Standard programming languages already have established testing frameworks (e.g., pytest for Python and testthat for R). Use them to make your life easier.
Integration Tests
- Test how components work together. Integration tests verify that modules, services, or systems interact correctly—that your database queries return what your application expects, that your API endpoints handle real requests properly, that data flows correctly through a pipeline.
- Use realistic test data. Synthetic data that’s too clean won’t catch real-world problems. Where possible, use anonymized or sample real data that reflects actual messiness: missing fields, inconsistent formats, unexpected values.
- Test failure modes, not just happy paths. What happens when the database is unavailable? When the API returns an error? When the file is corrupted? Integration tests are the right place to verify your code handles failures gracefully.
Continuous Integration
- Run tests automatically on every push and pull request. This is the core value of CI for testing: no code reaches main without passing the test suite. Configure branch protection rules to enforce this.
- Use a test matrix for compatibility. Research software often needs to support multiple language versions (Python 3.9, 3.10, 3.11) or operating systems. CI matrices let you test all combinations automatically. Focus on versions your users actually use.
Resources:
- The Turing Way: Testing
- Testing Research Software – Software Sustainability Institute
- pytest Documentation
- testthat Documentation
Deployment
Software Packaging
- Follow your language’s conventions. Every ecosystem has established norms for project layout. Following them makes your project immediately navigable to others and ensures compatibility with standard tooling (see resources below).
- Declare all dependencies explicitly. Never assume users have packages installed. List every dependency your code imports in your package manifest with version constraints. If your code calls it, it should be declared.
- Use Semantic Versioning (SemVer). The MAJOR.MINOR.PATCH convention communicates meaning: increment PATCH for bug fixes, MINOR for backward-compatible features, and MAJOR for breaking changes. Users and tools depend on this predictability. Start with 0.1.0 for your first version.
- Maintain a changelog. Document what changed in each release in a markdown file. Group changes by type (Added, Changed, Deprecated, Removed, Fixed, Security). Users upgrading between versions need this information.
- Publish to standard repositories. Distribute packages through the channels users expect:
- Python: PyPI (and conda-forge for conda users)
- R: CRAN (or r-universe for faster iteration)
- JavaScript: npm
Resources
- Python packaging 101 (pyOpenSci)
- R Packages (2e)
- rOpenSci Packages: Development, Maintenance, and Peer Review
AI Usage
Generative Artificial Intelligence (GenAI) has profoundly shifted the day-to-day cycle of software development. However, all software development should remain human guided and reviewed. The use of GenAI is permitted for most aspects of OSPO-associated work, however, when using GenAI, the following guidelines should be followed:
- Developers must recognize and disclose how it has been used somehow within the codebase (e.g., in source code comments, in the README, etc). This could include the tools (and versions) used and the nature of the tool usage (e.g., code generation, test scaffolding, documentation drafting).
- All code and other materials that were developed with AI-assistance must be reviewed, edited, and validated by a human. Blindly copying-and-pasting AI-generated output is unacceptable.
- Developers are fully responsible for ensuring that the code accurately performs its functions as described and as requested (e.g., by a client).
- Developers are fully responsible for the legal compliance of the codebase. For example, it is the developers’ responsibility to ensure that they are not illegally incorporating significant parts of proprietary or closed-source codebases into an open-source codebase.
Resources
- Navigating LLMs in Open Source: pyOpenSci’s New Peer Review Policy
- Preparing JOSS for a generative AI future: From code to human creativity and design
Last modified 1/13/2026