Reproducibility in the International Journal of Forecasting
Reproducibility is not a bureaucratic checkbox. It is the mechanism by which science earns trust. When another researcher — or a journal reviewer, or a policy-maker relying on your findings — can take your data and code and arrive at exactly the same results you published, your work becomes part of the scientific record in a way that a PDF alone never can.
This guide walks you through what a reproducibility package needs to contain, how to structure it, and how to document it so that someone who has never seen your project can run it from scratch.
Before you start, it is worth looking at two well-structured reproducibility packages in the wild:
- Causal Discovery in Multivariate Time Series through Mutual Information Featurization
- replication_triptych — a clean example of a well-organised replication kit
- online-probabilistic-forecast-combination — another strong example with clear structure
Additional guidance is available at the CASCAD certification checklist.
What a Reproducibility Package Must Include
At its core, a reproducibility package has two non-negotiable components:
The Readme File — What to Cover
The readme is the most important document in your package. A reviewer should be able to pick it up, read it top to bottom, and know exactly what to do. Below is everything it needs to address.
🗂️ Repository structure
Describe the structure of the repository you are providing. For example, if you are separating code, input data, and output data into different folders, say so explicitly and explain what lives where.
A simple example structure might look like:
project/
├── README.md
├── data/
│ ├── raw/
│ └── processed/
├── code/
│ ├── 01_clean_data.R
│ ├── 02_run_model.R
│ └── 03_produce_figures.R
└── output/
├── figures/
└── tables/
💻 Computing environment
This is one of the most commonly overlooked sections — and one of the most important. Document:
- The operating system and hardware used (see also the section on runtime below)
- The programming language(s) and their versions
- All packages and libraries required, with their exact versions
- The licence(s) applicable to your code
- If you used an environment manager like
condaorrenv, provide the configuration file (e.g.,environment.yml,renv.lock) and instructions for recreating the environment
A package that worked in Python 3.9 may behave differently in Python 3.12. Always pin your versions. If you are not using an environment manager, a simple requirements.txt or sessionInfo() output goes a long way.
🗃️ Data — what it is and how to get it
Describe every dataset used in the project. For each one, address:
- What the data is and what format it is in
- Where it comes from and any relevant pre-processing steps applied
- Any usage restrictions or licences that apply
Then distinguish between two types of datasets:
For datasets you can share openly, state clearly whether the data is:
- Directly included in the replication kit (preferred), or
- Available elsewhere — provide the repository, website, DOI, or access instructions
For datasets that cannot be shared due to copyright, NDAs, or restricted access, provide enough information for a third party to obtain it themselves:
- Data provider name and contact
- Database identifier — name, DOI, vintage
- Application and registration procedures required to access it
- Monetary costs and time requirements to obtain access
- Instructions on which variables and date ranges to request
- Whether a third party can temporarily access the data for reproduction purposes (this is sometimes possible under data use agreements)
Intermediary Datasets
Some scripts do not produce final outputs — they transform raw data into processed datasets that other scripts then use. These are intermediary datasets, and they deserve special attention.
As a default, include intermediary datasets in the replication kit. This allows a reviewer to skip computationally expensive or time-consuming preprocessing steps and still verify the core analytical results.
For each intermediary dataset, document:
- Which file it is
- Which script generates it
- Where it sits in the pipeline relative to other files
If intermediary data cannot be reproduced from raw data — due to bugs, time constraints, or insufficient compute — the rest of the verification can still be carried out using the pre-generated intermediary files. Make this explicit in your readme so reviewers know they are not expected to regenerate everything from scratch.
📊 Which code produces which outputs
This is the section a reviewer will use most. Map every table and figure in your paper to the script that produces it. Be explicit:
| Output | Script | Notes |
|---|---|---|
| Table 1 | code/02_run_model.R |
Runs in approx. 5 minutes |
| Figure 2 | code/03_produce_figures.R |
Requires processed data from step 1 |
| Figure 3 | code/03_produce_figures.R |
— |
⏱️ Hardware and expected runtime
State:
- The type of computer used (processor, RAM, GPU if relevant)
- The expected runtime for each major step, or for the full pipeline end-to-end
This helps a reviewer plan their time and flag immediately if something is taking far longer than expected — which often indicates a setup problem rather than a performance issue.
⚙️ Special setup requirements
If your code requires anything beyond a standard computing environment, document it explicitly:
- GPU requirements
- Parallel computing setup
- Specific memory requirements
- Access to a cluster or HPC environment
- Any manual steps that cannot be automated
Do not bury GPU or cluster requirements at the end. Put them near the top of the readme so a reviewer knows immediately whether they can run the check on their own machine.
A Reproducibility Checklist
Use this before submitting your package.