A Flexible, Robust, High-Performance Data System for the GCAM Model
New data system feeds a powerful model while serving as a research tool in its own right.
Increasingly complex human-Earth system models have been augmenting complex data requirements, to the point that stand-alone software systems are required to track and assemble these data inputs to the main model. A new data system known as “gcamdata” was developed for the Global Change Assessment Model (GCAM) to provide a robust, reproducible, and transparent system to track and prepare hundreds of model inputs and enable researchers to easily construct alternative scenarios for research.
While this new data system was made specifically for the GCAM model, many of its components and approaches to processing are broadly applicable to, and reusable by, other complex model/data systems aiming to improve transparency, reproducibility, and flexibility. As open-source software with flexible architecture, gcamdata introduces a new way to handle and prepare data to feed complex global models. This saves researchers time and effort, improves traceability and reproducibility, and enables exploratory “what-if” analyses using GCAM.
Modern, integrated human-Earth system models are complex and require correspondingly detailed input datasets. These models are sophisticated attempts to quantify relationships between environmental, social, and economic factors. This new data system software offers clear and easy-to-use application to a variety of modeling scenarios with documentation and error checking. Data objects in gcamdata are required to have descriptive metadata attached, which allows researchers to track data provenance throughout the system. As a result, a full, system-wide data map can be constructed with particular data dependencies, upstream and/or downstream, traced through the system. Any object and its dependencies in the system can be explored in detail as all data objects flowing between the various parts of the system include extensive metadata (including title, units, source, and comments). Many parts of the gcamdata package can be repurposed for any data system that involves multiple, potentially interacting, data processing steps, improving the reproducibility and transparency of science in many modeling domains.
Pacific Northwest National Laboratory
Primary support for this work was provided by the U.S. Department of Energy, Office of Science, as part of research in the Multisector Dynamics, Earth and Environmental System Modeling Program. Additional support was provided by the U.S. Department of Energy Offices of Fossil Energy, Nuclear Energy, and Energy Efficiency and Renewable Energy and the U.S. Environmental Protection Agency.
Bond-Lamberty, B., K. Dorheim, R. Cui, R. Horowitz, A. Snyder, K. Calvin, L. Feng, R. Hoesly, J. Horing, G. P. Kyle, R. Link, P. Patel, C. Roney, A. Staniszewsi, S. Turner, M. Chen, F. Feijoo, C. Hartin, M. Hejazi, G. Iyer, S. Kim, Y. Liu, C. Lynch, H. McJeon, S. Smith, S. Waldhoff, M. Wise, and L. Clarke. “gcamdata: An R package for preparation, synthesis, and tracking of input data for the GCAM integrated human-earth systems model.” Journal of Open Research Software 7(6) (2019). DOI:10.5334/jors.232