- Version
- Download 89
- File Size 472.80 KB
- File Count 1
A Comprehensive Guide to Generating SDTM Demographic Datasets Using R in Clinical Trials
Arvind Uttiramerur
Programmer Analyst at Thermofisher Scientific, USA
Abstract
R is a programming language widely used for statistical analysis and data visualization, offering a flexible and interactive environment supported by various packages for data cleaning, tidying, and analysis. It is particularly relevant for professionals in mathematics and statistics, including biostatisticians and programmers in the pharmaceutical and biotech industries. R provides a robust array of user-developed packages that can efficiently manipulate complex datasets, such as those based on the Study Data Tabulation Model (SDTM). The popularity of R in data-related fields has surged exponentially over the past decade due to its open-source nature, powerful statistical capabilities, and advanced visualization tools.
In this paper, we demonstrate a step-by-step approach to generating an SDTM Demographic (DM) dataset using R. The process leverages R packages such as sas7bdat, tidyverse, haven, parsedate, dplyr, tidyr, and Hmisc. We also provide a detailed procedure for setting up the R environment required for this process. While R has been extensively used for exploratory analysis in the pharmaceutical and biotech industries, its application in creating and analyzing clinical trial datasets, such as SDTM, has been limited. Traditionally, SAS® has been the preferred tool for generating clinical trial datasets. This paper explores R’s potential as a viable alternative, offering enhanced flexibility and cost-effectiveness in clinical trial.
Conclusion
This paper has demonstrated the viability of using R as an alternative to SAS for generating SDTM Demographics (DM) datasets in the pharmaceutical and biotech industries. By leveraging a combination of R packages, including sas7bdat, tidyverse, haven, parsedate, dplyr, tidyr, and Hmisc, we have shown that R can efficiently process raw clinical trial data to produce standardized SDTM-compliant datasets.
Our approach highlights the flexibility and cost-effectiveness of using R, especially for organizations looking to reduce dependence on proprietary software while still adhering to regulatory standards . The step-by-step guide provided serves as a practical resource for professionals aiming to integrate R into their clinical trial data management workflows.
However, the adoption of R for SDTM and ADaM dataset generation is still in its early stages, particularly in regulated environments where validated systems are crucial. While R offers robust capabilities for data manipulation and analysis , there are challenges related to its widespread adoption in clinical trial data management, including the need for formal validation processes and greater industry acceptance .
Future work could focus on further validating the R packages used in this paper, as well as expanding the use of R in other domains within the SDTM and ADaM frameworks . Additionally, developing user-friendly R packages specifically designed for clinical trial data management could help accelerate the adoption of R in this space.
In conclusion, R presents a promising alternative to SAS for generating SDTM datasets, offering enhanced flexibility, cost savings, and a broad range of statistical tools . As the pharmaceutical and biotech industries continue to evolve, the role of open-source tools like R is likely to expand, contributing to more efficient and accessible data management solutions in clinical research.
Reference
1. https://sas-and-r.blogspot.com/p/simulation-examples.html
2. https://cran.r-project.org/doc/manuals/r-release/R-lang.pdf
3. CRAN - Package dmiralonco(rstudio.com)(https://cran.rstudio.com/web/packages/admiralonco)
4. SDTM in R Asset Library • admiral (pharmaverse.github.io)
5. SDTM in Business Intelligence, Collinson, PhUSE 2014
6. 6.Clinical Data in Business Intelligence, Collinson, PhUSE 2016