data_format.Rmd
This document explains the format and structure of three important data frames used in serological studies:
Each section explains the required and optional columns for the data frames and provides an example.
data_sero
The primary data frame data_sero
must contain the
following columns:
biomarkers
.data_sero
Note: you can have multiple biomarkers for each individual, simply provide more columns of data and name the column header the name of the biomarkers.
# Define the biomarker name
biomarkers <- "IgG"
# Create the data_sero dataframe
data_sero <- data.frame(
id = c(1, 1, 2, 2, 3, 3),
time = c(1, 5, 1, 5, 1, 5),
IgG = c(1.2, 2.4, 1.2, 2.4, 3.0, 3.0)
)
data_sero
## id time IgG
## 1 1 1 1.2
## 2 1 5 2.4
## 3 2 1 1.2
## 4 2 5 2.4
## 5 3 1 3.0
## 6 3 5 3.0
exposure_data
The exposure_data
is an optional data frame used to
track known exposure events and the timing for each individual. The
exposure types must match those defined in the variable
exposureTypes
. An exposure is something that is known in
the study that would have an effect on the biomarker being measured.
Usually this is a known vaccination date, or a known date of infection
from PCR.
exposureTypes
.exposure_data
Note: if not defined will assume there are no known exposures in the study.
# Define possible exposure types
exposureTypes <- c("inf")
# Create the exposure_data dataframe
exposure_data <- data.frame(
id = c(1),
time = c(3),
exposure_type = c("inf")
)
exposure_data
## id time exposure_type
## 1 1 3 inf
attack_rate_data
The attack_rate_data
data frame represents the empirical
probability density function (PDF) of the attack rate over the study
period. It contains information about the likelihood of an event (such
as an infection or outbreak) occurring at different time points during
the study.
attack_rate_data
Note this is applied to the fitted exposure type. If not defined will assume a uniform probability of infection over the whole study period.
exposureFitted <- "inf"
# Create the attack_rate_data dataframe
attack_rate_data <- data.frame(
time = c(1, 2, 3, 4, 5),
prob = c(0.0, 0.33, 0.33, 0.33, 0.0)
)
attack_rate_data
## time prob
## 1 1 0.00
## 2 2 0.33
## 3 3 0.33
## 4 4 0.33
## 5 5 0.00