Input data

The model requires time series data about individual titre readings, along with last exposure times. Times can be relative (e.g. day of study) or absolute (i.e. precise calendar dates). The full list of required columns is as follows:

name type description required
pid numeric or character Unique identifier to identify a person across observations T
day integer or date The day of the observation. Can be a date or an integer representing a relative day of study T
last_exp_day integer or date The most recent day on which the person was exposed. Must be of the same type as the ‘day’ column T
titre_type character Name of the titre or biomarker T
value numeric Titre value T
censored -1, 0 or 1 Optional column. Whether this observation should be treated as censored: -1 for lower, 1 for upper, 0 for none. F

It can also contain further columns for any covariates to be included in the model. The data files installed with this package have additional columns infection_history, last_vax_type, and exp_num.

The model also accepts a covariate formula to define the regression model. The variables in the formula must correspond to column names in the dataset. Note that all variables will be treated as categorical variables; that is, converted to factors regardless of their input type.

Note also that the value column is assumed to be on a natural scale by default, and will be converted to a log scale for model fitting. If your data is already on a log scale, you must pass the log=TRUE argument when initialising the biokinetics class. See biokinetics.

Example

dat <- data.table::fread(system.file("delta_full.rds", package = "epikinetics"))
head(dat)
#>      pid        day last_exp_day titre_type    value censored infection_history
#>    <int>     <IDat>       <IDat>     <char>    <num>    <int>            <char>
#> 1:     1 2021-03-10   2021-03-08  Ancestral 175.9350        0   Infection naive
#> 2:     1 2021-04-15   2021-03-08  Ancestral 607.5750        0   Infection naive
#> 3:     1 2021-07-08   2021-03-08  Ancestral 179.0463        0   Infection naive
#> 4:     1 2021-03-10   2021-03-08      Alpha   5.0000       -1   Infection naive
#> 5:     1 2021-04-15   2021-03-08      Alpha 416.7905        0   Infection naive
#> 6:     1 2021-07-08   2021-03-08      Alpha 103.5274        0   Infection naive
#>    last_vax_type exp_num
#>           <char>   <int>
#> 1:      BNT162b2       2
#> 2:      BNT162b2       2
#> 3:      BNT162b2       2
#> 4:      BNT162b2       2
#> 5:      BNT162b2       2
#> 6:      BNT162b2       2

Ouput data

After fitting a model, a CmdStanMCMC object is returned. This means that users who are already familiar with cmdstanr are free to do what they want with the fitted model.

Important! If you provide data on a natural scale, it will be converted to a base2 log scale before inference is performed. This means that if working directly with the fitted CmdStanMCMC all values will be on this scale. The package provides a helper function for converting back to the original scale: convert_log_scale_inverse.

Three further functions provide model outputs that we think are particularly useful in data.table format. biokinetics contains documentation on each of these functions so please read that first; this Vignette provides guidance on the correct interpretation of each column in the returned tables. (In these functions data is returned on the original scale).

simulate_population_trajectories

See the documentation for this function here. There are 2 different output formats depending on whether the provided summarise argument is TRUE or FALSE.

summarise = TRUE

Returned columns are

name type description
time_since_last_exp integer Number of days since last exposure
me numeric Median titre value
lo numeric Titre value at the 0.025 quantile
hi numeric Titre value at the 0.975 quantile
titre_type character Name of the titre or biomarker
censored -1, 0 or 1 Whether this observation should be treated as censored: -1 for lower, 1 for upper, 0 for none

There will also be a column for each covariate in the regression model.

summarise = FALSE

Returned columns are

name type description
time_since_last_exp integer Number of days since last exposure
t0_pop numeric
tp_pop numeric
ts_pop numeric
m1_pop numeric
m2_pop numeric
m3_pop numeric
beta_t0 numeric
beta_tp numeric
beta_ts numeric
beta_m1 numeric
beta_m2 numeric
beta_t3 numeric
mu numeric Titre value
.draw integer Draw number
titre_type numeric Name of the titre or biomarker

There will also be column for each covariate in the hierarchical model.

simulate_individual_trajectories

See the documentation for this function here. There are 2 different output formats depending on whether the provided summarise argument is TRUE or FALSE.

summarise = FALSE

Returned columns are

name type description
pid character or numeric Unique person identifier as provided in input data
draw integer Which draw from the fits this is
time_since_last_exp integer Number of days since last exposure
mu numeric Titre value
titre_type character Name of the titre or biomarker
exposure_day integer Day of this person’s last exposure
calendar_day integer Day of this titre value
time_shift integer The number of days these exposures have been adjusted by, as provided in function arguments

There will also be a column for each covariate in the regression model.

summarise = TRUE

Returned columns are

name type description
me numeric Median titre value
lo numeric Titre value at the 0.025 quantile
hi numeric Titre value at the 0.075 quantile
titre_type character Name of the titre or biomarker
calendar_day integer Day of this titre value
time_shift integer The number of days the exposures were adjusted by, as provided in function arguments