A guide to biokinetics input and output data • epikinetics

Input data

The model requires time series data about individual titre readings, along with last exposure times. Times can be relative (e.g. day of study) or absolute (i.e. precise calendar dates). This is provided via the data argument when initialising an object of class biokinetics, which must be a data.table containing the following columns:

name	type	description
pid	numeric or character	Unique identifier to identify a person across observations
day	integer or date	The day of the observation. Can be a date or an integer representing a relative day of study
last_exp_day	integer or date	The most recent day on which the person was exposed. Must be of the same type as the ‘day’ column
titre_type	character	Name of the titre or biomarker
value	numeric	Titre value

It can also contain further columns for any covariates to be included in the model. The data files installed with this package have additional columns infection_history, last_vax_type, and exp_num.

The model also accepts a covariate formula to define the regression model. The variables in the formula must correspond to column names in the dataset. Note that all variables will be treated as categorical variables; that is, converted to factors regardless of their input type.

Note also that the value column is assumed to be on a natural scale by default, and will be converted to a log scale for model fitting. If your data is already on a log scale, you must pass the log=TRUE argument when initialising the biokinetics class. See biokinetics.

Example

dat <- data.table::fread(system.file("delta_full.rds", package = "epikinetics"))
head(dat)
#>      pid        day last_exp_day titre_type    value infection_history
#>    <int>     <IDat>       <IDat>     <char>    <num>            <char>
#> 1:     1 2021-03-10   2021-03-08  Ancestral 175.9350   Infection naive
#> 2:     1 2021-04-15   2021-03-08  Ancestral 607.5750   Infection naive
#> 3:     1 2021-07-08   2021-03-08  Ancestral 179.0463   Infection naive
#> 4:     1 2021-03-10   2021-03-08      Alpha   5.0000   Infection naive
#> 5:     1 2021-04-15   2021-03-08      Alpha 416.7905   Infection naive
#> 6:     1 2021-07-08   2021-03-08      Alpha 103.5274   Infection naive
#>    last_vax_type exp_num
#>           <char>   <int>
#> 1:      BNT162b2       2
#> 2:      BNT162b2       2
#> 3:      BNT162b2       2
#> 4:      BNT162b2       2
#> 5:      BNT162b2       2
#> 6:      BNT162b2       2

Output data

After fitting a model, a CmdStanMCMC object is returned. This means that users who are already familiar with cmdstanr are free to do what they want with the fitted model.

Important! If you provide data on a natural scale, it will be converted to a base2 log scale before inference is performed. This means that if working directly with the fitted CmdStanMCMC all values will be on this scale. The package provides a helper function for converting back to the original scale: convert_log2_scale_inverse.

Three further functions provide model outputs that we think are particularly useful in data.table format. biokinetics contains documentation on each of these functions so please read that first; this vignette provides guidance on the correct interpretation of each column in the returned tables (in these functions data is returned on the original scale).

simulate_population_trajectories

See the documentation for this function here. There are 2 different output formats depending on whether the provided summarise argument is TRUE or FALSE.

summarise = TRUE

Returned columns are

name	type	description
time_since_last_exp	integer	Number of days since last exposure
me	numeric	Median titre value
lo	numeric	Titre value at the 0.025 quantile
hi	numeric	Titre value at the 0.975 quantile
titre_type	character	Name of the titre or biomarker

There will also be a column for each covariate in the regression model.

summarise = FALSE

Returned columns are

name	type	description
time_since_last_exp	integer	Number of days since last exposure
t0_pop	numeric	Titre value at time 0
tp_pop	numeric	Time at peak titre
ts_pop	numeric	Time at start of waning
m1_pop	numeric	Boosting rate
m2_pop	numeric	Plateau rate
m3_pop	numeric	Waning rate
beta_t0	numeric	Coefficient to adjust t0 by
beta_tp	numeric	Coefficient to adjust tp by
beta_ts	numeric	Coefficient to adjust ts by
beta_m1	numeric	Coefficient to adjust m1 by
beta_m2	numeric	Coefficient to adjust m2 by
beta_m3	numeric	Coefficient to adjust m3 by
mu	numeric	Titre value
.draw	integer	Draw number
titre_type	numeric	Name of the titre or biomarker

There will also be column for each covariate in the hierarchical model.

See the model vignette for more detail about the model parameters.

simulate_individual_trajectories

See the documentation for this function here. There are 2 different output formats depending on whether the provided summarise argument is TRUE or FALSE.

summarise = FALSE

Returned columns are

name	type	description
pid	character or numeric	Unique person identifier as provided in input data
draw	integer	Which draw from the fits this is
time_since_last_exp	integer	Number of days since last exposure
mu	numeric	Titre value
titre_type	character	Name of the titre or biomarker
exposure_day	integer	Day of this person’s last exposure
calendar_day	integer	Day of this titre value
time_shift	integer	The number of days these exposures have been adjusted by, as provided in function arguments

There will also be a column for each covariate in the regression model.

summarise = TRUE

Returned columns are

name	type	description
me	numeric	Median titre value
lo	numeric	Titre value at the 0.025 quantile
hi	numeric	Titre value at the 0.075 quantile
titre_type	character	Name of the titre or biomarker
calendar_day	integer	Day of this titre value
time_shift	integer	The number of days the exposures were adjusted by, as provided in function arguments