--- title: "Time Varying" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Time Varying} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # What exactly does `tv::time_varying()` do? Given data X, specs, and exposures - For every patient or row e in the exposures: - Let X = filter the data X to current patient - construct the "grid": - Let f = the features from specs with `use_for_grid=TRUE` - Let grid_times = the unique datetimes in X with features in f and datetime between e\$exposure_start and e\$exposure_stop - The grid is now a one-row-per-break dataset, with the first break at e\$exposure_start and the last break before e$exposure_end - for grid period g in grid: - for row s in specs: - Let xx = filter data X to X\$feature == s\$feature and X\$datetime in the interval (g\$row_start - s\$lookback_end, g\$row_start - s\$lookback_start) - perform the aggregation s$aggregation on xx # FAQ ## 1. Does `tv` really loop over every single feature for every row in the grid independently? Yes. This is a lot of looping and a good reason to use more than one core. ## 2. Does `tv` require any exposure history to get the counts or time-since right? No. Each row is considered independently. ## 3. The look back in the specs is relative to what point in time? Usually the current grid row start time. ## 4. Can I get the grid returned to me so I can see what it's doing? As of version `1.7.0` you can; use `tv::time_varying(grid.only = TRUE)`. ## 5. Why is my `tv` on prospective patients really slow? Probably you misunderstand the exposure dataset. You really only want the current exposure in the grid with no other breakpoints, so set your exposure start to (e.g.) the current time and the exposure end to (e.g.) the current time plus one second. # An example Another example of how to use time-varying. Let's say we want to break every 6 hours. Just add that as a feature. Here we give it a count with infinite look back, to count the 6-hour period we're in. We also want to include the endpoint, which we encode with aggregation "event". ```{r} library(tv) library(tibble) library(dplyr) library(lubridate) data <- tribble( ~ pat_id, ~ feature, ~ datetime, ~ value, 1, "lactate", "2021-12-31 23:00:00", 9, 1, "lactate", "2022-01-01 03:41:00", 10, 1, "lactate", "2022-01-01 07:00:00", 11, 1, "blood pressure", "2022-01-01 02:00:00", 120, 1, "blood pressure", "2022-01-01 04:00:00", 115, 1, "blood pressure", "2022-01-01 06:00:00", 118, 1, "6-hour", "2022-01-01 00:00:00", NA_real_, 1, "6-hour", "2022-01-01 06:00:00", NA_real_, 1, "6-hour", "2022-01-01 12:00:00", NA_real_, 1, "6-hour", "2022-01-01 18:00:00", NA_real_, 1, "event", "2022-01-01 08:00:00", NA_real_, 1, "event", "2022-01-01 13:00:00", NA_real_ ) %>% mutate(datetime = as_datetime(datetime)) specs <- tribble( ~ feature, ~ use_for_grid, ~ lookback_start, ~ lookback_end, ~ aggregation, "lactate", TRUE, 0, Inf, "ts", "lactate", TRUE, 0, Inf, "lvcf", "blood pressure", FALSE, 0, 7200, "ts", # two hours "blood pressure", FALSE, 0, 7200, "lvcf", # two hours "6-hour", TRUE, 0, Inf, "n", "event", TRUE, 0, 0, "event" ) exposure <- tibble( pat_id = 1, encounter = 1:2, # optional id exposure_start = as_datetime(c("2022-01-01 00:00:00", "2022-01-01 08:00:00")), exposure_stop = as_datetime(c("2022-01-01 08:00:00", "2022-01-01 13:00:00")), ) time_varying(data, specs, exposure = exposure, time_units = "seconds", n_cores = 1) %>% arrange(pat_id, row_start) ``` Note that the lactate lab from 2021 does not contribute to a new row because it is not inside the exposure window. Note also that the look back is ignored for aggregation of type "event". # Overclocking the `time_varying()` function ## What if a variable doesn't change over the exposure? Run two tv's and merge the results: one for static variables and one for dynamic variables. ## Multiple exposures per person If there are multiple encounters per person, you can just add another row to the exposure data, and tag it with another id column which gets carried forward by `time_varying()`. ## Only look back to the beginning of the exposure instead of a fixed look back Simply pass the special value NA in the "lookback_end" column. ## Only look at things from before the exposure instead of a fixed look back Simply pass the special value NA in the "lookback_start" column. You'll probably also want something like "lookback_end = Inf". ## Only break every hour Set `use_for_grid=FALSE` for everything except an "hourly" feature. Then calculate every hour that the patient is at risk; set "feature" to "hourly", set "datetime" to the time stamp, and set "value" to `hour(datetime)`. ## Create an indicator for if a situation applies Start with a dataset of start time and end time. IMPORTANT: be sure no intervals overlap; if they do, the next section would be better suited for you. Otherwise, the pseudocode below should do the trick: ```{r eval=FALSE} data %>% tidyr::pivot_longer(c(start_time, end_time), names_to = "which_time", values_to = "datetime") %>% dplyr::mutate( value = +(which_time == "start_time") ) ``` Then make the specs a "lvcf" with infinite look back. ## Create a count of how many times a situation applies Start with a dataset of start time and end time. The pseudocode below should do the trick: ```{r eval=FALSE} data %>% tidyr::pivot_longer(c(start_time, end_time), names_to = "which_time", values_to = "datetime") %>% dplyr::arrange(pat_id, datetime) %>% dplyr::group_by(pat_id) %>% dplyr::mutate( value = cumsum(ifelse(which_time == "start_time", 1, -1)) ) %>% dplyr::ungroup() ``` Then make the specs a "lvcf" with infinite look back.