Using other input formats
Nathan Green
2022-08-17
Source:vignettes/other-input-formats.Rmd
other-input-formats.Rmd
The main aim of the CEdecisiontree package is to provide a bridge to models that would be otherwise build in Excel. Because of this the transition matrix format for input arguments is the primary format. However, this is not necessarily the easiest to define, manipulate or compute with. Here we give some examples alternative formats.
Setup
Quietly load libraries.
library(CEdecisiontree)
library(readr)
library(dplyr)
library(reshape2)
library(tidyr)
library(assertthat)
Load example data from the package.
Tree structure
Generally, we can specify a tree by who the children are for each parent node. This can be more compuationally efficient.
tree <-
list("1" = c(2,3),
"2" = c(4,5),
"3" = c(6,7),
"4" = c(),
"5" = c(),
"6" = c(),
"7" = c())
dat <-
data.frame(node = 1:7,
prob = c(NA, 0.2, 0.8, 0.2, 0.8, 0.2, 0.8),
vals = c(0,10,1,10,1,10,1))
tree
#> $`1`
#> [1] 2 3
#>
#> $`2`
#> [1] 4 5
#>
#> $`3`
#> [1] 6 7
#>
#> $`4`
#> NULL
#>
#> $`5`
#> NULL
#>
#> $`6`
#> NULL
#>
#> $`7`
#> NULL
dat
#> node prob vals
#> 1 1 NA 0
#> 2 2 0.2 10
#> 3 3 0.8 1
#> 4 4 0.2 10
#> 5 5 0.8 1
#> 6 6 0.2 10
#> 7 7 0.8 1
dectree_expected_recursive(names(tree)[1], tree, dat)
#> [1] 5.6
We can obtain the list of children from the probability matrix (or any other structure defining transition matrix).
transmat_to_child_list(probs)
#> $`1`
#> [1] 2 3
#>
#> $`2`
#> [1] 4 5
#>
#> $`3`
#> [1] 6 7
#>
#> $`4`
#> integer(0)
#>
#> $`5`
#> integer(0)
#>
#> $`6`
#> integer(0)
#>
#> $`7`
#> integer(0)
Single long array
If we keep with flat arrays then clearly, as the size of the tree increased the sparse matrices become impractical. We can provide a long format array to address this. Let us transform the wide array used previously to demonstrate the structure and space saving.
probs_long <-
probs %>%
mutate('from' = rownames(.)) %>%
melt(id.vars = "from",
variable.name = 'to',
value.name = 'prob') %>%
mutate(to = as.numeric(to)) %>%
na.omit()
cost_long <-
cost %>%
mutate('from' = rownames(.)) %>%
melt(id.vars = "from",
variable.name = 'to',
value.name = 'vals') %>%
mutate(to = as.numeric(to)) %>%
na.omit()
dat_long <-
merge(probs_long,
cost_long)
dat_long
#> from to prob vals
#> 1 1 2 0.2 10
#> 2 1 3 0.8 1
#> 3 2 4 0.2 10
#> 4 2 5 0.8 1
#> 5 3 6 0.2 10
#> 6 3 7 0.8 1
We can use the long array as the input argument instead of the
separate transition matrices. Internally, we simple convert back to a
matrix using long_to_transmat()
so for larger trees this
may be inefficient.
dectree_expected_values(
define_model(dat_long = dat_long))
#> vals used for calculation.
#> 1 2 3 4 5 6 7 8
#> 5.6 12.8 3.8 10.0 1.0 10.0 1.0 0.0
Computation speed
We can compare the computation times for the recursive and non-recursive formulations.
microbenchmark::microbenchmark(dectree_expected_values(define_model(dat_long = dat_long)),
dectree_expected_recursive(names(tree)[1], tree, dat), times = 100L)
#> Unit: microseconds
#> expr min lq
#> dectree_expected_values(define_model(dat_long = dat_long)) 11565.9 11931.40
#> dectree_expected_recursive(names(tree)[1], tree, dat) 27.5 28.95
#> mean median uq max neval cld
#> 12966.286 12500.65 13496.80 20457.3 100 b
#> 37.739 41.70 44.15 62.4 100 a
For this example the recursive formulation is much quicker. Change in memory before and after running the functions.
pryr::mem_change(dectree_expected_values( define_model(dat_long = dat_long)))
#> vals used for calculation.
#> -3.69 kB
pryr::mem_change(dectree_expected_recursive(names(tree)[1], tree, dat))
#> 528 B