Using other input formats

The main aim of the CEdecisiontree package is to provide a bridge to models that would be otherwise build in Excel. Because of this the transition matrix format for input arguments is the primary format. However, this is not necessarily the easiest to define, manipulate or compute with. Here we give some examples alternative formats.

Setup

Quietly load libraries.

library(CEdecisiontree)
library(readr)
library(dplyr)
library(reshape2)
library(tidyr)
library(assertthat)

Load example data from the package.

data("cost")
data("probs")

Tree structure

Generally, we can specify a tree by who the children are for each parent node. This can be more compuationally efficient.

tree <-
 list("1" = c(2,3),
      "2" =  c(4,5),
      "3" =  c(6,7),
      "4" =  c(),
      "5" =  c(),
      "6" =  c(),
      "7" =  c())
dat <-
 data.frame(node = 1:7,
            prob = c(NA, 0.2, 0.8, 0.2, 0.8, 0.2, 0.8),
            vals = c(0,10,1,10,1,10,1))
tree
#> $`1`
#> [1] 2 3
#> 
#> $`2`
#> [1] 4 5
#> 
#> $`3`
#> [1] 6 7
#> 
#> $`4`
#> NULL
#> 
#> $`5`
#> NULL
#> 
#> $`6`
#> NULL
#> 
#> $`7`
#> NULL
dat
#>   node prob vals
#> 1    1   NA    0
#> 2    2  0.2   10
#> 3    3  0.8    1
#> 4    4  0.2   10
#> 5    5  0.8    1
#> 6    6  0.2   10
#> 7    7  0.8    1

dectree_expected_recursive(names(tree)[1], tree, dat)
#> [1] 5.6

We can obtain the list of children from the probability matrix (or any other structure defining transition matrix).

transmat_to_child_list(probs)
#> $`1`
#> [1] 2 3
#> 
#> $`2`
#> [1] 4 5
#> 
#> $`3`
#> [1] 6 7
#> 
#> $`4`
#> integer(0)
#> 
#> $`5`
#> integer(0)
#> 
#> $`6`
#> integer(0)
#> 
#> $`7`
#> integer(0)

Single long array

If we keep with flat arrays then clearly, as the size of the tree increased the sparse matrices become impractical. We can provide a long format array to address this. Let us transform the wide array used previously to demonstrate the structure and space saving.

probs_long <-
  probs %>%
  mutate('from' = rownames(.)) %>%
  melt(id.vars = "from",
       variable.name = 'to',
       value.name = 'prob') %>%
  mutate(to = as.numeric(to)) %>% 
  na.omit()

cost_long <-
  cost %>%
  mutate('from' = rownames(.)) %>%
  melt(id.vars = "from",
       variable.name = 'to',
       value.name = 'vals') %>%
  mutate(to = as.numeric(to)) %>% 
  na.omit()

dat_long <-
  merge(probs_long,
        cost_long)

dat_long
#>   from to prob vals
#> 1    1  2  0.2   10
#> 2    1  3  0.8    1
#> 3    2  4  0.2   10
#> 4    2  5  0.8    1
#> 5    3  6  0.2   10
#> 6    3  7  0.8    1

We can use the long array as the input argument instead of the separate transition matrices. Internally, we simple convert back to a matrix using long_to_transmat() so for larger trees this may be inefficient.

dectree_expected_values(
  define_model(dat_long = dat_long))
#> vals used for calculation.
#>    1    2    3    4    5    6    7    8 
#>  5.6 12.8  3.8 10.0  1.0 10.0  1.0  0.0

Computation speed

We can compare the computation times for the recursive and non-recursive formulations.

microbenchmark::microbenchmark(dectree_expected_values(define_model(dat_long = dat_long)),
                               dectree_expected_recursive(names(tree)[1], tree, dat), times = 100L)
#> Unit: microseconds
#>                                                        expr     min       lq
#>  dectree_expected_values(define_model(dat_long = dat_long)) 11565.9 11931.40
#>       dectree_expected_recursive(names(tree)[1], tree, dat)    27.5    28.95
#>       mean   median       uq     max neval cld
#>  12966.286 12500.65 13496.80 20457.3   100   b
#>     37.739    41.70    44.15    62.4   100  a

For this example the recursive formulation is much quicker. Change in memory before and after running the functions.

pryr::mem_change(dectree_expected_values(  define_model(dat_long = dat_long)))
#> vals used for calculation.
#> -3.69 kB
pryr::mem_change(dectree_expected_recursive(names(tree)[1], tree, dat))
#> 528 B

Nathan Green

2022-08-17

Setup

Tree structure

Single long array

Computation speed