Combining Dict and data.table

The dict.table is a combination of dict and data.table and basically can be considered a data.table with unique column names and an extended set of functions to add, extract and remove data columns with the goal to further facilitate code development using data.table. A dict.table object provides all dict and data.table functions and operators at the same time.

Usage

dict.table(...)

as.dict.table(x, ...)

# S3 method for class 'data.table'
as.dict.table(x, copy = TRUE, ...)

is.dict.table(x)

# S3 method for class 'dict.table'
rbind(x, ...)

# S3 method for class 'dict.table'
cbind(x, ...)

Arguments

...: elements put into the dict.table and/or additional arguments to be passed on.
x: any R object or a dict.table object.
copy: if TRUE creates a copy of the data.table object otherwise works on the passed object by reference.

Details

Methods that alter dict.table objects usually come in two versions providing either copy or reference semantics where the latter start with 'ref_' to note the reference semantic, for example, add() and ref_add().

dict.table(...) initializes and returns a dict object.

as.dict.table(x, ...) coerce x to a dict.table

is.dict.table(x) check if x is a dict.table

add(.x, ...) and ref_add(.x, ...) add columns to .x. If the column name already exists, an error is given.

at(.x, ...) returns the columns at the given indices. Indices can be letters or numbers or both. All columns must exist.

at2(x, index) returns the column at the given index or signals an error if not found.

clear(x) and ref_clear(x) remove all elements from x.

clone(x) create a copy of x.

delete_at(.x, ...) and ref_delete_at(.x, ...) find and remove columns either by name or index (or both). If one or more columns don't exist, an error is signaled.

discard_at(.x, ...) and ref_discard_at(.x, ...) find and remove columns either by name or index (or both). Invalid column indices are ignored.

has(x, column) check if some column is in dict.table object.

has_name(x, name) check if x has the given column name.

is_empty(x) TRUE if object is empty otherwise FALSE

peek_at(x, ..., .default = NULL) returns the columns at the given indices or (if not found) columns with the given default value.

peek_at2(x, index, default = NULL) return column named index if it exist otherwise the given default value. If the default length does not match the number of rows, it is recycled accordingly and a warning is given, unless the default value has a length of 1, in which case recycling is done silently.

ref_pop(.x, index) return element at given column index and remove the column from the dict.table object.

rename(.x, old, new) and ref_rename(.x, old, new) rename one or more columns from old to new, respectively, by copy and in place (i.e. by reference).

replace_at(.x, .., .add = FALSE) and ref_replace_at(.x, ..., .add = FALSE) replace values at given indices. If a given index is invalid, an error is signaled unless .add was set to TRUE.

update(object, other) and ref_update(object, other) adds columns of other dict that are not yet in object and replaces the values at existing columns.

Examples

# Some basic examples using some typical data.table and dict operations.
# The constructor can take the 'key' argument known from data.table:
require(data.table)
#> Loading required package: data.table
dit = dict.table(x = rep(c("b","a","c"), each = 3), y = c(1,3,6), key = "y")
print(dit)
#> <dict.table> with 9 rows and 2 columns
#> Key: <y>
#>         x     y
#>    <char> <num>
#> 1:      b     1
#> 2:      a     1
#> 3:      c     1
#> 4:      b     3
#> 5:      a     3
#> 6:      c     3
#> 7:      b     6
#> 8:      a     6
#> 9:      c     6
setkey(dit, "x")                             # sort by 'x'
print(dit)
#> <dict.table> with 9 rows and 2 columns
#> Key: <x>
#>         x     y
#>    <char> <num>
#> 1:      a     1
#> 2:      a     3
#> 3:      a     6
#> 4:      b     1
#> 5:      b     3
#> 6:      b     6
#> 7:      c     1
#> 8:      c     3
#> 9:      c     6
(add(dit, "v" = 1:9))                        # add column v = 1:9
#> <dict.table> with 9 rows and 3 columns
#> Key: <x>
#>         x     y     v
#>    <char> <num> <int>
#> 1:      a     1     1
#> 2:      a     3     2
#> 3:      a     6     3
#> 4:      b     1     4
#> 5:      b     3     5
#> 6:      b     6     6
#> 7:      c     1     7
#> 8:      c     3     8
#> 9:      c     6     9
dit[y > 5]
#> <dict.table> with 3 rows and 2 columns
#> Key: <x>
#>         x     y
#>    <char> <num>
#> 1:      a     6
#> 2:      b     6
#> 3:      c     6
(ref_discard_at(dit, "x"))                   # discard column 'x'
#> <dict.table> with 9 rows and 1 column
#>        y
#>    <num>
#> 1:     1
#> 2:     3
#> 3:     6
#> 4:     1
#> 5:     3
#> 6:     6
#> 7:     1
#> 8:     3
#> 9:     6

try(at(dit, "x"))                            # index 'x' not found
#> Error : index 'x' not found
try(replace_at(dit, x = 0))                  # cannot be replaced, if it does not exist
#> Error : column(s) not found: 'x'

dit = replace_at(dit, x = 0, .add = TRUE)    # ok - re-adds column 'x' with all 0s
peek_at(dit, "x")                            # glance at column 'x'
#> <dict.table> with 9 rows and 1 column
#>        x
#>    <num>
#> 1:     0
#> 2:     0
#> 3:     0
#> 4:     0
#> 5:     0
#> 6:     0
#> 7:     0
#> 8:     0
#> 9:     0
has_name(dit, "x")                           # TRUE
#> [1] TRUE
ref_pop(dit, "x")                            # get column and remove it
#> [1] 0 0 0 0 0 0 0 0 0
has_name(dit, "x")                           # FALSE
#> [1] FALSE


# Copy and reference semantics when coercing *from* a data.table
dat = data.table(a = 1, b = 2)
dit = as.dict.table(dat)
is.dict.table(dit)                           # TRUE
#> [1] TRUE
is.dict.table(dat)                           # FALSE
#> [1] FALSE
ref_replace_at(dit, "a", 9)
dit[["a"]]                                   # 9
#> [1] 9
dat[["a"]]                                   # 1
#> [1] 1
dit.dat = as.dict.table(dat, copy = FALSE)   # init by reference
ref_replace_at(dit.dat, "a", 9)
dat[["a"]]                                   # 9
#> [1] 9
is.dict.table(dit.dat)                       # TRUE
#> [1] TRUE
is.dict.table(dat)                           # TRUE now as well!
#> [1] TRUE

# Coerce from dict
d = dict(a = 1, b = 1:3)
as.dict.table(d)
#> <dict.table> with 3 rows and 2 columns
#>        a     b
#>    <num> <int>
#> 1:     1     1
#> 2:     1     2
#> 3:     1     3

dit = dict.table(a = 1:2, b = 1:2)
rbind(dit, dit)
#> <dict.table> with 4 rows and 2 columns
#>        a     b
#>    <int> <int>
#> 1:     1     1
#> 2:     2     2
#> 3:     1     1
#> 4:     2     2

# rbind ...
dit = dict.table(a = 1:2, b = 1:2)
rbind(dit, dit)
#> <dict.table> with 4 rows and 2 columns
#>        a     b
#>    <int> <int>
#> 1:     1     1
#> 2:     2     2
#> 3:     1     1
#> 4:     2     2

# ... can be mixed with data.tables
dat = data.table(a = 3:4, b = 3:4)
rbind(dit, dat)  # yields a dict.table
#> <dict.table> with 4 rows and 2 columns
#>        a     b
#>    <int> <int>
#> 1:     1     1
#> 2:     2     2
#> 3:     3     3
#> 4:     4     4
rbind(dat, dit)  # yields a data.table
#>        a     b
#>    <int> <int>
#> 1:     3     3
#> 2:     4     4
#> 3:     1     1
#> 4:     2     2

# cbind ...
dit = dict.table(a = 1:2, b = 1:2)
dit2 = dict.table(c = 3:4, d = 5:6)
cbind(dit, dit2)
#> <dict.table> with 2 rows and 4 columns
#>        a     b     c     d
#>    <int> <int> <int> <int>
#> 1:     1     1     3     5
#> 2:     2     2     4     6

# ... can be mixed with data.tables
dat = data.table(x = 3:4, y = 3:4)
cbind(dit, dat)
#> <dict.table> with 2 rows and 4 columns
#>        a     b     x     y
#>    <int> <int> <int> <int>
#> 1:     1     1     3     3
#> 2:     2     2     4     4

dit = dict.table(a = 1:3)
add(dit, b = 3:1, d = 4:6)
#> <dict.table> with 3 rows and 3 columns
#>        a     b     d
#>    <int> <int> <int>
#> 1:     1     3     4
#> 2:     2     2     5
#> 3:     3     1     6

try(add(dit, a = 7:9))  # column 'a' already exists
#> Error : name 'a' exists already

dit = dict.table(a = 1:3, b = 4:6)
at(dit, "a")
#> <dict.table> with 3 rows and 1 column
#>        a
#>    <int>
#> 1:     1
#> 2:     2
#> 3:     3
at(dit, 2)
#> <dict.table> with 3 rows and 1 column
#>        b
#>    <int>
#> 1:     4
#> 2:     5
#> 3:     6
at(dit, "a", 2)
#> <dict.table> with 3 rows and 2 columns
#>        a     b
#>    <int> <int>
#> 1:     1     4
#> 2:     2     5
#> 3:     3     6
try(at(dit, "x"))     # index 'x' not found
#> Error : index 'x' not found
try(at(dit, 1:3))     # index 3 exceeds length of dict.table
#> Error : index 3 exceeds length of dict.table, which is 2

dit = dict.table(a = 1:3, b = 4:6)
at2(dit, 1)
#> [1] 1 2 3
at2(dit, "a")
#> [1] 1 2 3
at2(dit, 2)
#> [1] 4 5 6
try(at2(dit, "x"))     # index 'x' not found
#> Error : index 'x' not found
try(at2(dit, 5))       # index 5 exceeds length of dict.table
#> Error : index 5 exceeds length of dict.table, which is 2

dit = dict.table(a = 1, b = 2)
clear(dit)
#> <dict.table> with 0 rows and 0 columns
#> Null data.table (0 rows and 0 cols)
dit
#> <dict.table> with 1 row and 2 columns
#>        a     b
#>    <num> <num>
#> 1:     1     2
ref_clear(dit)
dit
#> <dict.table> with 0 rows and 0 columns
#> Null data.table (0 rows and 0 cols)

d = dict.table(a = 1:2, b = 3:4)
d2 = clone(d)
ref_clear(d)
print(d2)
#> <dict.table> with 2 rows and 2 columns
#>        a     b
#>    <int> <int>
#> 1:     1     3
#> 2:     2     4

(dit = as.dict.table(head(sleep)))
#> <dict.table> with 6 rows and 3 columns
#>    extra  group     ID
#>    <num> <fctr> <fctr>
#> 1:   0.7      1      1
#> 2:  -1.6      1      2
#> 3:  -0.2      1      3
#> 4:  -1.2      1      4
#> 5:  -0.1      1      5
#> 6:   3.4      1      6
delete_at(dit, "ID")
#> <dict.table> with 6 rows and 2 columns
#>    extra  group
#>    <num> <fctr>
#> 1:   0.7      1
#> 2:  -1.6      1
#> 3:  -0.2      1
#> 4:  -1.2      1
#> 5:  -0.1      1
#> 6:   3.4      1
delete_at(dit, "ID", 1)
#> <dict.table> with 6 rows and 1 column
#>     group
#>    <fctr>
#> 1:      1
#> 2:      1
#> 3:      1
#> 4:      1
#> 5:      1
#> 6:      1

try({
delete_at(dit, "foo")   # Column 'foo' not in dict.table
})
#> Error : column(s) not found: 'foo'

dit = as.dict.table(head(sleep))
discard_at(dit, "ID")
#> <dict.table> with 6 rows and 2 columns
#>    extra  group
#>    <num> <fctr>
#> 1:   0.7      1
#> 2:  -1.6      1
#> 3:  -0.2      1
#> 4:  -1.2      1
#> 5:  -0.1      1
#> 6:   3.4      1
discard_at(dit, "ID", 1)
#> <dict.table> with 6 rows and 1 column
#>     group
#>    <fctr>
#> 1:      1
#> 2:      1
#> 3:      1
#> 4:      1
#> 5:      1
#> 6:      1
discard_at(dit, "foo")  # ignored
#> <dict.table> with 6 rows and 3 columns
#>    extra  group     ID
#>    <num> <fctr> <fctr>
#> 1:   0.7      1      1
#> 2:  -1.6      1      2
#> 3:  -0.2      1      3
#> 4:  -1.2      1      4
#> 5:  -0.1      1      5
#> 6:   3.4      1      6

dit = dict.table(a = 1:3, b = as.list(4:6))
has(dit, 1:3)            # TRUE
#> [1] TRUE
has(dit, 4:6)            # FALSE
#> [1] FALSE
has(dit, as.list(4:6))   # TRUE
#> [1] TRUE

dit = dict.table(a = 1, b = 2)
has_name(dit, "a")    # TRUE
#> [1] TRUE
has_name(dit, "x")    # FALSE
#> [1] FALSE

d = dict.table(a = 1:4, b = 4:1)
is_empty(d)
#> [1] FALSE
is_empty(clear(d))
#> [1] TRUE

dit = dict.table(a = 1:3, b = 4:6)
peek_at(dit, "a")
#> <dict.table> with 3 rows and 1 column
#>        a
#>    <int>
#> 1:     1
#> 2:     2
#> 3:     3
peek_at(dit, 1)
#> <dict.table> with 3 rows and 1 column
#>        a
#>    <int>
#> 1:     1
#> 2:     2
#> 3:     3
peek_at(dit, 3)
#> <dict.table> with 0 rows and 0 columns
#> Null data.table (0 rows and 0 cols)
peek_at(dit, "x")
#> <dict.table> with 0 rows and 0 columns
#> Null data.table (0 rows and 0 cols)
peek_at(dit, "x", .default = 0)
#> <dict.table> with 3 rows and 1 column
#>        x
#>    <num>
#> 1:     0
#> 2:     0
#> 3:     0
peek_at(dit, "a", "x", .default = 0)
#> <dict.table> with 3 rows and 2 columns
#>        a     x
#>    <int> <num>
#> 1:     1     0
#> 2:     2     0
#> 3:     3     0

dit = dict.table(a = 1:3, b = 4:6)
peek_at2(dit, "a")
#> [1] 1 2 3
peek_at2(dit, 1)
#> [1] 1 2 3
peek_at2(dit, 3)
#> NULL
peek_at2(dit, 3, default = 9)
#> [1] 9 9 9
peek_at2(dit, "x")
#> NULL
peek_at2(dit, "x", default = 0)
#> [1] 0 0 0

dit = dict.table(a = 1:3, b = 4:6)
ref_pop(dit, "a")
#> [1] 1 2 3
ref_pop(dit, 1)
#> [1] 4 5 6

try({
ref_pop(dit, "x")  # index 'x' not found
})
#> Error : index 'x' not found

dit = dict.table(a = 1, b = 2, c = 3)
rename(dit, c("a", "b"), c("a1", "y"))
#> <dict.table> with 1 row and 3 columns
#>       a1     y     c
#>    <num> <num> <num>
#> 1:     1     2     3
print(dit)
#> <dict.table> with 1 row and 3 columns
#>        a     b     c
#>    <num> <num> <num>
#> 1:     1     2     3
ref_rename(dit, c("a", "b"), c("a1", "y"))
print(dit)
#> <dict.table> with 1 row and 3 columns
#>       a1     y     c
#>    <num> <num> <num>
#> 1:     1     2     3

dit = dict.table(a = 1:3)
replace_at(dit, "a", 3:1)
#> <dict.table> with 3 rows and 1 column
#>        a
#>    <int>
#> 1:     3
#> 2:     2
#> 3:     1

try({
replace_at(dit, "b", 4:6)               # column 'b' not in dict.table
})
#> Error : column(s) not found: 'b'
replace_at(dit, "b", 4:6, .add = TRUE)  # ok, adds column
#> <dict.table> with 3 rows and 2 columns
#>        a     b
#>    <int> <int>
#> 1:     1     4
#> 2:     2     5
#> 3:     3     6

# Update parts of tables (second overwrites columns of the first)
dit1 = dict.table(a = 1:2, b = 3:4)
dit2 = dict.table(         b = 5:6, c = 8:9)
update(dit1, dit2)
#> <dict.table> with 2 rows and 3 columns
#>        a     b     c
#>    <int> <int> <int>
#> 1:     1     5     8
#> 2:     2     6     9
update(dit2, dit1)
#> <dict.table> with 2 rows and 3 columns
#>        b     c     a
#>    <int> <int> <int>
#> 1:     3     8     1
#> 2:     4     9     2

Usage

Arguments

Details

See also

Examples