How safe is sapply?

Active Analytics Ltd: posted 1 December 2014 17:24 PST by Chibisi Chima-Okereke

Introduction

Consider a function whose return type is not known give a set of known argument types. One would desire that at least a given set of argument types would always give a consistent return type.

In R the sapply function applys a given function to each element of a list or vector data type and if the data can be coerced returns a vector, matrix or array and if the items cannot be coerced returns a list.

Lets say I would like to know what the class of each column in a table is. Here is my table:

# Create the data frame
x <- data.frame(matrix(runif(50), nc = 5))
# Copy the data frame to y
x; y <- x
           X1        X2        X3        X4         X5
1  0.38978750 0.5917161 0.8253375 0.9212409 0.81378414
2  0.19698452 0.9485623 0.5185512 0.4549364 0.08553984
3  0.04561459 0.7605560 0.6194432 0.2125995 0.45285563
4  0.12896352 0.9170430 0.1848815 0.6485884 0.85803866
5  0.86279157 0.8874200 0.9246772 0.5906202 0.93091009
6  0.56188260 0.5375034 0.7805531 0.7583699 0.73940692
7  0.93033282 0.8891776 0.1700843 0.0617091 0.96898827
8  0.12256712 0.3718221 0.5720081 0.9107144 0.63610281
9  0.54072288 0.5103223 0.7364012 0.9315216 0.51058452
10 0.77529338 0.1871945 0.1663368 0.2746300 0.26960322

# Change one of the columns in y to POSIXct time class
y[,5] <- as.POSIXct(seq(1, 10), origin = "2014-01-01")
y
           X1        X2        X3        X4                  X5
1  0.38978750 0.5917161 0.8253375 0.9212409 2013-12-31 16:00:01
2  0.19698452 0.9485623 0.5185512 0.4549364 2013-12-31 16:00:02
3  0.04561459 0.7605560 0.6194432 0.2125995 2013-12-31 16:00:03
4  0.12896352 0.9170430 0.1848815 0.6485884 2013-12-31 16:00:04
5  0.86279157 0.8874200 0.9246772 0.5906202 2013-12-31 16:00:05
6  0.56188260 0.5375034 0.7805531 0.7583699 2013-12-31 16:00:06
7  0.93033282 0.8891776 0.1700843 0.0617091 2013-12-31 16:00:07
8  0.12256712 0.3718221 0.5720081 0.9107144 2013-12-31 16:00:08
9  0.54072288 0.5103223 0.7364012 0.9315216 2013-12-31 16:00:09
10 0.77529338 0.1871945 0.1663368 0.2746300 2013-12-31 16:00:10
Now consider the following:
sapply(x, class)
       X1        X2        X3        X4        X5 
"numeric" "numeric" "numeric" "numeric" "numeric"
sapply(y, class)
$X1
[1] "numeric"

$X2
[1] "numeric"

$X3
[1] "numeric"

$X4
[1] "numeric"

$X5
[1] "POSIXct" "POSIXt"

In the above evalutation the types of both items entered in are data frame but the return item in one case is a vector and another case is a list. This kind of unexpected behaviour can cause all kinds of bother. One of the contributary elements of this problem is that classes in R are characters which are themselves vectors, so the POSIXct class in this case is defined by a character of length two. We should comment that this problem does not exist in Julia since there is a definite type class so that any defined class is identified by an entity that is itself a class (or type), and so arrays of classes are such that there can be only one element for each class.

A safer thing to do is to use the lapply function:

lapply(x, class)
$X1
[1] "numeric"

$X2
[1] "numeric"

$X3
[1] "numeric"

$X4
[1] "numeric"

$X5
[1] "numeric"

lapply(y, class)
$X1
[1] "numeric"

$X2
[1] "numeric"

$X3
[1] "numeric"

$X4
[1] "numeric"

$X5
[1] "POSIXct" "POSIXt" 

To return vector, matrix, or array types, a safer option is the vapply function which checks that all the objects returned are of common type and are of the same dimension as a submitted model. An example use of vapply:

vapply(rep(4, 4), runif, FUN.VALUE = rep(1, 4))
          [,1]      [,2]       [,3]      [,4]
[1,] 0.7801901 0.6784725 0.08042061 0.7888491
[2,] 0.9243664 0.8080259 0.77640415 0.5587008
[3,] 0.9915787 0.1311069 0.98725875 0.2628835
[4,] 0.3421723 0.1678438 0.35767812 0.1722526

In summary when you are iterating over a vector or list and returning a generic object use lapply, and if you are returning a basic type where each item returned has the same dimension use vapply.

Data Science Consulting & Software Training

Active Analytics Ltd. is a data science consultancy, and Open Source Statistical Software Training company. Please contact us for more details or to comment on the blog.

Dr. Chibisi Chima-Okereke, R Training, Statistics and Data Analysis.