Finding things in files

Active Analytics Ltd: posted 5 December 2014 10:26 PST by Chibisi Chima-Okereke


Last time we looked at using a the profr package to obtain the call stack for an R function. If you have inherited some code or are seeking to understand the iternals of a package from CRAN, once you have used a suitable profiler to investigate the function(s) that you are interested in it also helps to be able to find things in files without opening dozens of files and manually paging through them for what you want. Here we present some simple R code based around the grep and list.files functions. The code is located in our GitHub repository.

Code Analysis

Firstly is the ability to find a search pattern in a file and return the line numbers where that search pattern exists and the line content. Below, we have our work horse findInFile function. The purpose of this function is to search for a search pattern in a file and return a data frame with the location of where the search string is located and the line number of the search string. The case variable is whether the search should be case sensitive and it defaults to TRUE.

findInFile <- function(fPath, pattern, case = TRUE){
  # Get file contents as string
  fString <- tryCatch(scan(fPath, what = character(), sep = "\n", quiet = TRUE), 
  			error = function(e)e, warning = function(w){})
  if(class(fString)[1] == "simpleError")
    return(data.frame(File = fPath, Line = 0, Content = "Error, File Not Found!"))
  # Finding item in the file
    locs <- grep(pattern, fString)
    fString <- casefold(fString)
    pattern <- casefold(pattern)
    locs <- grep(pattern, fString)
  output <- data.frame(File = fPath, Line = 0, Content = NA)
  if(length(locs) > 0)
    output <- data.frame(File = fPath, Line = locs, Content = fString[locs])

# Usage: Searching a file for the pattern "someFunction"
# findInFile("path/to/file", "someFunction")

Next we build on this function allowing us to search a list or character vector of file paths:

findInFiles <- function(fPaths, pattern = NULL, case = TRUE){
  output <- lapply(fPaths, findInFile, pattern = pattern, case = case)

# Usage: Searching a vector or list of files for the the search pattern "someFunction"
# findInFiles(c("vector", "or", "list", "of", "file", "paths"), "someFunction")

Next we build on this function and extend the functionality to allow us to search for terms in files in a folder with a particular pattern names fpattern:

findInFolder <- function(fPath, pattern, fpattern = NULL, case = TRUE){
  fPaths <- list.files(fPath, fpattern, all.files = TRUE, full.names = TRUE)
  return(findInFiles(fPaths, pattern, case))

# Usage: Searching the files in a folder with file names ending with ".r" or ".R" for the search pattern "someFunction" ...
# findInFolder("path/to/folder", "someFunction", "[.][rR]$")

That's pretty much it. Now you can search for terms in files/folders and locate code of interest to you.

Data Science Consulting & Software Training

Active Analytics Ltd. is a data science consultancy, and Open Source Statistical Software Training company. Please contact us for more details or to comment on the blog.

Dr. Chibisi Chima-Okereke, R Training, Statistics and Data Analysis.