subNonStandardCharacters {Ecdat}R Documentation

sub nonstandard characters with replacement

Description

Find the first and last character not in standardCharacters and replace all between them with replacement. For example, a string like "Ruben" where "e" carries and accent and is mangled by some software would become something like "Rub_n" using the default values for standardCharacters and replacement.

Usage

subNonStandardCharacters(x,
   standardCharacters=c(letters, LETTERS, ' ','.', ',', 0:9,
      '\"', "\'", '-', '_', '(', ')', '[', ']', '\n'),
   replacement='_',
   gsubList=list(list(pattern='\\\\\\\\|\\\\',
      replacement='\"')),
   ... )

Arguments

x

character vector in which it is desired to find the first and last character not in standardCharacters and replace that substring by replacement.

standardCharacters

a character vector of acceptable characters to keep.

replacement

a character to replace the subtring starting and ending with characters not in standardCharacters.

gsubList

list of lists of pattern and replacement arguments to be called in succession before looking for nonStandardCharacters

...

optional arguments passed to strsplit

Details

1. for(il in 1:length(gsubList))x <- gsub( gsubList[[il]][["pattern"]], gsublist[[il]][['replacement']], x)

2. nx <- length(x)

3. x. <- strsplit(x, "", ...)

4. for(ix in 1:nx) find the first and last standardCharacters in x.[ix] and substitute replacement for everything in between.

Value

a character vector with everthing between the first and last character not in standardCharacters replaced by replacement.

Author(s)

Spencer Graves

See Also

sub, strsplit, grepNonStandardCharacters, subNonStandardNames encoded_text_to_latex subNonStandardNames

Examples

# Consider Names = Ruben, Avila and Jose, where "e" and "A" in
#    these examples carry an accent.  With the default values
#    for standardCharacters and replacement, these would become
#    Rub_en, _vila, and Jos_.
#    (The standard checks for R packages complains about
#    non-standard characters, so none are included here.)
#
Names <- c('Ra`l', 'Ra`', '`l', 'Torres, Raul',
           "Robert C. \\Bobby\\\\")
#  confusion in character sets can create
#  names like Names[2]
Name2 <- subNonStandardCharacters(Names)

Name2. <- c('Ra_l', 'Ra_', '_l', Names[4],
            'Robert C. "Bobby"')


all.equal(Name2, Name2.)



[Package Ecdat version 0.2-4 Index]