readNamedRegion {XLConnect} | R Documentation |
Reading named regions from a workbook
Description
Reads named regions from a workbook
.
Usage
## S4 method for signature 'workbook'
readNamedRegion(object, name, header, rownames, colTypes, forceConversion,
dateTimeFormat, check.names, useCachedValues, keep, drop, simplify, readStrategy)
Arguments
object |
The workbook to use
|
name |
The name of the named region to read
|
header |
The argument header specifies if the first row
should be interpreted as column names. The default value is TRUE .
|
rownames |
Index (numeric ) or name (character ) of column that
should be used as row names. The corresponding column will be removed from the data
set. Defaults to NULL which means that no row names are applied. Row names must be either
integer or character . Non-numeric columns will be coerced to character .
|
colTypes |
Column types to use when reading in the data. Specified as a character
vector of the corresponding type names (see XLC ; XLC$DATA_TYPE.<?> ). You
may also use R class names such as numeric , character , logical and POSIXt .
The types are applied in the given order to the columns - elements are recycled if necessary.
Defaults to character(0) meaning that column types are determined automatically (see the
Note section for more information).
By default, type conversions are only applied if the specified column type is a more generic type
(e.g. from Numeric to String) - otherwise NA is returned. The forceConversion flag can be
set to force conversion into less generic types where possible.
|
forceConversion |
logical specifying if conversions to less generic types should be
forced. Defaults to FALSE meaning that if a column is specified
to be of a certain type via the colTypes argument and a more generic type is detected in
the column, then NA will be returned (example: column is specified to be DateTime but a
more generic String is found). Specifying forceConversion = TRUE will try to enforce a
conversion - if it succeeds the corresponding (converted) value will be returned, otherwise
NA . See the Note section for some additional information.
|
dateTimeFormat |
Date/time format used when doing date/time conversions. Defaults to
getOption("XLConnect.dateTimeFormat") . This should be a POSIX format specifier according
to strptime although not all specifications have been implemented yet - the most
important ones however are available.
|
check.names |
logical specifying if column names of the resulting data.frame
should be checked to ensure that they are syntactically valid valid variable names and are not
duplicated. See the check.names argument of data.frame . Defaults to TRUE .
|
useCachedValues |
logical specifying whether to read cached formula results from the
workbook instead of re-evaluating them. This is particularly helpful in cases for reading data
produced by Excel features not supported in XLConnect like references to external workbooks.
Defaults to FALSE , which means that formulas will be evaluated by XLConnect.
|
keep |
List of column names or indices to be kept in the output data frame.
It is possible to specify either keep or drop , but not both at the same time.
Defaults to NULL . If a vector is passed as argument, it will be wrapped into a list.
This list gets replicated to match the length of the other arguments.
Example: if name = c("NamedRegion1", "NamedRegion2", "NamedRegion3") and keep = c(1,2) , keep will be
internally converted into list(c(1,2)) and then replicated to match the number of named regions, i.e.
keep = list(c(1,2), c(1,2), c(1,2)) . The result is that the first two columns of each named region
are kept. If keep = list(1,2) is specified, it will be replicated as list(1,2,1) , i.e. respectively
the first, second and first column of the named regions "NamedRegion1", "NamedRegion2", "NamedRegion3"
will be kept.
|
drop |
List of column names or indices to be dropped in the output data frame.
It is possible to specify either keep or drop , but not both at the same time.
Defaults to NULL . If a vector is passed as argument, it will be wrapped into a list.
This list gets replicated to match the length of the other arguments.
Example: if name = c("NamedRegion1", "NamedRegion2", "NamedRegion3") and drop = c(1,2) , drop will be
internally converted into list(c(1,2)) and then replicated to match the number of named regions, i.e.
drop = list(c(1,2), c(1,2), c(1,2)) . The result is that the first two columns of each named region
are dropped. If drop = list(1,2) is specified, it will be replicated as list(1,2,1) , i.e. respectively
the first, second and first column of the named regions "NamedRegion1", "NamedRegion2", "NamedRegion3"
will be dropped.
|
simplify |
logical specifying if the result should be simplified, e.g. in case the data.frame
would only have one row or one column (and data types match). Simplifying here is identical to calling unlist
on the otherwise resulting data.frame (using use.names = FALSE ). The default is FALSE .
|
readStrategy |
character specifying the reading strategy to use. Currently supported strategies are:
-
"default" (default): Can handle all supported data types incl. date/time values and can deal
directly with missing value identifiers (see setMissingValue )
-
"fast" : Increased read performance. Date/time values are read as numeric (number of days since 1900-01-01;
fractional days represent hours, minutes, and seconds) and only blank cells are recognized as missing (missing value
identifiers as set in setMissingValue are ignored)
|
Details
The arguments name
and header
are vectorized. As such,
multiple named regions can be read with one method call. If only one
single named region is read, the return value is a data.frame
.If
multiple named regions are specified, the return value is a (named)
list
of data.frame
's returned in the order they have been
specified with the argument name
.
Note
If no specific column types (see argument colTypes
) are specified,
readNamedRegion
tries to determine the resulting column types
based on the read cell types. If different cell types are found in a
specific column, the most general of those is used and mapped to the
corresponding R data type. The order of data types from least to most
general is Boolean (logical
) < DateTime (POSIXct
) <
Numeric (numeric
) < String (character
). E.g. if a column
is read that contains cells of type Boolean, Numeric and String then the
resulting column in R would be character
since character
is the most general type.
Some additional information with respect to forcing data type conversion
using forceConversion = TRUE
:
Forcing conversion from String to Boolean: TRUE
is returned
if and only if the target string is "true" (ignoring any capitalization).
Any other string will return FALSE
.
Forcing conversion from Numeric to DateTime: since Excel understands
Dates/Times as Numerics with some additional formatting, a conversion from
a Numeric to a DateTime is actually possible. Numerics in this case represent
the number of days since 1900-01-01. Fractional days represent hours, minutes,
and seconds.
Author(s)
Martin Studer
Thomas Themel
Nicola Lambiase
Mirai Solutions GmbH http://www.mirai-solutions.com
References
What are named regions/ranges?
http://www.officearticles.com/excel/named_ranges_in_microsoft_excel.htm
How to create named regions/ranges?
http://www.youtube.com/watch?v=iAE9a0uRtpM
See Also
workbook
,
readWorksheet
,
writeNamedRegion
,
writeWorksheet
,
readNamedRegionFromFile
,
readTable
,
onErrorCell
Examples
## Example 1:
# mtcars xlsx file from demoFiles subfolder of package XLConnect
demoExcelFile <- system.file("demoFiles/mtcars.xlsx", package = "XLConnect")
# Load workbook
wb <- loadWorkbook(demoExcelFile)
# Read named region 'mtcars' (with default header = TRUE)
data <- readNamedRegion(wb, name = "mtcars")
## Example 2;
# conversion xlsx file from demoFiles subfolder of package XLConnect
excelFile <- system.file("demoFiles/conversion.xlsx", package = "XLConnect")
# Load workbook
wb <- loadWorkbook(excelFile)
# Read named region 'conversion' with pre-specified column types
# Note: in the worksheet all data was entered as strings!
# forceConversion = TRUE is used to force conversion from String
# into the less generic data types Numeric, DateTime & Boolean
df <- readNamedRegion(wb, name = "conversion", header = TRUE,
colTypes = c(XLC$DATA_TYPE.NUMERIC,
XLC$DATA_TYPE.DATETIME,
XLC$DATA_TYPE.BOOLEAN),
forceConversion = TRUE,
dateTimeFormat = "%Y-%m-%d %H:%M:%S")
## Example 3:
# mtcars xlsx file from demoFiles subfolder of package XLConnect
demoExcelFile <- system.file("demoFiles/mtcars.xlsx", package = "XLConnect")
# Load workbook
wb <- loadWorkbook(demoExcelFile)
# Read the columns 1, 3 and 5 of the named region 'mtcars' (with default header = TRUE)
data <- readNamedRegion(wb, name = "mtcars", keep=c(1,3,5))
[Package
XLConnect version 0.2-7
Index]