前言
有时候数据格式是xlsx的,内部含有多个sheet表格,如果直接转换成csv的话非常费时。如果想批量处理文件(或者装个逼),这样就需要用到R了。
多种方案
使用rio包一次性读取到data list中
> library(rio)
> BPD_Data_List <-
+ import_list("Phytoplankton_BPNS1968-2010_abiotics_biovolumes.xlsx")
New names:
* `` -> ...3
There were 50 or more warnings (use warnings() to see the first 50)
> summary(BPD_Data_List)
Length Class Mode
values_phyto 13 data.frame list
phyto_meta 3 data.frame list
biovolumes 4 data.frame list
values_abiotic 10 data.frame list
abiotic_meta 2 data.frame list
samples 8 data.frame list
samples_meta 2 data.frame list
methods 14 data.frame list
methods_meta 2 data.frame list
sources 10 data.frame list
sources_meta 2 data.frame list
使用tidyverse包一次性读取到data list中
> library(tidyverse)
> BPD_Data_List <-
+ import_list("Phytoplankton_BPNS1968-2010_abiotics_biovolumes.xlsx", setclass = "tbl")
New names:
* `` -> ...3
There were 50 or more warnings (use warnings() to see the first 50)
> summary(BPD_Data_List)
Length Class Mode
values_phyto 13 tbl_df list
phyto_meta 3 tbl_df list
biovolumes 4 tbl_df list
values_abiotic 10 tbl_df list
abiotic_meta 2 tbl_df list
samples 8 tbl_df list
samples_meta 2 tbl_df list
methods 14 tbl_df list
methods_meta 2 tbl_df list
sources 10 tbl_df list
sources_meta 2 tbl_df list
如果每个表格都是统一格式的话,可以直接rbind
data_list <- import_list("Phytoplankton_BPNS1968-2010_abiotics_biovolumes.xlsx", setclass = "tbl", rbind = TRUE)
使用readxl
library(readxl)
#定义函数 read_excel_allsheets()
read_excel_allsheets <- function(filename, tibble = FALSE) {
sheets <- readxl::excel_sheets(filename)
x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X))
if(!tibble) x <- lapply(x, as.data.frame)
names(x) <- sheets
x
}
#函数使用方法:
mysheets <- read_excel_allsheets("foo.xls")

