前言
有时候数据格式是xlsx的,内部含有多个sheet表格,如果直接转换成csv的话非常费时。如果想批量处理文件(或者装个逼),这样就需要用到R了。
多种方案
使用rio包一次性读取到data list中
> library(rio) > BPD_Data_List <- + import_list("Phytoplankton_BPNS1968-2010_abiotics_biovolumes.xlsx") New names: * `` -> ...3 There were 50 or more warnings (use warnings() to see the first 50) > summary(BPD_Data_List) Length Class Mode values_phyto 13 data.frame list phyto_meta 3 data.frame list biovolumes 4 data.frame list values_abiotic 10 data.frame list abiotic_meta 2 data.frame list samples 8 data.frame list samples_meta 2 data.frame list methods 14 data.frame list methods_meta 2 data.frame list sources 10 data.frame list sources_meta 2 data.frame list
使用tidyverse包一次性读取到data list中
> library(tidyverse) > BPD_Data_List <- + import_list("Phytoplankton_BPNS1968-2010_abiotics_biovolumes.xlsx", setclass = "tbl") New names: * `` -> ...3 There were 50 or more warnings (use warnings() to see the first 50) > summary(BPD_Data_List) Length Class Mode values_phyto 13 tbl_df list phyto_meta 3 tbl_df list biovolumes 4 tbl_df list values_abiotic 10 tbl_df list abiotic_meta 2 tbl_df list samples 8 tbl_df list samples_meta 2 tbl_df list methods 14 tbl_df list methods_meta 2 tbl_df list sources 10 tbl_df list sources_meta 2 tbl_df list
如果每个表格都是统一格式的话,可以直接rbind
data_list <- import_list("Phytoplankton_BPNS1968-2010_abiotics_biovolumes.xlsx", setclass = "tbl", rbind = TRUE)
使用readxl
library(readxl) #定义函数 read_excel_allsheets() read_excel_allsheets <- function(filename, tibble = FALSE) { sheets <- readxl::excel_sheets(filename) x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X)) if(!tibble) x <- lapply(x, as.data.frame) names(x) <- sheets x } #函数使用方法: mysheets <- read_excel_allsheets("foo.xls")