还是NOAA数据库分享的问题,之前尝试了很多方案、出现了各种各样的问题:
- 尝试1:用Python录入MySQL数据库,采取了大量的分表和分页优化查询速度,体积占到了3TB,买了一堆固态硬盘加速IO
- 问题:体积太大、查询很慢、支持用户并发数低、成本太高、需要手动/脚本更新数据
- ----------
- 尝试2:用Python将数据格式化后按【站点】-【日期】的格式存放在不同目录下,每次用户填写表格使用PHP直接读取文件下载
- 问题:体积略大、需要手动/脚本更新数据(其实这个方案用了挺久的,没有啥其他的毛病,但是每月要运行脚本更新数据好累啊)
- ----------
- 尝试3:用NextCloud挂载NOAA的FTP服务,然后分享账号和文件
- 问题:NextCloud不会更新目录,而且文件数量太大经常崩溃(这是在本文要讲的方案之前使用的生产方案,也用了有半年了,不用再手动更新数据,而且还能卖账号!)
所以为了解决数据分享的问题,我又搞了个新方案:【Rclone挂载FTP配合Nginx的目录显示功能实现镜像站点的搭建】
此方案在写本文前已经使用了两个月了,非常非常稳定!如果不是服务器坏掉的话应该可以一直运行下去,也有一个缺点就是没有用户鉴权,其实可以加个nginx密码的,过段时间再说吧(小汤加油,顺便附上他博客:https://ivistang.cloudraft.cn/)。
教程正文:
这里选用的机器是Virmach家的黑五鸡肋鸡代号KVM-LA(真的很鸡肋很折腾 https://www.liujason.com/article/118.html),OS选的是CentOS-7-x64作为演示,其实都一样,看喜好吧。
1. 下载Rclone软件【偷懒的可以直接看2.2的一键包,不看这一步】
根据自己的Linux发行版安装即可,我之前写的一篇安装记录(https://www.liujason.com/article/244.html)是windows的,当时是为了挂载onedrive,Linux也是一样的,去下载即可。
官网下载链接:https://rclone.org/downloads/
这里我是amd64的Linux所以:
curl -O https://downloads.rclone.org/rclone-current-linux-amd64.zip
2. 下载好之后安装(2.1和2.2二选一)
2.1解压和添加到环境变量:
yum install unzip curl screen -y #全新的minimal系统可能不含unzip和curl,安装了才行;screen是每次必装 unzip rclone-current-linux-amd64.zip cd rclone-*-linux-amd64 #直接复制到bin里就可以了,记得该权限 sudo cp rclone /usr/bin/ sudo chown root:root /usr/bin/rclone sudo chmod 755 /usr/bin/rclone #安装manpage sudo mkdir -p /usr/local/share/man/man1 sudo cp rclone.1 /usr/local/share/man/man1/ sudo mandb
2.2我在看官网的时候看到了一键包,这里也把代码放上来
yum install unzip curl -y #全新的minimal系统可能不含unzip和curl,安装了才行 curl https://rclone.org/install.sh | sudo bash #官网的一键脚本,省心呀
3. 使用rclone挂载FTP
rclone的基本使用方法参见--help,我在文末附录贴出来给大家参考。
我们这里主要使用的就是挂载功能,这里要注意,如果是用的容器而非完全虚拟化的话,需要给容器开启fuse功能(proxmox中的lxc需设置为privileged,总之就是要开启fuse)。
3.1 配置rclone
我们现在来配置rclone来挂载FTP,先输入命令rclone config,然后根据提示进行配置,我这里详细注释了(平时我绝对不会写这么详细的,为了给小汤学习这次写的特别细):
[root@KVM-LA ~]# rclone config 2019/09/12 03:42:06 NOTICE: Config file "/root/.config/rclone/rclone.conf" not found - using defaults No remotes found - make a new one n) New remote s) Set configuration password q) Quit config n/s/q> n #新建一个配置文件 name> noaa #配置文件名 Type of storage to configure. Enter a string value. Press Enter for the default (""). Choose a number from below, or type in your own value ...... 10 / FTP Connection \ "ftp" ...... Storage> 10 ** See help for ftp backend at: https://rclone.org/ftp/ ** #选择FTP FTP host to connect to Enter a string value. Press Enter for the default (""). Choose a number from below, or type in your own value 1 / Connect to ftp.example.com \ "ftp.example.com" host> ftp.ncdc.noaa.gov #设置ftp服务器地址 FTP username, leave blank for current username, root Enter a string value. Press Enter for the default (""). user> anonymous #设置账号 FTP port, leave blank to use default (21) Enter a string value. Press Enter for the default (""). port> 21 #设置端口 FTP password y) Yes type in my own password g) Generate random password y/g> y Enter the password: password: Confirm the password: password: #设置密码 Use FTP over TLS (Implicit) Enter a boolean value (true or false). Press Enter for the default ("false"). tls> #设置是否使用TLS加密 Edit advanced config? (y/n) y) Yes n) No y/n> n Remote config -------------------- [noaa] type = ftp host = ftp.ncdc.noaa.gov user = anonymous port = 21 pass = *** ENCRYPTED *** -------------------- y) Yes this is OK e) Edit this remote d) Delete this remote y/e/d> y Current remotes: Name Type ==== ==== noaa ftp e) Edit existing remote n) New remote d) Delete remote r) Rename remote c) Copy remote s) Set configuration password q) Quit config e/n/d/r/c/s/q> q #接下来的就照着我这个填就好了
其中在设置FTP的时候要注意,如果是匿名FTP的话,有三种账号密码需要一个个尝试:
(1)用户名:anonymous 密码:Email
(2)用户名:FTP 密码:FTP或空
(3)用户名:USER 密码:pass
这里NOAA用的是第三个,我一个个试过了,只有这个可以
3.2 挂载rclone到系统存储
就和系统的mount挂载磁盘一样,挂载rclone虚拟盘同样也需要先新建一个目录才行:
sudo -u www mkdir /www/wwwroot/noaa-mirror.cloud.ac.cn/noaa -p #注意,这里我把noaa这个路径放在了noaa-mirror.cloud.ac.cn网站(nginx)的目录下,这样是为了后续直接可以在网站的子路径中查看 #另外还要注意权限问题,切换一下用户
然后再挂载上去:
screen #用screen后台运行 rclone mount noaa:/pub/data/ /www/wwwroot/noaa-mirror.cloud.ac.cn/noaa --read-only --copy-links --no-gzip-encoding --no-check-certificate --allow-other --allow-non-empty --umask 000 #格式为rclone mount 配置名称:远程路径 本地路径 --参数 #注意这里挂载的是noaa原镜像的子目录,因为上级目录中有很多是不需要的。
我在执行挂载的时候遇到了错误,实际上是缺失fuse库导致的,添加即可:
failed to mount FUSE fs: fusermount: exec: "fusermount": executable file not found in $PATH yum install fuse -y
这时候重新挂载,然后退出screen,看看挂载情况:
[root@KVM-LA ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/vda1 9.6G 2.3G 6.9G 25% / devtmpfs 7.8G 0 7.8G 0% /dev tmpfs 7.8G 16K 7.8G 1% /dev/shm tmpfs 7.8G 41M 7.8G 1% /run tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup tmpfs 1.6G 0 1.6G 0% /run/user/0 noaa:/pub/data/ 1.0P 0 1.0P 0% /www/wwwroot/noaa-mirror.cloud.ac.cn/noaa
再看看路径下的情况,已经全部挂载上了:
cd /www/wwwroot/noaa-mirror.cloud.ac.cn/noaa ls .......
另外说一句,当fuse挂载后要卸载100%会卡住,要用lazy模式卸载(umount -l XXX)
------至此挂载FTP到本地磁盘的任务完成------
4. 通过Nginx将数据发布到网络中
4.1 利用新建一个静态网站
这一步很简单,yum也行,一键包也行,或者用AHM/宝塔面板也行,我就不多说了。
4.2 Nginx开启autoindex
虽然开启了网站,但是打开https://noaa-mirror.cloud.ac.cn/noaa/ 页面进去也是404,这是因为没有开启autoindex,导致Nginx自动查询“/noaa/”路径下的index.html文件,然后发现文件找不到,于是返回404错误代码。解决方法是在对应的conf文件中的server段里增加:
location / {
autoindex on;
autoindex_exact_size off; #这里是关闭精确显示大小,就会以MB之类的单位显示,否则会显示bytes Orz
autoindex_localtime on; #这里会使用服务器时间,否则是GMT
}
搞定之后重启Nginx就可以看到效果了
--------附录:rclone帮助文档--------
[root@KVM-LA ~]# rclone --help Rclone syncs files to and from cloud storage providers as well as mounting them, listing them in lots of different ways. See the home page (https://rclone.org/) for installation, usage, documentation, changelog and configuration walkthroughs. Usage: rclone [flags] rclone [command] Available Commands: about Get quota information from the remote. authorize Remote authorization. cachestats Print cache stats for a remote cat Concatenates any files and sends them to stdout. check Checks the files in the source and destination match. cleanup Clean up the remote if possible config Enter an interactive configuration session. copy Copy files from source to dest, skipping already copied copyto Copy files from source to dest, skipping already copied copyurl Copy url content to dest. cryptcheck Cryptcheck checks the integrity of a crypted remote. cryptdecode Cryptdecode returns unencrypted file names. dbhashsum Produces a Dropbox hash file for all the objects in the path. dedupe Interactively find duplicate files and delete/rename them. delete Remove the contents of path. deletefile Remove a single file from remote. genautocomplete Output completion script for a given shell. gendocs Output markdown docs for rclone to the directory supplied. hashsum Produces an hashsum file for all the objects in the path. help Show help for rclone commands, flags and backends. link Generate public link to file/folder. listremotes List all the remotes in the config file. ls List the objects in the path with size and path. lsd List all directories/containers/buckets in the path. lsf List directories and objects in remote:path formatted for parsing lsjson List directories and objects in the path in JSON format. lsl List the objects in path with modification time, size and path. md5sum Produces an md5sum file for all the objects in the path. mkdir Make the path if it doesn't already exist. mount Mount the remote as file system on a mountpoint. move Move files from source to dest. moveto Move file or directory from source to dest. ncdu Explore a remote with a text based user interface. obscure Obscure password for use in the rclone.conf purge Remove the path and all of its contents. rc Run a command against a running rclone. rcat Copies standard input to file on remote. rcd Run rclone listening to remote control commands only. rmdir Remove the path if empty. rmdirs Remove empty directories under the path. serve Serve a remote over a protocol. settier Changes storage class/tier of objects in remote. sha1sum Produces an sha1sum file for all the objects in the path. size Prints the total size and number of objects in remote:path. sync Make source and dest identical, modifying destination only. touch Create new file or change file modification time. tree List the contents of the remote in a tree like fashion. version Show the version number. Use "rclone [command] --help" for more information about a command. Use "rclone help flags" for to see the global flags. Use "rclone help backends" for a list of supported services.