大数据技术之Sqoop
鸡汤: 选一种姿态,让自己活得无可替代,没有所谓的运气,只有绝对的努力!
1. 介绍¶
http://www.apache.org/dyn/closer.lua/sqoop/
Sqoop是一款开源的工具,主要用于在Hadoop(Hive)与传统的数据库(mysql、postgresql...)间进行数据的传递,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,也可以将HDFS的数据导进到关系型数据库中。
Sqoop项目开始于2009年,最早是作为Hadoop的一个第三方模块存在,后来为了让使用者能够快速部署,也为了让开发人员能够更快速的迭代开发,Sqoop独立成为一个Apache项目。
Sqoop2的最新版本是1.99.7。请注意,2与1不兼容,且特征不完整,它并不打算用于生产部署。
2. 作用¶
将导入或导出命令翻译成mapreduce程序来实现。在翻译出的mapreduce中主要是对inputformat和outputformat进行定制。
3. 安装¶
3.1 环境¶
安装Sqoop的前提是已经具备Java和Hadoop的环境。
wget https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz tar xf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/ ln -sf /usr/local/sqoop-1.4.7.bin__hadoop-2.6.0 /usr/local/sqoop
详细步骤
[root@master ~]# https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz^C [root@master ~]# cd /usr/local/src/ [root@master src]# wget https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz --2019-08-08 11:08:01-- https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz Resolving mirrors.tuna.tsinghua.edu.cn (mirrors.tuna.tsinghua.edu.cn)... 101.6.8.193, 2402:f000:1:408:8100::1 Connecting to mirrors.tuna.tsinghua.edu.cn (mirrors.tuna.tsinghua.edu.cn)|101.6.8.193|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 17953604 (17M) [application/octet-stream] Saving to: ‘sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz’ 100%[===================================================================================>] 17,953,604 8.92MB/s in 1.9s 2019-08-08 11:08:08 (8.92 MB/s) - ‘sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz’ saved [17953604/17953604] [root@master src]# tar xf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/ [root@master src]# ln -sf /usr/local/sqoop-1.4.7.bin__hadoop-2.6.0 /usr/local/sqoop
3.2 修改配置文件¶
Sqoop的配置文件与大多数大数据框架类似,在sqoop根目录下的conf目录中,和拷贝jdbc驱动到sqoop的lib目录下。
cd /usr/local/sqoop/conf cp sqoop-env-template.sh sqoop-env.sh cat>>sqoop-env.sh<<EOF export HADOOP_COMMON_HOME=/usr/local/hadoop export HADOOP_MAPRED_HOME=/usr/local/hadoop export HIVE_HOME=/usr/local/hive export ZOOKEEPER_HOME=/usr/local/zookeeper export ZOOCFGDIR=/usr/local/zookeeper export HBASE_HOME=/usr/local/hbase EOF cd /usr/local/src/ tar xf mysql-connector-java-5.1.46.tar.gz cp mysql-connector-java-5.1.46/mysql-connector-java-5.1.46.jar cp mysql-connector-java-5.1.46/mysql-connector-java-5.1.46.jar /usr/local/sqoop/lib/
详细操作
[root@master src]# cd /usr/local/sqoop [root@master sqoop]# ls bin CHANGELOG.txt conf ivy lib NOTICE.txt README.txt sqoop-patch-review.py src build.xml COMPILING.txt docs ivy.xml LICENSE.txt pom-old.xml sqoop-1.4.7.jar sqoop-test-1.4.7.jar testdata [root@master sqoop]# cd conf/ [root@master conf]# ls oraoop-site-template.xml sqoop-env-template.cmd sqoop-env-template.sh sqoop-site-template.xml sqoop-site.xml [root@master conf]# cp sqoop-env-template.sh sqoop-env.sh [root@master conf]# vim sqoop-env.sh [root@master conf]# egrep -v '#|^$' sqoop-env.sh export HADOOP_COMMON_HOME=/usr/local/hadoop export HADOOP_MAPRED_HOME=/usr/local/hadoop export HIVE_HOME=/usr/local/hive export ZOOKEEPER_HOME=/usr/local/zookeeper export ZOOCFGDIR=/usr/local/zookeeper export HBASE_HOME=/usr/local/hbase [root@master conf]# cd /usr/local/src/ [root@master src]# ls apache-hive-1.2.2-bin.tar.gz jdk-8u172-linux-x64.tar.gz sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz apache-storm-0.9.3.master.tgz mysql-connector-java-5.1.46 tmp hadoop-2.6.5.tar.gz mysql-connector-java-5.1.46.tar.gz zookeeper-3.4.5.tar.gz hbase scala-2.12.4.tgz hbase-1.3.1-bin.tar.gz spark-1.6.3-bin-hadoop2.6.tgz [root@master src]# tar xf mysql-connector-java-5.1.46.tar.gz [root@master src]# cp mysql-connector-java-5.1.46/ build.xml mysql-connector-java-5.1.46-bin.jar README.txt CHANGES mysql-connector-java-5.1.46.jar src/ COPYING README [root@master src]# cp mysql-connector-java-5.1.46/mysql-connector-java-5.1.46.jar /usr/local/sqoop/lib/
3.3 配置环境变量¶
[root@master bin]# tail -2 /etc/profile export SQOOP_HOME=/usr/local/sqoop export PATH=$SQOOP_HOME/bin:$PATH [root@master bin]# source /etc/profile [root@master bin]# which sqoop /usr/local/sqoop/bin/sqoop
3.3 测试sqoop¶
[root@master bin]# sqoop list-databases --connect jdbc:mysql://master:3306/ --username root --password root Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 19/08/08 11:34:48 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/08/08 11:34:48 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/08/08 11:34:48 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. Thu Aug 08 11:34:49 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL con nection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.information_schema cc hive mysql performance_schema sys
sqoop可以连接到mysql了。