大数据技术之Sqoop

鸡汤: 选一种姿态,让自己活得无可替代,没有所谓的运气,只有绝对的努力!

1. 介绍

http://www.apache.org/dyn/closer.lua/sqoop/

   Sqoop是一款开源的工具,主要用于在Hadoop(Hive)与传统的数据库(mysql、postgresql...)间进行数据的传递,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,也可以将HDFS的数据导进到关系型数据库中。

   Sqoop项目开始于2009年,最早是作为Hadoop的一个第三方模块存在,后来为了让使用者能够快速部署,也为了让开发人员能够更快速的迭代开发,Sqoop独立成为一个Apache项目。

   Sqoop2的最新版本是1.99.7。请注意,2与1不兼容,且特征不完整,它并不打算用于生产部署。

2. 作用

   将导入或导出命令翻译成mapreduce程序来实现。在翻译出的mapreduce中主要是对inputformat和outputformat进行定制。

3. 安装

3.1 环境

   安装Sqoop的前提是已经具备Java和Hadoop的环境。

wget https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
tar xf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/
ln -sf /usr/local/sqoop-1.4.7.bin__hadoop-2.6.0 /usr/local/sqoop 
详细步骤
[root@master ~]# https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz^C
[root@master ~]# cd /usr/local/src/
[root@master src]# wget https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
--2019-08-08 11:08:01--  https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
Resolving mirrors.tuna.tsinghua.edu.cn (mirrors.tuna.tsinghua.edu.cn)... 101.6.8.193, 2402:f000:1:408:8100::1
Connecting to mirrors.tuna.tsinghua.edu.cn (mirrors.tuna.tsinghua.edu.cn)|101.6.8.193|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 17953604 (17M) [application/octet-stream]
Saving to: ‘sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz’

100%[===================================================================================>] 17,953,604  8.92MB/s   in 1.9s   

2019-08-08 11:08:08 (8.92 MB/s) - ‘sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz’ saved [17953604/17953604]

[root@master src]# tar xf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/
[root@master src]# ln -sf /usr/local/sqoop-1.4.7.bin__hadoop-2.6.0 /usr/local/sqoop

3.2 修改配置文件

   Sqoop的配置文件与大多数大数据框架类似,在sqoop根目录下的conf目录中,和拷贝jdbc驱动到sqoop的lib目录下。

cd /usr/local/sqoop/conf
cp sqoop-env-template.sh sqoop-env.sh
cat>>sqoop-env.sh<<EOF
export HADOOP_COMMON_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/local/hive
export ZOOKEEPER_HOME=/usr/local/zookeeper
export ZOOCFGDIR=/usr/local/zookeeper
export HBASE_HOME=/usr/local/hbase
EOF
cd /usr/local/src/
tar xf mysql-connector-java-5.1.46.tar.gz
cp mysql-connector-java-5.1.46/mysql-connector-java-5.1.46.jar
cp mysql-connector-java-5.1.46/mysql-connector-java-5.1.46.jar /usr/local/sqoop/lib/

详细操作
[root@master src]# cd /usr/local/sqoop
[root@master sqoop]# ls
bin        CHANGELOG.txt  conf  ivy      lib          NOTICE.txt   README.txt       sqoop-patch-review.py  src
build.xml  COMPILING.txt  docs  ivy.xml  LICENSE.txt  pom-old.xml  sqoop-1.4.7.jar  sqoop-test-1.4.7.jar   testdata
[root@master sqoop]# cd conf/
[root@master conf]# ls
oraoop-site-template.xml  sqoop-env-template.cmd  sqoop-env-template.sh  sqoop-site-template.xml  sqoop-site.xml
[root@master conf]# cp sqoop-env-template.sh sqoop-env.sh
[root@master conf]# vim sqoop-env.sh 
[root@master conf]# egrep -v '#|^$' sqoop-env.sh 
export HADOOP_COMMON_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/local/hive
export ZOOKEEPER_HOME=/usr/local/zookeeper
export ZOOCFGDIR=/usr/local/zookeeper
export HBASE_HOME=/usr/local/hbase
[root@master conf]# cd /usr/local/src/
[root@master src]# ls
apache-hive-1.2.2-bin.tar.gz   jdk-8u172-linux-x64.tar.gz          sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
apache-storm-0.9.3.master.tgz  mysql-connector-java-5.1.46         tmp
hadoop-2.6.5.tar.gz            mysql-connector-java-5.1.46.tar.gz  zookeeper-3.4.5.tar.gz
hbase                          scala-2.12.4.tgz
hbase-1.3.1-bin.tar.gz         spark-1.6.3-bin-hadoop2.6.tgz
[root@master src]# tar xf mysql-connector-java-5.1.46.tar.gz
[root@master src]# cp mysql-connector-java-5.1.46/
build.xml                            mysql-connector-java-5.1.46-bin.jar  README.txt
CHANGES                              mysql-connector-java-5.1.46.jar      src/
COPYING                              README                               
[root@master src]# cp mysql-connector-java-5.1.46/mysql-connector-java-5.1.46.jar /usr/local/sqoop/lib/

3.3 配置环境变量

[root@master bin]# tail -2 /etc/profile
export SQOOP_HOME=/usr/local/sqoop
export PATH=$SQOOP_HOME/bin:$PATH
[root@master bin]# source /etc/profile
[root@master bin]# which sqoop
/usr/local/sqoop/bin/sqoop

3.3 测试sqoop

[root@master bin]# sqoop list-databases --connect jdbc:mysql://master:3306/ --username root --password root
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/08/08 11:34:48 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/08/08 11:34:48 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/08/08 11:34:48 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Thu Aug 08 11:34:49 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL con
nection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.information_schema
cc
hive
mysql
performance_schema
sys

sqoop可以连接到mysql了。