Spark交互式开发平台Zeppelin部署
Spark
2019-06-15
1164
0
搭建Zeppelin主要还是为了使用SparkR,过程如下,供各位参考。
安装过程
安装编译环境
yum install git
yum install java-1.8.0-openjdk-devel
yum install nodejs npm
安装Maven Ensure node is installed by running node --version Ensure maven is running version 3.1.x or higher with mvn -version Configure maven to use more memory than usual by export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=1024m"
wget http://www.eu.apache.org/dist/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
sudo tar -zxf apache-maven-3.3.9-bin.tar.gz -C /usr/local/
sudo ln -s /usr/local/apache-maven-3.3.9/bin/mvn /usr/local/bin/mvn
下载Zeppelin源码
git clone https://github.com/apache/zeppelin.git
git checkout branch-0.6 #切换最新的release版本
git pull #确保代码是最新的
或者直接下载官方打包好的源码
wget http://apache.fayea.com/zeppelin/zeppelin-0.6.1/zeppelin-0.6.1.tgz
编译 具体选项含义可以查看官方文档Build
先确定版本号
hadoop version
spark-shell --version
git版本过低的话可以用WANDisco 源
yum install http://opensource.wandisco.com/centos/6/git/x86_64/wandisco-git-release-6-1.noarch.rpm
开始编译
mvn clean package -DskipTests -Pspark-1.6 -Dspark.version=1.6.0 -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.8.0 -Pscala-2.10 -Pr -Pvendor-repo -Pbuild-distr -Pyarn -Ppyspark -Psparkr
Maven command to build the Zeppelin for YARN (All spark queries are tracked in Yarn history):
mvn clean package -Pspark-1.6 -Pr -Ppyspark -Dhadoop.version=2.6.0-cdh5.8.3 -Phadoop-2.6 -Pyarn –DskipTests
部署Zeppelin
将Zeppelin解压到指定的目录中
tar zxf zeppelin-0.6.2-SNAPSHOT.tar.gz -C /opt/
mv /opt/zeppelin-0.6.2-SNAPSHOT /opt/zeppelin
配置Zeppelin
mkdir /etc/zeppelin
mv /opt/zeppelin/conf /etc/zeppelin/conf
cd /opt/zeppelin
ln -s /etc/zeppelin/conf conf
cd /etc/zeppelin/conf
cp zeppelin-env.sh{.template,}
cp zeppelin-site.xml{.template,}
修改zeppelin-env.sh
文件
export ZEPPELIN_JAVA_OPTS="-Dmaster=yarn-client -Dspark.yarn.jar=/opt/zeppelin/interpreter/spark/zeppelin-spark_2.10-0.6.2-SNAPSHOT.jar"
export DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
if [ -n "$HADOOP_HOME" ]; then
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native
fi
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf}
export ZEPPELIN_LOG_DIR=/var/log/zeppelin
export ZEPPELIN_PID_DIR=/var/run/zeppelin
export ZEPPELIN_WAR_TEMPDIR=/var/tmp/zeppelin
创建对应目录
mkdir /var/log/zeppelin
mkdir /var/run/zeppelin
mkdir /var/tmp/zeppelin
为Zeppelin新建一个用户,并且处理相关的路径权限
useradd zeppelin
chown -R zeppelin:zeppelin /opt/zeppelin/notebook
chown zeppelin:zeppelin /etc/zeppelin/conf/interpreter.json
chown -R zeppelin:zeppelin /var/log/zeppelin
chown -R zeppelin:zeppelin /var/run/zeppelin
chown -R zeppelin:zeppelin /var/tmp/zeppelin
为用户建立hdfs的目录
su hdfs
hadoop fs -mkdir /user/zeppelin
hadoop fs -chmod 777 /user/zeppelin
启动
bin/zeppelin-daemon.sh start
默认通过8080端口访问zeppelin。可以在conf/zeppelin-env.sh或conf/zeppelin-site.xml中进行修改
设置Spark Interpreter
修改zeppelin-env.sh
文件
export JAVA_HOME=/usr/java/jdk1.8.0_60
export MASTER=yarn-client
export SPARK_HOME=/var/lib/hadoop-hdfs/spark-2.0.2-bin-hadoop2.6
export HADOOP_CONF_DIR=/etc/hadoop/conf
默认参数会从 SPARK_HOME/conf/spark-default.conf中读取
添加外部依赖
export SPARK_SUBMIT_OPTIONS="--jars /usr/install/libs/mysql-connector-java-5.1.34.jar,/usr/install/spark/lib/influxdbSink-byHost.jar"
碰到问题的解决
1.com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope) at [Source: {"id":"0","name":"parallelize"}; line: 1, column: 1]
应该是jackson版本冲突了.zeppelin中删除以下文件即可
rm zeppelin-server/target/lib/jackson-* and rm zeppelin-zengine/target/lib/jackson-*
相关连接: Could not find creator property with name ... with embedded spark binaries com.fasterxml.jackson.databind.JsonMappingException
如果是二进制包,直接删掉替换
rm lib/jackson-core-2.5.3.jar
rm lib/jackson-annotations-2.5.0.jar
rm lib/jackson-databind-2.5.3.jar
cp /var/lib/hadoop-hdfs/spark-2.0.2-bin-hadoop2.6/jars/jackson-core-2.6.5.jar ./lib/
cp /var/lib/hadoop-hdfs/spark-2.0.2-bin-hadoop2.6/jars/jackson-annotations-2.6.5.jar ./lib/
cp /var/lib/hadoop-hdfs/spark-2.0.2-bin-hadoop2.6/jars/jackson-databind-2.6.5.jar ./lib/
2.ERROR: lazy loading failed for package ‘stringr’
使用以下命令安装stringer
> install.packages("stringi",dep=TRUE)
根据提示删除00LOCK-stringi
rm -rf /usr/lib64/R/library/00LOCK-stringi
3./bin/sh: libpng-config: command not found
yum install libpng-devel
4.rjcommon.h:11:21: 错误:jpeglib.h:没有那个文件或目录
yum install libjpeg-turbo-devel
5.ERROR: configuration failed for package ‘XML’
yum install libxml2-devel
6.ERROR: configuration failed for package ‘rgl’
yum install mesa-libGL mesa-libGL-devel mesa-libGLU mesa-libGLU-devel
参考
Quick Start How-to: Install Apache Zeppelin on CDH 在Cloudera CDH上部署Zeppelin和SparkR Interpreter Running Zeppelin on CDH Apache Zeppelin on CDH Spark & Zeppelin