This website requires JavaScript.

在Windows中使用PyCharm开发Spark应用

本文教大家在Windows下配置PySpark开发环境。

相关软件下载

Spark

我下的是spark-1.6.0-bin-hadoop2.6.tgz

Hadoop

虽然不弄也可以顺利执行程序,但是报错看着不爽。下载预编译包hadoop-2.6.0.tar.gz

Anaconda

下载PYTHON 2.7版本的Anaconda

设置环境变量

Windows环境系统变量

SPARK_HOME C:\spark-1.6.0-bin-hadoop2.6 PYTHONPATH C:\spark-1.6.0-bin-hadoop2.6\bin;C:\spark-1.6.0-bin-hadoop2.6\python;C:\spark-1.6.0-bin-hadoop2.6\python\lib\py4j-0.9-src.zip HADOOP_HOME C:\hadoop-2.6.0 PATH变量中添加C:\hadoop-2.6.0\bin

测试代码

from pyspark import SparkContext
sc = SparkContext(appName="PythonWordCount")
lines = sc.textFile("c:/spark-1.6.0-bin-hadoop2.6/CHANGES.txt")  # path to a text file in local file system
counts = lines.flatMap(lambda x: x.split(' ')).map(lambda x: (x, 1)).reduceByKey(add)
output = counts.collect()
for (word, count) in output:
    print "%s: %i" % (word, count)
sc.stop()

参考

windows下 pycharm开发spark 在windows安装部署spark(python版)

0条评论
avatar