在Windows中使用PyCharm开发Spark应用
Spark
2020-01-15
522
0
本文教大家在Windows下配置PySpark开发环境。
相关软件下载
Spark
我下的是spark-1.6.0-bin-hadoop2.6.tgz
Hadoop
虽然不弄也可以顺利执行程序,但是报错看着不爽。下载预编译包hadoop-2.6.0.tar.gz
Anaconda
下载PYTHON 2.7版本的Anaconda
设置环境变量
Windows环境系统变量
SPARK_HOME C:\spark-1.6.0-bin-hadoop2.6 PYTHONPATH C:\spark-1.6.0-bin-hadoop2.6\bin;C:\spark-1.6.0-bin-hadoop2.6\python;C:\spark-1.6.0-bin-hadoop2.6\python\lib\py4j-0.9-src.zip HADOOP_HOME C:\hadoop-2.6.0 PATH变量中添加C:\hadoop-2.6.0\bin
测试代码
from pyspark import SparkContext
sc = SparkContext(appName="PythonWordCount")
lines = sc.textFile("c:/spark-1.6.0-bin-hadoop2.6/CHANGES.txt") # path to a text file in local file system
counts = lines.flatMap(lambda x: x.split(' ')).map(lambda x: (x, 1)).reduceByKey(add)
output = counts.collect()
for (word, count) in output:
print "%s: %i" % (word, count)
sc.stop()
参考
0条评论