This website requires JavaScript.

Spark使用案例-Uber数据分析

本篇中,我们会在Apache spark中对Uber数据进行分析。原文地址:https://acadgild.com/blog/spark-use-case-uber-data-analysis/

Uber数据集包含4个字段,分别是dispatching_base_number, date, active_vehicles and trips下载地址如下:

https://drive.google.com/open?id=0ByJLBTmJojjzS2c2UktqLW5uRG8

语句

Find the days on which each basement has more trips.语句比较平常。。。没什么好解释的

val dataset = sc.textFile("uber")
val header = dataset.first()
val format = new java.text.SimpleDateFormat("MM/dd/yyyy")
var days =Array("Sun","Mon","Tue","Wed","Thu","Fri","Sat")
val eliminate = dataset.filter(line => line != header)
val split = eliminate.map(line => line.split(",")).map { x => (x(0),format.parse(x(1)),x(3)) }
val combine = split.map(x => (x._1+" "+days(x._2.getDay),x._3.toInt))
val arrange = combine.reduceByKey(_+_).map(item => item.swap).sortByKey(false).collect.foreach(println)
0条评论
avatar