使用Apache Thirft 操作Hive[译]
默认分类
2020-01-15
561
0
众所周知,Apache Hive是一个数据仓库,可以很方便的通过SQL进行数据的读、写及管理。
有这样一个场景,如果用户想操作Hive,但是在她/他的系统中并未安装Hadoop群集或者Hive.这个是时候可以通过Apache Thrift接口用各种语言来编写代码进行操作。
在本文中,我们会通过Hive Thrift Server 编写简单的Java程序来操作Hive。
*注意:* *用户必须有群集环境,并可通过Apache Thrift访问Hive*What is Apache Thrift
Thrift是一个跨语言服务开发的软件框架。它结合了代码生成引擎及软件栈。来建立高效并且无缝连接服务。Thrift可以支持多种程序语言,例如: C++, C#, Cocoa, Erlang, Haskell, Java, Ocami, Perl, PHP, Python, Ruby, Smalltalk. 在多种不同的语言之间通信thrift可以作为二进制的高性能的通讯中间件,支持数据(对象)序列化和多种类型的RPC服务。Thrift适用于程序对程 序静态的数据交换,需要先确定好他的数据结构,他是完全静态化的,当数据结构发生变化时,必须重新编辑IDL文件,代码生成,再编译载入的流程,跟其他IDL工具相比较可以视为是Thrift的弱项,Thrift适用于搭建大型数据交换及存储的通用工具,对于大型系统中的内部数据传输相对于JSON和xml无论在性能、传输大小上有明显的优势。
When should you use Thrift
Thrift可以用来开发Web服务,它可以使用另外一种语言进行访问。
What is a HiveServer
HiveServer 是一个服务,允许客户端使用多种编程语言进行远程访问Hive并且返回结果。因为它用Apache Thirft开发,所以有时候直接被称为Thrift server。Thirft接口充当桥梁的角色,允许其他语言访问Hive。
下面我们看下一个使用Java访问Hive Server的例子。
Example of accessing Hive Server using Thrift in Java
例子如下,通过thrift接口,我们创建了一个表testHiveDriverTable1 字段为key和value
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;
public class HiveJdbcClient {
private static String driverName = "org.apache.hive.jdbc.HiveDriver";
/**
* @param args
* @throws SQLException
*/
public static void main(String args[]) throws SQLException {
try {
Class.forName(driverName);
} catch (ClassNotFoundException e) {
//TODO Auto-generated catch block
e.printStackTrace();
System.exit(1);
}
//replace "hive" here with the name of the user the queries should run
Connection con = DriverManager.getConnection("jdbc:hive2://localhost:10000/default", "acadgild", "");
Statement stmt = con.createStatement();
String tableName = "testHiveDriverTable1";
stmt.execute("drop table if exists " + tableName);
stmt.execute("create table " + tableName + "(key int, value string)");
//show tables
String sql = "show tables " + tableName + "";
System.out.println("Running: " + sql);
ResultSet res = stmt.executeQuery(sql);
if (res.next()) {
System.out.println(res.getString(1));
}
//describe table
sql = "describe " + tableName;
System.out.println("Running: " + sql);
res = stmt.executeQuery(sql);
while (res.next()) {
System.out.println(res.getString(1) + "\t" + res.getString(2));
}
}
}
Code Explanation
代码很简单(有点懒)
- In line 6, we are taking a class named HiveJdbcClient.
- In line 8, we are declaring a private static string variable named driverName, which will store the string “org.apache.hive.jdbc.HiveDriver” .
- In line 14, we are declaring a try catch block.
- In line 15, the Class.forName(driverName) method returns the Class object associated with the class or interface with the given string name, using the given class loader.
- Line 17 throws an error ClassNotFoundException, if the driverName class not found and exits the program.
- In line 23, we are trying to establish a connection with hive server where localhost:10000 is the Hive server port number and acadgild is the password of the url localhost:10000.
- In line 24, we are using createstatement() method to create a statement instance for sending SQL statements to the database. Here, Statement is an interface that represents an SQL statement.
- In line 25, we are declaring a String variable named tableName, which will store the string “testHiveDriverTable1“.
- In line 26, in order to execute an SQL query, we should use execute method of the interface statement. Here “drop table if exists” is a statement which compares the table name and drops the table if it already exists in the Hive server default database.
- In line 27, we are creating a table named testHiveDriverTable1 and its columns as key and value and there data types are int and string, respectively.
- In line 29, we are declaring a string named sql, where we are storing the value as the command show tables with the table name testHiveDriverTable1.
- In line 30, we are printing the string sql variable value, command show tables with the table name testHiveDriverTable1.
- In line 31, we are declaring a ResultSet interface object res and storing show tables command value of String variable sql into ResultSet interface object res.
- In line 32, we are declaring an if condition; if interface res has next value, it will print the table present in the hive server.
- In line 33, we are storing the string “describe” with the table name testHiveDriverTable1 into the string variable sql.
- In line 36, we are printing the string sql variable value.
- In line 38, we are executing SQL command describe on the table name testHiveDriverTable1 and to store that table contents into the ResultSet interface object res.
- In line 49, we are declaring a while condition until Object res has next value performs the operation within the while loop.
- Line 50 prints the tables present in the Hive server table using show tables command and print the description of the existing table.
Output
运行Java程序前请确保所有Hadoop的服务都运行了,通过以下命令运行Hive Thrift server。
hive --service hiveserver2
运行后显示效果如下:
Reference
Apache Thrift简介,与其它RPC对比 原文:Working Of Apache Thrift In Hive Server How HiveServer2 Brings Security and Concurrency to Apache Hive