This website requires JavaScript.

使用Apache Thirft 操作Hive[译]

众所周知,Apache Hive是一个数据仓库,可以很方便的通过SQL进行数据的读、写及管理。

有这样一个场景,如果用户想操作Hive,但是在她/他的系统中并未安装Hadoop群集或者Hive.这个是时候可以通过Apache Thrift接口用各种语言来编写代码进行操作。

在本文中,我们会通过Hive Thrift Server 编写简单的Java程序来操作Hive。

*注意:* *用户必须有群集环境,并可通过Apache Thrift访问Hive*

What is Apache Thrift

Thrift是一个跨语言服务开发的软件框架。它结合了代码生成引擎及软件栈。来建立高效并且无缝连接服务。Thrift可以支持多种程序语言,例如: C++, C#, Cocoa, Erlang, Haskell, Java, Ocami, Perl, PHP, Python, Ruby, Smalltalk. 在多种不同的语言之间通信thrift可以作为二进制的高性能的通讯中间件,支持数据(对象)序列化和多种类型的RPC服务。Thrift适用于程序对程 序静态的数据交换,需要先确定好他的数据结构,他是完全静态化的,当数据结构发生变化时,必须重新编辑IDL文件,代码生成,再编译载入的流程,跟其他IDL工具相比较可以视为是Thrift的弱项,Thrift适用于搭建大型数据交换及存储的通用工具,对于大型系统中的内部数据传输相对于JSON和xml无论在性能、传输大小上有明显的优势。

When should you use Thrift

Thrift可以用来开发Web服务,它可以使用另外一种语言进行访问。

What is a HiveServer

HiveServer 是一个服务,允许客户端使用多种编程语言进行远程访问Hive并且返回结果。因为它用Apache Thirft开发,所以有时候直接被称为Thrift server。Thirft接口充当桥梁的角色,允许其他语言访问Hive。

下面我们看下一个使用Java访问Hive Server的例子。

Example of accessing Hive Server using Thrift in Java

例子如下,通过thrift接口,我们创建了一个表testHiveDriverTable1 字段为keyvalue

import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;

public class HiveJdbcClient {
    private static String driverName = "org.apache.hive.jdbc.HiveDriver";

    /**
     * @param args
     * @throws SQLException
     */
    public static void main(String args[]) throws SQLException {
        try {
            Class.forName(driverName);
        } catch (ClassNotFoundException e) {
//TODO Auto-generated catch block
            e.printStackTrace();
            System.exit(1);
        }
//replace "hive" here with the name of the user the queries should run
        Connection con = DriverManager.getConnection("jdbc:hive2://localhost:10000/default", "acadgild", "");
        Statement stmt = con.createStatement();
        String tableName = "testHiveDriverTable1";
        stmt.execute("drop table if exists " + tableName);
        stmt.execute("create table " + tableName + "(key int, value string)");
//show tables
        String sql = "show tables " + tableName + "";
        System.out.println("Running: " + sql);
        ResultSet res = stmt.executeQuery(sql);
        if (res.next()) {
            System.out.println(res.getString(1));
        }
//describe table
        sql = "describe " + tableName;
        System.out.println("Running: " + sql);
        res = stmt.executeQuery(sql);
        while (res.next()) {
            System.out.println(res.getString(1) + "\t" + res.getString(2));
        }
    }
}

Code Explanation

代码很简单(有点懒)

  • In line 6, we are taking a class named HiveJdbcClient.
  • In line 8, we are declaring a private static string variable named driverName, which will store the string “org.apache.hive.jdbc.HiveDriver” .
  • In line 14, we are declaring a try catch block.
  • In line 15, the Class.forName(driverName) method returns the Class object associated with the class or interface with the given string name, using the given class loader.
  • Line 17 throws an error ClassNotFoundException, if the driverName class not found and exits the program.
  • In line 23, we are trying to establish a connection with hive server where localhost:10000 is the Hive server port number and acadgild is the password of the url localhost:10000.
  • In line 24, we are using createstatement() method to create a statement instance for sending SQL statements to the database. Here, Statement is an interface that represents an SQL statement.
  • In line 25, we are declaring a String variable named tableName, which will store the string “testHiveDriverTable1“.
  • In line 26, in order to execute an SQL query, we should use execute method of the interface statement. Here “drop table if exists” is a statement which compares the table name and drops the table if it already exists in the Hive server default database.
  • In line 27, we are creating a table named testHiveDriverTable1 and its columns as key and value and there data types are int and string, respectively.
  • In line 29, we are declaring a string named sql, where we are storing the value as the command show tables with the table name testHiveDriverTable1.
  • In line 30, we are printing the string sql variable value, command show tables with the table name testHiveDriverTable1.
  • In line 31, we are declaring a ResultSet interface object res and storing show tables command value of String variable sql into ResultSet interface object res.
  • In line 32, we are declaring an if condition; if interface res has next value, it will print the table present in the hive server.
  • In line 33, we are storing the string “describe” with the table name testHiveDriverTable1 into the string variable sql.
  • In line 36, we are printing the string sql variable value.
  • In line 38, we are executing SQL command describe on the table name testHiveDriverTable1 and to store that table contents into the ResultSet interface object res.
  • In line 49, we are declaring a while condition until Object res has next value performs the operation within the while loop.
  • Line 50 prints the tables present in the Hive server table using show tables command and print the description of the existing table.

Output

运行Java程序前请确保所有Hadoop的服务都运行了,通过以下命令运行Hive Thrift server。

hive --service hiveserver2

运行后显示效果如下: Result

Reference

Apache Thrift简介,与其它RPC对比 原文:Working Of Apache Thrift In Hive Server How HiveServer2 Brings Security and Concurrency to Apache Hive

0条评论
avatar