티스토리 뷰
개요
: Rhive ( R ↔ hive ) 설치 및 테스트를 위한 Page
환경
: HDP 2.5 Stack ( hive 1.2 , hadoop 2.7 )
: R 3.2.3
Source download
> git clone git://github.com/nexr/Rhive.git
> yum install ant
Build
> ant build
REAME file을 보면 다음과 같다.
## Install RHive ## Loading RHive and connecting to Hive ## Tutorials ## Requirements |
---|
→ 환경 변수 셋팅이 필요함
환경 변수
> export HIVE_HOME=/usr/hdp/current/hive-client
> export HADOOP_HOME=/usr/hdp/current/hadoop-client
Re-Build
> ant build
> R CMD build ./RHive
Rhive 패키지 설치
> R CMD INSTALL ./RHive_2.0-0.10.tar.gz
: rJava / Rserve 설치 필요
> R
> install.packages("rJava")
> install.packages("Rserve")
> R CMD INSTALL ./RHive_2.0-0.10.tar.gz
> R
> install.packages("./RHive_2.0-0.10.tar.gz", repos=NULL)
Rhive 예제 실행
export HIVE_HOME=/usr/hdp/current/hive-client
export HADOOP_HOME=/usr/hdp/current/hadoop-client
> su - hdfs
> R
>Sys.setenv(HIVE_HOME="/usr/hdp/current/hive-client")
>Sys.setenv(HADOOP_HOME="/usr/hdp/current/hadoop-client")
library(rJava)
library(Rserve)
library(RHive)
rhive.connect()
rhive.query("select count(*) from customer")
rhive.query("select count(*) from tpch.supplier")
> rhive.query(set hive.execution.engine=mr")
Error: unexpected symbol in "rhive.query(set hive.execution.engine"
> rhive.query("set hive.execution.engine=mr")
Error: java.sql.SQLException: The query did not generate a result set!
실행결과
Connection User 변경
rhive.connect("localhost", user="hdfs")
rhive.query("select count(*) from tpch.supplier")
user 설정이후 Tez여부 확인 → 동작확인
Hive Data R로 가져오기
resultDF <-rhive.query("select * from tpch.supplier limit 10")
summary(resultDF)