Hadoop 日常问题记录
Hadoop
2020-01-15
636
0
1. HDFS 权限问题
例如 Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
The /user/ directory is owned by "hdfs" with 755 permissions. As a result only hdfs can write to that directory. Unlike unix/linux, hdfs is the superuser and not root. So you would need to do this:
sudo -u hdfs hadoop fs -mkdir /user/,,myfile,, sudo -u hdfs hadoop fs -put myfile.txt /user/,,/,,If you want to create a home directory for root so you can store files in his directory, do:
sudo -u hdfs hadoop fs -mkdir /user/root sudo -u hdfs hadoop fs -chown root /user/rootThen as root you can do hadoop fs -put file /user/root/
2. Spark 运行Bug
spark-shell执行命令出现以下错误提示:
java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn/topology.py" (in directory "/home/108857"): error=2。
这个算是Cloudera 的bug , 把datanode上的 /etc/hadoop/conf.cloudera.yarn/topology* 复制到执行spark-shell的机器上即可。
3.Hive大量导入动态分区导致内存益处错误
也是个Bug级的存在,有两个方案处理
方案一:设置 hive.optimize.sort.dynamic.partition减少reduce的内存耗用
set hive.optimize.sort.dynamic.partition = true set hive.exec.max.dynamic.partitions.pernode=100000; set hive.exec.max.dynamic.partitions=100000;Shuffle Hive data before storing in Parquet
方案二:增加内存配置
set mapred.map.tasks=100;
set mapred.reduce.tasks=100;
set mapreduce.map.java.opts=-Xmx4096m;
set mapreduce.reduce.java.opts=-Xmx4096m;
set hive.exec.max.dynamic.partitions.pernode=100000;
set hive.exec.max.dynamic.partitions=100000;
Hive - Out of Memory Exception - Java Heap Space Unable to insert into a dynamic partition parquet table hive dynamic partitions insert java.lang.OutOfMemoryError: Java heap space
4. Hive Parquet时区问题
如果表格是Hive存储的Parquet文件,Impala读的时候时区会有问题。
解决方案1:
Impala select的时候进行时区转换 from_utc_timestamp(recordtime,"HKT")或者其他时间函数如hours_add
解决方案2:
设置属性使Impala自动转换 ,在后台Impala服务中的Impala Daemon Command Line Argument Advanced Configuration Snippet (Safety Valve) 添加--convert_legacy_hive_parquet_utc_timestamps=true
Linux系统问题
http://mirror.centos.org/centos/6/SCL/x86_64/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 404 Not Found"
yum remove centos-release-SCL
yum install centos-release-scl
参考:
List of time zone abbreviations TIMESTAMP Data Type Timestamp stored in Parquet file format in Impala Showing GMT Value