Spark中配置启用LZO压缩,步骤如下:
一、spark-env.sh配置
1 | export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/app/hadoop-2.6.0-cdh5.7.0/lib/native |
二、spark-defaults.conf配置
1 | spark.driver.extraClassPath /app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/hadoop-lzo-0.4.19.jar |
三、测试
1、读取Lzo文件
1 | spark-shell --master local[2] |
2、写出lzo文件
1 | spark-shell --master local[2] |
结果:
1 | [hadoop@spark220 common]$ hdfs dfs -ls /input/test_lzo |
至此配置与测试完成。
四、配置与测试中存问题
1、引用native,缺少LD_LIBRARY_PATH
1.1、错误提示:
1 | Caused by: java.lang.RuntimeException: native-lzo library not available |
1.2、解决办法:在spark的conf中配置spark-evn.sh,增加以下内容:
1 | export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/app/hadoop-2.6.0-cdh5.7.0/lib/native |
2、无法找到LzopCodec类
2.1、错误提示:
1 | Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzopCodec not found. |
2.2、解决办法:在spark的conf中配置spark-defaults.conf,增加以下内容:
1 |
|