若泽大数据 www.ruozedata.com

ruozedata


  • 主页

  • 归档

  • 分类

  • 标签

  • 发展历史

  • Suche

Hive存储格式的生产应用

Veröffentlicht am 2018-04-20 | Bearbeitet am 2019-04-24 | in Hive | Aufrufe:

相同数据,分别以TextFile、SequenceFile、RcFile、ORC存储的比较。

原始大小: 19M

enter description here

1. TextFile(默认) 文件大小为18.1M

enter description here

2. SequenceFile
1
2
3
4
5
6
7
8
9
10
11
12
   create table page_views_seq( 
track_time string,
url string,
session_id string,
referer string,
ip string,
end_user_id string,
city_id string
)ROW FORMAT DELIMITED FIELDS TERMINATED BY “\t”
STORED AS SEQUENCEFILE;

insert into table page_views_seq select * from page_views;

用SequenceFile存储后的文件为19.6M
enter description here

3. RcFile
1
2
3
4
5
6
7
8
9
10
11
12
   create table page_views_rcfile(
track_time string,
url string,
session_id string,
referer string,
ip string,
end_user_id string,
city_id string
)ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
STORED AS RCFILE;

insert into table page_views_rcfile select * from page_views;

用RcFile存储后的文件为17.9M
enter description here

4. ORCFile
1
2
3
4
5
   create table page_views_orc
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
STORED AS ORC
TBLPROPERTIES("orc.compress"="NONE")
as select * from page_views;

用ORCFile存储后的文件为7.7M
enter description here

5. Parquet
create table page_views_parquet
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
STORED AS PARQUET 
as select * from page_views;

用ORCFile存储后的文件为13.1M
enter description here

总结:磁盘空间占用大小比较

ORCFile(7.7M)<parquet(13.1M)<RcFile(17.9M)<Textfile(18.1M)<SequenceFile(19.6)
ruozedata WeChat Bezahlung
# hive # 压缩格式
大数据压缩,你们真的了解吗?
又又又是源码!RDD 作业的DAG是如何切分的?
  • Inhaltsverzeichnis
  • Übersicht

ruozedata

若泽数据优秀博客汇总
155 Artikel
31 Kategorien
74 schlagwörter
RSS
GitHub B站学习视频 腾讯课堂学习视频 官网
  1. 1. 1. TextFile(默认) 文件大小为18.1M
  2. 2. 2. SequenceFile
  3. 3. 3. RcFile
  4. 4. 4. ORCFile
  5. 5. 5. Parquet
|
若泽数据
|