Iceberg
Iceberg
Limitationsβ
- Support Iceberg V1/V2.
- The V2 format only supports Position Delete, not Equality Delete.
Create Catalogβ
Create Catalog Based on Hive Metastoreβ
It is basically the same as Hive Catalog, and only a simple example is given here. See Hive Catalog for other examples.
CREATE CATALOG iceberg PROPERTIES (
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hadoop.username' = 'hive',
'dfs.nameservices'='your-nameservice',
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
Create Catalog based on Iceberg APIβ
Use the Iceberg API to access metadata, and support services such as Hadoop File System, Hive, REST, DLF and Glue as Iceberg's Catalog.
Hadoop Catalogβ
CREATE CATALOG iceberg_hadoop PROPERTIES (
'type'='iceberg',
'iceberg.catalog.type' = 'hadoop',
'warehouse' = 'hdfs://your-host:8020/dir/key'
);
CREATE CATALOG iceberg_hadoop_ha PROPERTIES (
'type'='iceberg',
'iceberg.catalog.type' = 'hadoop',
'warehouse' = 'hdfs://your-nameservice/dir/key',
'dfs.nameservices'='your-nameservice',
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
Hive Metastoreβ
CREATE CATALOG iceberg PROPERTIES (
'type'='iceberg',
'iceberg.catalog.type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hadoop.username' = 'hive',
'dfs.nameservices'='your-nameservice',
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
AWS Glueβ
CREATE CATALOG glue PROPERTIES (
"type"="iceberg",
"iceberg.catalog.type" = "glue",
"glue.endpoint" = "https://glue.us-east-1.amazonaws.com",
"glue.access_key" = "ak",
"glue.secret_key" = "sk"
);
For Iceberg properties, see Iceberg Glue Catalog
Alibaba Cloud DLFβ
REST Catalogβ
This method needs to provide REST services in advance, and users need to implement the REST interface for obtaining Iceberg metadata.
CREATE CATALOG iceberg PROPERTIES (
'type'='iceberg',
'iceberg.catalog.type'='rest',
'uri' = 'http://172.21.0.1:8181'
);
If the data is on HDFS and High Availability (HA) is set up, need to add HA configuration to the Catalog.
CREATE CATALOG iceberg PROPERTIES (
'type'='iceberg',
'iceberg.catalog.type'='rest',
'uri' = 'http://172.21.0.1:8181',
'dfs.nameservices'='your-nameservice',
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.1:8020',
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.2:8020',
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
Google Dataproc Metastoreβ
CREATE CATALOG iceberg PROPERTIES (
"type"="iceberg",
"iceberg.catalog.type"="hms",
"hive.metastore.uris" = "thrift://172.21.0.1:9083",
"gs.endpoint" = "https://storage.googleapis.com",
"gs.region" = "us-east-1",
"gs.access_key" = "ak",
"gs.secret_key" = "sk",
"use_path_style" = "true"
);
hive.metastore.uris
: Dataproc Metastore URIοΌSee in Metastore Services οΌDataproc Metastore Services.
Iceberg On Object Storageβ
If the data is stored on S3, the following parameters can be used in properties:
"s3.access_key" = "ak"
"s3.secret_key" = "sk"
"s3.endpoint" = "s3.us-east-1.amazonaws.com"
"s3.region" = "us-east-1"
The data is stored on Alibaba Cloud OSS:
"oss.access_key" = "ak"
"oss.secret_key" = "sk"
"oss.endpoint" = "oss-cn-beijing-internal.aliyuncs.com"
"oss.region" = "oss-cn-beijing"
The data is stored on Tencent Cloud COS:
"cos.access_key" = "ak"
"cos.secret_key" = "sk"
"cos.endpoint" = "cos.ap-beijing.myqcloud.com"
"cos.region" = "ap-beijing"
The data is stored on Huawei Cloud OBS:
"obs.access_key" = "ak"
"obs.secret_key" = "sk"
"obs.endpoint" = "obs.cn-north-4.myhuaweicloud.com"
"obs.region" = "cn-north-4"
Column type mappingβ
Consistent with Hive Catalog, please refer to the column type mapping section in Hive Catalog.
Time Travelβ
Supports reading the snapshot specified by the Iceberg table.
Every write operation to the iceberg table will generate a new snapshot.
By default, read requests will only read the latest version of the snapshot.
You can use the FOR TIME AS OF
and FOR VERSION AS OF
statements to read historical versions of data based on the snapshot ID or the time when the snapshot was generated. Examples are as follows:
SELECT * FROM iceberg_tbl FOR TIME AS OF "2022-10-07 17:20:37";
SELECT * FROM iceberg_tbl FOR VERSION AS OF 868895038966572;
In addition, you can use the iceberg_meta table function to query the snapshot information of the specified table.