CREATE-ASYNC-MATERIALIZED-VIEW

Name

CREATE ASYNC MATERIALIZED VIEW

Description

This statement is used to create an asynchronous materialized view.

syntax

CREATE MATERIALIZED VIEW (IF NOT EXISTS)? mvName=multipartIdentifier
        (LEFT_PAREN cols=simpleColumnDefs RIGHT_PAREN)? buildMode?
        (REFRESH refreshMethod? refreshTrigger?)?
        (KEY keys=identifierList)?
        (COMMENT STRING_LITERAL)?
        (PARTITION BY LEFT_PAREN partitionKey = identifier RIGHT_PAREN)?
        (DISTRIBUTED BY (HASH hashKeys=identifierList | RANDOM) (BUCKETS (INTEGER_VALUE | AUTO))?)?
        propertyClause?
        AS query

illustrate

simpleColumnDefs

Used to define the materialized view column information, if not defined, it will be automatically derived

simpleColumnDefs
: cols+=simpleColumnDef (COMMA cols+=simpleColumnDef)*
    ;

simpleColumnDef
: colName=identifier (COMMENT comment=STRING_LITERAL)?
    ;

For example, define two columns aa and bb, where the annotation for aa is "name"

CREATE MATERIALIZED VIEW mv1
(aa comment "name",bb)

buildMode

Used to define whether the materialized view is refreshed immediately after creation, default to IMMEDIATE

IMMEDIATE：Refresh Now

DEFERRED：Delay refresh

buildMode
: BUILD (IMMEDIATE | DEFERRED)
;

For example, specifying the materialized view to refresh immediately

CREATE MATERIALIZED VIEW mv1
BUILD IMMEDIATE

refreshMethod

Used to define the refresh method for materialized views, default to AUTO

COMPLETE：Full refresh

AUTO：Try to refresh incrementally as much as possible. If incremental refresh is not possible, refresh in full

The SQL definition and partition fields of the materialized view need to meet the following conditions for partition incremental updates:

At least one of the base tables used by the materialized view must be a partitioned table.
The partitioned tables used by the materialized view must employ list or range partitioning strategies.
The top-level partition column in the materialized view can only have one partition field.
The SQL of the materialized view needs to use partition columns from the base table, such as after the SELECT clause.
If GROUP BY is used, the partition column fields must be after the GROUP BY.
If window functions are used, the partition column fields must be after the PARTITION BY.
Data changes should occur on partitioned tables. If they occur on non-partitioned tables, the materialized view needs to be fully rebuilt.
Using the fields that generate nulls in the JOIN as partition fields in the materialized view prohibits partition incremental updates.
The null attribute of partition fields in the base table used by the materialized view must not be empty if it comes from an internal table. If it comes from an external table in Hive, the null attribute of the base table can be empty.

refreshMethod
: COMPLETE | AUTO
;

For example, specifying full refresh of materialized views

CREATE MATERIALIZED VIEW mv1
REFRESH COMPLETE

refreshTrigger

Trigger method for refreshing data in materialized views, default to MANUAL

MANUAL：Manual refresh

SCHEDULE：Timed refresh

refreshTrigger
: ON MANUAL
| ON SCHEDULE refreshSchedule
;
    
refreshSchedule
: EVERY INTEGER_VALUE mvRefreshUnit (STARTS STRING_LITERAL)?
;
    
mvRefreshUnit
: MINUTE | HOUR | DAY | WEEK
;    

For example: executed every 2 hours, starting from 21:07:09 on December 13, 2023

CREATE MATERIALIZED VIEW mv1
REFRESH ON SCHEDULE EVERY 2 HOUR STARTS "2023-12-13 21:07:09"

key

The materialized view is the DUPLICATE KEY model, therefore the specified columns are arranged in sequence

identifierList
: LEFT_PAREN identifierSeq RIGHT_PAREN
    ;

identifierSeq
: ident+=errorCapturingIdentifier (COMMA ident+=errorCapturingIdentifier)*
;

For example, specifying k1 and k2 as sorting sequences

CREATE MATERIALIZED VIEW mv1
KEY(k1,k2)

partition

There are two types of partitioning methods for materialized views. If no partitioning is specified, there will be a default single partition. If a partitioning field is specified, the system will automatically deduce the source base table of that field and synchronize all partitions of the base table (currently supporting OlapTable and hive). (Limitation: If the base table is an OlapTable, it can only have one partition field)

For example, if the base table is a range partition with a partition field of create_time and partitioning by day, and partition by(ct) as select create_time as ct from t1 is specified when creating a materialized view, then the materialized view will also be a range partition with a partition field of 'ct' and partitioning by day

The selection of partition fields and the definition of materialized views must meet the conditions for partition incremental updates for the materialized view to be created successfully; otherwise, an error "Unable to find a suitable base table for partitioning" will occur.

property

The materialized view can specify both the properties of the table and the properties unique to the materialized view.

The properties unique to materialized views include:

grace_period: When performing query rewrites, there is a maximum allowed delay time (measured in seconds) for the data of the materialized view. If there is a discrepancy between the data of partition A and the base table, and the last refresh time of partition A of the materialized view was 1, while the current system time is 2, then this partition will not undergo transparent rewriting. However, if the grace_period is greater than or equal to 1, this partition will be used for transparent rewriting.

excluded_trigger_tables: Table names ignored during data refresh, separated by commas. For example, table1, table2

refresh_partition_num: The number of partitions refreshed by a single insert statement is set to 1 by default. When refreshing a materialized view, the system first calculates the list of partitions to be refreshed and then splits it into multiple insert statements that are executed in sequence according to this configuration. If any insert statement fails, the entire task will stop executing. The materialized view ensures the transactionality of individual insert statements, meaning that failed insert statements will not affect partitions that have already been successfully refreshed.

workload_group: The name of the workload_group used by the materialized view when performing refresh tasks. This is used to limit the resources used for refreshing data in the materialized view, in order to avoid affecting the operation of other business processes. For details on how to create and use workload_group, refer to WORKLOAD-GROUP

query

Create a query statement for the materialized view, and the result is the data in the materialized view

Random functions are not supported, for example:

SELECT random() as dd,k3 FROM user

Example

Create a materialized view mv1 that refreshes immediately and then once a week, with the data source being the hive catalog

CREATE MATERIALIZED VIEW mv1 BUILD IMMEDIATE REFRESH COMPLETE ON SCHEDULE EVERY 1 WEEK
 DISTRIBUTED BY RANDOM BUCKETS 2
 PROPERTIES (
 "replication_num" = "1"
 )
 AS SELECT * FROM hive_catalog.db1.user;

Create a materialized view with multiple table joins

CREATE MATERIALIZED VIEW mv1 BUILD IMMEDIATE REFRESH COMPLETE ON SCHEDULE EVERY 1 WEEK
 DISTRIBUTED BY RANDOM BUCKETS 2
 PROPERTIES (
 "replication_num" = "1"
 )
 AS select user.k1,user.k3,com.k4 from user join com on user.k1=com.k1;

Keywords

CREATE, ASYNC, MATERIALIZED, VIEW