Alibaba Cloud MaxCompute
MaxCompute is an enterprise-level SaaS (Software as a Service) cloud data warehouse on Alibaba Cloud.
Connect to MaxComputeβ
Exampleβ
-- 1. Create Catalog.
CREATE CATALOG mc PROPERTIES (
"type" = "max_compute",
"mc.default.project" = "xxx",
"mc.access_key" = "xxxx",
"mc.secret_key" = "xxx",
"mc.endpoint" = "http://service.cn-beijing-vpc.MaxCompute.aliyun-inc.com/api"
);
-- 2. Switch to the newly created Catalog.
SWITCH mc;
-- The following steps are the same as using Mysql.
-- 3. View all databases under this Catalog.
SHOW DATABASES;
-- 4. Use a database. Here, xxx is any database from the results shown in step 3.
USE xxx;
-- 5. View all tables under this database.
SHOW TABLES;
-- 6. Perform SQL queries.
select * from tb limit 10;
Basic properties of creating Catalogβ
Parameter | Description |
---|---|
type | Fixed as max_compute . |
mc.default.project | The name of the MaxCompute project you want to access. It can be created and managed in MaxCompute project list. |
mc.access_key | AccessKey.It can be created and managed in Alibaba Cloud console. |
mc.secret_key | SecretKey.It can be created and managed in Alibaba Cloud console. |
mc.endpoint | The region where MaxCompute is enabled. Please refer to How to obtain Endpoint and Quota below for configuration. |
Optional properties of creating Catalogβ
Parameter | Description | Description |
---|---|---|
mc.quota | pay-as-you-go | Quota name. Please refer to How to obtain Endpoint and Quota below for configuration. |
mc.split_strategy | byte_size | Set the split division method. It can be set to divide by byte size byte_size or divide by row count row_count . |
mc.split_byte_size | 268435456 | The file size read by each split, in bytes. The default is 256MB. It takes effect only when "mc.split_strategy" = "byte_size" . |
mc.split_row_count | 1048576 | The number of rows read by each split. It takes effect only when "mc.split_strategy" = "row_count" . |
Column type mappingβ
MaxCompute | Doris | Remarks |
---|---|---|
TINYINT | TINYINT | |
TINYINT | TINYINT | |
SMALLINT | SMALLINT | |
INT | INT | |
BIGINT | BIGINT | |
BINARY | Not supported | |
FLOAT | FLOAT | |
DOUBLE | DOUBLE | |
DECIMAL(precision,scale) | DECIMAL(precision,scale) | |
VARCHAR(n) | VARCHAR(n) | |
CHAR(n) | CHAR(n) | |
STRING | STRING | |
DATE | DATE | |
DATETIME | DATETIME(3) | You can specify the time zone by SET [global] time_zone = 'Asia/Shanghai' . |
TIMESTAMP | Not supported | |
TIMESTAMP_NTZ | DATETIME(6) | The precision of TIMESTAMP_NTZ in MaxCompute is 9. The maximum precision of DATETIME in Doris is only 6. Therefore, when reading data, the extra parts will be directly truncated. |
BOOLEAN | BOOLEAN | |
ARRAY | ARRAY | |
MAP | MAP | |
STRUCT | STRUCT | |
JSON | Not supported |
Usage notesβ
- The MaxCompute Catalog is developed based on the Open Storage SDK .
- The use of the Open Storage SDK has certain limitations. Please refer to the
Usage limitations
section in this document . - The Project in MaxCompute is equivalent to the DataBase in Doris.
How to obtain Endpoint and Quotaβ
-
if you use the dedicated resource group of data transmission service , please refer to the
Use exclusive data service resource groups
section in this document, and in2. Authorization
, enable the corresponding permissions. In theQuota (Quota) management
list, view and copy the corresponding QuotaName, and specify"mc.quota" = "QuotaName"
. At this time, you can choose VPC/public network to access MaxCompute, but the bandwidth through VPC is guaranteed, and the public network bandwidth resources are small. -
If you use pay-as-you-go, please refer to the Using
open storage (pay-as-you-go)
section in this document to enable the open storage (Storage API) switch and grant permissions to the users corresponding to Ak and SK. At this time, yourmc.quota
is the default valuepay-as-you-go
, and you do not need to specify this value additionally. At this time, you can only use VPC to access MaxCompute. -
Through steps 1/2, you already know how to access MaxCompute. Now, you need to configure
mc.endpoint
according to theEndpoints in different regions
in Alibaba Cloud Endpoints document. Users that access through VPC need to configuremc.endpoint
according to theVPC endpoint
column in theEndpoints in different regions(VPC)
. Users that access through the public network can choose theClassic network endpoint
column in theEndpoints in different regions(internal network for connecting cloud products)
or thePublic endpoint
column in theEndpoints in different regions(Internet)
to configuremc.endpoint
.