Star Schema Benchmark
Star Schema Benchmark(SSB) 是一个轻量级的数仓场景下的性能测试集。SSB 基于 TPC-H 提供了一个简化版的星型模型数据集,主要用于测试在星型模型下,多表关联查询的性能表现。另外,业界内通常也会将 SSB 打平为宽表模型(以下简称:SSB flat),来测试查询引擎的性能,参考Clickhouse。
本文档主要介绍 Apache Doris 在 SSB 1000G 测试集上的性能表现。
在 SSB 标准测试数据集上的 13 个查询上,我们基于 Apache Doris 2.0.15.1 版本进行了测试。
1. 硬件环境
硬件 | 配置说明 |
---|---|
机器数量 | 4 台阿里云主机(1 个 FE,3 个 BE) |
CPU | Intel Xeon (Ice Lake) Platinum 8369B 32核 |
内存 | 128G |
磁盘 | 阿里云 ESSD (PL0) |
2. 软件环境
- Doris 部署 3BE 1FE
- 内核版本:Linux version 5.15.0-101-generic
- 操作系统版本:Ubuntu 20.04 LTS (Focal Fossa)
- Doris 软件版本:Apache Doris 2.0.15.1
- JDK:openjdk version "1.8.0_352-352"
3. 测试数据量
SSB 表名 | 行数 | 备注 |
---|---|---|
lineorder | 5,999,989,709 | 商品订单明细表表 |
customer | 30,000,000 | 客户信息表 |
part | 2,000,000 | 零件信息表 |
supplier | 2,000,000 | 供应商信息表 |
dates | 2,556 | 日期表 |
lineorder_flat | 5,999,989,709 | 数据展平后的宽表 |
4. SSB 宽表测试结果
使用 Apache Doris 2.0.15.1 版本进行测试结果如下:
Query | Doris 2.0.15.1 (ms) |
---|---|
q1.1 | 80 |
q1.2 | 10 |
q1.3 | 110 |
q2.1 | 1680 |
q2.2 | 1210 |
q2.3 | 1060 |
q3.1 | 2010 |
q3.2 | 1560 |
q3.3 | 600 |
q3.4 | 10 |
q4.1 | 2380 |
q4.2 | 190 |
q4.3 | 120 |
Total | 11020 |
5. 标准 SSB 测试结果
使用 Apache Doris 2.0.15.1 版本进行测试结果如下:
Query | Doris 2.0.15.1 (ms) |
---|---|
q1.1 | 330 |
q1.2 | 80 |
q1.3 | 80 |
q2.1 | 1780 |
q2.2 | 1970 |
q2.3 | 1510 |
q3.1 | 4000 |
q3.2 | 1720 |
q3.3 | 1510 |
q3.4 | 160 |
q4.1 | 4010 |
q4.2 | 840 |
q4.3 | 400 |
Total | 19390 |
6. 环境准备
请先参照 官方文档 进行 Apache Doris 的安装部署,以获得一个正常运行中的 Doris 集群(至少包含 1 FE 1 BE,推荐 1 FE 3 BE)。
7. 数据准备
7.1 下载安装 SSB 数据生成工具。
执行以下脚本下载并编译 ssb-tools 工具。
sh bin/build-ssb-dbgen.sh
安装成功后,将在 ssb-dbgen/
目录下生成 dbgen
二进制文件。
7.2 生成 SSB 测试集
执行以下脚本生成 SSB 数据集:
sh bin/gen-ssb-data.sh -s 1000
注 1:通过
sh gen-ssb-data.sh -h
查看脚本帮助。注 2:数据会以
.tbl
为后缀生成在ssb-data/
目录下。文件总大小约 600GB。生成时间可能在数分钟到 1 小时不等。注 3:默认生成 100G 的标准测试数据集
7.3 建表
7.3.1 准备 doris-cluster.conf
文件
在调用导入脚本前,需要将 FE 的 ip 端口等信息写在 doris-cluster.conf
文件中。
文件位置在 ${DORIS_HOME}/tools/ssb-tools/conf/
目录下。
文件内容包括 FE 的 ip,HTTP 端口,用户名,密码以及待导入数据的 DB 名称:
# Any of FE host
export FE_HOST='127.0.0.1'
# http_port in fe.conf
export FE_HTTP_PORT=8030
# query_port in fe.conf
export FE_QUERY_PORT=9030
# Doris username
export USER='root'
# Doris password
export PASSWORD=''
# The database where SSB tables located
export DB='ssb'
7.3.2 执行以下脚本生成创建 SSB 表
sh bin/create-ssb-tables.sh -s 1000
或者复制 create-ssb-tables.sql 和 create-ssb-flat-table.sql 中的建表语句,在 MySQL 客户端中执行。
7.4 导入数据
我们使用以下命令完成 SSB 测试集所有数据导入及 SSB FLAT 宽表数据合成并导入到表里。
sh bin/load-ssb-data.sh
7.5 检查导入数据
select count(*) from part;
select count(*) from customer;
select count(*) from supplier;
select count(*) from dates;
select count(*) from lineorder;
select count(*) from lineorder_flat;
7.6 查询测试
SSB-FlAT 查询语句:ssb-flat-queries
标准 SSB 查询语句:ssb-queries
7.6.1 SSB FLAT 测试 SQL
--Q1.1
SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE
LO_ORDERDATE >= 19930101
AND LO_ORDERDATE <= 19931231
AND LO_DISCOUNT BETWEEN 1 AND 3
AND LO_QUANTITY < 25;
--Q1.2
SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE
LO_ORDERDATE >= 19940101
AND LO_ORDERDATE <= 19940131
AND LO_DISCOUNT BETWEEN 4 AND 6
AND LO_QUANTITY BETWEEN 26 AND 35;
--Q1.3
SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE
weekofyear(LO_ORDERDATE) = 6
AND LO_ORDERDATE >= 19940101
AND LO_ORDERDATE <= 19941231
AND LO_DISCOUNT BETWEEN 5 AND 7
AND LO_QUANTITY BETWEEN 26 AND 35;
--Q2.1
SELECT
SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR,
P_BRAND
FROM lineorder_flat
WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA'
GROUP BY YEAR, P_BRAND
ORDER BY YEAR, P_BRAND;
--Q2.2
SELECT
SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR,
P_BRAND
FROM lineorder_flat
WHERE
P_BRAND >= 'MFGR#2221'
AND P_BRAND <= 'MFGR#2228'
AND S_REGION = 'ASIA'
GROUP BY YEAR, P_BRAND
ORDER BY YEAR, P_BRAND;
--Q2.3
SELECT
SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR,
P_BRAND
FROM lineorder_flat
WHERE
P_BRAND = 'MFGR#2239'
AND S_REGION = 'EUROPE'
GROUP BY YEAR, P_BRAND
ORDER BY YEAR, P_BRAND;
--Q3.1
SELECT
C_NATION,
S_NATION, (LO_ORDERDATE DIV 10000) AS YEAR,
SUM(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE
C_REGION = 'ASIA'
AND S_REGION = 'ASIA'
AND LO_ORDERDATE >= 19920101
AND LO_ORDERDATE <= 19971231
GROUP BY C_NATION, S_NATION, YEAR
ORDER BY YEAR ASC, revenue DESC;
--Q3.2
SELECT
C_CITY,
S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR,
SUM(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE
C_NATION = 'UNITED STATES'
AND S_NATION = 'UNITED STATES'
AND LO_ORDERDATE >= 19920101
AND LO_ORDERDATE <= 19971231
GROUP BY C_CITY, S_CITY, YEAR
ORDER BY YEAR ASC, revenue DESC;
--Q3.3
SELECT
C_CITY,
S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR,
SUM(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE
C_CITY IN ('UNITED KI1', 'UNITED KI5')
AND S_CITY IN ('UNITED KI1', 'UNITED KI5')
AND LO_ORDERDATE >= 19920101
AND LO_ORDERDATE <= 19971231
GROUP BY C_CITY, S_CITY, YEAR
ORDER BY YEAR ASC, revenue DESC;
--Q3.4
SELECT
C_CITY,
S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR,
SUM(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE
C_CITY IN ('UNITED KI1', 'UNITED KI5')
AND S_CITY IN ('UNITED KI1', 'UNITED KI5')
AND LO_ORDERDATE >= 19971201
AND LO_ORDERDATE <= 19971231
GROUP BY C_CITY, S_CITY, YEAR
ORDER BY YEAR ASC, revenue DESC;
--Q4.1
SELECT (LO_ORDERDATE DIV 10000) AS YEAR,
C_NATION,
SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE
C_REGION = 'AMERICA'
AND S_REGION = 'AMERICA'
AND P_MFGR IN ('MFGR#1', 'MFGR#2')
GROUP BY YEAR, C_NATION
ORDER BY YEAR ASC, C_NATION ASC;
--Q4.2
SELECT (LO_ORDERDATE DIV 10000) AS YEAR,
S_NATION,
P_CATEGORY,
SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE
C_REGION = 'AMERICA'
AND S_REGION = 'AMERICA'
AND LO_ORDERDATE >= 19970101
AND LO_ORDERDATE <= 19981231
AND P_MFGR IN ('MFGR#1', 'MFGR#2')
GROUP BY YEAR, S_NATION, P_CATEGORY
ORDER BY
YEAR ASC,
S_NATION ASC,
P_CATEGORY ASC;
--Q4.3
SELECT (LO_ORDERDATE DIV 10000) AS YEAR,
S_CITY,
P_BRAND,
SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE
S_NATION = 'UNITED STATES'
AND LO_ORDERDATE >= 19970101
AND LO_ORDERDATE <= 19981231
AND P_CATEGORY = 'MFGR#14'
GROUP BY YEAR, S_CITY, P_BRAND
ORDER BY YEAR ASC, S_CITY ASC, P_BRAND ASC;
7.6.2 SSB 标准测试 SQL
--Q1.1
SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE
FROM lineorder, dates
WHERE
lo_orderdate = d_datekey
AND d_year = 1993
AND lo_discount BETWEEN 1 AND 3
AND lo_quantity < 25;
--Q1.2
SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE
FROM lineorder, dates
WHERE
lo_orderdate = d_datekey
AND d_yearmonth = 'Jan1994'
AND lo_discount BETWEEN 4 AND 6
AND lo_quantity BETWEEN 26 AND 35;
--Q1.3
SELECT
SUM(lo_extendedprice * lo_discount) AS REVENUE
FROM lineorder, dates
WHERE
lo_orderdate = d_datekey
AND d_weeknuminyear = 6
AND d_year = 1994
AND lo_discount BETWEEN 5 AND 7
AND lo_quantity BETWEEN 26 AND 35;
--Q2.1
SELECT SUM(lo_revenue), d_year, p_brand
FROM lineorder, dates, part, supplier
WHERE
lo_orderdate = d_datekey
AND lo_partkey = p_partkey
AND lo_suppkey = s_suppkey
AND p_category = 'MFGR#12'
AND s_region = 'AMERICA'
GROUP BY d_year, p_brand
ORDER BY p_brand;
--Q2.2
SELECT SUM(lo_revenue), d_year, p_brand
FROM lineorder, dates, part, supplier
WHERE
lo_orderdate = d_datekey
AND lo_partkey = p_partkey
AND lo_suppkey = s_suppkey
AND p_brand BETWEEN 'MFGR#2221' AND 'MFGR#2228'
AND s_region = 'ASIA'
GROUP BY d_year, p_brand
ORDER BY d_year, p_brand;
--Q2.3
SELECT SUM(lo_revenue), d_year, p_brand
FROM lineorder, dates, part, supplier
WHERE
lo_orderdate = d_datekey
AND lo_partkey = p_partkey
AND lo_suppkey = s_suppkey
AND p_brand = 'MFGR#2239'
AND s_region = 'EUROPE'
GROUP BY d_year, p_brand
ORDER BY d_year, p_brand;
--Q3.1
SELECT
c_nation,
s_nation,
d_year,
SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND c_region = 'ASIA'
AND s_region = 'ASIA'
AND d_year >= 1992
AND d_year <= 1997
GROUP BY c_nation, s_nation, d_year
ORDER BY d_year ASC, REVENUE DESC;
--Q3.2
SELECT
c_city,
s_city,
d_year,
SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND c_nation = 'UNITED STATES'
AND s_nation = 'UNITED STATES'
AND d_year >= 1992
AND d_year <= 1997
GROUP BY c_city, s_city, d_year
ORDER BY d_year ASC, REVENUE DESC;
--Q3.3
SELECT
c_city,
s_city,
d_year,
SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND (
c_city = 'UNITED KI1'
OR c_city = 'UNITED KI5'
)
AND (
s_city = 'UNITED KI1'
OR s_city = 'UNITED KI5'
)
AND d_year >= 1992
AND d_year <= 1997
GROUP BY c_city, s_city, d_year
ORDER BY d_year ASC, REVENUE DESC;
--Q3.4
SELECT
c_city,
s_city,
d_year,
SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND (
c_city = 'UNITED KI1'
OR c_city = 'UNITED KI5'
)
AND (
s_city = 'UNITED KI1'
OR s_city = 'UNITED KI5'
)
AND d_yearmonth = 'Dec1997'
GROUP BY c_city, s_city, d_year
ORDER BY d_year ASC, REVENUE DESC;
--Q4.1
SELECT
d_year,
c_nation,
SUM(lo_revenue - lo_supplycost) AS PROFIT
FROM dates, customer, supplier, part, lineorder
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_partkey = p_partkey
AND lo_orderdate = d_datekey
AND c_region = 'AMERICA'
AND s_region = 'AMERICA'
AND (
p_mfgr = 'MFGR#1'
OR p_mfgr = 'MFGR#2'
)
GROUP BY d_year, c_nation
ORDER BY d_year, c_nation;
--Q4.2
SELECT
d_year,
s_nation,
p_category,
SUM(lo_revenue - lo_supplycost) AS PROFIT
FROM dates, customer, supplier, part, lineorder
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_partkey = p_partkey
AND lo_orderdate = d_datekey
AND c_region = 'AMERICA'
AND s_region = 'AMERICA'
AND (
d_year = 1997
OR d_year = 1998
)
AND (
p_mfgr = 'MFGR#1'
OR p_mfgr = 'MFGR#2'
)
GROUP BY d_year, s_nation, p_category
ORDER BY d_year, s_nation, p_category;
--Q4.3
SELECT
d_year,
s_city,
p_brand,
SUM(lo_revenue - lo_supplycost) AS PROFIT
FROM dates, customer, supplier, part, lineorder
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_partkey = p_partkey
AND lo_orderdate = d_datekey
AND s_nation = 'UNITED STATES'
AND (
d_year = 1997
OR d_year = 1998
)
AND p_category = 'MFGR#14'
GROUP BY d_year, s_city, p_brand
ORDER BY d_year, s_city, p_brand;