Skip to main content

Distribute Hint

Overview​

The Distribute hint is used to control the shuffle method for joins.

Syntax​

  • Supports specifying the Distribute Type for the right table, which can be either [shuffle] or [broadcast], and should be written before the right table in the Join.
  • Supports an arbitrary number of Distribute Hints.
  • When encountering a Distribute Hint that cannot correctly generate a plan, the system will not display an error. It will make the best effort to apply the hint, and the final Distribute method will be shown in the EXPLAIN output.

Examples​

Used in Combination with Ordered Hint

Fix the Join order to the textual sequence, and then specify the expected Distribute method for the Join. For example:

Before using:

mysql> explain shape plan select count(*) from t1 join t2 on t1.c1 = t2.c2;
+----------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+----------------------------------------------------------------------------------+
| PhysicalResultSink |
| --hashAgg[GLOBAL] |
| ----PhysicalDistribute[DistributionSpecGather] |
| ------hashAgg[LOCAL] |
| --------PhysicalProject |
| ----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| ------------PhysicalProject |
| --------------PhysicalOlapScan[t1] |
| ------------PhysicalDistribute[DistributionSpecHash] |
| --------------PhysicalProject |
| ----------------PhysicalOlapScan[t2] |
+----------------------------------------------------------------------------------+

After using:

mysql> explain shape plan select /*+ ordered */ count(*) from t2 join[broadcast] t1 on t1.c1 = t2.c2;
+----------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+----------------------------------------------------------------------------------+
| PhysicalResultSink |
| --hashAgg[GLOBAL] |
| ----PhysicalDistribute[DistributionSpecGather] |
| ------hashAgg[LOCAL] |
| --------PhysicalProject |
| ----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| ------------PhysicalProject |
| --------------PhysicalOlapScan[t2] |
| ------------PhysicalDistribute[DistributionSpecReplicated] |
| --------------PhysicalProject |
| ----------------PhysicalOlapScan[t1] |
| |
| Hint log: |
| Used: ORDERED |
| UnUsed: |
| SyntaxError: |
+----------------------------------------------------------------------------------+

The Explain Shape Plan will display information related to the Distribute operator. Specifically:

  • DistributionSpecReplicated indicates that the corresponding data will be replicated to all BE nodes.
  • DistributionSpecGather indicates that the data will be gathered to the FE node.
  • DistributionSpecHash indicates that the data will be distributed to different BE nodes based on a specific hashKey and algorithm. Used in Combination with Leading Hint

When writing SQL queries, you can specify the corresponding DISTRIBUTE method for each JOIN operation while using the LEADING hint. Below is a specific example demonstrating how to mix Distribute Hint and Leading Hint in an SQL query.

explain shape plan
select
nation,
o_year,
sum(amount) as sum_profit
from
(
select
/*+ leading(orders shuffle {lineitem shuffle part} shuffle {supplier broadcast nation} shuffle partsupp) */
n_name as nation,
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from
part,
supplier,
lineitem,
partsupp,
orders,
nation
where
s_suppkey = l_suppkey
and ps_suppkey = l_suppkey
and ps_partkey = l_partkey
and p_partkey = l_partkey
and o_orderkey = l_orderkey
and s_nationkey = n_nationkey
and p_name like '%green%'
) as profit
group by
nation,
o_year
order by
nation,
o_year desc;

Summary​

The Distribute hint is a commonly used hint for controlling the join shuffle method, allowing manual specification of shuffle or broadcast distribution methods. Proper use of the Distribute hint can meet on-site tuning needs for join shuffle methods, increasing the flexibility of system control.