Doris Compute-Storage Decoupled Deployment Preparation
1. Overviewβ
This document describes the deployment preparation work for the Apache Doris compute-storage decoupled mode. The decoupled architecture aims to improve system scalability and performance, suitable for large-scale data processing scenarios.
2. Architecture Componentsβ
The Doris compute-storage decoupled architecture consists of three main modules:
- Frontend (FE): Handles user requests and manages metadata.
- Backend (BE): Stateless compute nodes that execute query tasks.
- Meta Service (MS): Manages metadata operations and data recovery.
3. System Requirementsβ
3.1 Hardware Requirementsβ
- Minimum configuration: 3 servers
- Recommended configuration: 5 or more servers
3.2 Software Dependenciesβ
- FoundationDB (FDB) version 7.1.38 or higher
- OpenJDK 17
4. Deployment Planningβ
4.1 Testing Environment Deploymentβ
Deploy all modules on a single machine, not suitable for production environments.
4.2 Production Deploymentβ
- Deploy FDB on 3 or more machines
- Deploy FE and Meta Service on 3 or more machines
- Deploy BE on 3 or more machines
When machine configurations are high, consider mixing FDB, FE, and Meta Service, but do not mix disks.
5. Installation Stepsβ
5.1 Install FoundationDBβ
This section provides a step-by-step guide to configure, deploy, and start the FoundationDB (FDB) service using the provided scripts fdb_vars.sh
and fdb_ctl.sh
. You can download doris tools and get fdb_vars.sh
and fdb_ctl.sh
from fdb
directory.
5.1.1 Machine Requirementsβ
Typically, at least 3 machines equipped with SSDs are required to form a FoundationDB cluster with dual data replicas and allow for single machine failures.
If only for development/testing purposes, a single machine is sufficient.
5.1.2 fdb_vars.sh
Configurationβ
Required Custom Settingsβ
Parameter | Description | Type | Example | Notes |
---|---|---|---|---|
DATA_DIRS | Specify the data directory for FoundationDB storage | Comma-separated list of absolute paths | /mnt/foundationdb/data1,/mnt/foundationdb/data2,/mnt/foundationdb/data3 | - Ensure directories are created before running the script - SSD and separate directories are recommended for production environments |
FDB_CLUSTER_IPS | Define cluster IPs | String (comma-separated IP addresses) | 172.200.0.2,172.200.0.3,172.200.0.4 | - At least 3 IP addresses for production clusters - The first IP will be used as the coordinator - For high availability, place machines in different racks |
FDB_HOME | Define the main directory for FoundationDB | Absolute path | /fdbhome | - Default path is /fdbhome - Ensure this path is absolute |
FDB_CLUSTER_ID | Define the cluster ID | String | SAQESzbh | - Each cluster ID must be unique - Can be generated using mktemp -u XXXXXXXX |
FDB_CLUSTER_DESC | Define the description of the FDB cluster | String | dorisfdb | - It is recommended to change this to something meaningful for the deployment |
Optional Custom Settingsβ
Parameter | Description | Type | Example | Notes |
---|---|---|---|---|
MEMORY_LIMIT_GB | Define the memory limit for FDB processes in GB | Integer | MEMORY_LIMIT_GB=16 | Adjust this value based on available memory resources and FDB process requirements |
CPU_CORES_LIMIT | Define the CPU core limit for FDB processes | Integer | CPU_CORES_LIMIT=8 | Set this value based on the number of available CPU cores and FDB process requirements |
5.1.3 Deploy FDB Clusterβ
After configuring the environment with fdb_vars.sh
, you can deploy the FDB cluster on each node using the fdb_ctl.sh
script.
./fdb_ctl.sh deploy
This command initiates the deployment process of the FDB cluster.
5.1.4 Start FDB Serviceβ
Once the FDB cluster is deployed, you can start the FDB service using the fdb_ctl.sh
script.
./fdb_ctl.sh start
This command starts the FDB service, making the cluster operational and obtaining the FDB cluster connection string, which can be used for configuring the MetaService.
5.2 Install OpenJDK 17β
- Download OpenJDK 17
- Extract and set the environment variable JAVA_HOME.
5.3 Install S3 or HDFS (Optional)β
The Apache Doris (cloud mode) stores data on S3 or HDFS services. If you already have the relevant services, you can use them directly. If not, this document provides a simple deployment tutorial for MinIO:
- Choose the appropriate version and operating system on ε¨ MinIO MinIO's download page and download the corresponding binary or installation packages for the server and client.
- start MinIO Server
export MINIO_REGION_NAME=us-east-1
export MINIO_ROOT_USER=minio # In older versions, the configuration is MINIO_ACCESS_KEY=minio
export MINIO_ROOT_PASSWORD=minioadmin # In older versions, the configuration is MINIO_SECRET_KEY=minioadmin
nohup ./minio server /mnt/data 2>&1 & - config MinIO Client
# If you are using a client installed with an installation package, the client name is mcli. If you directly download the client binary package, its name is mc
./mc config host add myminio http://127.0.0.1:9000 minio minioadmin - create a bucket
./mc mb myminio/doris
- verify if it is working properly
# upload a file
./mc mv test_file myminio/doris
# list files
./mc ls myminio/doris
6. Next Stepsβ
After completing the above preparations, please refer to the following documents to continue the deployment:
7. Notesβ
- Ensure time synchronization across all nodes
- Regularly back up FoundationDB data
- Adjust FoundationDB and Doris configuration parameters based on actual load