Skip to main content

Tablet Local Debug

During the online operation of Doris, various bugs may occur due to various reasons. For example: the replica is inconsistent, the data exists in the version diff, etc.

At this time, it is necessary to copy the copy data of the tablet online to the local environment for reproduction, and then locate the problem.

1. Get information about the tablet

The tablet id can be confirmed by the BE log, and then the information can be obtained by the following command (assuming the tablet id is 10020).

Get information such as DbId/TableId/PartitionId where the tablet is located.

mysql> show tablet 10020\G
*************************** 1. row ***************************
DbName: default_cluster:db1
TableName: tbl1
PartitionName: tbl1
IndexName: tbl1
DbId: 10004
TableId: 10016
PartitionId: 10015
IndexId: 10017
IsSync: true
Order: 1
DetailCmd: SHOW PROC '/dbs/10004/10016/partitions/10015/10017/10020';

Execute DetailCmd in the previous step to obtain information such as BackendId/SchemHash.

mysql>  SHOW PROC '/dbs/10004/10016/partitions/10015/10017/10020'\G
*************************** 1. row ***************************
ReplicaId: 10021
BackendId: 10003
Version: 3
LstSuccessVersion: 3
LstFailedVersion: -1
LstFailedTime: NULL
SchemaHash: 785778507
LocalDataSize: 780
RemoteDataSize: 0
RowCount: 2
State: NORMAL
IsBad: false
VersionCount: 3
PathHash: 7390150550643804973
MetaUrl: http://192.168.10.1:8040/api/meta/header/10020
CompactionStatus: http://192.168.10.1:8040/api/compaction/show?tablet_id=10020

Create tablet snapshot and get table creation statement

mysql> admin copy tablet 10020 properties("backend_id" = "10003", "version" = "2")\G
*************************** 1. row ***************************
TabletId: 10020
BackendId: 10003
Ip: 192.168.10.1
Path: /path/to/be/storage/snapshot/20220830101353.2.3600
ExpirationMinutes: 60
CreateTableStmt: CREATE TABLE `tbl1` (
`k1` int(11) NULL,
`k2` int(11) NULL
) ENGINE=OLAP
DUPLICATE KEY(`k1`, `k2`)
DISTRIBUTED BY HASH(k1) BUCKETS 1
PROPERTIES (
"replication_num" = "1",
"version_info" = "2"
);

The admin copy tablet command can generate a snapshot file of the corresponding replica and version for the specified tablet. Snapshot files are stored in the Path directory of the BE node indicated by the Ip field.

There will be a directory named tablet id under this directory, which will be packaged as a whole for later use. (Note that the directory is kept for a maximum of 60 minutes, after which it is automatically deleted).

cd /path/to/be/storage/snapshot/20220830101353.2.3600
tar czf 10020.tar.gz 10020/

The command will also generate the table creation statement corresponding to the tablet at the same time. Note that this table creation statement is not the original table creation statement, its bucket number and replica number are both 1, and the versionInfo field is specified. This table building statement is used later when loading the tablet locally.

So far, we have obtained all the necessary information, the list is as follows:

  1. Packaged tablet data, such as 10020.tar.gz.
  2. Create a table statement.

2. Load Tablet locally

  1. Build a local debugging environment

    Deploy a single-node Doris cluster (1FE, 1BE) locally, and the deployment version is the same as the online cluster. If the online deployment version is DORIS-1.1.1, the local environment also deploys the DORIS-1.1.1 version.

  2. Create a table

    Create a table in the local environment using the create table statement from the previous step.

  3. Get the tablet information of the newly created table

    Because the number of buckets and replicas of the newly created table is 1, there will only be one tablet with one replica:

    mysql> show tablets from tbl1\G
    *************************** 1. row ***************************
    TabletId: 10017
    ReplicaId: 10018
    BackendId: 10003
    SchemaHash: 44622287
    Version: 1
    LstSuccessVersion: 1
    LstFailedVersion: -1
    LstFailedTime: NULL
    LocalDataSize: 0
    RemoteDataSize: 0
    RowCount: 0
    State: NORMAL
    LstConsistencyCheckTime: NULL
    CheckVersion: -1
    VersionCount: -1
    PathHash: 7390150550643804973
    MetaUrl: http://192.168.10.1:8040/api/meta/header/10017
    CompactionStatus: http://192.168.10.1:8040/api/compaction/show?tablet_id=10017
    mysql> show tablet 10017\G
    *************************** 1. row ***************************
    DbName: default_cluster:db1
    TableName: tbl1
    PartitionName: tbl1
    IndexName: tbl1
    DbId: 10004
    TableId: 10015
    PartitionId: 10014
    IndexId: 10016
    IsSync: true
    Order: 0
    DetailCmd: SHOW PROC '/dbs/10004/10015/partitions/10014/10016/10017';

    Here we will record the following information:

    • TableId
    • PartitionId
    • TabletId
    • SchemaHash

    At the same time, we also need to go to the data directory of the BE node in the debugging environment to confirm the shard id where the new tablet is located:

    cd /path/to/storage/data/*/10017 && pwd

    This command will enter the directory where the tablet 10017 is located and display the path. Here we will see a path similar to the following:

    /path/to/storage/data/0/10017

    where 0 is the shard id.

  4. Modify Tablet Data

    Unzip the tablet data package obtained in the first step. The editor opens the 10017.hdr.json file, and modifies the following fields to the information obtained in the previous step:

    "table_id":10015
    "partition_id":10014
    "tablet_id":10017
    "schema_hash":44622287
    "shard_id":0
  5. Load the tablet

    First, stop the debug environment's BE process (./bin/stop_be.sh). Then copy all the .dat files in the same level directory of the 10017.hdr.json file to the /path/to/storage/data/0/10017/44622287 directory. This directory is the directory where the debugging environment tablet we obtained in step 3 is located. 10017/44622287 are the tablet id and schema hash respectively.

    Delete the original tablet meta with the meta_tool tool. The tool is located in the be/lib directory.

    ./lib/meta_tool --root_path=/path/to/storage --operation=delete_meta --tablet_id=10017 --schema_hash=44622287

    Where /path/to/storage is the data root directory of BE. If the deletion is successful, the delete successfully log will appear.

    Load the new tablet meta via the meta_tool tool.

    ./lib/meta_tool --root_path=/path/to/storage --operation=load_meta --json_meta_path=/path/to/10017.hdr.json

    If the load is successful, the load successfully log will appear.

  6. Verification

    Restart the debug environment's BE process (./bin/start_be.sh). Query the table, if correct, you can query the data of the loaded tablet, or reproduce the online problem.