Skip to main content
Skip to main content
Blog/Release Notes

Apache Doris 2.1.4 just released

Apache Doris

Dear community, Apache Doris version 2.1.4 was released on June 26, 2024. In this update, we have optimized various functional experiences for data lakehouse scenarios, with a focus on resolving the abnormal memory usage issue in the previous version. Additionally, we have implemented several improvemnents and bug fixes to enhance the stability. Welcome to download and use it.

Quick Download: https://doris.apache.org/download/

GitHub Release: https://github.com/apache/doris/releases

Behavior changes

  • Non-existent files will be ignored when querying external tables such as Hive. #35319

    The file list is obtained from the meta cache, and it may not be consistent with the actual file list.

    Ignoring non-existent files helps to avoid query errors.

  • By default, creating a Bitmap Index will no longer be automatically changed to an Inverted Index. #35521

    This behavior is controlled by the FE configuration item enable_create_bitmap_index_as_inverted_index, which defaults to false.

  • When starting FE and BE processes using --console, all logs will be output to the standard output and differentiated by prefixes indicating the log type. #35679

    For more infomation, please see the documentations:

  • If no table comment is provided when creating a table, the default comment will be empty instead of using the table type as the default comment. #36025

  • The default precision of DECIMALV3 has been adjusted from (9, 0) to (38, 9) to maintain compatibility with the version in which this feature was initially released. #36316

New features

Query Optimizer

  • Support FE flame graph tool

    For more information, see the documentation

  • Support SELECT DISTINCT to be used with aggregation.

  • Support single table query rewrite without GROUP BY. This is useful for complex filters or expressions. #35242.

  • The new optimizer fully supports point query functionality #36205.

Lakehouse

  • Support native reader of Apache Paimon deletion vector #35241

  • Support using Resource in Table Valued Functions #35139

  • Access controller with Hive Ranger plugin supports Data Mask

Asynchronous Materialized Views

  • Support partition roll-up during construction. #31812

  • Support triggered updates during construction. #34548

  • Support specifying the store_row_column and storage_medium attribute during construction. #35860

  • Transparent rewrite supports single table asynchronous materialized views. #34646

  • Transparent rewrite supports AGG_STATE type aggregation roll-up. #35026

Others

  • Added function replace_empty.

    For more information, see documentation.

  • Support show storage policy using statement.

    For more information, see documentation.

  • Support JVM metrics on the BE side.

    By setting enable_jvm_monitor=true in be.conf to enable this feature.

Improvements

  • Supported creating inverted indexes for columns with Chinese names. #36321

  • Estimate memory consumed by segment cache more accurately so that unused memory can be released more quickly. #35751

  • Filter empty partitions before exporting tables to remote storage. #35542

  • Optimize routine load task allocation algorithm to balance the load among Backends. #34778

  • Provide hints when a related variable is not found during a set operation. #35775

  • Support placing Java UDF jar files in the FE's custom_lib directory for default loading. #35984

  • Add a timeout global variable audit_plugin_load_timeout for audit log load jobs.

  • Optimize the performance of transparent rewrite planning for asynchronous materialized views.

  • Optimize the INSERT operation that when the source is empty, the BE will not execute. #34418

  • Support fetching file lists of Hive/Hudi tables in batches. #35107

Bug fixes

Query Optimizer

  • Fixed the issue where SQL cache returns old results after truncating a partition. #34698

  • Fixed the issue where casting from JSON to other types did not correctly handle nullable attributes. #34707

  • Fixed occasional DATETIMEV2 literal simplification errors. #35153

  • Fixed the issue where COUNT(*) could not be used in window functions. #35220

  • Fixed the issue where nullable attributes could be incorrect when all SELECT statements under UNION ALL have no FROM clause. #35074

  • Fixed the issue where bitmap in join and subquery unnesting could not be used simultaneously. #35435

  • Fixed the performance issue where filter conditions could not be pushed down to the CTE producer in specific situations. #35463

  • Fixed the issue where aggregate combinators written in uppercase could not be found. #35540

  • Fixed the performance issue where window functions were not properly pruned by column pruning. #35504

  • Fixed the issue where queries might parse incorrectly leading to wrong results when multiple tables with the same name but in different databases appeared simultaneously in the query. #35571

  • Fixed the query error caused by generating runtime filters during schema table scans. #35655

  • Fixed the issue where nested correlated subqueries could not execute because the join condition was folded into a null literal. #35811

  • Fixed the occasional issue where decimal literals were set with incorrect precision during planning. #36055

  • Fixed the occasional issue where multiple layers of aggregation were merged incorrectly during planning. #36145

  • Fixed the occasional issue where the input-output mismatch error occurred after aggregate expansion planning. #36207

  • Fixed the occasional issue where <=> was incorrectly converted to =. #36521

Query Execution

  • Fixed the issue where the query hangs if the limited rows are reached on the pipeline engine and memory is not released. #35746

  • Fixed the BE coredump when enable_decimal256 is true but falls back to the old planner. #35731

Asynchronous Materialized Views

  • Fixed the issue where asynchronous materialized views caused backup and restore exceptions. #35703

  • Fixed the issue where partition rewrite could lead to incorrect results. #35236

Semi-structured

  • Fixed the core dump problem when a VARIANT with an empty key is used. #35671
  • Bitmap and BloomFilter index should not perform light index changes. #35225

Primary Key

  • Fixed the issue where an exception BE restart occurred in the case of partial column updates during import, which could result in duplicate keys. #35678

  • Fixed the issue where BE might core dump during clone operations when memory is tight. #34702

Lakehouse

  • Fixed the issue where a Hive table could not be created with a fully qualified name such as ctl.db.tbl #34984

  • Fixed the issue where the Hive metastore connection did not close when refreshing #35426

  • Fixed a potential meta replay issue when upgrading from 2.0.x to 2.1.x. #35532

  • Fixed the issue where the Table Valued Function could not read an empty snappy compressed file. #34926

  • Fixed the issue where unable to read Parquet files with invalid min-max column statistics #35041

  • Fixed the issue where unable to handle pushdown predicates with null-aware functions in the Parquet/ORC reader #35335

  • Fixed the issue about the order of partition columns when creating a Hive table #35347

  • Fixed the issue where writing to a Hive table on S3 failed when partition values contained spaces #35645

  • Fixed the issue about incorrect scheme of Aliyun OSS endpoint #34907

  • Fixed the issue where the Parquet format Hive table written by Doris could not be read by Hive #34981

  • Fixed the issue where unable to read ORC files after the schema change of a Hive table #35583

  • Fixed the issue where unable to read Paimon tables via JNI after the schema change of the Paimon table #35309

  • Fixed the issue of too small Row Groups in Parquet format files written out. #36042 #36143

  • Fixed the issue where unable to read Paimon tables after schema changes #36049

  • Fixed the issue where unable to read Hive Parquet format tables after schema changes #36182

  • Fixed the FE OOM issue caused by Hadoop FS cache #36403

  • Fixed the issue where FE could not start after enabling the Hive Metastore Listener #36533

  • Fixed the issue of query performance degradation with a large number of files #36431

  • Fixed the timezone issue when reading the timestamp column type in Iceberg #36435

  • Fixed DATETIME conversion error and data path error on Iceberg Table. #35708

  • Support retain and pass the additional user-defined properties fo Table Valued Functions to the S3 SDK. #35515

Data Import

  • Fixed the issue where CANCEL LOAD did not work #35352

  • Fixed the issue where a null pointer error in the Publish phase of load transactions prevented the load from completing #35977

  • Fixed the issue with bRPC serializing large data files when sent via HTTP #36169

Data Management

  • Fixed the isseu that the resource tag in ConnectionContext was not set after forwarding DDL or DML to master FE. #35618

  • Fixed the issue where the restored table name was incorrect when lower_case_table_names was enabled #35508

  • Fixed the issue where admin clean trash could not work #35271

  • Fixed the issue where a storage policy could not be deleted from a partition #35874

  • Fixed the issue of data loss when importing into a multi-replica automatic partition table #36586

  • Fixed the issue where the partition column of a table changed when querying or inserting into an automatic partition table using the old optimizer #36514

Memory Management

  • Fixed the issue of frequent errors in the logs due to failure in obtaining Cgroup meminfo. #35425

  • Fixed the issue where the Segment cache size was uncontrolled when using BloomFilter, leading to abnormal process memory growth. #34871

Permissions

  • Fixed the issue where permission settings were ineffective after enabling case-insensitive table names. #36557

  • Fixed the issue where setting LDAP passwords through non-Master FE nodes did not take effect. #36598

  • Fixed the issue where authorization could not be checked for the SELECT COUNT(*) statement. #35465

Others

  • Fixed the issue where the client JDBC program could not close the connection if the MySQL connection was broken. #36616

  • Fixed MySQL protocol compatibility issue with the SHOW PROCEDURE STATUS statement. #35350

  • The libevent now forces Keepalive to solve the issue of connection leaks in certain situations. #36088

Credits

Thanks to every one who contributes to this release.

@airborne12, @amorynan, @AshinGau, @BePPPower, @BiteTheDDDDt, @ByteYue, @caiconghui, @CalvinKirs, @cambyzju, @catpineapple, @cjj2010, @csun5285, @DarvenDuan, @dataroaring, @deardeng, @Doris-Extras, @eldenmoon, @englefly, @feiniaofeiafei, @felixwluo, @freemandealer, @Gabriel39, @gavinchou, @GoGoWen, @HappenLee, @hello-stephen, @hubgeter, @hust-hhb, @jacktengg, @jackwener, @jeffreys-cat, @Jibing-Li, @kaijchen, @kaka11chen, @Lchangliang, @liaoxin01, @LiBinfeng-01, @lide-reed, @luennng, @luwei16, @mongo360, @morningman, @morrySnow, @mrhhsg, @Mryange, @mymeiyi, @nextdreamblue, @platoneko, @qidaye, @qzsee, @seawinde, @shuke987, @sollhui, @starocean999, @suxiaogang223, @TangSiyang2001, @Thearas, @Vallishp, @w41ter, @wangbo, @whutpencil, @wsjz, @wuwenchi, @xiaokang, @xiedeyantu, @XieJiann, @xinyiZzz, @XuPengfei-1020, @xy720, @xzj7019, @yiguolei, @yongjinhou, @yujun777, @Yukang-Lian, @Yulei-Yang, @zclllyybb, @zddr, @zfr9527, @zgxme, @zhangbutao, @zhangstar333, @zhannngchen, @zhiqiang-hhhh, @zy-kkk, @zzzxl1993