Blog/Release Notes

Apache Doris 3.0.3 just released

Apache DorisDecember 2, 2024

Dear community members, the Apache Doris 3.0.3 version was officially released on December 02, 2024, this version further enhances the performance and stability of the system.

Quick Download: https://doris.apache.org/download/

GitHub Release: https://github.com/apache/doris/releases

Behavioral Changes

Prohibited column updates on MOW tables with synchronous materialized views. #40190
Adjusted the default parameters of RoutineLoad to improve import efficiency. #42968
When StreamLoad fails, the return value of LoadedRows is adjusted to 0. #41946 #42291
Adjusted the default memory limit of Segment cache to 5%. #42308 #42436

New Features

Introduced the session variable enable_cooldown_replica_affinity to control the affinity of cold and hot tiered replicas. #42677
Added table$partition syntax for querying partition information of Hive tables. #40774
- View Documentation
Supported creation of Hive tables in Text format. #41860 #42175
- View Documentation

Asynchronous Materialized Views

Introduced new materialized view attribute use_for_rewrite. When use_for_rewrite is set to false, the materialized view does not participate in transparent rewriting. #40332

Query Optimizer

Supported correlated non-aggregate subqueries. #42236

Query Execution

Added functions ngram_search, normal_cdf, to_iso8601, from_iso8601_date, SESSION_USER(), last_query_id. #38226 #40695 #41075 #41600 #39575 #40739
The aes_encrypt and aes_decrypt functions support GCM mode. #40004
Profile outputs the changed session variable values. #41016 #41318

Semi-structured Data Management

Added array functions array_match_all and array_match_any. #40605 #43514
The array function array_agg supports nesting ARRAY/MAP/STRUCT within ARRAY. #42009
Added approximate aggregate statistical functions approx_top_k and approx_top_sum. #44082

Improvements

Storage

Supported bitmap_empty as the default value. #40364
Introduced the session variable insert_timeout to control the timeout of DELETE statements. #41063
Improved some error message prompts. #41048 #39631
Improved the priority scheduling of replica repair. #41076
Enhanced the robustness of timezone handling when creating tables. #41926 #42389
Checked the validity of partition expressions when creating tables. #40158
Supported Unicode-encoded column names in DELETE operations. #39381

Compute-Storage Decoupled

Supported ARM architecture deployment in storage and compute separation mode. #42467 #43377
Optimized the eviction strategy and lock competition of file cache, improving hit rate and high concurrency point query performance. #42451 #43201 #41818 #43401
S3 storage vault supported use_path_style, solving the problem of using custom domain names for object storage. #43060 #43343 #43330
Optimized storage and compute separation configuration and deployment, preventing misoperations in different modes. #43381 #43522 #43434 #40764 #43891
Optimized observability and provided an interface for deleting specified segment file cache. #38489 #42896 #41037 #43412
Optimized Meta-service operation and maintenance interface: RPC rate limiting and tablet metadata correction. #42413 #43884 #41782 #43460

Lakehouse

Paimon Catalog supported Alibaba Cloud DLF and OSS-HDFS storage. #41247 #42585
- View Documentation
Supported reading of Hive tables in OpenCSV format. #42257 #42942
Optimized the performance of accessing the information_schema.columns table in External Catalog. #41659 #41962
Used the new Max Compute open storage API to access Max Compute data sources. #41614
Optimized the scheduling policy of the JNI part of Paimon tables, making scan tasks more balanced. #43310
Optimized the read performance of small ORC files. #42004 #43467
Supported reading of parquet files in brotli compressed format. #42177
Added file_cache_statistics table under the information_schema library to view metadata cache statistics. #42160

Query Optimizer

Optimization: When queries only differ in comments, the same SQL Cache can be reused. #40049
Optimization: Improved the stability of statistical information when data is frequently updated. #43865 #39788 #43009 #40457 #42409 #41894
Optimization: Enhanced the stability of constant folding. #42910 #41164 #39723 #41394 #42256 #40441
Optimization: Column pruning can generate better execution plans. #41719 #41548

Query Execution

Optimized the memory usage of the sort operator. #39306
Optimized the performance of computations on ARM. #38888 #38759
Optimized the computational performance of a series of functions. #40366 #40821 #40670 #41206 #40162
Used SSE instructions to optimize the performance of the match_ipv6_subnet function. #38755
Supported automatic creation of new partitions during insert overwrite. #38628 #42645
Added the status of each PipelineTask in Profile. #42981
IP type supported runtime filter. #39985

Semi-structured Data Management

Output the real SQL of prepared statements in audit logs. #43321
The filebeat doris output plugin supports fault tolerance and progress reporting. #36355
Optimized the performance of inverted index queries. #41547 #41585 #41567 #41577 #42060 #42372
The array function array overlaps supports acceleration using inverted indexes. #41571
The IP function is_ip_address_in_range supports acceleration using inverted indexes. #41571
Optimized the CAST performance of the VARIANT data type. #41775 #42438 #43320
Optimized the CPU resource consumption of the Variant data type. #42856 #43062 #43634
Optimized the metadata and execution memory resource consumption of the Variant data type. #42448 #43326 #41482 #43093 #43567 #43620

Permissions

Added a new configuration item ldap_group_filter in LDAP for custom group filtering. #43292

Other

Supported displaying connection count information by user in FE monitoring items. #39200

Bug Fixes

Storage

Fixed the issue with using IPv6 hostnames. #40074
Fixed the inaccurate display of broker/s3 load progress. #43535
Fixed the issue where queries might hang from FE. #41303 #42382
Fixed the issue of duplicate auto-increment IDs under exceptional circumstances. #43774 #43983
Fixed occasional NPE issues with groupcommit. #43635
Fixed the inaccurate calculation of auto bucket. #41675 #41835
Fixed the issue where FE might not correctly plan multi-table flows after restart. #41677 #42290

Compute-Storage Decoupled

Fixed the issue that MOW primary key tables with large delete bitmaps might cause coredump. #43088 #43457 #43479 #43407 #43297 #43613 #43615 #43854 #43968 #44074 #41793 #42142
Fixed the issue that segment files, when being a multiple of 5MB, would fail to upload objects. #43254
Fixed the issue that the default retry policy of aws sdk did not take effect. #43575 #43648
Fixed the issue that altering storage vault could continue execution even when the wrong type was specified. #43489 #43352 #43495
Fixed the issue that tablet_id might be 0 during the delayed commit process of large transactions. #42043 #42905
Fixed the issue that constant folding RCP and FE forwarding SQL might not be executed in the expected computation group. #43110 #41819 #41846
Fixed the issue that meta-service did not strictly check instance_id upon receiving RPC. #43253 #43832
Fixed the issue that FE follower information_schema version did not update in time. #43496
Fixed the issue of atomicity in file cache rename and inaccurate metrics. #42869 #43504 #43220

Lakehouse

Prohibited implicit conversion predicates from being pushed down to JDBC data sources to avoid inconsistent query results. #42102
Fixed some read issues with high-version Hive transactional tables. #42226
Fixed the issue that the Export command might cause deadlocks. #43083 #43402
Fixed the issue of being unable to query Hive views created by Spark. #43552
Fixed the issue that Hive partition paths containing special characters led to incorrect partition pruning. #42906
Fixed the issue that Iceberg Catalog could not use AWS Glue. #41084

Asynchronous Materialized Views

Fixed the issue that asynchronous materialized views might not refresh after the base table is rebuilt. #41762

Query Optimizer

Fixed the issue that partition pruning results might be incorrect when using multi-column range partitioning. #43332
Fixed the issue of incorrect calculation results in some limit offset scenarios. #42576

Query Execution

Fixed the issue that hash join with array types larger than 4G could cause BE Core. #43861
Fixed the issue that is null predicate operations might yield incorrect results in some scenarios. #43619
Fixed the issue that bitmap types might produce incorrect output results in hash join. #43718
Fixed some issues where function results were calculated incorrectly. #40710 #39358 #40929 #40869 #40285 #39891 #40530 #41948 #43588
Fixed some issues with JSON type parsing. #39937
Fixed issues with varchar and char types in runtime filter operations. #43758 #43919
Fixed some issues with the use of decimal256 in scalar and aggregate functions. #42136 #42356
Fixed the issue that arrow flight reported Reach limit of connections errors upon connection. #39127
Fixed the issue of incorrect memory usage statistics for BE in k8s environments. #41123

Semi-structured Data Management

Adjusted the default values of segment_cache_fd_percentage and inverted_index_fd_number_limit_percent. #42224
logstash now supports group_commit. #40450
Fixed the issue of coredump when building index. #43246 #43298
Fixed issues with variant index. #43375 #43773
Fixed potential fd and memory leaks under abnormal compaction circumstances. #42374
Inverted index match null now correctly returns null instead of false. #41786
Fixed the issue of coredump when ngram bloomfilter index bf_size is set to 65536. #43645
Fixed the issue of potential coredump during complex data type JOINs. #40398
Fixed the issue of coredump with TVF JSON data. #43187
Fixed the precision issue of bloom filter calculations for dates and times. #43612
Fixed the issue of coredump with IPv6 type storage. #43251
Fixed the issue of coredump when using VARIANT type with light_schema_change disabled. #40908
Improved cache performance for high-concurrency point queries. #44077
Fixed the issue that bloom filter indexes were not synchronized when columns were deleted. #43378
Fixed instability issues with es catalog under special circumstances such as mixed array and scalar data. #40314 #40385 #43399 #40614
Fixed coredump issues caused by abnormal regular pattern matching. #43394

Permissions

Fixed several issues where permissions were not properly restricted after authorization. #43193 #41723 #42107 #43306
Enhanced several permission checks. #40688 #40533 #41791 #42106

Other

Supplemented missing audit log fields in audit log tables and files. #43303
- View Documentation

Behavioral Changes​

New Features​

Asynchronous Materialized Views​

Query Optimizer​

Query Execution​

Semi-structured Data Management​

Improvements​

Storage​

Compute-Storage Decoupled​

Lakehouse​

Query Optimizer​

Query Execution​

Semi-structured Data Management​

Permissions​

Other​

Bug Fixes​

Storage​

Compute-Storage Decoupled​

Lakehouse​

Asynchronous Materialized Views​

Query Optimizer​

Query Execution​

Semi-structured Data Management​

Permissions​

Other​

Behavioral Changes

New Features

Asynchronous Materialized Views

Query Optimizer

Query Execution

Semi-structured Data Management

Improvements

Storage

Compute-Storage Decoupled

Lakehouse

Query Optimizer

Query Execution

Semi-structured Data Management

Permissions

Other

Bug Fixes

Storage

Compute-Storage Decoupled

Lakehouse

Asynchronous Materialized Views

Query Optimizer

Query Execution

Semi-structured Data Management

Permissions

Other