Apache Doris 3.0.3 just released
Dear community members, the Apache Doris 3.0.3 version was officially released on December 02, 2024, this version further enhances the performance and stability of the system.
Quick Download: https://doris.apache.org/download/
GitHub Release: https://github.com/apache/doris/releases
Behavioral Changes
- Prohibited column updates on MOW tables with synchronous materialized views. #40190
- Adjusted the default parameters of RoutineLoad to improve import efficiency. #42968
- When StreamLoad fails, the return value of LoadedRows is adjusted to 0. #41946 #42291
- Adjusted the default memory limit of Segment cache to 5%. #42308 #42436
New Features
-
Introduced the session variable
enable_cooldown_replica_affinity
to control the affinity of cold and hot tiered replicas. #42677 -
Added
table$partition
syntax for querying partition information of Hive tables. #40774 -
Supported creation of Hive tables in Text format. #41860 #42175
Asynchronous Materialized Views
- Introduced new materialized view attribute
use_for_rewrite
. Whenuse_for_rewrite
is set to false, the materialized view does not participate in transparent rewriting. #40332
Query Optimizer
- Supported correlated non-aggregate subqueries. #42236
Query Execution
- Added functions
ngram_search
,normal_cdf
,to_iso8601
,from_iso8601_date
,SESSION_USER()
,last_query_id
. #38226 #40695 #41075 #41600 #39575 #40739 - The
aes_encrypt
andaes_decrypt
functions support GCM mode. #40004 - Profile outputs the changed session variable values. #41016 #41318
Semi-structured Data Management
- Added array functions
array_match_all
andarray_match_any
. #40605 #43514 - The array function
array_agg
supports nesting ARRAY/MAP/STRUCT within ARRAY. #42009 - Added approximate aggregate statistical functions
approx_top_k
andapprox_top_sum
. #44082
Improvements
Storage
- Supported
bitmap_empty
as the default value. #40364 - Introduced the session variable
insert_timeout
to control the timeout of DELETE statements. #41063 - Improved some error message prompts. #41048 #39631
- Improved the priority scheduling of replica repair. #41076
- Enhanced the robustness of timezone handling when creating tables. #41926 #42389
- Checked the validity of partition expressions when creating tables. #40158
- Supported Unicode-encoded column names in DELETE operations. #39381
Compute-Storage Decoupled
- Supported ARM architecture deployment in storage and compute separation mode. #42467 #43377
- Optimized the eviction strategy and lock competition of file cache, improving hit rate and high concurrency point query performance. #42451 #43201 #41818 #43401
- S3 storage vault supported
use_path_style
, solving the problem of using custom domain names for object storage. #43060 #43343 #43330 - Optimized storage and compute separation configuration and deployment, preventing misoperations in different modes. #43381 #43522 #43434 #40764 #43891
- Optimized observability and provided an interface for deleting specified segment file cache. #38489 #42896 #41037 #43412
- Optimized Meta-service operation and maintenance interface: RPC rate limiting and tablet metadata correction. #42413 #43884 #41782 #43460
Lakehouse
-
Paimon Catalog supported Alibaba Cloud DLF and OSS-HDFS storage. #41247 #42585
- View Documentation
-
Supported reading of Hive tables in OpenCSV format. #42257 #42942
-
Optimized the performance of accessing the
information_schema.columns
table in External Catalog. #41659 #41962 -
Used the new Max Compute open storage API to access Max Compute data sources. #41614
-
Optimized the scheduling policy of the JNI part of Paimon tables, making scan tasks more balanced. #43310
-
Optimized the read performance of small ORC files. #42004 #43467
-
Supported reading of parquet files in brotli compressed format. #42177
-
Added
file_cache_statistics
table under theinformation_schema
library to view metadata cache statistics. #42160
Query Optimizer
- Optimization: When queries only differ in comments, the same SQL Cache can be reused. #40049
- Optimization: Improved the stability of statistical information when data is frequently updated. #43865 #39788 #43009 #40457 #42409 #41894
- Optimization: Enhanced the stability of constant folding. #42910 #41164 #39723 #41394 #42256 #40441
- Optimization: Column pruning can generate better execution plans. #41719 #41548
Query Execution
- Optimized the memory usage of the sort operator. #39306
- Optimized the performance of computations on ARM. #38888 #38759
- Optimized the computational performance of a series of functions. #40366 #40821 #40670 #41206 #40162
- Used SSE instructions to optimize the performance of the
match_ipv6_subnet
function. #38755 - Supported automatic creation of new partitions during insert overwrite. #38628 #42645
- Added the status of each PipelineTask in Profile. #42981
- IP type supported runtime filter. #39985
Semi-structured Data Management
- Output the real SQL of prepared statements in audit logs. #43321
- The filebeat doris output plugin supports fault tolerance and progress reporting. #36355
- Optimized the performance of inverted index queries. #41547 #41585 #41567 #41577 #42060 #42372
- The array function
array overlaps
supports acceleration using inverted indexes. #41571 - The IP function
is_ip_address_in_range
supports acceleration using inverted indexes. #41571 - Optimized the CAST performance of the VARIANT data type. #41775 #42438 #43320
- Optimized the CPU resource consumption of the Variant data type. #42856 #43062 #43634
- Optimized the metadata and execution memory resource consumption of the Variant data type. #42448 #43326 #41482 #43093 #43567 #43620
Permissions
- Added a new configuration item
ldap_group_filter
in LDAP for custom group filtering. #43292
Other
- Supported displaying connection count information by user in FE monitoring items. #39200
Bug Fixes
Storage
- Fixed the issue with using IPv6 hostnames. #40074
- Fixed the inaccurate display of broker/s3 load progress. #43535
- Fixed the issue where queries might hang from FE. #41303 #42382
- Fixed the issue of duplicate auto-increment IDs under exceptional circumstances. #43774 #43983
- Fixed occasional NPE issues with groupcommit. #43635
- Fixed the inaccurate calculation of auto bucket. #41675 #41835
- Fixed the issue where FE might not correctly plan multi-table flows after restart. #41677 #42290
Compute-Storage Decoupled
- Fixed the issue that MOW primary key tables with large delete bitmaps might cause coredump. #43088 #43457 #43479 #43407 #43297 #43613 #43615 #43854 #43968 #44074 #41793 #42142
- Fixed the issue that segment files, when being a multiple of 5MB, would fail to upload objects. #43254
- Fixed the issue that the default retry policy of aws sdk did not take effect. #43575 #43648
- Fixed the issue that altering storage vault could continue execution even when the wrong type was specified. #43489 #43352 #43495
- Fixed the issue that tablet_id might be 0 during the delayed commit process of large transactions. #42043 #42905
- Fixed the issue that constant folding RCP and FE forwarding SQL might not be executed in the expected computation group. #43110 #41819 #41846
- Fixed the issue that meta-service did not strictly check instance_id upon receiving RPC. #43253 #43832
- Fixed the issue that FE follower information_schema version did not update in time. #43496
- Fixed the issue of atomicity in file cache rename and inaccurate metrics. #42869 #43504 #43220
Lakehouse
- Prohibited implicit conversion predicates from being pushed down to JDBC data sources to avoid inconsistent query results. #42102
- Fixed some read issues with high-version Hive transactional tables. #42226
- Fixed the issue that the Export command might cause deadlocks. #43083 #43402
- Fixed the issue of being unable to query Hive views created by Spark. #43552
- Fixed the issue that Hive partition paths containing special characters led to incorrect partition pruning. #42906
- Fixed the issue that Iceberg Catalog could not use AWS Glue. #41084
Asynchronous Materialized Views
- Fixed the issue that asynchronous materialized views might not refresh after the base table is rebuilt. #41762
Query Optimizer
- Fixed the issue that partition pruning results might be incorrect when using multi-column range partitioning. #43332
- Fixed the issue of incorrect calculation results in some limit offset scenarios. #42576
Query Execution
- Fixed the issue that hash join with array types larger than 4G could cause BE Core. #43861
- Fixed the issue that is null predicate operations might yield incorrect results in some scenarios. #43619
- Fixed the issue that bitmap types might produce incorrect output results in hash join. #43718
- Fixed some issues where function results were calculated incorrectly. #40710 #39358 #40929 #40869 #40285 #39891 #40530 #41948 #43588
- Fixed some issues with JSON type parsing. #39937
- Fixed issues with varchar and char types in runtime filter operations. #43758 #43919
- Fixed some issues with the use of decimal256 in scalar and aggregate functions. #42136 #42356
- Fixed the issue that arrow flight reported
Reach limit of connections
errors upon connection. #39127 - Fixed the issue of incorrect memory usage statistics for BE in k8s environments. #41123
Semi-structured Data Management
- Adjusted the default values of
segment_cache_fd_percentage
andinverted_index_fd_number_limit_percent
. #42224 - logstash now supports group_commit. #40450
- Fixed the issue of coredump when building index. #43246 #43298
- Fixed issues with variant index. #43375 #43773
- Fixed potential fd and memory leaks under abnormal compaction circumstances. #42374
- Inverted index match null now correctly returns null instead of false. #41786
- Fixed the issue of coredump when ngram bloomfilter index bf_size is set to 65536. #43645
- Fixed the issue of potential coredump during complex data type JOINs. #40398
- Fixed the issue of coredump with TVF JSON data. #43187
- Fixed the precision issue of bloom filter calculations for dates and times. #43612
- Fixed the issue of coredump with IPv6 type storage. #43251
- Fixed the issue of coredump when using VARIANT type with light_schema_change disabled. #40908
- Improved cache performance for high-concurrency point queries. #44077
- Fixed the issue that bloom filter indexes were not synchronized when columns were deleted. #43378
- Fixed instability issues with es catalog under special circumstances such as mixed array and scalar data. #40314 #40385 #43399 #40614
- Fixed coredump issues caused by abnormal regular pattern matching. #43394
Permissions
- Fixed several issues where permissions were not properly restricted after authorization. #43193 #41723 #42107 #43306
- Enhanced several permission checks. #40688 #40533 #41791 #42106
Other
-
Supplemented missing audit log fields in audit log tables and files. #43303