Remote User Defined Function Service
Remote UDFβ
Remote UDF Service supports accessing user-provided UDF Services via RPC to execute user-defined functions. Compared to native UDF implementation, Remote UDF Service has the following advantages and limitations:
1. Advantages
-
Cross-language: UDF Services can be written in various languages supported by Protobuf.
-
Security: UDF failures or crashes only affect the UDF Service itself and do not cause Doris process crashes.
-
Flexibility: UDF Services can invoke any other services or library classes to meet diverse business requirements.
2. Usage Limitations
-
Performance: Compared to native UDFs, UDF Service introduces additional network overhead, resulting in lower performance. Additionally, the UDF Service implementation itself can impact function execution efficiency, and users need to handle issues like high concurrency and thread safety.
-
Single-row mode and batch processing mode: In Doris' original row-based query execution framework, UDF RPC calls are made for each row of data, resulting in poor performance. However, in the new vectorized execution framework, UDF RPC calls are made for each batch of data (default: 2048 rows), leading to significant performance improvements. In actual testing, the performance of Remote UDF based on vectorization and batch processing is comparable to that of native UDF based on row storage.
Writing UDF Functionsβ
This section provides instructions on how to develop a Remote RPC service. A Java version example is provided in samples/doris-demo/udf-demo/
for reference.
Copying the Proto Filesβ
Copy gensrc/proto/function_service.proto
and gensrc/proto/types.proto
to the RPC service.
function_service.proto
-
PFunctionCallRequest
-
function_name: Function name, corresponding to the symbol specified during function creation.
-
args: Arguments passed to the method.
-
context: Query context information.
-
-
PFunctionCallResponse
-
result: Result.
-
status: Status, where 0 represents normal.
-
-
PCheckFunctionRequest
-
function: Function-related information.
-
match_type: Matching type.
-
-
PCheckFunctionResponse
- status: Status, where 0 represents normal.
Generating Interfacesβ
Generate code using protoc. Refer to protoc -h
for specific parameters.
Implementing Interfacesβ
The following three methods need to be implemented:
-
fnCall: Used to write the calculation logic.
-
checkFn: Used for UDF creation validation, checking if the function name, parameters, return values, etc., are valid.
-
handShake: Used for interface probing.
Creating UDFβ
Currently, UDTF is not supported.
CREATE FUNCTION
name ([,...])
[RETURNS] rettype
PROPERTIES (["key"="value"][,...])
Note:
-
The
symbol
in the PROPERTIES represents the method name passed in the RPC call, and this parameter must be set. -
The
object_file
in the PROPERTIES represents the RPC service address. Currently, it supports a single address and cluster addresses in the brpc-compatible format. For cluster connection methods, refer to the Format Specification (Chinese). -
The
type
in the PROPERTIES represents the UDF invocation type, which is set to Native by default. Use RPC to pass when using RPC UDF. -
name
: A function belongs to a specific database. The name is in the form ofdbName
.funcName
. WhendbName
is not explicitly specified, the current session's database is used asdbName
.
Example:
CREATE FUNCTION rpc_add_two(INT,INT) RETURNS INT PROPERTIES (
"SYMBOL"="add_int_two",
"OBJECT_FILE"="127.0.0.1:9114",
"TYPE"="RPC"
);
CREATE FUNCTION rpc_add_one(INT) RETURNS INT PROPERTIES (
"SYMBOL"="add_int_one",
"OBJECT_FILE"="127.0.0.1:9114",
"TYPE"="RPC"
);
CREATE FUNCTION rpc_add_string(varchar(30)) RETURNS varchar(30) PROPERTIES (
"SYMBOL"="add_string",
"OBJECT_FILE"="127.0.0.1:9114",
"TYPE"="RPC"
);
Using UDFβ
Users must have the SELECT
privilege on the corresponding database to use UDF.
The usage of UDF is similar to regular functions, with the only difference being that the scope of built-in functions is global, while the scope of UDF is within the database. When the session is connected to a database, simply use the UDF name to search for the corresponding UDF within the current database. Otherwise, the user needs to explicitly specify the database name of the UDF, such as dbName
.funcName
.
Deleting UDFβ
When you no longer need a UDF function, you can delete it using the DROP FUNCTION
command.
Exampleβ
The samples/doris-demo/
directory provides examples of RPC server implementations in CPP, Java, and Python languages. Please refer to the README.md
file in each directory for specific usage instructions.
For example, rpc_add_string
:
mysql >select rpc_add_string('doris');
+-------------------------+
| rpc_add_string('doris') |
+-------------------------+
| doris_rpc_test |
+-------------------------+
The log will display:
INFO: fnCall request=function_name: "add_string"
args {
type {
id: STRING
}
has_null: false
string_value: "doris"
}
INFO: fnCall res=result {
type {
id: STRING
}
has_null: false
string_value: "doris_rpc_test"
}
status {
status_code: 0
}