local
Localβ
Nameβ
local
Descriptionβ
Local table-valued-function(tvf), allows users to read and access local file contents on be node, just like accessing relational table. Currently supports csv/csv_with_names/csv_with_names_and_types/json/parquet/orc
file format.
It needs ADMIN
privilege to use.
syntaxβ
local(
"file_path" = "path/to/file.txt",
"backend_id" = "be_id",
"format" = "csv",
"keyn" = "valuen"
...
);
parameter description
Related parameters for accessing local file on be node:
file_path
:(required) The path of the file to be read, which is a relative path to the
user_files_secure_path
directory, whereuser_files_secure_path
parameter can be configured on be.Can not contains
..
in path. Support using glob syntax to match multi files, such aslog/*.log
backend_id
:(required) The backend id where the file resides. The
backend_id
can be obtained byshow backends
command.
File format parameters:
format
: (required) Currently supportcsv/csv_with_names/csv_with_names_and_types/json/parquet/orc
column_separator
: (optional) default,
.line_delimiter
: (optional) default\n
.compress_type
: (optional) Currently supportUNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE/SNAPPYBLOCK
. Default value isUNKNOWN
, it will automatically infer the type based on the suffix ofuri
.The following 6 parameters are used for loading in json format. For specific usage methods, please refer to: Json Load
read_json_by_line
: (optional) default"true"
strip_outer_array
: (optional) default"false"
json_root
: (optional) default""
json_paths
: (optional) default""
num_as_string
: (optional) defaultfalse
The following 2 parameters are used for loading in csv formatfuzzy_parse
: (optional) defaultfalse
trim_double_quotes
: Boolean type (optional), the default value isfalse
. True means that the outermost double quotes of each field in the csv file are trimmed.skip_lines
: Integer type (optional), the default value is 0. It will skip some lines in the head of csv file. It will be disabled when the format iscsv_with_names
orcsv_with_names_and_types
.
Examplesβ
Analyze the log file on specified BE:
mysql> select * from local(
"file_path" = "log/be.out",
"backend_id" = "10006",
"format" = "csv")
where c1 like "%start_time%" limit 10;
+--------------------------------------------------------+
| c1 |
+--------------------------------------------------------+
| start time: 2023εΉ΄ 08ζ 07ζ₯ ζζδΈ 23:20:32 CST |
| start time: 2023εΉ΄ 08ζ 07ζ₯ ζζδΈ 23:32:10 CST |
| start time: 2023εΉ΄ 08ζ 08ζ₯ ζζδΊ 00:20:50 CST |
| start time: 2023εΉ΄ 08ζ 08ζ₯ ζζδΊ 00:29:15 CST |
+--------------------------------------------------------+
Read and access csv format files located at path ${DORIS_HOME}/student.csv
:
mysql> select * from local(
"file_path" = "student.csv",
"backend_id" = "10003",
"format" = "csv");
+------+---------+--------+
| c1 | c2 | c3 |
+------+---------+--------+
| 1 | alice | 18 |
| 2 | bob | 20 |
| 3 | jack | 24 |
| 4 | jackson | 19 |
| 5 | liming | d18 |
+------+---------+--------+
Can be used with desc function
:
mysql> desc function local(
"file_path" = "student.csv",
"backend_id" = "10003",
"format" = "csv");
+-------+------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------+------+-------+---------+-------+
| c1 | TEXT | Yes | false | NULL | NONE |
| c2 | TEXT | Yes | false | NULL | NONE |
| c3 | TEXT | Yes | false | NULL | NONE |
+-------+------+------+-------+---------+-------+
Keywordsβ
local, table-valued-function, tvf
Best Practiceβ
For more detailed usage of local tvf, please refer to S3 tvf, The only difference between them is the way of accessing the storage system.