Owlver | GBase 8a Implementation Guide: Parameter Optimization (2)

Psst! Do you accept cookies?

We use cookies to enhance and personalise your experience.
Please accept our cookies. Checkout our Cookie Policy for more information.

Okay Hm?

DEV @dev.to

GBase 8a Implementation Guide: Parameter Optimization (2)

July 1, 2024 5 minutes

1. SQL Execution Parameters

1.1 Insert Value Data Distribution Parameter

gcluster_random_insert

This parameter controls how data is distributed across nodes when executing insert value on a randomly distributed table. The default value is 0, and the recommended configuration is 1.

0: All insert value data is inserted into a single node (if the executing node is a composite node, data is inserted there; otherwise, it is inserted into a random node).
1: insert value data is evenly distributed across all nodes randomly.

1.2 Parameter for Supporting `insert into select from dual`

t_gcluster_use_new_dual

This parameter controls whether GCluster uses the new implementation of the dual table, which supports insert into ... select ... from dual.

Range: 0, 1
Default: 0
0: Uses the old implementation, does not support insert into ... select ... from dual.
1: Uses the new implementation, supports insert into ... select ... from dual.
Scope: session, global

1.3 Data Redistribution Parameter for `group by`

t_gcluster_hash_redistribute_groupby_on_multiple_expression

This parameter enables hash redistribution for group by operations across all columns. The default value is 0 (disabled). It can be optimized for SQL where the first group by field is a constant or has few distinct values.

1.4 Join/Materialized Result Set Size Parameter

_gbase_result_threshold

This parameter limits the size of JOIN result sets and materialized result sets. It needs to be configured in both GCluster and GNode. The default value is large, at 137,438,953,472. It is recommended to set it to twice the number of rows in the largest table. This helps avoid Cartesian products by erroring out if the resulting row count exceeds this value.

1.5 Parallel Materialization Threshold for Result Sets

gbase_parallel_threshold

This parameter defines the threshold for parallel materialization of result sets. When the result set row count is greater than or equal to this value, multiple threads are used for parallel materialization; otherwise, it is done serially. The default value is 10,000. Adjusting this parameter can optimize performance for serial materialization stages that are time-consuming.

1.6 Allow Binary/Varbinary Column Creation

gcluster_support_binary

The default value is 1, allowing the creation of binary/varbinary columns. When set to 0, binary/varbinary columns are not allowed, and varchar type must be used. Adjust based on business needs.

1.7 Parameter for Table and Column Names with Chinese Characters

gcluster_extend_ident

This parameter controls whether table and column names can include Chinese characters and special characters. The default value is 0 (disabled). It is generally not recommended to enable this parameter unless necessary.

1.8 Optimization Parameter for `group by` with Window Functions

t_gcluster_group_by_ext_optimization

When enabled, this parameter optimizes group by rollup/cube/grouping sets by converting them into union all executions. The default value is 0 (disabled), and the recommended value is 1. Note that this optimization does not work if the grouping column in the projection is a function.

Example SQL rewrite to bypass the current optimization limitation:

SELECT func(a), b, COUNT(*) FROM t GROUP BY rollup(a, b);
-- Change to
SELECT func(a), cnt FROM (SELECT a, b, COUNT(*) AS cnt FROM t GROUP BY rollup(a, b)) tmp;

1.9 One-pass Hash Group Optimization

_gbase_one_pass_hash_group

This optimization is suitable for cases where the source table has a large number of rows relative to the group buffer, and the distinct values in the group by columns are numerous. There are three partitioning methods: RR, original hash, and one-pass hash, with the evaluation criteria as follows:

DistinctRatio < 10: Uses RR partitioning (requires two aggregations).
Otherwise: Uses hash partitioning. If the group buffer can hold 50% of the source data, original hash partitioning is used; otherwise, one-pass hash partitioning is used.

Note: The algorithm might not be optimal due to inaccurate sampling results or when only considering data volume without data characteristics.

1.10 Recursive Call Depth for Stored Procedures

max_sp_recursion_depth

This parameter specifies the maximum depth for recursive calls in stored procedures. The range is [0-255], with a default value of 0. Adjust this parameter based on the need for recursive calls in stored procedures. Increasing the value may also require increasing the thread_stack parameter in GCluster.

1.11 CTE Support Parameter

t_gcluster_support_cte

This parameter controls whether the common table expression (CTE) syntax (with as) is supported. It is a session-level parameter, with a default value of 0 (disabled) and 1 enabling CTE syntax support.

1.12 Recursive Query Parameter for `connect by start with`

_gbase_connect_by_support_table_with_deleted_records

This parameter controls whether connect by start with recursive queries can be executed on tables after data deletion. The default value is 0 (OFF). Enabling this parameter allows such queries even after deletions.

1.13 Recursion Depth Limit for `or` Operator in Correlated Subqueries

_gbase_or_recursion_depth

This parameter limits the maximum depth of nested conditions with the or operator in correlated subqueries. The default value is 10, and exceeding this value results in an error. The parameter is session-level in GNode.

1.14 Distinct Row Count Limit for `in` Subquery Results

_gbase_in_subquery_result_threshold

This parameter limits the distinct row count for in subquery results. The range is [0-100 million], with a default value of 10 million. Adjust based on business scenarios.

2. dblink Parameters

2.1 Retaining Intermediate Temporary Results

gcluster_dblink_direct_data_exchange

This parameter is related to transparent gateways and controls how data is transferred between two GBase 8a clusters during insert select operations. The default value is 1, using select into server for cross-cluster data distribution. When set to 0, the select results are converted into insert values statements.

Note: For significantly different GBase 8a cluster versions, set gcluster_dblink_direct_data_exchange to 0 for compatibility, despite the reduced performance.

2.2 Controlling Table Generation Method with dblink

t_gcluster_dblink_generate_interim_table_policy

This parameter controls how interim tables are generated during table pulls with dblink. It is a global and session-level parameter.

Range: 0, 1
Default: 1
0: Uses automatic evaluation based on the data type of the projection expression results.
1: Requests the gateway to use create ... select ... limit 0 to determine the interim table structure, resulting in more accurate column data type evaluation.

Author: DEV
Date: July 1, 2024
Permalink: owlver.com/s/grFviX1M
A story from Owlver Network.

Last Stories

What's your thoughts?

Please Register or Login to your account to be able to submit your comment.