直方图（Histogram）对CBO的影响

wapysun

浏览: 21508233 次
性别:
来自: 杭州

最近访客更多访客>>

devcang

hunankeda110

辽东小小

apex53

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

2012-06 ( 77)
2012-05 ( 587)
2012-04 ( 177)
更多存档...

对于有列数据非常倾斜的表，做直方图分析很重要，直方图主要讨论的是数据在列上的分布情况。

SQL> select * from v$version where rownum<2; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production SQL> create table t as select 1 id,object_name from dba_objects; Table created. SQL> update t set id=99 where rownum=1; 1 row updated. SQL> commit; Commit complete. SQL> create index ind_t on t(id); Index created. SQL> exec dbms_stats.gather_table_stats(user, tabname=>'T', estimate_percent=>100); PL/SQL procedure successfully completed. SQL> col column_name format a20 SQL> select table_name,column_name,endpoint_number,endpoint_value from user_histograms where table_name='T'; TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ------------------------------ -------------------- --------------- -------------- T ID 72483 1 T ID 72484 99 T OBJECT_NAME 0 2.4504E+35 T OBJECT_NAME 1 6.2963E+35可以看出，dbms_stats包默认已对id列做了直方图分析。SQL> set autotrace traceonly SQL> select * from t where id=99; Execution Plan ---------------------------------------------------------- Plan hash value: 4013845416 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 27 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 27 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_T | 1 | | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"=99) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3 consistent gets 0 physical reads 0 redo size 603 bytes sent via SQL*Net to client 520 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processedCBO根据直方图信息估算id=99的记录只有1行，这很正确，所以选择索引。

SQL> select * from t where id=1; 72483 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1601196873 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 72483 | 1911K| 95 (2)| 00:00:02 | |* 1 | TABLE ACCESS FULL| T | 72483 | 1911K| 95 (2)| 00:00:02 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ID"=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5137 consistent gets 0 physical reads 0 redo size 2457976 bytes sent via SQL*Net to client 53672 bytes received via SQL*Net from client 4834 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 72483 rows processedCBO根据直方图信息估算id=1的记录有72483行，几乎和表记录数一致，所以选择了全表扫描。

在这里，仅仅删除直方图的信息，保留表和索引的分析信息

SQL> set autotrace off SQL> select table_name,column_name,endpoint_number,endpoint_value from user_histograms where table_name='T'; TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ------------------------------ -------------------- --------------- -------------- T OBJECT_NAME 0 2.4504E+35 T OBJECT_NAME 1 6.2963E+35 SQL> select num_rows,avg_row_len,blocks,last_analyzed from user_tables where table_name='T'; NUM_ROWS AVG_ROW_LEN BLOCKS LAST_ANALYZE ---------- ----------- ---------- ------------ 72484 27 340 16-APR-11 SQL> select blevel,leaf_blocks,distinct_keys,last_analyzed from user_indexes where table_name='T'; BLEVEL LEAF_BLOCKS DISTINCT_KEYS LAST_ANALYZE ---------- ----------- ------------- ------------ 1 142 2 16-APR-11 删除直方图后的执行计划

SQL> set autotrace traceonly SQL> select * from t where id=99; Execution Plan ---------------------------------------------------------- Plan hash value: 4013845416 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 725 | 19575 | 73 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T | 725 | 19575 | 73 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_T | 290 | | 71 (0)| 00:00:01 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"=99) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3 consistent gets 0 physical reads 0 redo size 603 bytes sent via SQL*Net to client 520 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed SQL> select * from t where id=1; 72483 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 4013845416 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 725 | 19575 | 73 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T | 725 | 19575 | 73 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_T | 290 | | 71 (0)| 00:00:01 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5137 consistent gets 0 physical reads 0 redo size 2457976 bytes sent via SQL*Net to client 53672 bytes received via SQL*Net from client 4834 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 72483 rows processed不管谓词id=99还是id=1,CBO估算行数均为725行，而实际id=99的记录只有1条，id=1的记录基本为全部的表记录，所以CBO均使用索引是错的。

在起初，我对T进行分析时，遗漏了estimate_percent=>100这一参数的设置，以至于不管是对谓词id=1还是id=99执行计划均采用全表扫描。我想是因为对id列上的数据分布情况没有分析出来（或者产生了分析信息是错误的），所以应该根据我们的数据量来设置estimate_percent值。

oracle 11gr2 document中对于estimate_percent的解释：

estimate_percent

Percentage of rows to estimate (NULL means compute): The valid range is [0.000001,100]. Use the constant DBMS_STATS.AUTO_SAMPLE_SIZE to have Oracle determine the appropriate sample size for good statistics. This is the default.The default value can be changed using the SET_DATABASE_PREFS Procedure, SET_GLOBAL_PREFS Procedure, SET_SCHEMA_PREFS Procedure and SET_TABLE_PREFS Procedure.

method_opt
Accepts:
FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
FOR COLUMNS [size clause] column|attribute [size_clause] [,column|attribute [size_clause]...]
size_clause is defined as size_clause := SIZE {integer | REPEAT | AUTO | SKEWONLY}

- integer : Number of histogram buckets. Must be in the range [1,254].
- REPEAT : Collects histograms only on the columns that already have histograms.
- AUTO : Oracle determines the columns to collect histograms based on data distribution and the workload of the columns.
- SKEWONLY : Oracle determines the columns to collect histograms based on the data distribution of the columns.
The default is FOR ALL COLUMNS SIZE AUTO.The default value can be changed using the SET_PARAM Procedure.

在oracle中要删除直方图信息就是设置bucket的数据为1，可以使用如下两个命令来实现：
analyze table 表 compute statistics for table for columns 列 size 1；
exec dbms_stats.gather_table_stats(ownname => '',tabname=>'',cascade=>true, method_opt=>'for columns 列 size 1');

skewonly 选项的时间性很强，因为它检查每个索引中每列值的分布。如果 dbms_stats 发现一个索引中具有不均匀分布的列，它将为该索引创建直方图，以帮助基于成本的 SQL 优化器决定是使用索引还是全表扫描访问。
begin
dbms_stats. gather_table_stats (
ownname => '',
tabname=>''，
estimate_percent =>dbms_stats.auto_sample_size,
method_opt => 'for columns 列 size skewonly',
cascade=>true,
degree => 2);
end;

cascade
Gather statistics on the indexes for this table. Index statistics gathering is not parallelized. Using this option is equivalent to running the GATHER_INDEX_STATS Procedure on each of the table's indexes. Use the constant DBMS_STATS.AUTO_CASCADE to have Oracle determine whether index statistics to be collected or not. This is the default. The default value can be changed using theSET_PARAM Procedure.