蛊毒传说
oracle 柱状图(histogram)oracle中的柱状图是用于记录表中的数据分布质量情况的描述,当每次使用analyze或者dbms_stat包分析数据表及列后,该表的分布情况会呗保存在统计表(user_tab_columns/user_histograms)里面,当多表连接时,CBO优化器会根据柱状图提供的信息评估多表连接时将产生的成本(cost)或技术(cardinality),决定是否使用该列的索引,当然,导致CBO不能选择最优执行计划的因素有多种情况,而柱状图只是协助CBO优化器选择最优的执行计划,在一个数据分布不均匀的表列上建立柱状图将有力地保证优化器做出正确合理的选择。其他因素后面在进行探讨。 (直方图的使用不受索引的限制,可以在表的任何列上构建直方图)1. 搜集柱状图SQL> conn scott/tigerConnected to Oracle Database 10g Enterprise Edition Release 10.2.0.1.0Connected as scottSQL> select table_name from user_tables;TABLE_NAME------------------------------DEPTEMPBONUSSALGRADESQL> exec dbms_stats.gather_table_stats(ownname=>'scott',tabname => 'dept',estimate_percent => null,method_opt => 'for all indexed columns',cascade => true);PL/SQL procedure successfully completedSQL> select column_name,density,num_buckets,histogram from user_tab_col_statistics where table_name='DEPT';COLUMN_NAME DENSITY NUM_BUCKETS HISTOGRAM------------------------------ ---------- ----------- ---------------DEPTNO 0.125 4 HEIGHT BALANCEDDNAME 0.25 1 NONELOC 0.25 1 NONESQL> select * from dept;DEPTNO DNAME LOC------ -------------- -------------10 ACCOUNTING NEW YORK20 RESEARCH DALLAS30 SALES CHICAGO40 OPERATIONS BOSTONSQL> select * from user_tab_histograms where table_name='DEPT';TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE------------------------------ -------------------------------------------- --------------- -----------------------------DEPT DEPTNO 1 10DEPT DEPTNO 2 20DEPT DEPTNO 3 30DEPT DEPTNO 4 40DEPT DNAME 0 3.388635500875DEPT LOC 0 3.443005050520DEPT DNAME 1 4.322850386777DEPT LOC 1 4.0640554408998 rows selected柱状图的搜集有三个参数,for columns <list> SIZE <N> | REPEAT | AUTO | SKEWONLY:统计指定列的histograms.N的取值范围[1,254]; REPEAT上次统计过的histograms;AUTO由oracle决定N的大小;SKEWONLY在上面柱状图搜集中,histogram字段有三个值,NONE,FREQUENCY或者HEIGHT BALANCEDa. NONE:就是没有直方图b. FREQUENCY: 当该列的distinct值数量<=bucket数量时,为此类型。在user_tab_histograms表中记录有相关的值c. HEIGHT BALANCED:当该列的distinct值数量>bucket数量时,为此类型。d. density字段值的含义 Density的含义是“密度”。DENSITY值是会影响CBO判断执行计划的2. 并不是所有柱状图信息都有存在的必要,产生直方图的成本是很高的,频繁分析一个表,若该表数据量非常大,做一次分析可能导致严重的性能问题,但是那些列的柱状图应该存在呢,建议如下:a. 第一次收集统计信息时,设置method_opt=>FOR ALL COLUMNS SIZE 1,这意味删除所有列上的直方图。b. 在测试阶段或者在真实生产环境中,在调优SQL的过程中,DBA将会逐渐得知每个需要直方图信息的字段,在这些字段上人工收集统计信息,method_opt=>FOR COLUMNS SIZE AUTO[COLUMN_NAME]c. 在每次数据分布有所变化的时候,更新统计信息,使用method_opt=>FOR ALL COLUMNS SIZE REPEAT,这样只会收集已经存在了直方图信息的字段。重复2,3步骤,直到系统稳定。3. 柱状图是如何影响执行计划的,下面通过示例来查看SQL> show userUser is "colin"SQL> drop table tmp_liuhc_1;Table droppedSQL> create table tmp_liuhc_1 as select * from dba_objects;Table createdSQL> desc tmp_liuhc_1;Name Type Nullable Default Comments-------------- ------------- -------- ------- --------OWNER VARCHAR2(30) YOBJECT_NAME VARCHAR2(128) YSUBOBJECT_NAME VARCHAR2(30) YOBJECT_ID NUMBER YDATA_OBJECT_ID NUMBER YOBJECT_TYPE VARCHAR2(19) YCREATED DATE YLAST_DDL_TIME DATE YTIMESTAMP VARCHAR2(19) YSTATUS VARCHAR2(7) YTEMPORARY VARCHAR2(1) YGENERATED VARCHAR2(1) YSECONDARY VARCHAR2(1) YSQL> select owner,count(*) from tmp_liuhc_1 group by owner;OWNER COUNT(*)------------------------------ ----------MDSYS 885TSMSYS 3DMSYS 189LINK 3PUBLIC 19987OUTLN 8CTXSYS 339OLAPSYS 720HR 34SYSTEM 454EXFSYS 281SCOTT 6DBSNMP 46ORDSYS 1669ORDPLUGINS 10SYSMAN 1321OE 127PM 26SH 306XDB 682OWNER COUNT(*)------------------------------ ----------IX 53BI 8SYS 22912WMSYS 242SI_INFORMTN_SCHEMA 8COLIN 626 rows selectedSQL> create index idx_tmp_liuhc on tmp_liuhc_1(owner);Index createdSQL> select sysdate from dual;SYSDATE-----------2011-10-30删除柱状图信息,bucket为1时,相当于一个普通分析,没有柱状图信息,执行计划按绝大多数ID字段内容来选择走索引,删除之后刷新shared_poolSQL> exec dbms_stats.gather_table_stats(ownname => 'colin',tabname => 'tmp_liuhc_1',estimate_percent => null ,method_opt => 'for all columns size 1' ,cascade => true);PL/SQL procedure successfully completedSQL> alter system flush shared_pool;System alteredSQL> select * from user_tab_col_statistics where column_name='OWNER';TABLE_NAME COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUEDENSITY NUM_NULLS NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE GLOBAL_STATS USER_STATS AVG_COL_LEN HISTOGRAM------------------------------ ------------------------------ ------------ ---------------------------------------------------------------- ---------------------------------------------------------------- ---------- ---------- ----------- ------------- ----------- ------------ ---------- ----------- ---------------TMP_LIUHC_1 OWNER 26 4249 5844420.03846153 0 1 2011-10-30 9: 50325 YES NO 6 NONESQL> explain plan for select * from tmp_liuhc_1 where owner='COLIN';ExplainedSQL> select * From table(Dbms_Xplan.display);PLAN_TABLE_OUTPUT--------------------------------------------------------------------------------Plan hash value: 3774022813--------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU--------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1936 | 175K| 57 (0| 1 | TABLE ACCESS BY INDEX ROWID| TMP_LIUHC_1 | 1936 | 175K| 57 (0|* 2 | INDEX RANGE SCAN | IDX_TMP_LIUHC | 1936 | | 5 (0--------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------2 - access("OWNER"='COLIN')14 rows selectedSQL> explain plan for select * from tmp_liuhc_1 where owner='SYS';ExplainedSQL> select * From table(Dbms_Xplan.display);PLAN_TABLE_OUTPUT--------------------------------------------------------------------------------Plan hash value: 3774022813--------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU--------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1936 | 175K| 57 (0| 1 | TABLE ACCESS BY INDEX ROWID| TMP_LIUHC_1 | 1936 | 175K| 57 (0|* 2 | INDEX RANGE SCAN | IDX_TMP_LIUHC | 1936 | | 5 (0--------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------2 - access("OWNER"='SYS')14 rows selectedSQL> exec dbms_stats.gather_table_stats(ownname => 'colin',tabname => 'tmp_liuhc_1',estimate_percent => null ,method_opt => 'for all columns size auto' ,cascade =>true);PL/SQL procedure successfully completedSQL> select * from user_tab_col_statistics where column_name='OWNER';TABLE_NAME COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUEDENSITY NUM_NULLS NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE GLOBAL_STATS USER_STATS AVG_COL_LEN HISTOGRAM------------------------------ ------------------------------ ------------ ---------------------------------------------------------------- ---------------------------------------------------------------- ---------- ---------- ----------- ------------- ----------- ------------ ---------- ----------- ---------------TMP_LIUHC_1 OWNER 26 4249 5844429.93541977 0 26 2011-10-30 9: 50325 YES NO 6 FREQUENCYSQL>SQL> explain plan for select * from tmp_liuhc_1 where owner='COLIN';ExplainedSQL> select * From table(Dbms_Xplan.display);PLAN_TABLE_OUTPUT--------------------------------------------------------------------------------Plan hash value: 3774022813--------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU--------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 6 | 558 | 2 (0| 1 | TABLE ACCESS BY INDEX ROWID| TMP_LIUHC_1 | 6 | 558 | 2 (0|* 2 | INDEX RANGE SCAN | IDX_TMP_LIUHC | 6 | | 1 (0--------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------2 - access("OWNER"='COLIN')14 rows selectedSQL> explain plan for select * from tmp_liuhc_1 where owner='SYS';ExplainedSQL> select * From table(Dbms_Xplan.display);PLAN_TABLE_OUTPUT--------------------------------------------------------------------------------Plan hash value: 1961695573--------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time--------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 22912 | 2080K| 160 (2)| 00:00:02|* 1 | TABLE ACCESS FULL| TMP_LIUHC_1 | 22912 | 2080K| 160 (2)| 00:00:02--------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------1 - filter("OWNER"='SYS')13 rows selected从以上可以看出,当删除柱状图时,查询SYS用户时,CBO按大多数ID字段的内容,选择走索引;当搜集柱状图后,CBO选择了正确的执行计划,走全表扫描,因为前面已经查询了,SYS用户下的表占用了决大部分。附录:附带两张表的解释信息,此处的表DBA_TAB_COLUMNS和表user_tab_col_statistics是同样效果DBA_TAB_HISTOGRAMS该视图列出了所有表上列的柱状图信息。字段名称 描述---------------------------------------------------------OWNER 表的所有者TABLE_NAME 表名COLUMN_NAME 列名(字段名)ENDPOINT_NUMBER 终点数值,也就是bucket的编号值ENDPOINT_VALUE 根据该bucket的编号值,规格化的终点字段值,如果字段类型是数字,则显示该字段值,如果是其他类型的,则是被规格化的数值。DBA_TAB_COLUMNS该视图包含了所有表上列的描述信息。(注意:虽然视图和簇的信息也包含在该视图中,但没有相关的柱状图信息)字段名称 描述----------------------------------------------------------OWNER 表的所有者TABLE_NAME 表名COLUMN_NAME 列名(字段名)DATA_TYPE 字段的数据类型DATA_LENGTH 该列定义的长度DATA_PRECISION NUMBER或FLOAT数值类型的精度DATA_SCALE 定义的小数位数NULLABLE 是否允许NULL值COLUMN_ID 该列在表中的顺序的位置,表中第几个字段DEFAULT_LENGTH 字段默认值的长度DATA_DEFAULT 字段默认值NUM_DISTINCT 该字段不同值的数量LOW_VALUE 该字段的最小值,该值是内部按16进制存储的该值的前32 bytes内容HIGH_VALUE 该字段的最大值,该值是内部按16进制存储的该值的前32 bytes内容DENSITY 该字段的密度(不同值的比例值,比如一个字段只有2个不同值,那么该字段密度为0.5)NUM_NULLS 该字段NULL值的数量NUM_BUCKETS 该字段柱状图中bucket的数量LAST_ANALYZED 最近一次分析表的时间SAMPLE_SIZE 采样数据的数量(根据分析时的百分比得出的数量,例如100%分析,该采样值就是行的数量)