Commit 0f1fa93a authored by unknown's avatar unknown
Browse files

WL#2985 "Partition-pruning", "range walking" addition: better comments.

parent bcd56c6d
Loading
Loading
Loading
Loading
+2 −3
Original line number Diff line number Diff line
@@ -690,11 +690,10 @@ class partition_info : public Sql_alloc
  /* 
    A bitmap of partitions used by the current query. 
    Usage pattern:
    * It is guaranteed that all partitions are set to be unused on query start.
    * The handler->extra(HA_EXTRA_RESET) call at query start/end sets all
      partitions to be unused.
    * Before index/rnd_init(), partition pruning code sets the bits for used
      partitions.
    * The handler->extra(HA_EXTRA_RESET) call at query end sets all partitions
      to be unused.
  */
  MY_BITMAP used_partitions;

+86 −21
Original line number Diff line number Diff line
@@ -2115,7 +2115,7 @@ int SQL_SELECT::test_quick_select(THD *thd, key_map keys_to_use,
       The list of intervals we'll obtain will look like this:
       ((t1.a, t1.b) = (1,'foo')),
       ((t1.a, t1.b) = (2,'bar')), 
       ((t1,a, t1.b) > (10,'zz'))  (**)
       ((t1,a, t1.b) > (10,'zz'))
       
    2. for each interval I 
       {
@@ -2574,30 +2574,95 @@ int find_used_partitions_imerge(PART_PRUNE_PARAM *ppar, SEL_IMERGE *imerge)
    This function 
      * recursively walks the SEL_ARG* tree collecting partitioning "intervals"
      * finds the partitions one needs to use to get rows in these intervals
      * marks these partitions as used
      * marks these partitions as used.
    The next session desribes the process in greater detail.
 
  NOTES    
    WHAT IS CONSIDERED TO BE "INTERVALS"
    A partition pruning "interval" is equivalent to condition in one of the 
    forms:
    
    "partition_field1=const1 AND ... AND partition_fieldN=constN"        (1)
    "subpartition_field1=const1 AND ... AND subpartition_fieldN=constN"  (2)
    "(1) AND (2)"                                                        (3)
    
    In (1) and (2) all [sub]partitioning fields must be used, and "x=const"
    includes "x IS NULL". 
    
    If partitioning is performed using 
       
       PARTITION BY RANGE(unary_monotonic_func(single_partition_field)),
       
    then the following is also an interval:

           "   const1 OP1 single_partition_field OP2 const2"              (4)
  IMPLEMENTATION
    TYPES OF RESTRICTIONS THAT WE CAN OBTAIN PARTITIONS FOR    
    We can find out which [sub]partitions to use if we obtain restrictions on 
    [sub]partitioning fields in the following form:
    1.  "partition_field1=const1 AND ... AND partition_fieldN=constN"
    1.1  Same as (1) but for subpartition fields

    If partitioning supports interval analysis (i.e. partitioning is a
    function of a single table field, and partition_info::
    get_part_iter_for_interval != NULL), then we can also use condition in
    this form:
    2.  "const1 <=? partition_field <=? const2"
    2.1  Same as (2) but for subpartition_field

    INFERRING THE RESTRICTIONS FROM SEL_ARG TREE
    
    The below is an example of what SEL_ARG tree may represent:
    
    (start)
     |                           $
     |   Partitioning keyparts   $  subpartitioning keyparts
     |                           $
     |     ...          ...      $
     |      |            |       $
     | +---------+  +---------+  $  +-----------+  +-----------+
     \-| par1=c1 |--| par2=c2 |-----| subpar1=c3|--| subpar2=c5|
       +---------+  +---------+  $  +-----------+  +-----------+
            |                    $        |             |
            |                    $        |        +-----------+ 
            |                    $        |        | subpar2=c6|
            |                    $        |        +-----------+ 
            |                    $        |
            |                    $  +-----------+  +-----------+
            |                    $  | subpar1=c4|--| subpar2=c8|
            |                    $  +-----------+  +-----------+
            |                    $         
            |                    $
       +---------+               $  +------------+  +------------+
       | par1=c2 |------------------| subpar1=c10|--| subpar2=c12|
       +---------+               $  +------------+  +------------+
            |                    $
           ...                   $

    The up-down connections are connections via SEL_ARG::left and
    SEL_ARG::right. A horizontal connection to the right is the
    SEL_ARG::next_key_part connection.
    
    find_used_partitions() traverses the entire tree via recursion on
     * SEL_ARG::next_key_part (from left to right on the picture)
     * SEL_ARG::left|right (up/down on the pic). Left-right recursion is
       performed for each depth level.
    
    Recursion descent on SEL_ARG::next_key_part is used to accumulate (in
    ppar->arg_stack) constraints on partitioning and subpartitioning fields.
    For the example in the above picture, one of stack states is:
      in find_used_partitions(key_tree = "subpar2=c5") (***)
      in find_used_partitions(key_tree = "subpar1=c3")
      in find_used_partitions(key_tree = "par2=c2")   (**)
      in find_used_partitions(key_tree = "par1=c1")
      in prune_partitions(...)
    We apply partitioning limits as soon as possible, e.g. when we reach the
    depth (**), we find which partition(s) correspond to "par1=c1 AND par2=c2",
    and save them in ppar->part_iter.
    When we reach the depth (***), we find which subpartition(s) correspond to
    "subpar1=c3 AND subpar2=c5", and then mark appropriate subpartitions in
    appropriate subpartitions as used.
    
    It is possible that constraints on some partitioning fields are missing.
    For the above example, consider this stack state:
      in find_used_partitions(key_tree = "subpar2=c12") (***)
      in find_used_partitions(key_tree = "subpar1=c10")
      in find_used_partitions(key_tree = "par1=c2")
      in prune_partitions(...)
    Here we don't have constraints for all partitioning fields. Since we've
    never set the ppar->part_iter to contain used set of partitions, we use
    its default "all partitions" value.  We get  subpartition id for 
    "subpar1=c3 AND subpar2=c5", and mark that subpartition as used in every
    partition.

    The inverse is also possible: we may get constraints on partitioning
    fields, but not constraints on subpartitioning fields. In that case,
    calls to find_used_partitions() with depth below (**) will return -1,
    and we will mark entire partition as used.

    where OP1 and OP2 are '<' OR '<=', and const_i can be +/- inf.
    Everything else is not a partition pruning "interval".
  TODO
    Replace recursion on SEL_ARG::left and SEL_ARG::right with a loop

  RETURN
    1   OK, one or more [sub]partitions are marked as used.
+14 −10
Original line number Diff line number Diff line
@@ -3673,11 +3673,11 @@ typedef uint32 (*get_endpoint_func)(partition_info*, bool left_endpoint,

  DESCRIPTION
    Initialize partition set iterator to walk over the interval in
    ordered-list-of-partitions (for RANGE partitioning) or 
    ordered-list-of-list-constants (for LIST partitioning) space.
    ordered-array-of-partitions (for RANGE partitioning) or 
    ordered-array-of-list-constants (for LIST partitioning) space.

  IMPLEMENTATION
    This function is applied when partitioning is done by
    This function is used when partitioning is done by
    <RANGE|LIST>(ascending_func(t.field)), and we can map an interval in
    t.field space into a sub-array of partition_info::range_int_array or
    partition_info::list_array (see get_partition_id_range_for_endpoint,
@@ -3760,7 +3760,7 @@ int get_part_iter_for_interval_via_mapping(partition_info *part_info,


/*
  Partitioning Interval Analysis: Initialize iterator to walk integer interval
  Partitioning Interval Analysis: Initialize iterator to walk field interval

  SYNOPSIS
    get_part_iter_for_interval_via_walking()
@@ -3776,7 +3776,8 @@ int get_part_iter_for_interval_via_mapping(partition_info *part_info,
  DESCRIPTION
    Initialize partition set iterator to walk over interval in integer field
    space. That is, for "const1 <=? t.field <=? const2" interval, initialize 
    the iterator to do this: 
    the iterator to return a set of [sub]partitions obtained with the
    following procedure:
      get partition id for t.field = const1,   return it
      get partition id for t.field = const1+1, return it
       ...                 t.field = const1+2, ...
@@ -3799,7 +3800,7 @@ int get_part_iter_for_interval_via_mapping(partition_info *part_info,
    The rationale behind these requirements is that if they are not met
    we're likely to hit most of the partitions and traversing the interval
    will only add overhead. So it's better return "all partitions used" in
    this case.
    that case.

  RETURN
    0 - No matching partitions, iterator not initialized
@@ -3917,7 +3918,7 @@ uint32 get_next_partition_id_range(PARTITION_ITERATOR* part_iter)
      part_iter  Partition set iterator structure

  DESCRIPTION
    This is special implementation of PARTITION_ITERATOR::get_next() for
    This implementation of PARTITION_ITERATOR::get_next() is special for 
    LIST partitioning: it enumerates partition ids in 
    part_info->list_array[i] where i runs over [min_idx, max_idx] interval.

@@ -3937,13 +3938,16 @@ uint32 get_next_partition_id_list(PARTITION_ITERATOR *part_iter)


/*
  PARTITION_ITERATOR::get_next implementation: walk over integer interval
  PARTITION_ITERATOR::get_next implementation: walk over field-space interval

  SYNOPSIS
    get_next_partition_via_walking()
      part_iter  Partitioning iterator

  DESCRIPTION
    This implementation of PARTITION_ITERATOR::get_next() returns ids of
    partitions that contain records with partitioning field value within
    [start_val, end_val] interval.

  RETURN 
    partition id