转载:ORACLE-BASE - ROLLUP, CUBE, GROUPING Functions and GROUPING SETS
This article gives an overview of the functionality available for aggregation in data warehouses, focusing specifically on the information required for the Oracle Database SQL Expert (1Z0-047) exam.
The examples in this article will be run against the following simple dimension table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
DROPTABLE dimension_tab; CREATETABLE dimension_tab ( fact_1_id NUMBER NOTNULL, fact_2_id NUMBER NOTNULL, fact_3_id NUMBER NOTNULL, fact_4_id NUMBER NOTNULL, sales_value NUMBER(10,2) NOTNULL );
INSERTINTO dimension_tab SELECT TRUNC(DBMS_RANDOM.value(low =>1, high =>3)) AS fact_1_id, TRUNC(DBMS_RANDOM.value(low =>1, high =>6)) AS fact_2_id, TRUNC(DBMS_RANDOM.value(low =>1, high =>11)) AS fact_3_id, TRUNC(DBMS_RANDOM.value(low =>1, high =>11)) AS fact_4_id, ROUND(DBMS_RANDOM.value(low =>1, high =>100), 2) AS sales_value FROM dual CONNECTBY level <=1000; COMMIT;
To keep the queries and their output simple I am going to ignore the fact tables and also limit the number of distinct values in the columns of the dimension table.
Let’s start be reminding ourselves how the GROUP BY clause works. An aggregate function takes multiple rows of data returned by a query and aggregates them into a single result row.
1 2 3 4 5 6 7 8 9 10
SELECTSUM(sales_value) AS sales_value FROM dimension_tab;
SALES_VALUE ----------- 50528.39
1row selected.
SQL>
Including the GROUP BY clause limits the window of data processed by the aggregate function. This way we get an aggregated value for each distinct combination of values present in the columns listed in the GROUP BY clause. The number of rows we expect can be calculated by multiplying the number of distinct values of each column listed in the GROUP BY clause. In this case, if the rows were loaded randomly we would expect the number of distinct values for the first three columns in the table to be 2, 5 and 10 respectively. So using the fact_1_id column in the GROUP BY clause should give us 2 rows.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
SELECT fact_1_id, COUNT(*) AS num_rows, SUM(sales_value) AS sales_value FROM dimension_tab GROUPBY fact_1_id ORDERBY fact_1_id;
In addition to the regular aggregation results we expect from the GROUP BY clause, the ROLLUP extension produces group subtotals from right to left and a grand total. If “n” is the number of columns listed in the ROLLUP , there will be n+1 levels of subtotals.
Looking at the output in a SQL*Plus or a grid output, you can visually identify the rows containing subtotals as they have null values in the ROLLUP columns. It may be easier to spot when scanning down the output of the following query shown here. Obviously, if the raw data contains null values, using this visual identification is not an accurate approach, but we will discuss this issue later.
1 2 3 4 5 6 7
SELECT fact_1_id, fact_2_id, fact_3_id, SUM(sales_value) AS sales_value FROM dimension_tab GROUPBYROLLUP (fact_1_id, fact_2_id, fact_3_id) ORDERBY fact_1_id, fact_2_id, fact_3_id;
It is possible to do a partial rollup to reduce the number of subtotals calculated. The output from the following partial rollup is shown here.
1 2 3 4 5 6 7
SELECT fact_1_id, fact_2_id, fact_3_id, SUM(sales_value) AS sales_value FROM dimension_tab GROUPBY fact_1_id, ROLLUP (fact_2_id, fact_3_id) ORDERBY fact_1_id, fact_2_id, fact_3_id;
In addition to the subtotals generated by the ROLLUP extension, the CUBE extension will generate subtotals for all combinations of the dimensions specified. If “n” is the number of columns listed in the CUBE , there will be 2n subtotal combinations.
As the number of dimensions increase, so do the combinations of subtotals that need to be calculated, as shown by the output of the following query, shown here.
1 2 3 4 5 6 7
SELECT fact_1_id, fact_2_id, fact_3_id, SUM(sales_value) AS sales_value FROM dimension_tab GROUPBYCUBE (fact_1_id, fact_2_id, fact_3_id) ORDERBY fact_1_id, fact_2_id, fact_3_id;
It is possible to do a partial cube to reduce the number of subtotals calculated. The output from the following partial cube is shown here.
1 2 3 4 5 6 7
SELECT fact_1_id, fact_2_id, fact_3_id, SUM(sales_value) AS sales_value FROM dimension_tab GROUPBY fact_1_id, CUBE (fact_2_id, fact_3_id) ORDERBY fact_1_id, fact_2_id, fact_3_id;
It can be quite easy to visually identify subtotals generated by rollups and cubes, but to do it programatically you really need something more accurate than the presence of null values in the grouping columns. This is where the GROUPING function comes in. It accepts a single column as a parameter and returns “1” if the column contains a null value generated as part of a subtotal by a ROLLUP or CUBE operation or “0” for any other value, including stored null values.
The following query is a repeat of a previous cube, but the GROUPING function has been added for each of the dimensions in the cube.
SELECT fact_1_id, fact_2_id, SUM(sales_value) AS sales_value, GROUPING(fact_1_id) AS f1g, GROUPING(fact_2_id) AS f2g FROM dimension_tab GROUPBYCUBE (fact_1_id, fact_2_id) ORDERBY fact_1_id, fact_2_id;
The GROUPING_ID function provides an alternate and more compact way to identify subtotal rows. Passing the dimension columns as arguments, it returns a number indicating the GROUP BY level.
It’s possible to write queries that return the duplicate subtotals, which can be a little confusing. The GROUP_ID function assigns the value “0” to the first set, and all subsequent sets get assigned a higher number. The following query forces duplicates to show the GROUP_ID function in action.
Calculating all possible subtotals in a cube, especially those with many dimensions, can be quite an intensive process. If you don’t need all the subtotals, this can represent a considerable amount of wasted effort. The following cube with three dimensions gives 8 levels of subtotals (GROUPING_ID: 0-7), shown here.
1 2 3 4 5 6 7 8
SELECT fact_1_id, fact_2_id, fact_3_id, SUM(sales_value) AS sales_value, GROUPING_ID(fact_1_id, fact_2_id, fact_3_id) AS grouping_id FROM dimension_tab GROUPBYCUBE(fact_1_id, fact_2_id, fact_3_id) ORDERBY fact_1_id, fact_2_id, fact_3_id;
If we only need a few of these levels of subtotaling we can use the GROUPING SETS expression and specify exactly which ones we need, saving us having to calculate the whole cube. In the following query we are only interested in subtotals for the “ FACT_1_ID, FACT_2_ID ” and “ FACT_1_ID, FACT_3_ID ” groups.
ROLLUP and CUBE consider each column independently when deciding which subtotals must be calculated. For ROLLUP this means stepping back through the list to determine the groupings.
1 2 3 4 5
ROLLUP (a, b, c) (a, b, c) (a, b) (a) ()
CUBE creates a grouping for every possible combination of columns.
1 2 3 4 5 6 7 8 9
CUBE (a, b, c) (a, b, c) (a, b) (a, c) (a) (b, c) (b) (c) ()
Composite columns allow columns to be grouped together with braces so they are treated as a single unit when determining the necessary groupings. In the following ROLLUP columns “a” and “b” have been turned into a composite column by the additional braces. As a result the group of “a” is not longer calculated as the column “a” is only present as part of the composite column in the statement.
1 2 3 4 5 6 7
ROLLUP ((a, b), c) (a, b, c) (a, b) ()
Not considered: (a)
In a similar way, the possible combinations of the following CUBE are reduced because references to “a” or “b” individually are not considered as they are treated as a single column when the groupings are determined.
1 2 3 4 5 6 7 8 9 10 11
CUBE ((a, b), c) (a, b, c) (a, b) (c) ()
Not considered: (a, c) (a) (b, c) (b)
The impact of this is shown clearly in the follow two statements, whose output is shown here and here. The regular cube returns 198 rows and 8 groups (0-7), while the cube with the composite column returns only 121 rows with 4 groups (0, 1, 6, 7)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
-- Regular Cube. SELECT fact_1_id, fact_2_id, fact_3_id, SUM(sales_value) AS sales_value, GROUPING_ID(fact_1_id, fact_2_id, fact_3_id) AS grouping_id FROM dimension_tab GROUPBYCUBE(fact_1_id, fact_2_id, fact_3_id) ORDERBY fact_1_id, fact_2_id, fact_3_id;
-- Cube with composite column. SELECT fact_1_id, fact_2_id, fact_3_id, SUM(sales_value) AS sales_value, GROUPING_ID(fact_1_id, fact_2_id, fact_3_id) AS grouping_id FROM dimension_tab GROUPBYCUBE((fact_1_id, fact_2_id), fact_3_id) ORDERBY fact_1_id, fact_2_id, fact_3_id;
Concatenated groupings are defined by putting together multiple GROUPING SETS , CUBE s or ROLLUP s separated by commas. The resulting groupings are the cross-product of all the groups produced by the individual grouping sets. It might be a little easier to understand what this means by looking at an example. The following GROUPING SET results in 2 groups of subtotals, one for the fact_1_id column and one for the fact_id_2 column.