It’s quite common to analyse a measure by categorizing it into different buckets or bands. For example to list the customers who are due for payment in the next week or fortnight, to decide which income group to target for a particular product campaign, which age group of students are most likely to buy a smartphone and so on. These bands are usually not supplied by source systems but derived from the underlying measure and they exist only for analytic purposes. While this banding could be achieved in reports or dashboards, there is significant value to model them as dimensions especially to promote self-service BI.
In this example, fact_admission represents the fact data collected from hospital admissions. AGE_YRS is the client’s age at the time of admission. DIM_AGE is banding dimension for age group with the granularity being AGE in yrs.
Screen Capture 1- Age Group Bands
- DIM_AGE is type 1 SCD that has all ages say from 0 to 130 yrs.
- AGE_SK is the surrogate key. It’s not an identity rather it takes same value of AGE except for Unknown SKs, in which case AGE is NULL. This means the ETL does not have to look-up this dimension table.
- DIM_AGE has an out of bound surrogate key with value 999 to cater for age greater than 130 yrs.
The three age group bands are defined as follows
AGE_GROUP_1
Infant | AGE Below 1 yr |
Toddler | AGE between 1-3 yrs |
Pre-schooler | AGE between 3-5 yrs |
Young Kids | AGE between 5-7 yrs |
Big Kids | AGE between 7-12 yrs |
Teens | AGE between 12-18 yrs |
Young Adult | AGE between 18-39 yrs |
Middle Age | AGE between 40-59 yrs |
Seniors | AGE 60 and over |
AGE_GROUP_2
Unlicensed Drivers | AGE Below 17 yrs |
Young Drivers | AGE between 17-25 yrs |
Silver License | AGE between 25-30 yrs |
Gold License | AGE between 30-75 yrs |
Older Drivers | AGE over 75 yrs |
AGE_GROUP_3
Pre-schooler | AGE below 5 yrs |
School | AGE between 5-18 yrs |
Uni | AGE between 18-25 yrs |
Salaried | AGE between 25-60 yrs |
Pensioner | AGE over 60yrs |
- When building the dimension, it is recommended to build dimension using a view on DIM_AGE. This allows to include columns that could act as key columns to the age bands and more importantly to sort the different captions in the required order.
- The materialized AGE_SK column on fact table is optional. The fact table can be linked to DIM_AGE using a view on fact table because of the fact the surrogate key is directly derived from age measure.
- A new age group can be easily created just by adding a new column to the dimension table without the need to re-key fact tables.
Incorporating banding as dimension simplifies reporting, self-service BI platform and also provides a design that is highly scalable. Banding can be extended to any measure like income, duration, count , miles and percentile and so on.
One thought on “Dimensional Modeling: Banding”