Dimensional Modeling: Banding

It’s quite common to analyse a measure by categorizing it into different buckets or bands. For example to list the customers who are due for payment in the next week or fortnight, to decide which income group to target for a particular product campaign, which age group of students are most likely to buy a smartphone and so on. These bands are usually not supplied by source systems but derived from the underlying measure and they exist only for analytic purposes. While this banding could be achieved in reports or dashboards, there is significant value to model them as dimensions especially to promote self-service BI.

In this example, fact_admission represents the fact data collected from hospital admissions. AGE_YRS is the client’s age at the time of admission. DIM_AGE is banding dimension for age group with the granularity being AGE in yrs.

Screen Capture 1- Age Group Bands

  • DIM_AGE is type 1 SCD that has all ages say from 0 to 130 yrs.
  • AGE_SK is the surrogate key. It’s not an identity rather it takes same value of AGE except for Unknown SKs, in which case AGE is NULL. This means the ETL does not have to look-up this dimension table.
  • DIM_AGE has an out of bound surrogate key with value 999 to cater for age greater than 130 yrs.

The three age group bands are defined as follows

AGE_GROUP_1

Infant AGE Below 1 yr
Toddler AGE between 1-3 yrs
Pre-schooler AGE between 3-5 yrs
Young Kids AGE between 5-7 yrs
Big Kids AGE between 7-12 yrs
Teens AGE between 12-18 yrs
Young Adult AGE between 18-39 yrs
Middle Age AGE between 40-59 yrs
Seniors AGE 60 and over

AGE_GROUP_2

Unlicensed Drivers AGE Below 17 yrs
Young Drivers AGE between 17-25 yrs
Silver License AGE between 25-30 yrs
Gold License AGE between 30-75 yrs
Older Drivers AGE over 75 yrs

AGE_GROUP_3

Pre-schooler AGE below 5 yrs
School AGE between 5-18 yrs
Uni AGE between 18-25 yrs
Salaried AGE between 25-60 yrs
Pensioner AGE over 60yrs
  • When building the dimension, it is recommended to build dimension using a view on DIM_AGE. This allows to include columns that could act as key columns to the age bands and more importantly to sort the different captions in the required order.
  • The materialized AGE_SK column on fact table is optional. The fact table can be linked to DIM_AGE using a view on fact table because of the fact the surrogate key is directly derived from age measure.
  • A new age group can be easily created just by adding a new column to the dimension table without the need to re-key fact tables.

Incorporating banding as dimension simplifies reporting, self-service BI platform and also provides a design that is highly scalable. Banding can be extended to any measure like income, duration, count , miles and percentile and so on.

Benny Austin