Introducing the Fabric Accelerator!

The Fabric Accelerator is a collection of reusable code artifacts integrated with an orchestration framework for Microsoft Fabric. This accelerator helps you to build, deploy and run data platforms using Microsoft Fabric in a consistent and repeatable manner. It leverages the popular ELT (Extract, Load, Transform) framework for meta-data based orchestration. The ELT Framework is widely used with Azure Synapse and Azure Databricks. It has now been extended to support Microsoft Fabric.

Continue reading

PySpark Common Transforms

Quite often I come across transformations that are applicable to several scenarios. So created this reusable Python class that leverages PySpark capabilities to apply common transformation to a dataframe or a subset of columns in a dataframe. The code is in GitHub – bennyaustin/pyspark-utils. There is also an extensive function reference and usage document to go with it. Feel free to use, extend, request features and contribute.

Continue reading

Time Zone Conversions in PySpark

PySpark has built-in functions to shift time between time zones. Just need to follow a simple rule. It goes like this. First convert the timestamp from origin time zone to UTC which is a point of reference. Then convert the timestamp from UTC to the required time zone. In this way there is no need to maintain lookup tables and its a generic method to convert time between time zones even for the ones that require daylight savings offset.

Continue reading