Quite often I come across transformations that are applicable to several scenarios. So created this reusable Python class that leverages PySpark capabilities to apply common transformation to a dataframe or a subset of columns in a dataframe. The code is in GitHub – bennyaustin/pyspark-utils. There is also an extensive function reference and usage document to go with it. Feel free to use, extend, request features and contribute.
Continue readingDatabricks
Upsert to Azure Synapse Analytics using PySpark

At the moment SQL MERGE operation is not available in Azure Synapse Analytics. However, it is possible to implement this feature using Azure Synapse Analytics connector in Databricks with some PySpark code.
Continue readingTime Zone Conversions in PySpark
PySpark has built-in functions to shift time between time zones. Just need to follow a simple rule. It goes like this. First convert the timestamp from origin time zone to UTC which is a point of reference. Then convert the timestamp from UTC to the required time zone. In this way there is no need to maintain lookup tables and its a generic method to convert time between time zones even for the ones that require daylight savings offset.
Continue reading