PySpark Common Transforms

Quite often I come across transformations that are applicable to several scenarios. So created this reusable Python class that leverages PySpark capabilities to apply common transformation to a dataframe or a subset of columns in a dataframe. The code is in GitHub – bennyaustin/pyspark-utils. There is also an extensive function reference and usage document to go with it. Feel free to use, extend, request features and contribute.

Continue reading →

Databricks Daylight Saving PySpark Spark Time Zone Conversion tz

Time Zone Conversions in PySpark

PySpark has built-in functions to shift time between time zones. Just need to follow a simple rule. It goes like this. First convert the timestamp from origin time zone to UTC which is a point of reference. Then convert the timestamp from UTC to the required time zone. In this way there is no need to maintain lookup tables and its a generic method to convert time between time zones even for the ones that require daylight savings offset.

Continue reading →

Share this:

Share this: