PySpark has built-in functions to shift time between time zones. Just need to follow a simple rule. It goes like this. First convert the timestamp from origin time zone to UTC which is a point of reference. Then convert the timestamp from UTC to the required time zone. In this way there is no need to maintain lookup tables and its a generic method to convert time between time zones even for the ones that require daylight savings offset.
from pyspark.sql.functions import * df=spark.createDataFrame([ ('America/New_York','2020-02-01 10:00:00') ,('Europe/Lisbon','2020-02-01 10:00:00') ,('Europe/Madrid','2020-02-01 10:00:00') ,('Europe/London', '2020-02-01 10:00:00') ,('America/Sao_Paulo', '2020-02-01 10:00:00') ,('Africa/Nairobi', '2020-02-01 10:00:00') ,('Asia/Damascus', '2020-02-01 10:00:00') ,('Asia/Singapore', '2020-02-01 10:00:00') ,('Atlantic/Bermuda','2020-02-01 10:00:00') ,('Canada/Mountain','2020-02-01 10:00:00') ,('Pacific/Tahiti','2020-02-01 10:00:00') ] ,["OriginTz","Time"]) df=df.withColumn("Time",col("Time").cast("Timestamp"))\ .withColumn("UTC",to_utc_timestamp(col("Time"), col("OriginTz")))\ .withColumn("AEST",from_utc_timestamp(col("UTC"),"Australia/Brisbane"))\ .withColumn("AEDT",from_utc_timestamp(col("UTC"),"Australia/Sydney")) display(df)
I found this useful to convert time to account for daylight savings offset as well. For instance, in Australia some states observe daylight savings while others don’t. Say, to convert time from AEST (Australian Eastern Standard Time, no daylight savings) to AEDT (Australian Eastern Daylight Time) the from time zone needs to be one that doesn’t observe daylights savings like Australia/Brisbane and the to time zone needs to be one that observes daylight savings like Australia/Sydney (highlighted in the code snippet).