PySpark Common Transforms

Quite often I come across transformations that are applicable to several scenarios. So created this reusable Python class that leverages PySpark capabilities to apply common transformation to a dataframe or a subset of columns in a dataframe. The code is in GitHub – bennyaustin/pyspark-utils. There is also an extensive function reference and usage document to go with it. Feel free to use, extend, request features and contribute.

Continue reading

Kaggle: TalkingData

A brief retrospective of my submission for Kaggle data science competition that predicts the gender and age group of a smartphone user based on their usage pattern.

Continue reading

Kaggle: Grupo Bimbo

A brief retrospective of my submission for Kaggle data science competition that forecasts inventory demand for Grupo Bimbo.

Continue reading