Category Archives : Python

Pyspark Joins by Example

This entry was posted in Python Spark on January 27, 2018 by Will

Summary: Pyspark DataFrames have a join method which takes three parameters: DataFrame on the right side of the join, Which fields are being joined on, and what type of join (inner, outer, left_outer, right_outer, leftsemi). You call the join method from the left side DataFrame object such as df1.join(df2, df1.col1 == df2.col1, ‘inner’).

Pyspark Join Data with Two Tables (A and B)

Pyspark ALS and Recommendation Outputs

This entry was posted in Python Spark on December 26, 2016 by Will

Lately, I’ve written a few iterations of pyspark to develop a recommender system (I’ve had some practice creating recommender systems in pyspark). I ran into a situation where I needed to generate some recommendations on some different datasets. My problem was that I had to decipher some of the prediction documentation. Because of my struggles, […]

Building a Recommender System in Spark with ALS

This entry was posted in Python Spark and tagged RecSys on May 1, 2016 by Will

Summary: Spark has an implementation of Alternating Least Squares (ALS) along with a set of very simple functions to create recommendations based on past data.

Working in Pyspark: Basics of Working with Data and RDDs

This entry was posted in Python Spark on April 23, 2016 by Will

Summary: Spark (and Pyspark) use map, mapValues, reduce, reduceByKey, aggregateByKey, and join to transform, aggregate, and connect datasets. Each function can be stringed together to do more complex tasks.

Learn by Marketing

Data Mining + Marketing in Plain English