Lab in Data Science

Website for the EPFL Lab in Data Science 2019

GitHub Repository

Questions and Answers

Mattermost

DSLab Week 8

Week 2/3: Spark DataFrames


An API for computing with structured data (if it’s a table, it fits)

Some key points:

DataFrame demo

Download the notebook

PySpark architecture overview

<img src=”figs/pyspark_architecture.png” height=400px>

PySpark DataFrame optimization

<img src=”figs/databricks_catalyst.png” height=200px>