Talk

How to Handle Very Large Data with Python

Thursday, May 23

15:25 - 16:10

RoomPanino

LanguageEnglish

Audience levelBeginner

Elevator pitch

Loading data in-memory is a quick and convenient way to explore smaller datasets. When you work with larger datasets with millions of rows, it can be challenging to load and transform the data. In this talk, we’ll walk through available tools that make it easy to handle very large datasets.

Abstract

This talk provides an overview of the most commonly used Python packages for handling very large data. We’ll cover Polars, Vaex, and Pandas 2.0 and walk through the difference between eager and lazy evaluation.

TagsBig Data

Jill Cates

I’m a senior data scientist at Shopify and I’m based in Toronto, Canada. I’m a big fan of Python and love the community that surrounds it. Outside of work, I love to play tennis, going on long walks outside with my dog Ziggy, and write posts for my work-in-progress blog - Normally Distributed.