Schedule
Login

RegEx Strikes Back: Regular Expressions for Text Mining

A short time ago in a galaxy not so far away a regular expression was taking 5 days to run. In this talk you will learn why regular expressions can be slow, how to make them fast using a trie regex data structure and the many uses a good old regular expression can have.

Abstract

Regular Expressions have a bad reputation, and they are slow (or so they say) for text mining tasks. In this talk you’ll learn why regex can be slow and how to use a Trie Regex to craft blazingly fast regular expressions with no effort. How regular expressions integrate smoothly with many libraries (pandas, spacy, etc) and how to use the regex module for common text cleaning tasks such as: prefix finding, fuzzy matching and many more.

Slides: https://speakerdeck.com/mesejo/pycon-italia-regex-strikes-back

Speaker
Daniel Mesejo
Track
Python & Friends
Audience Level
Intermediate
Language
English
Duration
30 minutes
Speaker name:
Daniel Mesejo
      Powered by Vercel Logo