Thoughtstream | Courses

Understanding Regular Expressions for Bioinformatics

Regular expressions are powerful and efficient tools for analyzing and manipulating the types of structured text-based data commonly used in bioinformatics applications.

Regexes are available in most major programming languages and as a key component of many other useful software tools (browsers, command-line utilities, text editors, web servers, etc.).

But to most programmers, regular expressions remain a riddle wrapped in a mystery inside an enigma shrouded in line-noise. They seem hard to create, harder to use, and almost impossible to debug or maintain.

So most developers make an entirely rational choice: either don't use regexes at all (the "Reinventing The Wheel...Badly" solution), or else just cut-and-paste existing regexes, adapting them to the new task by trial-and-error (the "Attack Of The Mutant Clones" approach).

This one- or two-day class offers a third option, taking participants back to the fundamentals of regular expressions and explaining what regexes really are (i.e. not declarative pattern matching specifications) and how they actually work (i.e. not simply by sequential character-to-character text comparisons).

The class also demonstrates how bioinformatics programmers can make use of their existing software development skills to construct correct and efficient regexes...without selling their souls or losing their minds along the way.

The course is completely language-agnostic. Every example will be shown in all five major modern dialects of regex syntax (ERE, PCRE, POSIX, P6, and Vimmish), which collectively cover the use of regular expressions in: Apache, C, C++, C#, Chrome, Clojure, egrep, Emacs, flex, Firefox, gawk, grep, Haskell, Java, JavaScript, MySQL, .Net, PCRE, Perl, PHP, PowerShell, Python, Ruby, Safari, sed, VB.NET, and Vim.

Course format

1-day or 2-day seminar

Who should attend

Programmers in bioinformatics-related fields who are familiar with the basics of control flow, string handling, and simple data structures in one or more of the above programming languages.