Thoughtstream | Courses

Understanding Regular Expressions for Bioinformatics

In the first half of this course, we'll explore the principles and mechanisms underlying all Perl regular expressions. You'll see how the highly compact syntax of Perl patterns controls a built-in pattern-matching "engine," and learn how to design and construct Perl regexes efficiently. We'll also look at the four principal uses of regexes in Perl, discuss a number of uniquely Perlish regex "idioms," and examine how those techniques apply specifically to the kinds of searching, sorting, and sifting tasks commonly encountered in bioinformatics. When we're done, Perl's regexes will no longer seem like a mystery wrapped in an enigma wrapped in line-noise.

In the second half of this class, we'll look at the more advanced and powerful features of Perl regular expressions such as code embedding, user-defined assertions, regex recursion, and backtracking control. These high-end features are not covered in most Perl textbooks or classes, yet understanding and being able to apply them is essential when dealing with large, real world data sets. We'll work through several everyday, yet challenging, problems in bioinformatic information processing to illustrate how the Perl's regular expression mechanism can be tamed and harnessed to solve them. By the end of the course, the full (and surprising) power of Perl's regular expression will be at your command.

Course format

1-day or 2-day seminar

Who should attend

Perl programmers in bioinformatics-related fields who are familiar with the basics of Perl's control flow, string handling, and simple data structures (scalars, arrays, hashes).