Thoughtstream | Courses

Parsing Techniques for Bioinformatics in Perl

Parsing is the process of detecting and verifying the structure of incoming data and then processing that data so as to make it available within a program in convenient ways.
This full-day tutorial will introduce beginner and intermediate Perl programmers to the wide range of parsing mechanisms available in Perl and explain specific techniques for parsing data in a variety of commonly used formats. Most examples will be based on typical parsing problems encountered in Bioinformatics.
Topics covered include:
simple parsing with regexes

linear parsing with state machines

piece-wise parsing with extractors

structured parsing with grammars

processing comma-separated text

dealing with XML and other tagged formats

dealing with BLAST output and other heterogeneous structured formats

handling queries in synthetic and natural languages

extracting data structures from structured data

processing file inclusions

coping with incomplete, malformed, and ambiguous data

selecting and using appropriate parsing tools from the CPAN

integrating parsing and object oriented programming

data mining (parsing as a data recognition tool)

error detection and consistency checking (parsing as a data validation tool) structured I/O (parsing as a data acquisition tool)

recognition and extraction (parsing as a data search tool)

hierarchical data processing (parsing as a data transformation tool)

task specific languages (parsing as a command specification tool)

Course format

1-day or 2-day seminar

Who should attend

Perl programmers in bioinformatics-related fields who are familiar with simple regular expressions and the use of modules. The techniques presented are not restricted to the particular applications mentioned, and will be useful to anyone who needs to process structured bioinformatics data of any kind.