Parsing is the process of detecting and verifying the structure of incoming data and then processing that data so as to make it available to a program in convenient ways.
This full-day tutorial will introduce beginner and intermediate programmers to the powerful and efficient parsing mechanisms built into Perl 6, and will explore specific techniques for parsing data in a variety of commonly used formats. Most examples will be based on typical parsing problems encountered in Bioinformatics.
Topics covered include:
- simple parsing with regexes
- structured parsing with grammars
- processing comma-separated text
- dealing with XML and other tagged formats
- decoding heterogeneous structured formats such as FASTA, Swiss-Prot, GenBank, and BLAST reports
- handling queries in synthetic and natural languages
- extracting data structures from structured data
- processing file inclusions
- coping with incomplete, malformed, and ambiguous data
- selecting and using appropriate parsing tools from the CPAN
- integrating parsing and object oriented programming
- data mining (parsing as a data recognition tool)
- error detection and consistency checking (parsing as a data validation tool)
- structured I/O (parsing as a data acquisition tool)
- recognition and extraction (parsing as a data search tool)
- hierarchical data processing (parsing as a data transformation tool)
- task specific languages (parsing as a command specification tool)
Programmers in bioinformatics-related fields who are familiar with simple regular expressions. The techniques presented are not restricted to the particular applications mentioned, and will be useful to anyone who needs to process or transform structured bioinformatics data of any kind.