Perl for Bioinformatics

Biostat 140.636

1st term, 2009-2010
MWF 1:30-2:20 in W4013 (Lectures)
F 10:30-11:20 in W4007 (Lab)

Instructor: Fernando Pineda

Description

This course uses the perl programming language to introduce skills and concepts needed to process and interpret data from high-throughput technologies in the biological sciences. Lectures with live computer demonstrations and hands-on-laboratories will be used to introduce key concepts. These will be reinforced and extended with weekly readings and programming exercises. Exercises and examples will draw heavily from biological sequence analysis as well as real-world problems in proteomics, genetics and computational biology in general. Occasional guest lecturers will present case studies of how they use perl and unix to advance scientific investigations. Students will be introduced to the wealth of bioinformatics software-development resources available on the World Wide Web. Students will be introduced to necessary fundamentals in computer science including: (1) pattern matching, parsing and translation (2) data structures, algorithms and complexity, (3) programming style, e.g. top-down vs bottom-up programming and Object-oriented programming. Applied topics to be covered include: (1) Biological sequence analysis, (2) Perl as middleware (3) how to use unix and perl to manage and process high-throughput datasets and (4) automated interaction with local and remote biological databases and (5) High performance computing.

Resources

Lecture notes, Handouts, Homeworks, Sample code, Labs, etc.

People

Name Role Contact/Location Office Hours
Fernando Pineda
MMI
Instructor Tel: 443-287-3673
fernando . pineda @ jhu . edu
office: E3626
by appointment

Prerequisites

Permission of the instructor AND (a previous course in computer science OR computer programming experience). If you have never done any programming or used a command line on a unix workstation, then you will find this course to be very challenging and you may wish to register as an auditor instead of taking it for credit.

Homework and Grading Policy

Grades are based on four programming assignments and a final project. The programming assignments count for 60% of the grade, the final project counts for 40% of the grade. It is expected that each student will coordinate with the instructor to select a suitable project based on the student's interest. Homework problems are generally awarded 2-5 points each. Programming problems typically receive 1 point for turning in a well documented code, 2 points if it works correctly and 1 point for programming style. Documentation and programming style are necessarily somewhat subjective. Note: The assignments will depend on material in the readings as well as the lectures.

Homework is accepted electronically as html formatted documents! No home will be accepted via email or on paper. No late homework will be accepted. Once the due date and time has passed, it will be impossible to submit homework electronically. However,I am reasonably generous with partial credit, so I recommend that students submit partially completed work. For programming assignments students may discuss ideas and approaches with others. However, programs and projects are to be completed independently and must be original work. The first block of comments in your Perl code should contain, at a minimum, the following items.

  • Name of the program
  • Your name and the date
  • Assignment number
  • Usage instructions for the program

Each assignment must be formatted as an html document. Your web page should be able to stand alone as a description of each problem and it's solution. In other words, there must be a coherent description of the exercise, a copy of your code(s) and a copy of enough output to prove that your program works. It is not useful to hand-in 20 pages of output consisting of a single column of numbers per page. Actual working source code must exist in the same directory that you put your web page. The grader will run this code if he or she has any questions about whether your published code actually works. However, do not expect the grader to get your answers to the assignment by running your code. That is your job. You should be able to summarize the salient features, and show enough results that you can convince the grader that your program works (This is graduate school after all!). Refer to the notes associated with the first week of class for more details.

Each homework assignment must be saved in a different directory in you public_html directory. The urls for the homework assignments are standardized. In particular the first four assignment would have urls:

    http://bilbo.jhsph.edu/~userid/140.636.1/index.html
    http://bilbo.jhsph.edu/~userid/140.636.2/index.html
    http://bilbo.jhsph.edu/~userid/140.636.3/index.html
    http://bilbo.jhsph.edu/~userid/140.636.4/index.html 
where 'userid' is your 'userid'. The final project will have url
    http://bilbo.jhsph.edu/~userid/final_project/index.html 

If you have little experience writing software, you may be shocked at the amount of effort it takes to write something that works correctly. The first few assignments may be especially stressful. (The good news is that it gets easier with practice.) Start on the assignment as soon as you can, and take the opportunity to ask lots of questions of the instructor and your fellow students.

Final Project

You should have a meeting with the instructor to decide on a final project no later than four weeks before the end of the course. A written proposal (a paragraph or two describing the project) which is put on your final project web page is due three weeks before the end of the course. I find that the best projects are those that come from active research projects. So it is best to consult with your advisor, or a faculty member in your department for potential projects. A suitable project should be about as much work as two or three homework assignments. The final project is graded on how well is shows mastery of the subject matter taught in the course. For example a project that makes effective use of modules, data structures (e.g. references), regular expressions or databases, will get more points than a program that uses just rudimentary perl. Documentation and maintainability is also important. Note: A project that solves an interesting and useful problem will also get more points than a problem that is just a homework exercise (This is graduate school after all). Here are some example projects from previous years: FinalProjects

Laboratory

In previous years we used the Linux lab in the basement of the Hampton house. We are no longer using the Linux lab. Instead students must bring their laptops to the laboratory sessions. Students will use their laptops to directly log into the teaching server using a terminal application and/or an X11 server on their laptop. Instructions for configuring your laptop are here.

Student computer accounts

To get a userid and password, you need to fill out the questionnaire on the first day of class. If you did not get a questionnaire, it can be downloaded here.

Texts

Note: New editions of these books come out frequently. Although, I try to keep this up, the links below, may not be current. Make sure you get the latest edition!

Schedule and Syllabus

day date venue topic remarks
Fri 2009-08-28 W4007 Course Mechanics & Basic Lab skills LABORATORY: Set up laptops & user accounts
Fri 2009-08-28 W4013 Introduction to Linux LECTURE:
Mon 2009-08-31 W4013 Linux, Editing & HTML LABORATORY:
Wed 2009-09-02 W4013 Perl Basics LECTURE:
Fri 2009-09-04 W4007 Introduction, Biological sequences & codes LECTURE: 1st assignment due by midnight
Fri 2009-09-04 W4013 1st pass at data types and control structures LECTURE:
Mon 2009-09-07     Holiday
Wed 2009-09-09 W4013 Lists, arrays, hashes, and control structures LECTURE:
Fri 2009-09-11 W4007 Lists, arrays, hashes, and control structures LECTURE:
Fri 2009-09-11 W4013 Regular expressions LECTURE:
Mon 2009-09-14 W4013 Regular expressions LABORATORY:
Wed 2009-09-16   Scope, subroutines and functions LECTURE:
Fri 2009-09-18 W4007 References, Packages & Modules LECTURE:
Fri 2009-09-18 W4013 References, Packages & Modules LABORATORY:
Mon 2009-09-21 W4013 Object Oriented Programming LECTURE:
Wed 2009-09-23 W4013 Object Oriented Programming LECTURE:
Fri 2009-09-25 W4007 Data Structures LECTURE:
Fri 2009-09-25 W4013 cancelled CANCELLED
Mon 2009-09-28 W4013 Sequence Alignment & Dynamic Programming LECTURE:
Wed 2009-09-30 W4013 BLAST LABORATORY:
Fri 2009-10-02 W4007 Interacting with the System LECTURE:
Fri 2009-10-02 W4013 Bioperl LECTURE:
Mon 2009-10-05 W4013 Bioperl Laboratory:
Wed 2009-10-07 W4013 Introduction to Relational Databases LECTURE:
Fri 2009-10-09 W4007 SQL LECTURE:
Fri 2009-10-09 W4013 MySQL practicum LABORATORY:
Mon 2009-10-12 W4013 TBD  
Wed 2009-10-14 W4013 Agent-based simulation LECTURE:
Fri 2009-10-16 W4007 High Performance Computing LECTURE:
Fri 2009-10-16 W4013 High Performance Computing LABORATORY:
Mon 2009-10-19 W4013 Student presentations Final Projects due
Wed 2009-10-21 W4013 Student presentations Final Projects due

-- TWikiAdminUser - 2009-08-24



Create personal sidebar
 
This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback