Perl for Bioinformatics
Biostat 140.636
1st term, 2009-2010
MWF 1:30-2:20 in W4013 (Lectures)
F 10:30-11:20 in W4007 (Lab)
Instructor:
Fernando Pineda
Description
This course uses the perl programming language to introduce skills and concepts needed to process and interpret data from high-throughput technologies in the biological sciences. Lectures with live computer demonstrations and hands-on-laboratories will be used to introduce key concepts. These will be reinforced and extended with weekly readings and programming exercises. Exercises and examples will draw heavily from biological sequence analysis as well as real-world problems in proteomics, genetics and computational biology in general. Occasional guest lecturers will present case studies of how they use perl and unix to advance scientific investigations. Students will be introduced to the wealth of bioinformatics software-development resources available on the World Wide Web. Students will be introduced to necessary fundamentals in computer science including: (1) pattern matching, parsing and translation (2) data structures, algorithms and complexity, (3) programming style, e.g. top-down vs bottom-up programming and Object-oriented programming. Applied topics to be covered include: (1) Biological sequence analysis, (2) Perl as middleware (3) how to use unix and perl to manage and process high-throughput datasets and (4) automated interaction with local and remote biological databases and (5) High performance computing.
Resources
Lecture notes, Handouts, Homeworks, Sample code, Labs, etc.
People
Prerequisites
Permission of the instructor AND (a previous course in computer science OR computer programming experience). If you have never done any programming or used a command line on a unix workstation, then you will find this course to be very challenging and you may wish to register as an auditor instead of taking it for credit.
Homework and Grading Policy
Grades are based on four programming assignments and a final project. The programming assignments count for 60% of the grade, the final project counts for 40% of the grade. It is expected that each student will coordinate with the instructor to select a suitable project based on the
student's interest. Homework problems are generally awarded 2-5 points each. Programming problems typically receive 1 point for turning in a well documented code, 2 points if it works correctly and 1 point for programming style. Documentation and programming style are necessarily somewhat subjective. Note: The assignments will depend on material in the readings as well as the lectures.
Homework is accepted electronically as html formatted documents! No home will be accepted via email or on paper. No late homework will be accepted. Once the due date and time has passed, it will be impossible to submit homework electronically. However,I am reasonably generous with partial credit, so I recommend that students submit partially completed work. For programming assignments students may discuss ideas and approaches with others. However, programs and projects are to be completed independently and must be original work. The first block of comments in your Perl code should contain, at a minimum, the following items.
- Name of the program
- Your name and the date
- Assignment number
- Usage instructions for the program
Each assignment must be formatted as an html document. Your web page should be able to stand alone as a description of each problem and it's solution. In other words, there must be a coherent description of the exercise, a copy of your code(s) and a copy of enough output to prove that your program works. It is not useful to hand-in 20 pages of output consisting of a single column of numbers per page. Actual working source code must exist in the same directory that you put your web page. The grader will run this code if he or she has any questions about whether your published code actually works. However, do not expect the grader to get your answers to the assignment by running your code. That is your job. You should be able to summarize the salient features, and show enough results that you can convince the grader that your program works (This is graduate school after all!). Refer to the notes associated with the first week of class for more details.
Each homework assignment must be saved in a different directory in you public_html directory. The urls for the homework assignments are standardized. In particular the first four assignment would have urls:
http://bilbo.jhsph.edu/~userid/140.636.1/index.html
http://bilbo.jhsph.edu/~userid/140.636.2/index.html
http://bilbo.jhsph.edu/~userid/140.636.3/index.html
http://bilbo.jhsph.edu/~userid/140.636.4/index.html
where 'userid' is your 'userid'. The final project will have url
http://bilbo.jhsph.edu/~userid/final_project/index.html
If you have little experience writing software, you may be shocked at the amount of effort it takes to write something that works correctly. The first few assignments may be especially stressful. (The good news is that it gets easier with practice.) Start on the assignment as soon as you can, and take the opportunity to ask lots of questions of the instructor and your fellow students.
Final Project
You should have a meeting with the instructor to decide on a final project no later than four weeks before the end of the course. A written proposal (a paragraph or two describing the project) which is put on your final project web page is due three weeks before the end of the course. I find that the best projects are those that come from active research projects. So it is best to consult with your advisor, or a faculty member in your department for potential projects. A suitable project should be about as much work as two or three homework assignments. The final project is graded on how well is shows mastery of the subject matter taught in the course. For example a project that makes effective use of modules, data structures (e.g. references), regular expressions or databases, will get more points than a program that uses just rudimentary perl. Documentation and maintainability is also important. Note: A project that solves an interesting and useful problem will also get more points than a problem that is just a homework exercise (This is graduate school after all). Here are some example projects from previous years:
FinalProjects
Laboratory
In previous years we used the Linux lab in the basement of the Hampton house. We are no longer using the Linux lab. Instead students must bring their laptops to the laboratory sessions. Students will use their laptops to directly log into the teaching server using a terminal application and/or an X11 server on their laptop. Instructions for configuring your laptop are
here.
Student computer accounts
To get a userid and password, you need to fill out the questionnaire on the first day of class. If you did not get a questionnaire, it can be downloaded
here.
Texts
Note: New editions of these books come out frequently. Although, I try to keep this up, the links below, may not be current. Make sure you get the latest edition!
- Of interest or recommended by other students
Schedule and Syllabus
--
TWikiAdminUser - 2009-08-24