BLAST  Project (40 pts)

Assignments:  In addition to writing on exams and homework, there will be two special assignments.    

Assignment 1. You will be assigned a 20-amino acid sequence from the list below.  Each sequence is a fragment of an actual protein.  Use your sequence to do the following steps and then answer the questions below for your report:  

(a) Do a first BLAST run using your 20 a.a. sequence.  What is your protein? What is the complete amino acid sequence of the protein?  

(b) Now using the entire a.a. sequence, do a second BLAST run to find related sequences.  Carefully select a subset of the related sequences and do a Multiple Sequence Alignment using a program like CLUSTAL W.   Choose related proteins from at least five to eight other diverse organisms, if possible.  Using relatively diverse organisms (archaeal, bacterial, and eukaryotic) will make it easier to identify the most conserved amino acids within the protein.   Prepare a figure or table showing a sequence alignment of your protein and identify the conserved amino acids.  

(c)  Which amino acids (be specific, aa #) do you think are most important for preserving the structure of the protein?  Why?  Which amino acids do you think are most important for the function of the protein?  Why?   

(d) You should find that a structure is known for your protein.  What is the PDB code?

(e)  Using PyMOL (or a similiar program) make an original figure illustrating your protein.  

(f)  In approximately one-half page, briefly describe the function of your protein.  

REPORT / Grading - 40 pts total

1) your name, the date and class, your sequence number and given 20 a.a. sequence, 

2) the name and source of your protein and the complete protein sequence in FASTA format -  5 pts

3)  table showing your multiple sequence alignment using 5 selected BLAST hits - 15 pts

4) PyMOL or similar figure - 10 pts, and

5) a half-page functional summary of your protein - 10 pts.  Tell a clear, story with a  minimum of typos / errors and include two primary references to go with your report.

 

To earn an excellent grade on your reports, you must:

Regarding plagiarism:  Please be aware that plagiarism is considered to be ethically unacceptable. Here is a one suggestion as to how to avoid plagiarism: While reading the sources of information that you plan to use in preparing your assignment, take notes on the content, using your own words.  While preparing your paper, refer to your notes, rather than the original source.  If you feel the need to use a phrase from a source, be sure to put the phrase in quotes, and reference the source.  In these days of the internet, it is very easy to find material that can be easily be plagiarized. Be warned, however, that the existence of the internet makes it much easier than ever before to detect plagiarism.  If you plagiarize in writing assignments, you run a high risk of getting caught, and a high risk of failing the course.

***********************************************************

Select your 20 amino acid sequence for the first writing assignment by the day of your birth from the list below  (e.g. if you were born on September . 29, 1989 - your sequence is RLGADMEDVC GRLVQYRGEV)

             Sequences for CH370  2009  -  BLAST Writing Assignment

 

1)  RYLCKKQGTK KTLQKIGAAC         2)  YKSAIPQEEV KAMAAFCEKK

3)  VELGANWVEG VNGGKMNPIW         4)  IQWEERNVAA IQGPGGKWMI

5)  LEHSPGVVEQ IIRPTGLLDP         6)  TDDLEKVCNE VVDTCLYKGS

7)  FAMLSLGTKA DTHDEILEGL         8)  SADYVMVSAS LGVLQSDLIQ   

9)  HLKEGLVAII DLAVDRLHCS         10) VVVEGVHLCM MMRGVEKQHS

11) HQLKNNPSSR RHITMLWNPD         12) VNFKIRHNIE DGSVQLADHY  

13) VEYMEEEKKT WGTVFKTLKS         14) KGGKHKTGPN LHGLFGRKTG   

15) DTIENVKAKI QDKEGIPPDQ         16) SYMGFEVVRP DHPALPPLDN   

17) LGGYAPGNAY LDGLAQQRRS         18) TFKQECKIKY SERVYDACMD

19) QNNFVHDCVN ITVKEHTVTT         20) VGKKREFVER LTSVAAEIYG

21) MAELLTGRTL FPGTDHIDGL         22) GGNGIVGAQV PLGAGIALAC   

23) TSSFVPLELR VTAASGAPRY         24) PSGNIGANGV AIFESVHGTA   

25) RRSASLHLPK LSITGTYDLK         26) VRRMELKADQ LYKQKIIRGF

27) TQGQIYGIKV DIRDAYGNVK         28) QNRSYSKLLC GLLAERLRIS    

29) RLGADMEDVC GRLVQYRGEV         30) RDNMSVILIC FPNAPKVSPE  

31) LGKKNRSLNG EKVDQVDYLL 

 

Every sequence should have a perfect match with a real protein, but that protein may not have a structure that has been determined.  However, if that is the case you should be able to find a closely related protein that does have a structure available.