BLAST  Project (60 pts) - due Oct. 15th

Assignments:  In addition to writing on exams and homework, there will be two special assignments - this BLAST assignment and a term paper.    

 

To earn an excellent grade on your report, you must:

·                     communicate clearly;   make it interesting and informative to the reader; 

·                     answer all listed questions, with your thinking clearly explained;

·                     turn the report in on time (late penalty - 3 pts/ first day + 2 pts/additional day; no credit after 10 days)

·                      format for your reports:

o         Place your name, UTeID, class, date, seq #, 20-mer seq., name of protein, complete FASTA seq. properly formatted, etc.  information on the cover page

o        use one and one-half spacing for the body of your text

o        use single spacing for figure captions and references

o        references: include at least 4 primary references (see below) in J Biol Chem style - place a number within the text (1) and then provide a list of references at the end, single spaced.  e.g., Borst, P., and Elferink, R. O. (2002) Annu. Rev. Biochem. 71, 537?592 .

o        make sure that Figures included have a figure legend and Tables have a title or heading 

              be neat, with few or no spelling or grammatical errors.

BLAST Assignment. You will be assigned a 20-amino acid sequence from the list below.  Each sequence is a fragment of an actual protein.  Use your sequence to do the following steps and then answer the questions below for your report:  NOTE:  See detailed illustration of instructions and sample results.

(a) Do a first BLAST run using your 20 a.a. sequence.  What is your protein? What is the complete amino acid sequence of the protein?  

(b) Now using the entire a.a. sequence, do a second BLAST run to find related sequences.  Carefully select a subset of the related sequences and do a Multiple Sequence Alignment using a program like CLUSTAL W.   Choose related proteins from at least five to eight other diverse organisms, if possible.  Using relatively diverse organisms (archaeal, bacterial, and eukaryotic) will make it easier to identify the most conserved amino acids within the protein.   Prepare a figure or table showing a sequence alignment of your protein and identify the conserved amino acids.  

(c) Determine the complete gene sequence for your protein or the cDNA sequence.  Analyze the sequence for restriction sites. 

(d) Design primers to express your protein as an N-terminal fusion product with a His tab, MBP or some other fusion product to improve solubility. Show primers to be used as each end, and briefly describe the logic and work behind your design to show that this might work.

(e)  Which amino acids (be specific, aa #) do you think are most important for preserving the structure of the protein?  Why?  Which amino acids do you think are most important for the function of the protein?  Why?   

(f) You should find that a structure is known for your protein.  What is the PDB code?

(g)  Using PyMOL (or a similiar program) make an original figure illustrating your protein.  There are links to tutorials on using PyMol on the links page (e,g, Tyler's PyMol How To demo slides)

(h)  In approximately one-half page, briefly describe the function of your protein.  

REPORT / Grading - 60 pts total:      For an example of reporting results - click here

1) Cover page - your name, the date and class, your sequence number and given 20 a.a. sequence, the name and source of your protein and the complete protein sequence in FASTA format -  5 pts

2)  Table showing your multiple sequence alignment using 5 selected BLAST hits - 15 pts

3) Original PyMOL or similar figure - 10 pts, and

4) DNA sequence, Restriction site analysis, Primers - explanation for how you arrived at your result - 20 pts

5) Half-page functional summary of your protein - 10 pts.  Tell a clear, story with a  minimum of typos / errors and include four (4) primary references to go with your report (see above).

Regarding plagiarism:  Please be aware that plagiarism is considered to be ethically unacceptable. Here is a one suggestion as to how to avoid plagiarism: While reading the sources of information that you plan to use in preparing your assignment, take notes on the content, using your own words.  While preparing your paper, refer to your notes, rather than the original source.  If you feel the need to use a phrase from a source, be sure to put the phrase in quotes, and reference the source.  In these days of the internet, it is very easy to find material that can be easily be plagiarized. Be warned, however, that the existence of the internet makes it much easier than ever before to detect plagiarism.  If you plagiarize in writing assignments, you run a high risk of getting caught, and a high risk of failing the course.

*********************************************************

Select your 20 amino acid sequence for the first writing assignment by the day of your birth from the list below  (e.g. if you were born on September  11, 2002 - your sequence is QNRSYSKLLC GLLAERLRIS

             Sequences for CH370    -  BLAST Writing Assignment  (Step by step instructions  /  Sample of results)

 

Every sequence should have a perfect match with a real protein, but that protein may not have a structure that has been determined.  However, if that is the case you should be able to find a closely related protein that does have a structure available.     

 

1) GFEIVRPGHP LVPKRPDACF            17)GQGLQEGERD FGVKARSILC

2) HQLKNNPSSR RHITMLWNPD            18)LDTMVAALSC CQEAYGVSVI

3) LGCYTESGQA IPVSFNGVKG            19)LGKKNRSLNG EKVDQVDYLL

4) LLDIGGGFPG SEDTKLKFEE            20)NLSRLQEAGE LLRTEINRSV

5) PLLKFDLFYG RTDAQIKSLL            21)QNFYEKIYNA LKPNGYCVAQ

6) AIPSEYMINK MKDEDLLVPL            22)AASPLEKVCL VGCGFSTGYG

7) ATATMGYKHK ALDANEAKDQ            23)AVSDARCVFD MATEVGFSMH

8) CFYKLLTGAL ERDCGISPDD            24)EEQLRADHVF ICFPKNREDR

9) FAYHEMMTHV PMTVSKEPKN            25)FDSNEAMYTK IKQGGTTYDI

10)FRGFGGPQAL FIAENWMSEV            26)FTEEEFKRLN INAAKSSFLP

11)QNRSYSKLLC GLLAERLRIS            27)REAEADASEG ADMLMVKPGL

12)RTFAKNGGCC GGNGNNPNCC            28)SADYVMVSAS LGVLQSDLIQ

13)SSMGHDGVVD EQSGKIVNDL            29)TEGVYKVSWT EPTGTDVSLN

14)VEYMEEEKKT WGTVFKTLKS            30)VGKKREFVER LTSVAAEIYG

15)VIFGNRQADR SPCGTGTSAK            31)VIVEETPPER WFVGGRSVAE

16)YDNGWNYEIY VKNDNTIDYR