With the new genomic data bases of model species, such as Esherichia coli, Saccharomyces cerevisae, mouse, and human, the sequences of many proteins of biological interest will in principle be known, and the problem of characterizing a protein primary structure will be reduced to identifying it in the data base. 
   Within the past few years several research groups have demonstrated how MS can be used for identification of proteins in sequence data bases. One approach is to cleave the protein with a sequence-specific proteolytic enzyme, measure molecular weight values for the resulting peptide mixture by mass spectrometry, and search a sequence data base for proteins that should yield these values. Search algorithms have also been implemented recently that utilize low resolution tandem mass spectra of selected peptides (<3 kDa) from the protein degradation. Yates and coworkers compared the MS/MS sequence data to the sequences predicted for each of the peptides that would be generated from each protein in the data base.  In the PEPTIDESEARCH sequence tag approach of Mann and Wilm, a partial sequence of 2–3 amino acids is assigned from the fragment mass differences in the MS/MS spectrum. This partial sequence and its mass distance from each end of the peptide (based on the masses of the fragment and molecular ions) are used for the data base search.
   For three studied proteins, a single sequence tag retrieved only the correct protein from the data base; a fourth protein required the input of two sequence tags.