Laboratory of Molecular Biology, Division of Basic Sciences, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA.
Two new sets of scoring matrices are introduced: H2 for the protein sequence comparison and T2 for the protein sequence-structure correlation. Each element of H2 or T2 measures the frequency with which a pair of amino acid types in one protein, k-residues apart in the sequence, is aligned with another pair of residues, of given amino acid types (for H2) or in given structural states (for T2), in other structurally homologous proteins. There are four types, corresponding to the k-values of 1 to 4, for both H2 and T2. These matrices were set up using a large number of structurally homologous protein pairs, with little sequence homology between the pair, that were recently generated using the structure comparison program SHEBA. The two scoring matrices were incorporated into the main body of the sequence alignment program SSEARCH in the FASTA package and tested in a fold recognition setting in which a set of 107 test sequences were aligned to each of a panel of 3,539 domains that represent all known protein structures. Six procedures were tested; the straight Smith-Waterman (SW) and FASTA procedures, which used the Blosum62 single residue type substitution matrix; BLAST and PSI-BLAST procedures, which also used the Blosum62 matrix; PASH, which used Blosum62 and H2 matrices; and PASSC, which used Blosum62, H2, and T2 matrices. All procedures gave similar results when the probe and target sequences had greater than 30% sequence identity. However, when the sequence identity was below 30%, a similar structure could be found for more sequences using PASSC than using any other procedure. PASH and PSI-BLAST gave the next best results.
Original Link : http://www.ncbi.nlm.nih.gov/pubmed/10975579