BLAST stands for Basic Local Alignment Search Tool. It is an algorithm and is used in Bioinformatics for for finding regions of between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of the matches. BLAST is used to infer functional and evolutionary relationships between sequences as well as helps to identify members of gene family.
This sequence analysis tool is based on the statistical theory developed by Samuel Karlin and Steve Altschul. This original theory was later extended to cover multiple weak matches between query and database entries. A group at the National Center For Biotechnology Information (NCBI), USA ,supports the BLAST server. This identifies sequences that have common blocks of local similarity. BLAST is a popular user friendly tool.
BLAST programs were designed for fast database searching, with minimal sacrifice of sensitivity to distant related sequences. It is used to find sequence homologs to predict the identity, function, and 3D structure of the query sequence. It shows better results for protein sequences than nucleotide sequence. The default database is the nr ( non-redundant) database, and the user still has the option to select any database of their choice from the list.
The program takes a query sequence and searches it against the database selected by the user. It aligns a query sequence against every subject sequence in the database. The results are reported in the form of a ranked list followed by a series of individual sequence alignments and various statistics and scores. Every list in that is assigned with similarity score S.
BLAST is extremely fast. the program can be runned locally or queries can be E-mailed to NCBI server. BLAST does not give guarantee to find the best alignment between your query and the database because its strategy is expected to find most matches and this way it sacrifies.
Types of BLAST program:
There are five different types of BLAST programs, which can be distinguished by the type of the query sequence (DNA or Protein) and the type of the subject database.
1) blastp: This program compares an amino acid query sequence against a protein database.
2) blastn: This program compares a nucleotide query sequence against a nucleotide sequence in the database.
3) balstx: This program searches the six frame translation products of a nucleotide sequence in the database.
4) blastn: This program searches a protein sequence against a translated nucleotide sequence in the database.
5) tblastx: This program compares the six frame translations of a nucleotide query sequence against a six frame translations of a nucleotide sequence database.
To calculate similarity between the sequences BLAST uses five different matrices namely PAM30, PAM70, BLOSUM80, BLOSUM62 and BLOSUM45. The PAM accepted point mutation matrix, was given by Dayhoff et.al, in 1978, so it is also called as Dayhoff mutation data matrix. An evolutionary distance of 1 PAM indicates the probability of a residue mutating during a distance in which 1 point mutation was accepted per 100 residues. Similarly 250 PAM matrix gives similarity scores eqivalent to 20% matches present between two sequences. In PAM matrix, position-to-position match compares one protein with another. The odds of each protein are multiplid to calculate a score for the entire alignment. to make it computationally equivalent, the logarithms are added.
The BLOSUM matrix (Blocks Substitution Matrix) removes contribution from identical residue pairs by clustering sequence segments on the basis of minimum percentage identity. The average contribution at each residue position is calculated so that the clusters are treated as single sequence.
Steps involved in using a BLAST tool for Protein sequence:
Step 1:
Go to the BLAST homepage ( www.ncbi.nlm.nih.gov/blast) at the NCBI site. Click on the link blastp (standard protein - protein BLAST)
Step 2:
Paste the unknown protein sequence in the data entry field.
Step 3:
There are several boxes below the sequence entry box. For the trial purpose leave all the fields default.
Step4:
The important field is a drop menu that allows the selection of the database to use. The nr ( non-redundant database) is the default setting. Depending on the choice we can select any database from the list. the nr database may consume some tie as it is the largest and the most comprehensive database for the BLAST to search.
Step 5:
The only other option that could be changed is the filtering. Uncheck the box, which is next to words 'Choose filter'. In order to get true positive hits the sequence should normally be filtered. the filter option ensures that no false positive results are obtained due to short sequences that are very common across biological spectrum.
Step 6:
Click on the " BLAST ! " button to run the BLAST search.
Step 7:
A new page will appear showing the ID number of the search and the appropriate wait time. Click on the " Format ! " button and wait for the results. The results will be returned when the search is complete.
Interpreting BLAST results:
The BLAST results page begins with the program version used, the reference for BLAST, the name and length of the query sequence, the database searched and the contact information. Next it shows the number of hits obtained on the query sequence and a graphical overview of the alignment of the hits to the query. A long red line near the top of the graphical overview represents the length of the query sequence. Each coloured line below the query sequence represents a hit obtained from the database. The multicoloured bar at the top of the graphical overview is a legend with the different colors representing different similarity score ranges.
The graphical overview is a very useful preliminary tool, which helps to determine if the query sequence is of terrestrial origin, or not. If the query has a hit that show up as long as similar regions, it is probable that the query is terrestrial. If the query has short hits, it is possible that the query is extra terrestrial.
The section below the graphical view ranks the hits by scores. For each hit obtained, it gives link by the accession number to the hit sequence in the database, the name of the hit sequence, the BLAST score and the Expect value or E-value. E-value gives idea of the evolutionary relationships.