BLAST本地化使用

Blast的应该算生信入门过程中使用频率最高的软件之一了，而且一些软件的原理也是基于序列比对的基础上的。NCBI提供web版序列比对，在KEGG上的KAAS也是提供比对功能然后查找KEGG id。数量不多的序列可以根据NCBI网页上即可进行比对，但是面对几千上万条序列，则需要本地BLAST进行比对。本地blast流程如下：

下载blast+，ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/，选择最新版本即可，比如2.6.0。然后进入后选择自己操作系统所对应的版本下载即可，安装流程比较简单。

然后我要需要一个fasta文件作为建库文件，我这里举例选择uniprot的swissprot全库，然后格式化数据库即可

 cd blast-2.6.0/bin     #将当前目录切换到blast/bin路径下
 makeblastdb -in uniprot.fasta -dbtype prot -parse_seqids -hash_index -out uniprot  #建数据库

参数说明：

 -in 需要建库的fasta序列
 -dbtype 如果是蛋白库则用prot，核酸库用nucl
 -out 所建数据库的名称
 -parse_seqids 和 -out uniprot 一般都加上，解释如下：
 -parse_seqids => Parse Seq-ids in FASTA input         #我也不懂这是干啥
 -hash_index => Create index of sequence hash values

然后将序列比对上数据库，这里以蛋白比对蛋白为例

 blastp.exe -query protein.fasta -db uniprot -evalue 1e-3 -out blast.xml -outfmt "5" -num_alignments 10 -num_threads 2

参数说明：

 -query 输入文件名，也就是需要比对的序列文件
 -db 格式化后的数据库名称
 -evalue 设定输出结果中的e-value阈值
 -out 输出文件名
 -num_alignments 输出比对上的序列的最大值条目数
 -num_threads 线程数
 此外还有：
 -num_descriptions 对比对上序列的描述信息，一般跟tabular格式连用
 -outfmt      
  0 = pairwise,
  1 = query-anchored showing identities,
  2 = query-anchored no identities,
  3 = flat query-anchored, show identities,
  4 = flat query-anchored, no identities,
  5 = XML Blast output,
  6 = tabular,
  7 = tabular with comment lines,
  8 = Text ASN.1,
  9 = Binary ASN.1
  10 = Comma-separated values

本文出自于http://www.bioinfo-scrounger.com转载请注明出处