SPAdes

SPAdes 3.10.1 Manual

MetaVelvet

NeSSM

MetaSim

Omega

IDBA-UD

Ray

 

======================================================================================

WebMGA | Mothur | GLIMMER (Web) | Glimmer

Silva: Comprehensive Ribosomal RNADatabases |

greengenes.lbl.gov - Aligned 16S rDNAdata and …

RDP Release 11 -- Sequence Analysis Tools

rrnDB - the Ribosomal RNA Operon Copy

视频 (Mothur): 下载安装及使用mothur软件做metagenomics方面的分析 ...

Mothur 命令手册-Mothur命令中文解释(一)

Mothur 命令手册-Mothur命令中文解释(二)

Welcome to the mothur wiki

Mothur 的使用 

tRNAscan-SE 的使用

checkM

MEGAN5 - MEtaGenome ANalyzer

Prodigal使用教程

clustering

======================

cd-hit-est: fast DNA clustering

CD-HIT User Gulde

cd-hit-est (http://weizhong-lab.ucsd.edu/cd-hit/) is a very widely used program for clustering and comparing large sets of DNA sequences. cd-hit-est is very fast and can handle extremely large databases. cd-hit helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.

1. "Clustering of highly homologous sequences to reduce the size of large protein database", Weizhong Li, Lukasz Jaroszewski and Adam Godzik Bioinformatics (2001) 17:282-283.
2. "Tolerating some redundancy significantly speeds up clustering of large protein databases", Weizhong Li, Lukasz Jaroszewski and Adam Godzik Bioinformatics (2002) 18:77-82.
3. "Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences", Weizhong Li and Adam Godzik Bioinformatics (2006) 22:1658-1659.
4. "CD-HIT Suite: a web server for clustering and comparing biological sequences", Ying Huang, Beifang Niu, Ying Gao, Limin Fu and Weizhong Li Bioinformatics (2010) 26:680-682.

----------------------------------------------

cd-hit:fast protein clustering

cd-hit (http://weizhong-lab.ucsd.edu/cd-hit/) is a very widely used program for clustering and comparing large sets of protein sequences. cd-hit is very fast and can handle extremely large databases. cd-hit helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.

----------------------------------------------------------------

h-cd-hit: fast hierarchical protein clustering

cd-hit (http://weizhong-lab.ucsd.edu/cd-hit/) is a very widely used program for clustering and comparing large sets of protein sequences. cd-hit is very fast and can handle extremely large databases. cd-hit helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.

In this program, you can create a non-redudant protein database hierarchically in two steps by using two sets of parameters. First, we cluster using sequence identity cutoff=0.9. Based the clustering results of the first step, we cluster again using sequence identity cutoff=0.6 for the second step. The final goal is to generate a non-redundant protein sequences (60% sequence identity) for downstream analysis.

=========================

rRNA prediction

===========================

blastn_rRNA:rRNA prediction by blastn program

This program predicts rRNA by using BLASTN to identify DNA reads containing rRNA sequences.

1. "Basic Local Alignment Search Tool", S. F. Altschul, et al. Journal of Molecular Biology (1990) 215(3):403-410.
2. "5S Ribosomal RNA database", M. Szymanski. et al. Nucleic Acids Res. (2002) 30: 176-178.
3. "The European ribosomal RNA database", J. Wuyts et al. Nucleic Acids Res. (2004) 32: D101-D103.

-----------------------------------------------------------------------------

hmm_rRNA: rRNA prediction by hmmer 3.0 program

This program predicts rRNA by using HMMER 3.0 to identify DNA reads containing rRNA sequences.

1."Profile hidden Markov models", S. R. Eddy Bioinformatics (1998) 14(9):755-763.
2."Identification of ribosomal RNA genes in metagenomic fragments", Y. Huang, P. Gilna and W. Li Bioinformatics (2009) 25: 1338-1340.
3. "5S Ribosomal RNA database", M. Szymanski. et al. Nucleic Acids Res. (2002) 30: 176-178.
4. "The European ribosomal RNA database", J. Wuyts et al. Nucleic Acids Res. (2004) 32: D101-D103.

=======================

tRNA prediction

===========================

tRNA: tRNA prediction by tRNAscan-SE program

This program predicts tRNA by using program tRNAscan-SE to identify DNA reads containing tRNA sequences.

1. "tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence", T. M. Lowe and S. R. Eddy Nucleic Acids Research (1997) 25(5):955-964.

========================

orf prediction

========================

orf_finder: orf prediction by six-reading-frame technique

This program predicts ORF by six-reading-frame technique.

metagene: This program predicts ORF by metagene program.

MetaGeneAnnotator

1. "MetaGene: prokaryotic gene finding from environmental genome shotgun sequence", H. Noguchi, J. Park and T. Takagi Nucleic Acids Research (2006) 34(19):5623-5630.

fraggene_scan: orf prediction by fraggene_scan program

This program predicts ORF by Fraggenescan program.

1. "FragGeneScan: predicting genes in short and error-prone reads", M. Rho, H. Tang and Y. Ye Nucleic Acids Research (2010) 38(20).

=======================

function annotation

==========================

cog: protein function annotation by COG database

This program performs function annotation by using RPSBLAST program on COG database (prokaryotic proteins).

1. "Basic Local Alignment Search Tool", S. F. Altschul, et al. Journal of Molecular Biology (1990) 215(3):403-410.

pfam:protein function annotation by pfam database

This program performs function annotation by using HMMER 3.0 program on PFAM database.

1."Profile hidden Markov models", S. R. Eddy Bioinformatics (1998) 14(9):755-763.
2. "The Pfam protein families database", R. D. Finn, et al. Nucleic Acids Rese arch (2010) 38: D211-D222.

tigrfam: protein function annotation by tigrfam database

This program performs function annotation by using HMMER 3.0 program on TIGRFAM database.

1."Profile hidden Markov models", S. R. Eddy Bioinformatics (1998) 14(9):755-763.
2. "The TIGRFAMs database of protein families", D. H., Haft et al. Nucleic Acids Research (2010) 38: D211-D222.

===============================================

pathway annotation

==============================

kegg: pathway annoation by KEGG database

This program uses BLAST to search protein sequences against KEGG protein database. The KEGG number and its pathway/functions will be outputted.

1. "Basic Local Alignment Search Tool", S. F. Altschul, et al. Journal of Molecular Biology (1990) 215(3):403-410.
2. "Kyoto Encyclopedia of Genes and Genomes", H. Ogata, et al. Nucleic Acids Research (1999) 27(1):29-34.

==============================================

taxonomy binning

rdp_binning:taxonomic binning by rdp classifier program

1. "Nave Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy", Q. Wang, G. M. Garrity, J. M. Tiedje, and J. R. Cole Appl Environ Microbiol (2007) 73(16):5261-5267.

frhit_binning: taxonomic binning by frhit program

1. "FR-HIT, a Very Fast Program to Recruit Metagenomic Reads to Homologous Reference Genomes", B. Niu, Z. Zhu, L. Fu, S. Wu, W. Li, Bioinformatics (2011).

=============================

OTU finder

================================

cd-hit-otu:OTU finder by cd-hit-otu program

This program performs Operantional Taxonomic Units (OTUs) finding. It uses a three-step clustering for identifying OTUs. The first-step clustering is raw read filtering and trimming. The second step is error-free reads picking.. At the last step, we do OTU clustering at different distanct cutoffs (0.01, 0.02, 0.03... 0.12).

Please consult CD-HIT-OTU web site for detailed description of CD-HIT-OTU.

The whole CD-HIT-OTU program was zipped into one file and can be downloaded here

------------------------

1. "Ultrafast Clustering Algorithms for Metagenomic Sequence Analysis", W. Li, L. Fu, B. Niu, S. Wu & J. Wooley Briefings in Bioinformatics, (2012) 13 (6):656-668. doi: 10.1093/bib/bbs035
2. "WebMGA: a Customizable Web Server for Fast Metagenomic Sequence Analysis", S. Wu, Z. Zhu, L. Fu, B. Niu & W. Li BMC Genomics 2011, 12:444. PDF Pubmed Citations

 

 

 

宏基因组数据库

EMP: Earth Microbiome Project

GOS: Global Ocean Sampling Expedition

CoML: Census of Marine Life

IMG: http://img.jgi.doe.gov/

MG-RAST:http://metagenomics.anl.gov

----------

说明:

可读框(open reading frame, ORF)是以起始密码子开始,在三联体读框的倍数后出现终止密码子之间的一段序列。可读框有可能编码一条多肽链或一种蛋白质。当没有已知蛋白质产物时,该区域被称为可读框,而当确知该可读框编码某一蛋白时,它就被称为编码区,即一个可读框是潜在的编码区。很多情况下,可读框即指某个基因的编码序列。

可读框是基因序列的一部分,包含一段可以编码蛋白的碱基序列,不能被终止子打断。当一个新基因被识别,其DNA序列被解读,人们仍旧无法搞清相应的蛋白序列是什么。这是因为在没有其它信息的前提下,DNA序列可以按六种框架阅读和翻译(每条链三种,对应三种不同的起始密码子)。可读框识别包括检测这六个阅 读框架并决定哪一个包含以启动子和终止子为界限的DNA序列而其内部不包含启动子或密码子,符合这些条件的序列有可能对应一个真正的单一的基因产物。可读 框的识别是证明一个新的DNA序列为特定的蛋白质编码基因的部分或全部的先决条件。

================

Installation notes for tRNAscan-SE version 1.3.1

 

  1. $ cd /programinstallers/

  2. $ wget -N http://lowelab.ucsc.edu/software/tRNAscan-SE.tar.gz

  3. $ tar -zxvf tRNAscan-SE.tar.gz

  4. $ cd tRNAscan-SE-1.3.1

  5. Edit the top of the Makefile.
    $ nano Makefile
    Set the paths and other make variables to suit your system.
    In particular, you need to specify:

    1. where executables are to be installed

    2. where data files are to be installed

    3. where Perl is already installed on the system

    4. what the Perl binary is called (i.e. 'perl' or 'perl5')

    5. where temporary files will reside

    6. where to install man pages

    PERLDIR = /usr/bin
    PERLBIN = perl
    BINDIR  = $(HOME)/bin
    LIBDIR  = $(HOME)/lib/tRNAscan-SE
    MANDIR  = $(HOME)/man
    TEMPDIR = /tmp
    becomes
    PERLDIR = /usr/bin
    PERLBIN = perl
    BINDIR  = /usr/local/bin
    LIBDIR  = /usr/local/lib/tRNAscan-SE
    MANDIR  = /usr/local/share/man
    TEMPDIR = /tmp

  6. $ make

  7. $ sudo make install
    .
    .
    .
    cp trnascan-1.4 covels-SE coves-SE eufindtRNA tRNAscan-SE /usr/local/bin/.
    cp -R tRNAscanSE /usr/local/bin/
    cp TPCsignal Dsignal *.cm gcode.* /usr/local/lib/tRNAscan-SE/.
    cp tRNAscan-SE.man /usr/local/share/man/man1/tRNAscan-SE.1

  8. Cleanup
    $ cd ..

  9. $ rm tRNAscan-SE-1.3.1 -rf

  10. To correct the error message
    Can't locate tRNAscanSE/Utils.pm in @INC (@INC contains:...
    I patched /usr/local/bin/tRNAscan-SE with the following additional line
    use lib "/usr/local/bin"; # otherwise can't find modules listed below
    inserted at line 28

  11. I also needed to do this (permissions were 754 previously)
    $ sudo chmod 755 /usr/local/bin/tRNAscanSE

  12. And for cgi compatibility, found three system() calls to rm in /usr/local/bin/tRNAscan-SE, changed to use the full path for each of these /bin/rm

=============

HMMER (有二进制版本)

RNAmmer本地化 |

RNAmmer 1.2 (邮箱申请,无需安装)

Download MetaGeneAnnotator

JGI

ChecM

HMMER

Prodigal

pplacer