alifmt − Aligned sequences formats
This document illustrates some common formats used for aligned sequences representation.
CLUSTAL
CLUSTAL W (1.82) multiple sequence alignment MALK_ECOLI MASVQLQNVTKAWGEVVVSKDINLDIHEGEFVVFVGPSGCGKSTL MALK_SALTY MASVQLRNVTKAWGDVVVSKDINLDIHDGEFVVFVGPSGCGKSTL MALK_ENTAE MASVQLRNVTKAWGDVVVSKDINLEIQDGEFVVFVGPSGCGKSTL MALK_PHOLU MSSVTLRNVSKAYGETIISKNINLEIQEGEF−−−−−−−−−−−−−− *:** *:**:**:*:.::**:***:*::*** MALK_ECOLI LRMIAGLETITSGDLACRRLHKEPGV MALK_SALTY LRMIAGLETITSGDLACRRLHQEPGV MALK_ENTAE LRMIAGLETVTSGDL−−−−−−−−−−− MALK_PHOLU LRM−−−−−−−−−−−−−−−−−−−−−−− ***
Warning: Names must not contain spaces or exceed 30 characters.
FASTA
>MALK_ECOLI MASVQLQNVTKAWGEVVVSKDINLDIHEGEFVVFVGPSGCGKSTLLRMIA GLETITSGDLACRRLHKEPGV >MALK_SALTY MASVQLRNVTKAWGDVVVSKDINLDIHDGEFVVFVGPSGCGKSTLLRMIA GLETITSGDLACRRLHQEPGV >MALK_ENTAE MASVQLRNVTKAWGDVVVSKDINLEIQDGEFVVFVGPSGCGKSTLLRMIA GLETVTSGDL−−−−−−−−−−− >MALK_PHOLU MSSVTLRNVSKAYGETIISKNINLEIQEGEF−−−−−−−−−−−−−−LRM−− −−−−−−−−−−−−−−−−−−−−−
MEGA
#mega !Title Multiple Sequence Alignment; #MALK_ECOLI MASVQLQNVTKAWGEVVVSKDINLDIHEGEFVVFVGPSGCGKSTLLRMIA GLETITSGDLACRRLHKEPGV #MALK_SALTY MASVQLRNVTKAWGDVVVSKDINLDIHDGEFVVFVGPSGCGKSTLLRMIA GLETITSGDLACRRLHQEPGV #MALK_ENTAE MASVQLRNVTKAWGDVVVSKDINLEIQDGEFVVFVGPSGCGKSTLLRMIA GLETVTSGDL−−−−−−−−−−− #MALK_PHOLU MSSVTLRNVSKAYGETIISKNINLEIQEGEF−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−−−−−
MSF |
!!AA_MULTIPLE_ALIGNMENT 1.0 |
PileUp of: @pep.list
msf.seq MSF: 55 Type: P Nov 22, 2001 11:02 Check: 2529 ..
Name: m73237 Len: 655 Check: 7493 Weight: 1.00
Name: l28824 Len: 655 Check: 5456 Weight: 1.00
Name: u04379 Len: 655 Check: 9580 Weight: 1.00
//
1 50
m73237 ~~~~~MADSA NHLPFFFGQI TREEAEDYLV QGGMSDGLYL LRQSRNYLGG
l28824 MASSGMADSA NHLPFFFGNI TREEAEDYLV QGGMSDGLYL LRQSRNYLGG
u04379 ~~~~~MPDPA AHLPFFYGSI SRAEAEEHLK LAGMADGLFL LRQCLRSLGG
51
m73237 ~~~~~
l28824 ~~~~~
u04379 AACG*
Warning: This format cannot handle more than 500 sequences in a single alignment.
NEXUS
#NEXUS begin data; dimensions ntax=2 nchar=80; format datatype=Protein interleave gap=− missing='.'; matrix [ 1 50] btdDm −−−−−−−AQQQQHHLHMQQAQHH−−−−−−−−−−−LHLSH−−−−−−QQAQQ TSp1 −−−−−−−−−−−−−−−−−−−−AEH−−−−−−−−−−−PSLR−−−−−−−−GTPL [ 51 80] btdDm YACPICSKKFSRSDHLSKHKKTHF−−−−−− TSp1 FACPICNKRFMRSDHLAKHVKTHN−−−−−− ; endblock;
Warning: Text enclosed in brackets is considered as comment, and thus ignored.
PHYLIP
Sequential ( PHYLIPS ):
2 109 ATISA1 GSPNLYQ−GGRKPWHSINFICAHDGFTLADLVTYNNKNNLANGEENNDG ENHNYSWNCGEEGDFASISVKRLRKRQMRNFFVSLMVSQGVPMIYMGDE YGHTKGGN−−− POTISA1 GSPNLYQKGGRKPWNSINFVCAHDGFTLADLVTYNNKHNLANGEDNKDG ENHNNSWNCGEEGEFASIFVKKLRKRQMRNFFLCLMVSQGVPMIYMGDE YGHTKGGN−−−
Interleaved ( PHYLIPI ):
2 109 ATISA1 GSPNLYQ−GGRKPWHSINFICAHDGFTLADLVTYNNKNNLANGEENND POTISA1 GSPNLYQKGGRKPWNSINFVCAHDGFTLADLVTYNNKHNLANGEDNKD GENHNYSWNCGEEGDFASISVKRLRKRQMRNFFVSLMVSQGVPMIYMG GENHNNSWNCGEEGEFASIFVKKLRKRQMRNFFLCLMVSQGVPMIYMG DEYGHTKGGN−−− DEYGHTKGGN−−−
Warning: Species names may not contain characters ‘( ) : ; , [ ]’ and exceed 10 characters.
STOCKHOLM
# STOCKHOLM 1.0 MALK_ECOLI MASVQLQNVTKAWGEVVVSKDINLDIHEGEFVVFVGPSGCGKSTLLRMIA MALK_SALTY MASVQLRNVTKAWGDVVVSKDINLDIHDGEFVVFVGPSGCGKSTLLRMIA MALK_ENTAE MASVQLRNVTKAWGDVVVSKDINLEIQDGEFVVFVGPSGCGKSTLLRMIA MALK_PHOLU MSSVTLRNVSKAYGETIISKNINLEIQEGEF−−−−−−−−−−−−−−−−−−− MALK_ECOLI RCHLFREDGTACRRLHKEPGV MALK_SALTY RCHLFREDGSACRRLHQEPGV MALK_ENTAE −−−−−−−−−−−−−−−−−−−−− MALK_PHOLU −−−−−−−−−−−−−−−−−−−−− //
Nicolas Joly (njoly AT pasteur DOT fr), Institut Pasteur.