seqfmt − Sequences formats
This document illustrates some common formats used for sequences representation.
EMBL
ID MMVASPHOS standard; RNA; EST; 140 BP. AC X97897; DE M.musculus mRNA for protein homologous to DE vasodilator−stimulated phosphoprotein SQ Sequence 140 BP; 25 A; 58 C; 39 G; 17 T; 1 other; ttctcccaga agctgactct atggngaccc cgagagagac tgagcagaac 60 ccccgcaccc ctgcacttcc aatcaggggc gccccgggag cactccccgt 120 ccgccctccg cgcagccatg 140 //
FASTA
>MMVASPHOS ttctcccagaagctgactctatggngaccccgagagagactgagcagaacctggagccag ccccgcacccctgcacttccaatcaggggcgccccgggagcactccccgtggcgcgccgc ccgccctccgcgcagccatg
GCG |
!!NA_SEQUENCE 1.0 |
(No documentation)
dna1.txt Length: 88 Nov 22, 2001 14:38 Type: N Check: 3818 ..
1 TAGTCGTAGT CGGAGCGATG CTGACGATGA CGATGACGAT CGTAGCTGAT
51 CGATCGAGCT GATGCTGATC GAGCTAGCTG ATCGATCG
GDE |
#sample1 |
TTCAAGAGAAACAGCGGCCAAGGAAAAGACTCGGCATGATTGTCCATAGCTTACAAAGCG
#sample2
TTCAAGAGAAACAGCGGCTGGGGGAAAGACTCGTCCTGATTGCCTGTAGATGGTAAAGCG
GENBANK
LOCUS HUMHBV1 130 bp DNA PRI 17−JUN−1993 DEFINITION Human DNA/endogenous Hepatitis B virus (HBV) DNA, left host viral junction. ACCESSION M15770 BASE COUNT 32 a 43 c 29 g 26 t ORIGIN 1 agcgggcagt gcagctgctt ggacagcagg ggtgtttctt caacccaggc 61 ctcctgtcac aacaggccca ttcaattctg aacctgcaag ccaactccaa 121 cctcttttcc cagggggaac caaaaaccct //
IG |
; comment |
U03518
AACCTGCGGAAGGATCATTACCGAGTGCGGGTCCTTTGGGCCCAACCTCCCATCCGTGTC
TATTGTACCCTGTTGCTTCGGCGGGCCCGCCGCTTGTCGGCCGCCGGGGGGGCGCCTCTG
TGAGTTGATTGAATGCAATCAGTTAAAACTTTCAACAATGGATCTCTTGGTTCCGGC1
NBRF (pir)
>P1;CCHU cytochrome c [validated] − human MGDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIW GEDTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYLKKATNE*
CODATA
ENTRY CCHU #type complete TITLE cytochrome c [validated] − human ACCESSIONS A31764; A05676; I55192; A00001 SUMMARY #length 105 #molecular−weight 11749 #checksum 3247 SEQUENCE 5 10 15 20 25 30 1 M G D V E K G K K I F I M K C S Q C H T V E K G G K H K T G 31 P N L H G L F G R K T G Q A P G Y S Y T A A N K N K G I I W 61 G E D T L M E Y L E N P K K Y I P G T K M I F V G I K K K E 91 E R A D L I A Y L K K A T N E ///
RAW |
ttctcccagaagctgactctatggngaccccgagagagactgagcagaacctggagccag |
ccccgcacccctgcacttccaatcaggggcgccccgggagcactccccgtggcgcgccgc
ccgccctccgcgcagccatg
Warning: This format cannot handle more than one sequence per file.
SWISSPROT
ID 100K_RAT STANDARD; PRT; 149 AA. AC Q62671; DE 100 kDa protein (EC 6.3.2.−). SQ SEQUENCE 149 AA; 17004 MW; D06484B8BC29112E CRC64; MMSARGDFLN YALSLMRSHN DEHSDVLPVL DVCSLKHVAY VFQALIYWIK PQLERKRTRE LLELGIDNED SEHENDDDTS QSATLNDKDD ESLPAETGQN SITIRPPDDQ HLPTANTCIS RLYVPLYSSK QILKQKLLLA IKTKNFGFV //
Nicolas Joly (njoly AT pasteur DOT fr), Institut Pasteur.