GenBank records are entered in a loose format. While this allows for a great amount of freedom and facilitates the needs of many users it makes it difficult for computers to process the data.The IMAGE Consortium uses the sequences stored in GenBank for many purposes including our Imagene clustering software and our QC efforts. There are certain important pieces of data we try to determine from a GenBank record. While we have tried to be as robust as possible when determining criteria for parsing a GenBank record our software relies on certain assumptions, which will be explained here.
LOCUS AA099559 436 bp mRNA EST 28-OCT-1996
DEFINITION zl78a03.s1 Stratagene colon (#937204) Homo sapiens cDNA clone
IMAGE:510700 3' similar to gb:D11086 CYTOKINE RECEPTOR COMMON GAMMA
CHAIN PRECURSOR (HUMAN);, mRNA sequence.
ACCESSION AA099559
NID g1645633
VERSION AA099559.1 GI:1645633
KEYWORDS EST.
SOURCE human.
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia;
Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 436)
AUTHORS Hillier,L., Lennon,G., Becker,M., Bonaldo,M.F., Chiapelli,B.,
Chissoe,S., Dietrich,N., DuBuque,T., Favello,A., Gish,W.,
Hawkins,M., Hultman,M., Kucaba,T., Lacy,M., Le,M., Le,N.,
Mardis,E., Moore,B., Morris,M., Parsons,J., Prange,C., Rifkin,L.,
Rohlfing,T., Schellenberg,K., Soares,M.B., Tan,F., Thierry-Meg,J.,
Trevaskis,E., Underwood,K., Wohldmann,P., Waterston,R., Wilson,R.
and Marra,M.
TITLE Generation and analysis of 280,000 human expressed sequence tags
JOURNAL Genome Res. 6 (9), 807-828 (1996)
MEDLINE 97044478
COMMENT
Contact: Wilson RK
Washington University School of Medicine
4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108
Tel: 314 286 1800
Fax: 314 286 1810
Email: est@watson.wustl.edu
This clone is available royalty-free through LLNL ; contact the
IMAGE Consortium (info@image.llnl.gov) for further information.
Seq primer: -40M13 fwd. from Amersham
High quality sequence stop: 353.
FEATURES Location/Qualifiers
source 1..436
/organism="Homo sapiens"
/db_xref="GDB:3843195"
/db_xref="taxon:9606"
/clone="IMAGE:510700"
/clone_lib="Stratagene colon (#937204)"
/lab_host="SOLR cells (kanamycin resistant)"
/note="Organ: colon; Vector: pBluescript SK-; Site_1:
EcoRI; Site_2: XhoI; Cloned unidirectionally. Primer:
Oligo dT. T-84 colonic epithelial cell line. Average
insert size: 1.0 kb; Uni-ZAP XR Vector; ~5' adaptor
sequence: 5' GAATTCGGCACGAG 3' ~3' adaptor sequence: 5'
CTCGAGTTTTTTTTTTTTTTTTTT 3'"
BASE COUNT 118 a 72 c 153 g 91 t 2 others
ORIGIN
1 tttttttgat gattatcaac agaaacttta tttctcatcg gttcaggaac aatcggaggg
61 tagatggaaa gaggaaggga gggaaagagg gagggaggaa gaatcctgcg aaaaggaagg
121 gccagactga gggagaagaa aaacatgttc ggggcaaaag ggtaattctc aagtggggaa
181 tgccaaatga aggggtgctt acatgggggc acaaaattcc aaatcagcca cagtggggtg
241 aggtgagtat gagacgcagg tgggttgaat gaaggaaagt tagtaccact tagggctaca
301 ggaccctggg gttcttcttg tcagaggatt gggggttcag gtttcaggct ttagggtgta
361 acattggggg ggcccagtta ggggctattg ctggttngca tggngggggg ccccaggccc
421 cctcccccaa gggccc
//
This is where we determine our GenBank accession number, the record type and date to be "AA101995", "EST" and "28-OCT-1996" respectively. Currently The IMAGE Consortium only processes records of type "EST" and "PRI".
This is where we determine the orientation of a clone by searching for the first occurence of 5' or 3'. In this case the EST is the 5 prime end.
IMAGE also makes note of how much of the sequence is considered poor quality by searching this field for the phrase "High quality sequence stop: ###" as in the example below:
High quality sequence stop: 353.
/clone="IMAGE:510700"
In this one line we are able to determine our internal clone id and that this GenBank record describes an IMAGE clone. While we also try some secondary methods to determine this information this is the safest way to ensure that data regarding an IMAGE clone can be retrieved by IMAGE.
We also try to determine further data about a clone by searching the entire record for certain key phrases. The phrase “considered poor quality” marks a clone as low quality and the phrase “reversed clone” marks a clone as reversed.
[an error occurred while processing this directive]