Pageviews

Monday, June 6, 2011

demographic data done! (for now, anyway)

this morning i was able to very quickly import tab-delimited data into mongoDB (i'm learning things!). i decided to put the demographic data into it's own collection, so my database now has two collections: db.loci and db.demographic. an input of (tab-delimited) demographic data like this:

Individual Species Location Latitude Longitude Population
J01 Junco hyemalis NoPlace, TX 45.678 -109.876 POP1
J02 Junco hyemalis NoPlace, TX 45.678 -109.876 POP1
J11 Junco phaeonotus NoPlace, MS 49.800 -110.424 POP2
J12 Junco phaeonotus NoPlace, MS 49.800 -110.424 POP2

looks like this in the db.demographic collection: 

{ "_id" : ObjectId("4ded0248318a12036f000000"), "Longitude" : "-109.876", "Individual" : "J01", "Location" : "NoPlace, TX", "Latitude" : "45.678", "Species" : "Junco hyemalis", "Population" : "POP1" }
{ "_id" : ObjectId("4ded0248318a12036f000001"), "Longitude" : "-109.876", "Individual" : "J02", "Location" : "NoPlace, TX", "Latitude" : "45.678", "Species" : "Junco hyemalis", "Population" : "POP1" }
{ "_id" : ObjectId("4ded0248318a12036f00000a"), "Longitude" : "-110.424", "Individual" : "J11", "Location" : "NoPlace, MS", "Latitude" : "49.800", "Species" : "Junco phaeonotus", "Population" : "POP2" }
{ "_id" : ObjectId("4ded0248318a12036f00000b"), "Longitude" : "-110.424", "Individual" : "J12", "Location" : "NoPlace, MS", "Latitude" : "49.800", "Species" : "Junco phaeonotus", "Population" : "POP2" }

hooray! (and the code is only 30 lines!)

now i'm trying to figure out the most intelligent way to proceed. my project plan has this week allotted for the demographic input task (textDataIndexMDB.py), quality data input and retrieving SAM/BAM sequence files. i'm rethinking whether quality data is worth putting into the database (since all reads used to call a locus should already be high quality). the retrieval of SAM/BAM data should just require saving the path to their directory in the loci collection (much like the retrieval of sequence data from locus fasta files). i'm worried i haven't allotted enough time for the data formatting and GUI construction - the latter especially so since i've never done it before! i suppose i'll try to get the official tasks done by tomorrow, then update the project plan to reflect an unanticipated half week bonus. that is, of course, assuming all goes according to plan..

No comments:

Post a Comment