Pageviews

Tuesday, June 14, 2011

mostly thinking out loud

this week has been kind of weird because i'm a little preoccupied with the talk i'm giving at the evolution meeting AND the fact that i'm a little ahead of schedule and needing to begin on the next big task: the file conversions. (i'm not actually sure i'm ahead of schedule - outputting the data to the terminal screen has been straightforward, so i'm moving on to file format conversions and reserving the right to go back and re-examine outputting individual/demographic data.)

since one of the main motivations for the program is to be able to generate files given a subset of data (i.e., a specific list of individuals), i've been thinking about how to accomplish this. i've decided that it'll be a set of scripts, the first of which generates a list of files that contain the input list of individuals and that script will call the relevant "converter scripts" to do the leg work in converting file formats on the list generated by the first script. phew.

i've written a "new" script, getFastaFromMDB.py that generates the file list from a list of individuals (given at the command line). it's basically the combined pieces of scripts i've been messing with lately. since Biopython has a fasta to nexus converter, it's really easy to do that conversion (and is already done). the next step for me is to work on the IMa2 converter script, which is probably not trivial.

on a side note, john mccormack, a post-doc in the brumfield lab, asked me if the program would be able to find loci that contain at least some number of individuals from each population but where the identity of the individuals is unknown/unimportant (i.e., *BEAST requires at least one individual per population for each locus). that is a really good idea and seems like something i can do and the program should do. the back of my brain is working on an algorithm for making this happen. in the mean time, i'm going to forge ahead with the IMa2 converter.

No comments:

Post a Comment