Friday, July 29, 2011

the final stretch

the good news is the gui is more or less fully functional now. the bad news is that i think it's really specific to input format and i'm not sure what to do about that. my intention was to make a robust but simple way of viewing a large multi-locus dataset...the back parts of my brain are mulling over how to remove some of the most restrictive input requirements (like numerical loci) so that more people can use this program.
but back to the good news. import menu, export menu, summary screen, locus screen, output formats, raw data output - all up and running. it's not the prettiest thing you've ever seen but it is fast and pretty straightforward.

the welcome screen with directions

the import menu

the summary screen displayed when 'Display the data' is pressed in the File menu

when one of the 'numLoci' buttons is pressed on the summary screen, a second screen with info about all the loci that a particular individual has is displayed.

when the 'Coverage_This_Ind' or 'Coverage_Total' buttons are pressed, a fasta file is created that contains all the original reads pertaining to a locus, from either a single individual or all individuals that have that locus (respectively). to export loci in the 3 different formats, first a set of the data must be selected. users may select individuals:

or users may select populations:

then users select the output formats:

all loci that are output either contain at least all individuals selected or at least one indivdiual from each population selected.

so that's where i'm at now. this weekend = continue thinking about how to make the program more robust and beginning the process of (fingers crossed) getting this into a format readable/displayable by the Galaxy Project. man this week went quick! 

Thursday, July 28, 2011

export menu - done!

it has been a pretty productive start of the week. the export menu is complete - users select a subset of populations or individuals, then the output format(s) they would like, and the program searches for all the loci that contain the given subset, formats those loci and outputs them accordingly. wahoo! the new code is on GitHub.

i realized i'm not very good at reusing code (because i haven't fully grasped the OOP mentality and the finer points of python/tkinter are still rather difficult for me) but i think i'll be able to clean up the code as i continue to learn more, without losing functionality. i'm pleased that everything is more or less working well right now. the two remaining goals for this week/weekend are the get the raw data output associated with a button and to make the categories shown on the summary screen read from the input file instead of hard set by my code. i think it's doable, but we'll see...

Monday, July 25, 2011

finally - scrollbars!

the title says all i really want to say - the two main display screens for the GUI are now equipped with scrollbars! next up, make the "Coverage_This_Ind" button output the raw data that align to a particular locus and work on the export menu's callbacks. 

Friday, July 22, 2011

GUI progress

this week i've been making progress with the Tkinter GUI for lociNGS (still not sure about the name...). i'm attaching some screen shots to better illustrate where i'm at, but in words, i have a fully functional input menu. i have a summary screen and a locus specific screen working, but only if they don't contain more rows of data than a screen can fit. i've been working for the last 24 hours or so on scrollbars (the natural solution to this problem), but adding them to the code i already have has been, er, difficult. i'm going to keep mulling over that in the back of my brain and go back to working on the export menu/function.
i'm uploading the GUI script to GitHub even though it's a total mess and it won't work on anyone else's computer because it's dependent on inputting the right data in the right formats. useful, i know.

menu shot and the summary screen. 20 individuals, 4 populations, 2 species, made up locations, and the number of loci that each individual has called for it. (does that last one make sense? with next-gen data, the number of loci obtained for each individual in a dataset is variable. this column shows how many high quality loci were obtained for that individual.)

this is what the loci data screen should look like. i've restricted the output to only 10 loci so that you can read them on the screen.

this is what the loci screen actually looks like right now because it's cramming way too many lines of data onto a single screen and i can't figure out the best way to add scrollbars (which will have to go on the summary screen too).

Monday, July 18, 2011

kind of a rough week

this past week has been kind of rough. i sprained my ankle pretty bad AND i have been lost in python land trying to convert my scripts to intercommunicating modules. i spent many hours trying to figure this particular problem out:

 i created a class to make a row of checkboxes (class Checkbar) and a method for that class that reports a vector of 0s and 1s for whether a box is unchecked or checked (respectively, method called state). i have a separate function (called POPMENU) that creates an object of this class and this method has an internal method that scans all objects of the Checkbar class and reports the vectors for them all (called allstates). when i run POPMENU, this works fine. but when i embed it in a GUI widget (a Button), the vector always returns zeroes. i figure it has something/everything to do with passing variables or when the instances are created, but i can't quite make it work. i'm uploading the isolated problem in a script called (to GitHub) if anyone wants to look at it.

this week i've spent a lot of time reading about lambda functions, classes (and their methods) and functions. on the plus side, i've written the "input" portion of the GUI - in other words, i can create a drop down menu that will load the locus files, SAM/BAM files and demographic data into MongoDB. that was rather satisfying and is located in the script. two other new scripts, called and are being uploaded too. ( is used by for the input of files. isn't being used just yet, and may have to be rewritten, but was a first pass at getting all the reformatting scripts together.)

Friday, July 8, 2011

that's a little better

i've been trying to organize my brain and formulate the next step for this project. brad suggested i think about packaging some scripts into importable modules, which requires an intelligent plan for how the scripts will be used. i've been reading about GUIs and slowly piecing together how the various scripts i have will interact with each other and the user in the end product. i can't say i've settled on an optimal module design (should i put all the conversion scripts into one module so they can be accessed together? how about the scripts that upload the various data types to mongoDB? those seem more like stand alone pieces, but a user will need most or all of them at once, so...) still processing.
i edited my original GUI plan/figure to reflect the current state of my brain. feels good to transfer such things to paper. i anticipate the rest of today will be devoted to putting the converter scripts together in a module, since i'm pretty sure that makes sense. thoughts will continue to percolate as well.

Wednesday, July 6, 2011

it's also alive!

after a couple of hours cursing Migrate this morning, i found an error in my code and the output from now runs perfectly. (i was shocked when the root of my frustration was entirely my fault. shocked, i say! just kidding.)

that means that the scripts to database/retrieve data and convert .fasta files to the three desired formats are done. (tentatively done.) i've spent this afternoon reading Programming Python's chapters on Tkinter (their GUI-er) and copying the short scripts. this i'll probably do tomorrow as well, on top of making several new test datasets to error check my scripts. i also need to investigate Galaxy.

Tuesday, July 5, 2011

it's alive!

the first IMa2 input file my script produced successfully loaded and ran(in IMa2!) this morning. wahoo! i followed that up by working on, which I've written to take an IMa2 file as input. i know this means that users will have to generate an IMa2 file, even if they're only interested in a Migrate file, but these files are small and fast. also, the two formats are very similar and the IMa2 files have all the components necessary for Migrate format, so it was easy to code. i'm currently working on getting the output files successfully running with Migrate, which is proving tricky. off to GitHub then more fiddling with Migrate. good start to the week!

Friday, July 1, 2011

IMa2 format - done?

i can't believe how relatively easy writing this formatting script turned out to be. I LOVE PYTHON! i still need to run the final files through IMa2 to be certain they're good to go, but even if there is something that needs to be fixed, the bulk of the script ( is done! and already loaded on GitHub.

the output looks something like this (with sequences truncated and only two loci of 149 shown):