LDAP: couldn't connect to LDAP server
Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
mind:alignment [2016/05/13 11:30]
norkish
mind:alignment [2016/05/13 13:26] (current)
norkish
Line 3: Line 3:
  
 <pre> <pre>
-minimum%Overlap band_noband_identity unbandedTime bandedTime linearBandTime +minPercOverlap banded_identity band_noband_identity unbandedTime bandedTime optimizedBandTime totalAlns 
-0.2 0.982 2.812 1.5482 1.8109 +0.03125 1.0 0.893 3.0931 0.7097 0.4767 225.0 
-0.4 0.989 2.8063 2.4472 2.9774 +0.0625 1.0 0.956 2.9003 0.8283 0.5601 225.0 
-0.6 0.993 2.7984 3.1182 3.4305 +0.125 1.0 0.9734 2.94 1.1837 0.9215 225.0 
-0.8 0.994 2.822 3.547 3.3106 +0.25 1.0 0.9867 2.925 1.8431 1.569 225.0 
-1.0 0.996 2.7969 3.6476 2.8028 +0.5 1.0 1.0 2.9287 2.9216 2.493 225.0 
-1.0 1.0 2.9273 3.3881 2.9144+1.0 1.0 1.0 2.9299 3.8145 2.5307 225.0
 </​pre>​ </​pre>​
-I think that we're not always getting 100% identity when the minimum%Overlap is 1.0 because of floating point precision issues. I reran setting the minimum%Overlap to 1.0 exactly and it had perfect identity. There is also always perfect identity between the two banded. ​ 
  
-Need to graph themTurns out banding doesn't necessarily save us a ton of time. Too much bookkeeping?​ Not long enough sequences ​to see payoff+Initially the optimizedBandTime was running quite a bit slower. I greased it up by minimizing repetitive calculations and now it seems to be working faster than the unbanded aligner even when the "​band"​ is the full sequence lengthIt's interesting to see that with the minimum percent overlap at 50%, the banded alignment still finds the same alignment as the unbanded 100% of the time. These results are exact character matching by the way, and I imagine those numbers will go up slightly if we change everything ​to lowercase:​ 
 + 
 +<​pre>​ 
 +minPercOverlap banded_identity band_noband_identity unbandedTime bandedTime optimizedBandTime totalAlns 
 +0.03125 1.0 0.8978 2.9952 0.7033 0.4429 225.0 
 +0.0625 1.0 0.96 3.0146 0.869 0.5882 225.0 
 +0.125 1.0 0.98 2.924 1.1947 0.9135 225.0 
 +0.25 1.0 0.9867 2.9173 1.8567 1.576 225.0 
 +0.5 1.0 1.0 2.9185 2.996 2.5493 225.0 
 +1.0 1.0 1.0 2.9427 3.925 2.6588 225.0 
 +</​pre>​ 
 + 
 +Interesting that the band_noband_identity went up slightly, but only for smaller bands. What this means is that there are some cases where case makes difference, but with sufficiently large context, it figures it out anyway. Also keep in mind that the listed times are for all 225 alignments. The per alignment time is thus on the order of .012 seconds. Also the times are averages of 10 iterations. 
 + 
 +[{{ mind:​comparebandedunbandedresults.png?1000 }}] 
 + 
 +The graph on the left shows how often the alignment results from the banded and unbanded alignment algorithms are identical. Even with a very, very small bandwidth (i.e., .03% of the sequence length), the alignments are identical nearly 90% of the time. That number reaches 100% when the bandwidth is 50% of the sequence length. This to me says that what we are aligning has a pretty no-duh, almost-no-indels answer most of the time. There are about 10% of the lyrics that have metadata in them that require a bit bigger bandwidth to get the optimal alignment. 
 + 
 +In terms of the time, the smaller the bandwidth, the faster the banded alignment is. The unbanded alignment is obviously unaffected by the bandwidth. The difference between the banded and the optimized banded is that the banded uses linear time but polynomial space; the optimized uses both linear time and linear space (p.s., this was a lot harder to implement than I expected :)). The reason that the optimized runs a little faster than the banded or (even the unbanded when bandwith ratio is greater than .6) has less to do with the time savings of linear space usage and more to do with the fact that I greased up the optimized aligner to frontload all possible repetitive calculations. Thus the red (and blue) line could also be greased to appropriately match the green line. 
 + 
 +Results are averages of 10 iterations on 225 alignments of Billy Joel lyrics (no tabs). 
 + 
 +So, my plan then will be to do a crude MSA of all lyrics to get a "gold standard"​ lyric for each song (presumably with no metadata); then align the gold standard lyric to the tabs to distinguish the actual song content in the tab from the metadata (and simultaneously check the completeness in terms of song coverage of the tablature). 
 +Graphs: 
mind/alignment.1463160626.txt.gz · Last modified: 2016/05/13 11:30 by norkish
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0