April 7, 2016

  • Christopher: worked with Tae Woo on null parent problem; worked on how to load/save files on filesystem (instead of database)
  • Dr. Embley: made all kinds of progress on the paper – much to discuss
  • Dr. Liddle: held a successful Founders Conference; worked with Dr. Embley on implementation and experiment issues; worked on related work search
  • Dr. Lonsdale: worked with Dr. Embley on evaluation; worked on lit. review


  • ER Paper
  • Embley home computer question
  • Liddle/Almquist

March 31, 2016

  • Tae Woo: Working on adding records into comments and empty record into form cache; Null parent
  • Christopher: Almost finished models and ajax for COMET; Will work with Dr. Liddle
  • Dr. Embley: Working on paper; Getting results of experiment; Constraint enforcer; Debugging JSON
  • Dr. Woodfield: Gone next Thursday April 7; Able to inject handlers; General and hard participation constraint handlers; Started on literature search; Needs to continue search; Got account on super computer;
  • Dr. Lonsdale: Literature search and evaluation.

Sidebars: ** Christopher and Tae Woo Null parent ** Coordinate counting Dr. Embley Dr. Lonsdale ** Abstract Due ** Coordinate Finish Paper

March 24, 2016

  • Tae Woo: fixed empty OSMX issue; it turned out to be simple in the end (original form JSON)
  • Dr. Woodfield: almost finished Android lectures; made progress on supercomputer account (applied for new one)
  • Dr. Embley: working on paper, ground truthing; has a reasonable draft up to the experiment, but not the data
  • Dr. Liddle: lots of programming
  • Dr. Lonsdale: working on ground truthing and tracking down relevant literature for NLP


  • Java strategy to read & parse JSON in streaming mode
  • Accessing the data
    • “json” column from ling240 for user 30
    • User30/TheElyAncestry/Couple/333.json
    • User30/TheElyAncestry/Family/333.json
    • etc.
  • Additional experiments

March 17, 2016

  • Dr. Lonsdale: the lab is done; it was a good experience; students enjoyed it; summarized student comments
  • Dr. Woodfield: fixed a bug related to JSON merge and relationship sets; next up: look at supercomputer data
  • Tae Woo: worked on empty OSMX situation; still working on updating COMET with regex stuff
  • Christopher: has been looking at the new COMET code; needs backlog prioritized
  • Dr. Embley: worked on dev test set and experimental evaluations; trying to get precision/recall/f-measure for extraction to report in the paper; ran into another JSON issue
  • Dr. Liddle: set up the experiment and worked on JSON encoding bug, related issues


  • ER Paper literature review:
    • ER itself – Liddle (starting with same papers needed for the Jacky Akoka paper)
    • General database integrity checking, quality – Woodfield
    • NLP idea of semantics, pragmatics of extraction, quality – Lonsdale
  • How to get across COMET boundary in the pipeline without having to do it by hand
  • Empty OSMX situation: encode “{}” for data field, single record for form array
  • Priorities in the backlog
  • Encoding bug debrief
  • Dr. Embley's JSON problem

March 10, 2016

  • Group coding frenzy today

March 3, 2016

  • Dr. Liddle: progress on experimental design; talked with Dr. Scott about sample size (10,000 should be fine)
  • Dr. Lonsdale: OntoSoar iterates over pages and initiates new thread for each page; can't do it interactively yet (but that's just a convenience thing for us); still little work to do on the runner; laptop just needed to attach to the FS network
  • Dr. Woodfield: got another constraint added; sees the last two we need to do; for the one error we knew of, it didn't find it, but it did find three we didn't know about
  • Christopher: helped Tae Woo past his blocker; found how to display the HTML correctly; hasn't been able to change opacity; did a demo and showed the issues
  • Tae Woo: has missed a corner case that we need: case where there is no data on the form; OntoSoar does produce some pages with no data; this will be a little difficult to do
  • Dr. Embley: progress on lots of miscellaneous things in the pipeline; Peter came the other day and it looks like he'll be here for the summer, perhaps working on another version of OntoSoar


  • Integrals over probability distributions
  • ER paper; what needs to be done by Monday?
    • Need to send pipeline output to the database, including uploading the 12 resources and running the ensemble
    • ling240 database and annotator setup, with users, jobs, etc.
    • Set up test cases for constraint checker
  • How to get vi to work without auto-indent:
    • Use the colon command :set noai

February 25, 2016

  • Dr. Liddle: worked with Dr. Embley on various programming tasks; Harwood is online
  • Tae Woo: welcome back! Working on putting results of regex back into COMET; needs to update annotations and formCache, and gridRenderer will process this; needs to align field ID in annotations and formCache
  • Christopher: got the icon in about the right place/size; has the click handler working; just needs to render text as HTML and create CSS rules for the various classes
  • Dr. Lonsdale: ran the Harwood texts and found that only one of the three pages ran all the way through because of English extractions we hadn't encountered before; now it is a bit more robust and we have results on all the files; also looked over training materials from Dr. Embley and has some suggestions
  • Dr. Embley: has been able to run lists of pages with Kilbarchan all the way through using batch-file drivers; found a couple of examples in the Kilbarchan set where semantic errors will be detectable by the constraint checker; working on a bug where message info is getting lost somewhere in the pipeline; will get precision/recall f-scores on the extraction and with respect to semantic errors


  • formCache vs. annotations (Tae Woo/Christopher/Liddle)
  • ER paper
    • Need to set up annotation experiment (Liddle; need structure from co-authors)

February 18, 2016

  • Dr. Embley: working on the pipeline to send ranges, sequences of pages to the tool set; trying to work on paper; continued working on experiment design
  • Dr. Liddle: worked with Dr. Embley and Christopher on various aspects of the tool
  • Dr. Lonsdale: still working on agent issue (reinit or start new without using bash script); Soar agent – grr! Has license issues with FamilySearch laptop, needs to connect to their network to re-auth.
  • Christopher: got the message demo mostly functional in annotator.ca; need to figure out how to render HTML in dialog box
  • Dr. Woodfield: got the distribution coded and clean-compiled; given participation constraint can now hand back list of integer pairs


  • Multiple main() entry points
  • ER paper
  • Harwood configuration
  • Subversion and Netbeans arguing with Dr. W

February 11, 2016

  • Christopher: got sick, but did find way to handle problem with models and AJAX; meet with Dr. Liddle Tuesday morning 9:30 or so (office hours)
  • Dr. Embley: pushed Kilbarchan all the way through the pipeline
  • Dr. Liddle: spent some time working on FROntIER and OntoES with Dr. Embley
  • Dr. Lonsdale: still working on figuring out multiple instantiations of OntoSoar agent (Java class vs. agent invocation)
  • Dr. Woodfield: mostly worked on distribution class


  • Implementation for ER paper
    • Step 1: prepare for ground truthing by Dr. Lonsdale's class no later than the end of February
      • Need COMET working (icons and messages)
      • Set up tasks – Kilbarchan, Ely, Harwood (101, 102, 104)
    • Step 2: perform ground truthing
      • Before constraint checker (an option)
      • After constraint checker
    • Step 3: …

February 4, 2016

  • Dr. Woodfield: did a good job with the presentation; one guy wanted to buy it immediately; could it be plugged into some other application?
  • Dr. Embley: 8000 42GB compressed JSON files from Family Tree sit on the supercomputer; we need to get accounts so we can access this data; Tae Woo is in Australia with his family – has done little since; Dr. E has been working on GreenFIE execution rules, and it works fine on Kilbarchan
  • Dr. Liddle: has spent most of his time working on course development (Android and Web Dev)
  • Dr. Lonsdale: figured out something while coding OntoSOAR interface; can only have one agent per invocation of OntoSOAR, and (agent 1:1 page), so we'll need to invoke OntoSOAR repeatedly, one per page (can't iterate inside OntoSOAR)


  • ER Paper: need to gather data
  • OntoES: assumption that largest block wins doesn't work well for relationships like it does for entities; also “\n” question

January 28, 2016

  • Dr. Woodfield: Has made some progress.
  • Dr. Embley: Blocker: can't figure out the logging framework. FHT workshop paper due Tuesday. Submitted second revision of TKDD, got paper with Nagy accepted. Recoded handlers for general use. Found problem with wrappers on getParentDocument – new wrapper works fine. ConstraintEnforcer is in the pipeline now (but with logging issue).
  • Dr. Liddle: Back from Founders trip. It was really good.
  • Dr. Lonsdale: Got his machine working!!! Competition with high school students is over now. Will not attend RootsTech next week.
  • Tae Woo: Had a couple of blockers: need to save regular expressions, also offset problem – but those are figured out now.


  • Tae Woo add annotations code
  • FHT Workshop Paper & Presentation
  • Configuring logging with the Apache logging framework
  • (Liddle/Embley) Jacky Akoka paper
  • ER paper based on constraint enforcer
  • (w/Almquist?) Can we demo on Tuesday?

January 21, 2016

  • No DEG meetings today, but still do have individual meetings with Dr. Embley

January 14, 2016

  • Christopher: He's back! Has the data structure for messages probably right (or pretty close); still having trouble getting enough information to be able to load the annotations in the forms from the filesystem.
  • Dr. Woodfield: worked on constraints rather than Android this past weekend; general constraints code seems to be coming along well; needs copy of Dr. Embley's OSMX file.
  • Dr. Embley: Added relationship names, pushed everything through to be ready for COMET; drafted FHT Workshop paper.
  • Dr. Liddle: Gone 1/15 through 1/26. Made lots of progress on my classes.
  • Dr. Lonsdale: Not a lot to report. Working with OIT and CSRs on backup.
  • Tae Woo: Now generates regular expression, but has some offset problems; is there a tool that shows offsets for an arbitrary page?
    • Try: od -A d -c filename | more


  • (Christoper/Embley/Liddle): Info to load annotations from the filesystem
  • (Woodfield/Liddle): svn integration
  • FHT Workshop paper

January 7, 2016

  • Tae Woo: is now able to traverse DOM tree to get the proper data structure and will now be able to generate regular expressions appropriately
  • Dr. Embley: has been spending time on TKDD paper (2 rounds already); has written constraint handlers; Dr. Woodfield has been able to hook these constraint handlers into his code, and it actually works for participation constraints; want to get general constraints working too; Peter is interested in joining during summer as he works with FamilySearch
  • Christopher: got dropped from employment; is submitting form to get readmitted (5 cr. hr. issue)
  • Dr. Liddle: break was nice, but still busy; getting new semester going: two classes (web dev and Android dev); will be out of town Jan. 15-26
  • Dr. Lonsdale: has five 3-credit classes this semester, so schedule is going to be busy; has tried to update to repository, but is getting errors in Netbeans and TortoiseSVN, so it may be LDAP, HTTP, or some firewall issue; is teaching Ling 240, and 30 students are able to do annotation-type tasks; hasn’t used product backlog, but needs access to it (done!)


  • Review Q4 2015 goals and propose Q1 2016 goals
  • Q4 2015 goals
    1. Integrate OntoSOAR and ListReader into the ensemble, working satisfactorily
    2. Merge of data from extraction tools prepared for input to COMET
    3. Constraint enforcer designed and planned for COMET integration, coding under way
    4. Fill out the full-line pipeline so we handle corner cases in batch processing (e.g. complex and off-page annotations)
    5. GreenFIE rule generation working and integrated into ensemble
  • Q1 2016 goals
    1. Get OntoSOAR and ListReader working satisfactorily.
    2. Get COMET/ListReader interface working for labeling.
    3. Get constraint enforcer working minimally and integrated completely.
    4. Get constraint messages integrated into COMET.
    5. Get thin line of full administrative system working.
    6. Complete the annotation of a full book such as Kilbarchan Parish.
    7. Get GreenFIE extraction generation working and then integrated into the ensemble.
  • Messages format for FHT Workshop paper
  • ER paper?
  • Repo access for Dr. Lonsdale
