Alert! Work in Progress We are in the process of making major upgrades to CCASH's architecture. New features include flexible data structures suitable for a variety of textual data and annotations. This reduces the burden on developers who do not wish to implement their own data structures. Other new features include fine-grained user permissions over these data structures. Unfortunately, the size of these changes means that the code base will be unstable for a time. Feel free to email us about progress details, or to get involved if you want things to move faster!
What is CCASH?
CCASH (Cost-Conscious Annotation Supervised by Humans) is a web-based annotation framework. It is designed to be an environment for evaluating state-of-the-art and experimental techniques for efficient annotation and also for applying those techniques to real world annotation projects. While designing CCASH we had our eye particularly on Active Learning; however other techniques such as feature labeling and incorporating rich prior knowledge could also be incorporated into CCASH without too much trouble.
How does it work?
CCASH coordinates the activities of two components:
- Annotation Tasks are graphical user interface that run in your browser and allow you to annotate instances, or correct automatic annotations. An annotation task's job is to display a particular kind of instance and solicit a particular kind of annotation. CCASH tasks are implemented with the Google Web Toolkit, allowing you to write code in Java assisted by GWT's WYSIWYG editors.
- Annotation Managers run as xmlrpc services on the network. As such they may be written in any language with an xmlrpc implementation (that is to say, almost anything). Annotation managers are in charge of two important tasks:
- Provide annotators with an optionally pre-annotated instances
- Record annotations
In a typical annotation scenario, CCASH would query an annotation manager for a pre-annotated instance, then present that instance to a human annotator via a compatible GUI task. After the annotator finished, the completed annotation would be sent back to the annotation manager to be preserved.
CCASH is an Eclipse project, so you will want to get a current copy of Eclipse. We recommend the Eclipse Enterprise Edition (Eclipse EE) since it comes ready to run Apache Tomcat servers, which you'll need to run data providers.
You'll need to install the following Eclipse plugins:
- Google Plugin for Eclipse You will need at least the Google Plugin for your version of Eclipse, and the GWT SDK 2.4.0. The other features are for Android development.
- Subversive for subversion funcationality. Use an SVN 1.6 API library. Subclipse would also work, but it requires more manual setup in non-Windows environments.
CCASH manages its data in a relational database. We chose postgres as the default implementation because of its permissive licensing and sub-second timing values. You will need to install the postgres server on your system.
After installing postgres, you must configure postgres to accept connections from your CCASH install. Do this by editing the pg_hba.conf file and changing the line that reads
host all all 127.0.0.1/32 ident
host all all 127.0.0.1/32 trust
This tells postgres to trust all connections from the localhost. This is fine for development. (In the future when you deploy Ccash, you will probably want to increase security by changing the word "trust" to "md5" which will require you to create a postgres account and password for CCASH. The username and password can be whatever you want as long as you change the corresponding data inside of the file Ccash/src/META-INF/persistence.xml).
Create a database
Create a postgres database for CCASH by running the following command:
createdb -U postgres ccash
Create a database user
Create a postgres user for CCASH by running the following command:
createuser -U postgres ccash
For a copy of CCASH licensed under the AGPL, see the SourceForge project at https://sourceforge.net/p/ccash/code/HEAD/tree/. Using Subclipse or Subversive check out a read-only copy of the the code from http://svn.code.sf.net/p/ccash/code/trunk. If you are interested in CCASH under a different license, please contact us directly.
To run CCASH, right-click on the eclipse CCASH project, click "Run As," and select "Web Application." After a minute a "Development Mode" tab will open in Eclipse and display a url. Copy this url into a browser, and you will see the CCASH login screen. Login with username "admin" and password "passwd99". You can change this password after logging in by clicking the "Admin" menu item, and selecting "Annotators".
"Hello, World" annotation task
Start annotating by Doing Simple Sentiment Classification.
How do I implement my own annotation task in CCASH?
CCASH is an annotation framework. Before you can apply CCASH to the annotation task you are interested in, you'll need to create an Annotation Manager to run on the server, and an Annotation Task to run in your annotators' browsers. If you create something that you think others might be interested in, please contribute it to the repository!
Do I have to build my application from scratch?
We have already developed some annotation tasks that we are interested in. Feel free to use their pieces as building blocks for your own project!
Example annotation tasks
These are fully formed annotation tasks you can use for reference. Relevant classes are indicated by links to their javadocs
- Simple Sentiment Classification (an extremely simple task put together for Demo purposes).
- English part of speech tagging - Label sequences of English words with their respective parts of speech from the Penn Treebank Tagset.
- Syriac morphological tagging - Label sequences of Syriac words with their respective morphological analyses. This includes separating the prefix and stem from the main word, assigning a grammatical category (Noun, Verb, etc), assigning gender (common, masculine, feminine), and so on.
- Syriac morphological tagging tutorial - The same as normal Syriac morphological tagging, except that after each sentence the annotator receives feedback on how they did and optionally are obliged to try the sentence again.
- Survey - Asks users to answer a series of short answer and multiple choice questions
- User study - Takes an annotator through a predetermined sequence of other tasks.
- Training - Presents annotators with a series of instructions on the left side of the screen while they perform an annotation task on the right side of the screen.
These are reusable components that was have developed while working on our own tasks. Check out the linked javadocs for more information.
- AbstractFileReadingAnnotationManager - reads a list of instances from a file, and (optionally) a list of pre-labels from another file, and finally records annotations received to a file.
- AbstractFileReadingInstanceProvider - reads a list of instances from a file and serves them up sequentially to each annotator. [bogus Javadoc]. Used to implement AbstractFileReadingAnnotationManager.
- AbstractFileReadingAutomaticAnnotationProvider - reads a list of annotations from a file and then uses their instance ids to match annotations to instances. [bogus Javadoc]. Used to implement AbstractFileReadingAutomaticAnnotationProvider.
- AbstractFileReadingAnnotationRecorder - records all annotations received to a file. Used to implement AbstractFileReadingAutomaticAnnotationProvider.
These are tasks that we have in the incubator.
- Named Entity Tagging - Label noun phrases as Person, Location, Business, etc.
I have a problem! What should I do?
First consult the CCASH Frequently Asked Questions. If your question isn't answered there, send us a note at ccash at cs dot byu dot edu.
- CCASH being used to evaluate pre-annotation and correction propagation for Syriac morphological analysis
- CCASH being used to evaluate tag dictionaries in English POS tagging
- Original CCASH paper (partially out-of-date)