cicada
cicada is a statistical machine translation toolkit based on a
semiring parsing framework . Based on the generic framework, we
can
- Learn model(s): tree-to-string, string-to-tree, string-to-string,
tree-to-tree grammars/models, word alignment models, parsing
grammar/models.
- Translate sentence, lattice and/or parsed tree using
{string,tree}-to-{string,tree} models.
- Combine translations using forests or lattices.
- Align {hypergraph,lattice}-to-lattice.
- (dependency) parse lattices (or sentences)
The cicada toolkit is mainly developed by
Taro Watanabe
at Multilingual Translation Laboratory, Universal Communication
Institute, National Institute of Information and Communications
Technology (NICT).
If you have any questions about cicada, you can send them to
taro.watanabe at nict dot go dot jp.
Remark: cicada is 蝉(CJK UNIFIED IDEOGRAPH-8749), (or セミ) in Japanese, pronounced SEMI.
Quick Start
The stable version is: 0.3.5.
The latest code is also available from github.com.
Compile
For details, see BUILD.rst.
./autogen.sh (required when you get the code by git clone)
./configure
make
make install (optional)
Run
You can find a sample grammar file at samples directory together with
ngram language model. Here is an example run (Note that \ indicates
shell's newline).
./progs/cicada \
--input samples/scfg/input.txt \
--grammar samples/scfg/grammar.bin \
--grammar "glue:straight=true,inverted=false,non-terminal=[x],goal=[s]" \
--grammar "insertion:non-terminal=[x]" \
--feature-function "ngram:file=samples/scfg/ngram.bin" \
--feature-function word-penalty \
--feature-function rule-penalty \
--operation compose-cky \
--operation apply:prune=true,size=100,weights=samples/scfg/weights \
--operation output:file=-,kbest=10,weights=samples/scfg/weights
This sample means:
- Input is samples/scfg/input.txt
- Three grammars:
- The SCFG file is samples/scfg/grammar.bin which is a
binary version of samples/scfg/grammar.bz2.
- Additional grammar is a glue grammar consisting of two rules:
[s] -> <[x,1], [x,1]> and [s] -> <[s,1] [x,2], [s,1] [x,2]>.
- Another additional grammar is an insertion grammar which simply
copies the input string to output string, [x] -> <word-x, word-x>
- Three feature functions:
- 5-gram language model from samples/scfg/ngram.bin which is a
binary version of samples/scfg/ngram.bz2.
- A word penalty feature which penalize by the number of words in
the target side of a derivation.
- A rule penalty feature which penalize by the number of rules in a
derivation.
- In addition, there exist features already defined for each
hierarchical phrase pair. For example, see samples/scfg/grammar.bz2.
- Actual operation(s):
- Input is composed by CKY algorithm (compose-cky) which result
in a hypergraph.
- Cube-pruning (apply) to approximately compute features using 100
as a histogram pruning threshold using the weights at
samples/scfg/weights.
- 10-best derivations are computed and output at
- (stdout) using samples/scfg/weights as a
weight vector to compute the score for each derivation.
In depth
The above example can be found at samples/scfg/sample.sh. The
details of grammars, features and operations are described in
doc/grammar.rst, doc/features.rst and doc/operation.rst, respectively.
If you want to try tree-to-string or string-to-tree models, see
samples/t2s/sample.sh and samples/s2t/sample.sh.
In order to train a model, see doc/training.rst and
doc/training-stsg.rst which describe how to create your own
{tree,string}-to-{tree,string} models, tune parameters, and run
decoder.
Systems
It has been successfully compiled on x86_64 on Linux, OS X and
Cygwin, and regularly tested on Linux and OS X.
Descriptions
Basically, we have four distinct structures:
- lattice: a representation of graph implemented as a
two-dimensional array (see doc/lattice.rst).
- grammar: a collection of WFST implemented as a trie structure
(see doc/grammar.rst).
- tree-grammar: a collection of WFSTT (tree-transducer) implemented
as a (nested) trie structure (see doc/tree-grammar.rst).
- hypergraph: a compact representation of set of trees (or forest)
(see doc/hypergraph.rst).
Translation/parsing can be carried out by:
- A lattice (or sentence) is composed with a grammar, generating a
hypergraph .
- A lattice (or sentence) is composed with a tree-grammar,
generating a hypergraph .
- A lattice (or sentence) is composed with a phrasal grammar,
generating a phrasal hypergraph .
- A hypergraph/forest (or parse-tree) is composed with a phrasal
grammar, generating another hypergraph .
- A hypergraph/forest (or parse-tree) is composed with a tree
grammar, generating another hypergraph .
Alignment can be carried out by:
- A lattice is composed with dictionary, generating alignment
hypergraph, or
- A hypergraph is composed with dictionary, generating alignment
hypergraph .
- In order to support word alignment training, we can learn
Model1/HMM/Model4 by symmetized learning or
symmetric posterior constrained learning with smoothing via
variational Bayes or via L0 prior.
- Word clustering tool is also included to support word alignment
learning + translation .
- Final combined alignment can be generated either by heuristic
(AKA grow-diag-final-and etc.) or by ITG or max-matching from
posterior probabilities.
Also, lexicon model can be discriminatively trained .
For details of the training process, please refer to
doc/training.rst and doc/alignment.rst.
Dependency parsing can be carried out by:
- A lattice is dependency parsed by arc-standard, arc-eager, hybrid, degree2,
which generates derivation hypergraph.
- Forests are rescored by dependency features (TODO).
We support dependency projection with Model1/HMM posterior
probabilities so that we can train arbitrary dependency parses
after projections.
After the hypergraph generation, you can:
- Additional features are evaluated to generate another hypergraph .
cicada implements cube-pruning , cube-growing ,
incremental and exact (and stateless-inside-algorithm)
methods.
- cube-growing employs coarse-heuristics , such as lower-order
ngrams etc.
- cube-pruning implements algorithm 2 of faster cube pruning .
- Perform variational decoding for hypergraph or MBR decoding for hypergraph
based on the expected ngram-counts over forest .
- K-best sentences are generated from hypergraph .
- Generate oracle translations (BLEU only).
Or, you can combine outputs from multiple systems by :
- Perform parsing over n-bests (use your favorite parser, such as
Berkeley parser/Stanford parser etc.)
- Generate context-free confusion forest by combining trees (not confusion network!)
It is performed by collecting rules from parse trees, and
generate by Earley algorithm
- Generate k-best translations after feature application etc.
Or, a conventional system combination strategy of :
- Create lattice from n-best list by incremental merging
- Construct hypergraph by linear grammar (grammar-glue-straight + grammar-insertion)
- Generate k-best translations after feature application etc.
Monolingual grammar learning is implemented:
- A simple PCFG by simply extracting rules.
- Learn latent annotated PCFG by split/merge process with an EM
algorithm .
- Also, learn coarse grammars from the latent annotated PCFG for
coarse-to-fine parsing .
Phrase/synchronous-rule/tree-to-string/string-to-tree
extraction/scoring are implemented (see doc/extraction.rst and
doc/indexing.rst for details):
- A conventional phrase extract algorithm in Moses.
- A conventional hierarchical phrase extraction algorithm in Hiero
with or without syntax augmentation .
- Tree-to-string/string-to-tree extraction from forest .
- Tree-to-tree rule extraction from forest (experimental).
- max-scope constraints to limit the grammar size .
- After count extraction, you can perform map/reduce to compute
model scores .
- Then, prune your model based on Fisher's exact test .
Various learning components are implemented (see doc/learning.rst
for details):
- k-best merging batch learning
- MERT on hypergraphs or sentences
- batch algorithms (L-BFGS, SMO, liblinear ) with various
objectives, including ranking (AKA PRO) , softmax,
softmax-margin , margin, hinge or xBLEU .
- online algorithms (SGD, PA) with various objectives, including
margin (AKA MIRA) , hinge, ranking or softmax.
- online learning
- mini-batch style synchronous learning with various objectives,
including hinge, ranking, softmax or xBLEU .
- When synchronously merging parameters, we can select features by
kbest-feature merging .
- mini-batch style asynchronous learning with various objectives,
including hinge, ranking, softmax or xBLEU .
Feature functions (see doc/features.rst for details):
- The ngram language model feature supports both of
expgram and
kenlm .
- Feed-forward neural network language model .
- Sparse features, including rule-identity, source/target ngrams, and
word pairs.
References