Tool Specification¶
Corpus Generation¶
Usage:
usage: corpusgen.py [-h] -c CORPUS [-d DATABASE] [-i INPUT] [-x XML] [-X XML_EMPTY] [-r RULE | --rule-exact | --rule-parser] [-t THREAD] CosmEL Tool: Corpus Generator. optional arguments: -h, --help show this help message and exit -c CORPUS, --corpus CORPUS store corpus data in directory "<CORPUS>/" -d DATABASE, --database DATABASE load product database from directory <DATABASE>; default is "<CORPUS>/repo/" -i INPUT, --input INPUT load articles in directory <INPUT>; default is "<CORPUS>/article/original_article/" -x XML, --xml XML output rule labeled XML articles into directory <XML>; default is "<CORPUS>/xml/purged_article_rid/" -X XML_EMPTY, --xml-empty XML_EMPTY output empty XML articles into directory <XML-EMPTY> for human annotation; default is "<CORPUS>/xml/purged_article/" -r RULE, --rule RULE use file <RULE> as rule --rule-exact use rule with only exact-match --rule-parser use rule with parser -t THREAD, --thread THREAD use <THREAD> threads; default is `os.cpu_count()`
Model Training¶
Usage:
usage: train.py [-h] -c CORPUS [-m MODEL] [-x XML] [--emb EMB] [-s {c,cd,cn,cdn}] [-l [{gid,rid,joint} [{gid,rid,joint} ...]]] [-L [{gid,rid,joint} [{gid,rid,joint} ...]]] [-t THREAD] CosmEL Tool: Training. optional arguments: -h, --help show this help message and exit -c CORPUS, --corpus CORPUS store corpus data in directory "<CORPUS>/" -m MODEL, --model MODEL store model data in directory "<MODEL>/"; default is "<CORPUS>/model/" -x XML, --xml XML load golden labeled XML articles from directory <XML>; default is "<CORPUS>/xml/purged_article_gid/" --emb EMB load pretrained embeddings from file <EMB>; default is "<CORPUS>/embeddings/purged_article.dim300.emb.bin" -s {c,cd,cn,cdn}, --structure-eem {c,cd,cn,cdn} use model structure <STRUCTURE-EEM> for entity embeddings model; default is "cdn" -l [{gid,rid,joint} [{gid,rid,joint} ...]], --label-eem [{gid,rid,joint} [{gid,rid,joint} ...]] use label type <LABEL-EEM> for entity embeddings model; default is "joint" -L [{gid,rid,joint} [{gid,rid,joint} ...]], --label-mtc [{gid,rid,joint} [{gid,rid,joint} ...]] use label type <LABEL-MTC> for mention type classifier; default is "gid" -t THREAD, --thread THREAD use <THREAD> threads; default is `os.cpu_count()`
Model Prediction¶
Usage:
usage: predict.py [-h] -c CORPUS [-m MODEL] [-o OUTPUT] [-s {c,cd,cn,cdn}] [-l {gid,rid,joint}] [-L {gid,rid,joint}] [-t THREAD] CosmEL Tool: Prediction. optional arguments: -h, --help show this help message and exit -c CORPUS, --corpus CORPUS store corpus data in directory "<CORPUS>/" -m MODEL, --model MODEL store model data in directory "<MODEL>/"; default is "<CORPUS>/model/" -o OUTPUT, --output OUTPUT output predicted XML articles into directory <OUTPUT>; default is "<CORPUS>/xml/purged_article_gnrid/" -s {c,cd,cn,cdn}, --structure-eem {c,cd,cn,cdn} use model structure <STRUCTURE-EEM> for entity embeddings model; default is "cdn" -l {gid,rid,joint}, --label-eem {gid,rid,joint} use label type <LABEL-EEM> for entity embeddings model; default is "joint" -L {gid,rid,joint}, --label-mtc {gid,rid,joint} use label type <LABEL-MTC> for mention type classifier; default is "gid" -t THREAD, --thread THREAD use <THREAD> threads; default is `os.cpu_count()`