cosmel.corpus.corpus module

class cosmel.corpus.corpus.Corpus(article_root, *args, parsed_root=None, mention_root=None, parts=[''], skips=[], skip_file='')[source]

Bases: object

The corpus class.

Parameters:
  • article_root (str) – the path to the folder containing word segmented article files.
  • parsed_root (str) – the path to the folder containing parsed article files.
  • mention_root (str) – the path to the folder containing mention files.
  • parts (list) – the list of article/mention parts.
  • skips (list) – the list of articles to be ignored.
  • skip_file (str) – the file of list of articles to be ignored.

Notes

  • Load all articles from article_root/part for all part in parts.
  • Load all parsed article from parsed_root/part for all part in parts.
  • Load all mentions from mention_root/part for all part in parts.
reload_mention(mention_root)[source]
article_set

the article set.

Type:ArticleSet
id_to_article

the dictionary maps article ID to article object.

Type:Id2Article
parsed_article_set

the parsed article set.

Type:ParsedArticleSet
id_to_parsed_article

the dictionary maps article ID to parsed article object.

Type:Id2ParsedArticle
mention_set

the mention set.

Type:MentionSet
mention_bundle_set

the mention bundle set.

Type:MentionBundleSet
id_to_mention

the dictionary maps article ID, sentence ID, and mention ID to mention object.

Type:Id2Mention
id_to_mention_bundle

the dictionary maps article ID to mention bundle.

Type:Id2MentionBundle
head_to_mention_list

the dictionary maps head word to mention object list.

Type:NameHead2MentionList