cosmel.corpus.corpus module¶
-
class
cosmel.corpus.corpus.Corpus(article_root, *args, parsed_root=None, mention_root=None, parts=[''], skips=[], skip_file='')[source]¶ Bases:
objectThe corpus class.
Parameters: - article_root (str) – the path to the folder containing word segmented article files.
- parsed_root (str) – the path to the folder containing parsed article files.
- mention_root (str) – the path to the folder containing mention files.
- parts (list) – the list of article/mention parts.
- skips (list) – the list of articles to be ignored.
- skip_file (str) – the file of list of articles to be ignored.
Notes
- Load all articles from
article_root/partfor allpartinparts. - Load all parsed article from
parsed_root/partfor allpartinparts. - Load all mentions from
mention_root/partfor allpartinparts.
-
article_set¶ the article set.
Type: ArticleSet
-
id_to_article¶ the dictionary maps article ID to article object.
Type: Id2Article
-
parsed_article_set¶ the parsed article set.
Type: ParsedArticleSet
-
id_to_parsed_article¶ the dictionary maps article ID to parsed article object.
Type: Id2ParsedArticle
-
mention_set¶ the mention set.
Type: MentionSet
-
mention_bundle_set¶ the mention bundle set.
Type: MentionBundleSet
-
id_to_mention¶ the dictionary maps article ID, sentence ID, and mention ID to mention object.
Type: Id2Mention
-
id_to_mention_bundle¶ the dictionary maps article ID to mention bundle.
Type: Id2MentionBundle
-
head_to_mention_list¶ the dictionary maps head word to mention object list.
Type: NameHead2MentionList