cosmel.corpus.corpus module¶
-
class
cosmel.corpus.corpus.
Corpus
(article_root, *args, parsed_root=None, mention_root=None, parts=[''], skips=[], skip_file='')[source]¶ Bases:
object
The corpus class.
Parameters: - article_root (str) – the path to the folder containing word segmented article files.
- parsed_root (str) – the path to the folder containing parsed article files.
- mention_root (str) – the path to the folder containing mention files.
- parts (list) – the list of article/mention parts.
- skips (list) – the list of articles to be ignored.
- skip_file (str) – the file of list of articles to be ignored.
Notes
- Load all articles from
article_root
/part
for allpart
inparts
. - Load all parsed article from
parsed_root
/part
for allpart
inparts
. - Load all mentions from
mention_root
/part
for allpart
inparts
.
-
article_set
¶ the article set.
Type: ArticleSet
-
id_to_article
¶ the dictionary maps article ID to article object.
Type: Id2Article
-
parsed_article_set
¶ the parsed article set.
Type: ParsedArticleSet
-
id_to_parsed_article
¶ the dictionary maps article ID to parsed article object.
Type: Id2ParsedArticle
-
mention_set
¶ the mention set.
Type: MentionSet
-
mention_bundle_set
¶ the mention bundle set.
Type: MentionBundleSet
-
id_to_mention
¶ the dictionary maps article ID, sentence ID, and mention ID to mention object.
Type: Id2Mention
-
id_to_mention_bundle
¶ the dictionary maps article ID to mention bundle.
Type: Id2MentionBundle
-
head_to_mention_list
¶ the dictionary maps head word to mention object list.
Type: NameHead2MentionList