cosmel.corpus.article module

class cosmel.corpus.article.Article(file_path, root_path)[source]

Bases: collections.abc.Sequence

The article object (contains list of sentences).

  • Item: the word-segmented sentence (WsWords)
Parameters:
  • root_path (str) – the root path of the articles.
  • file_path (str) – the path to the article.
static path_to_aid(path, root)[source]

str: Convert file path to article ID.

aid

the article ID (containg folder and author name).

Type:str
path

the related file path.

Type:str
parsed

the parsed article of this article.

Type:ParsedArticle
bundle

the mention bundle of this article.

Type:MentionBundle
save(file_path, method)[source]

Save the article to file.

Parameters:method – one of str(), txtstr(), roledstr(), roledtxtstr().
class cosmel.corpus.article.ArticleSet(article_root, parts=[''], skips=[])[source]

Bases: collections.abc.Collection

The set of articles.

  • Item: the article object (Article)
Parameters:
  • article_root (str) – the path to the folder containing data files.
  • parts (list) – the list of article/mention parts.
  • skips (list) – the list of articles to be ignored.

Notes

  • Load all articles from article_root/part for all part in parts.
path

the root path of the articles.

Type:str
class cosmel.corpus.article.Id2Article(article_set)[source]

Bases: collections.abc.Mapping

The dictionary maps article ID to article object.

  • Key: the article ID (str).
  • Item: the article object (Article).
Parameters:article_set (ArticleSet) – the article set.