Literature Databases

Scopus

litstudy.refine_scopus(docs: DocumentSet, *, search_title=True) → Tuple[DocumentSet, DocumentSet]

Attempt to fetch Scopus metadata for each document in the given set. Returns a tuple containing two sets: the documents available on Scopus and the remaining documents not found on Scopus.

Documents are retrieved based on their identifier (DOI, Pubmed ID, or Scopus ID). Documents without a unique identifier are retrieved by performing a fuzzy search based on their title. This is not ideal and can lead to false positives (i.e., another document is found having the same title), thus it can be disabled if necessary.

Parameters:: search_title -- Flag to toggle searching by title.

litstudy.search_scopus(query: str, *, limit: int | None = None) → DocumentSet

Submit the given query to the Scopus API.

Parameters:: limit -- Restrict results the first limit documents.

litstudy.load_scopus_csv(path: str) → DocumentSet: Import CSV file exported from Scopus

SemanticScholar

litstudy.fetch_semanticscholar(key: set, *, session=None) → Document | None

Fetch SemanticScholar metadata for the given key. The key can be one of the following (see API reference):

DOI
S2 paper ID
ArXiv ID (example format: arXiv:1705.10311)
MAG ID (example format: MAG:112218234)
ACL ID (example format: ACL:W12-3903)
PubMed ID (example format: PMID:19872477)
Corpus ID (example format: CorpusID:37220927)

Parameters:: session -- The requests.Session to use for HTTP requests.
Returns:: The Document if it was found and None otherwise.

litstudy.refine_semanticscholar(docs: DocumentSet, *, session=None) → Tuple[DocumentSet, DocumentSet]

Attempt to fetch SemanticScholar metadata for each document in the given set based on their DOIs. Returns a tuple containing two sets: the documents available on SemanticScholar and the remaining documents that were not found or do not have a DOI.

Parameters:: session -- The requests.Session to use for HTTP requests.
Returns:: The documents available on SemanticScholar and the remaining documents.

litstudy.search_semanticscholar(query: str, *, limit: int | None = None, batch_size: int = 100, session=None) → DocumentSet

Submit the given query to SemanticScholar API and return the results as a DocumentSet. The query is a string containg keywords (see API reference).

Parameters:

query -- The search query to submit.
limit -- The maximum number of results to return.
batch_size -- The number of results to retrieve per request. Must be at most 100.
session -- The requests.Session to use for HTTP requests.

CrossRef

litstudy.fetch_crossref(doi: str, *, timeout=0.5, session=None) → Document | None

Fetch the metadata for the given DOI from CrossRef.

Parameters:

timeout -- The timeout between each HTTP request in seconds.
session -- The requests.Session to use for HTTP requests.

Returns:

The Document or None if the DOI was not available.

litstudy.refine_crossref(docs: DocumentSet, *, timeout=0.5, session=None) → Tuple[DocumentSet, DocumentSet]

Attempts to fetch metadata from CrossRef for each document in the given set. Returns a tuple of two sets: the documents retrieved from CrossRef and the remaining documents (i.e., without DOI or not found).

Parameters:

timeout -- Timeout in seconds between each request to throttle server communication.
session -- The requests.Session to use for HTTP requests.

litstudy.search_crossref(query: str, *, limit: int | None = None, timeout: float = 0.5, options: dict = {}, session=None) → DocumentSet

Submit the query to the CrossRef API.

Parameters:

query -- The search query.
limit -- Maximum number of results to retrieve. None is unlimited.
timeout -- Timeout in seconds between each request to throttle server communication
options -- Additional parameters that are passed to the /works endpoint of CrossRef (see CrossRef API <https://api.crossref.org>`_). Options are sort and filter.
session -- The requests.Session to use for HTTP requests.

CSV

Load an abitrary CSV file and parse its contents as a DocumentSet on a best effort basis.

An attempt is made to guess the purpose of the fields of the CSV file based on their names. For example, the date of publication is likely given by a field named something like "Publication Date", "Year of Publication", or "Published Year". In case the field name cannot be determined, it is possible to explicitly set the purpose of field names by passing additional parameters. For example, date_field explicit sets name of the date field.

The CSV is parsed using the given dialect. If not dialect is given, an attempt is made to guess the dialect based on the file's content.

Parameters:

path -- Name of CSV file.
dialect -- Used to read the CSV file.
title_field -- Field name for title.
authors_field -- Field name for authors.
abstract_field -- Field name for abstract.
citation_field -- Field name for citation_count.
date_field -- Field name for publication_date or
source_field -- Field name for source.
doi_field -- Field name for doi.
filter -- Optional function applied to each loaded record. This function can be used to, for example, add or delete fields.

Example:

docs = litstudy.load_csv("my_data.csv",
                         title_field="Document Title",
                         date_field="Pub Date")

IEEE Xplore

litstudy.load_ieee_csv(path: str) → DocumentSet: Import CSV file exported from IEEE Xplore.

Springer Link

litstudy.load_springer_csv(path: str) → DocumentSet: Load CSV file exported from Springer Link.

bibtex

litstudy.load_bibtex(path: str) → DocumentSet: Load the bibtex file at the given path as a DocumentSet.

RIS

litstudy.load_ris_file(path: str) → DocumentSet: Load the RIS file at the given path as a DocumentSet.

dblp

litstudy.search_dblp(query: str, *, limit=None) → DocumentSet

Perform the given query on the DBLP API and return the results as a DocumentSet.

Parameters:: limit -- The maximum number of documents to retrieve.

arXiv

litstudy.search_arxiv(query, start=0, max_results=2000, batch_size=100, sort_order='descending', sort_by='submittedDate', sleep_time=3) → DocumentSet

Search arXiv.

Each returned document contains the following attributes: title, authors, doi, journal_ref, publication_date, abstract, language, and category

Parameters:

query -- The query as described in the arXiv API use manual.
max_results -- The maximum number of results to return.
start -- Skip the first start documents from the results.
batch_size -- The number documents to fetch per request.
sleep_time -- The time to wait in seconds between each HTTP requests.