Literature Databases
Scopus
- litstudy.refine_scopus(docs: DocumentSet, *, search_title=True) Tuple[DocumentSet, DocumentSet]
Attempt to fetch Scopus metadata for each document in the given set. Returns a tuple containing two sets: the documents available on Scopus and the remaining documents not found on Scopus.
Documents are retrieved based on their identifier (DOI, Pubmed ID, or Scopus ID). Documents without a unique identifier are retrieved by performing a fuzzy search based on their title. This is not ideal and can lead to false positives (i.e., another document is found having the same title), thus it can be disabled if necessary.
- Parameters:
search_title -- Flag to toggle searching by title.
- litstudy.search_scopus(query: str, *, limit: int | None = None) DocumentSet
Submit the given query to the Scopus API.
- Parameters:
limit -- Restrict results the first limit documents.
- litstudy.load_scopus_csv(path: str) DocumentSet
Import CSV file exported from Scopus
SemanticScholar
- litstudy.fetch_semanticscholar(key: set, *, session=None) Document | None
Fetch SemanticScholar metadata for the given key. The key can be one of the following (see API reference):
DOI
S2 paper ID
ArXiv ID (example format: arXiv:1705.10311)
MAG ID (example format: MAG:112218234)
ACL ID (example format: ACL:W12-3903)
PubMed ID (example format: PMID:19872477)
Corpus ID (example format: CorpusID:37220927)
- Parameters:
session -- The requests.Session to use for HTTP requests.
- Returns:
The Document if it was found and None otherwise.
- litstudy.refine_semanticscholar(docs: DocumentSet, *, session=None) Tuple[DocumentSet, DocumentSet]
Attempt to fetch SemanticScholar metadata for each document in the given set based on their DOIs. Returns a tuple containing two sets: the documents available on SemanticScholar and the remaining documents that were not found or do not have a DOI.
- Parameters:
session -- The requests.Session to use for HTTP requests.
- Returns:
The documents available on SemanticScholar and the remaining documents.
- litstudy.search_semanticscholar(query: str, *, limit: int | None = None, batch_size: int = 100, session=None) DocumentSet
Submit the given query to SemanticScholar API and return the results as a DocumentSet. The query is a string containg keywords (see API reference).
- Parameters:
query -- The search query to submit.
limit -- The maximum number of results to return.
batch_size -- The number of results to retrieve per request. Must be at most 100.
session -- The requests.Session to use for HTTP requests.
CrossRef
- litstudy.fetch_crossref(doi: str, *, timeout=0.5, session=None) Document | None
Fetch the metadata for the given DOI from CrossRef.
- Parameters:
timeout -- The timeout between each HTTP request in seconds.
session -- The requests.Session to use for HTTP requests.
- Returns:
The Document or None if the DOI was not available.
- litstudy.refine_crossref(docs: DocumentSet, *, timeout=0.5, session=None) Tuple[DocumentSet, DocumentSet]
Attempts to fetch metadata from CrossRef for each document in the given set. Returns a tuple of two sets: the documents retrieved from CrossRef and the remaining documents (i.e., without DOI or not found).
- Parameters:
timeout -- Timeout in seconds between each request to throttle server communication.
session -- The requests.Session to use for HTTP requests.
- litstudy.search_crossref(query: str, *, limit: int | None = None, timeout: float = 0.5, options: dict = {}, session=None) DocumentSet
Submit the query to the CrossRef API.
- Parameters:
query -- The search query.
limit -- Maximum number of results to retrieve.
None
is unlimited.timeout -- Timeout in seconds between each request to throttle server communication
options -- Additional parameters that are passed to the
/works
endpoint of CrossRef (see CrossRef API <https://api.crossref.org>`_). Options are sort and filter.session -- The requests.Session to use for HTTP requests.
CSV
- litstudy.load_csv(path: str, dialect: Dialect | None = None, title_field: str | None = None, authors_field: str | None = None, abstract_field: str | None = None, citation_field: str | None = None, date_field: str | None = None, source_field: str | None = None, doi_field: str | None = None, filter=None) DocumentSet
Load an abitrary CSV file and parse its contents as a
DocumentSet
on a best effort basis.An attempt is made to guess the purpose of the fields of the CSV file based on their names. For example, the date of publication is likely given by a field named something like "Publication Date", "Year of Publication", or "Published Year". In case the field name cannot be determined, it is possible to explicitly set the purpose of field names by passing additional parameters. For example,
date_field
explicit sets name of the date field.The CSV is parsed using the given
dialect
. If not dialect is given, an attempt is made to guess the dialect based on the file's content.- Parameters:
path -- Name of CSV file.
dialect -- Used to read the CSV file.
title_field -- Field name for
title
.authors_field -- Field name for
authors
.abstract_field -- Field name for
abstract
.citation_field -- Field name for
citation_count
.date_field -- Field name for
publication_date
orsource_field -- Field name for
source
.doi_field -- Field name for
doi
.filter -- Optional function applied to each loaded record. This function can be used to, for example, add or delete fields.
Example:
docs = litstudy.load_csv("my_data.csv", title_field="Document Title", date_field="Pub Date")
IEEE Xplore
- litstudy.load_ieee_csv(path: str) DocumentSet
Import CSV file exported from IEEE Xplore.
Springer Link
- litstudy.load_springer_csv(path: str) DocumentSet
Load CSV file exported from Springer Link.
bibtex
- litstudy.load_bibtex(path: str) DocumentSet
Load the bibtex file at the given path as a DocumentSet.
RIS
- litstudy.load_ris_file(path: str) DocumentSet
Load the RIS file at the given path as a DocumentSet.
dblp
- litstudy.search_dblp(query: str, *, limit=None) DocumentSet
Perform the given query on the DBLP API and return the results as a DocumentSet.
- Parameters:
limit -- The maximum number of documents to retrieve.
arXiv
- litstudy.search_arxiv(query, start=0, max_results=2000, batch_size=100, sort_order='descending', sort_by='submittedDate', sleep_time=3) DocumentSet
Search arXiv.
Each returned document contains the following attributes: title, authors, doi, journal_ref, publication_date, abstract, language, and category
- Parameters:
query -- The query as described in the arXiv API use manual.
max_results -- The maximum number of results to return.
start -- Skip the first
start
documents from the results.batch_size -- The number documents to fetch per request.
sleep_time -- The time to wait in seconds between each HTTP requests.