Scopus API in Python#

by Vincent F. Scalfani

These recipe examples use the Elsevier Scopus API and the Python Scopus API-wrapper package, pybliometrics. Code was tested and sample data downloaded from the Scopus API on February 16, 2022 via and This tutorial content is intended to help facillitate academic research. Before continuing or reusing any of this code, please be aware of Elsevier’s API policies and appropiate use-cases. You will also need to register for an API key in order to use the Scopus API.

1. Initial Pybliometrics Setup#

The first time you run import pybliometrics, it will prompt you for your Elsevier Scopus API Key, which is then saved to a local config file. See the documentation:

import pybliometrics
# import other libraries needed
from pybliometrics.scopus import ScopusSearch
import time
import numpy as np
import pandas as pd

2. Get Author Data#

Number of Records for Author#

# Scopus Author ID field (AU-ID): 55764087400, Vincent Scalfani
q1 = ScopusSearch('AU-ID(55764087400)', download=False)

Download Record Data#

q1 = ScopusSearch('AU-ID(55764087400)')

# save to dataframe
df1 = pd.DataFrame(q1.results)
# view column names
Index(['eid', 'doi', 'pii', 'pubmed_id', 'title', 'subtype',
       'subtypeDescription', 'creator', 'afid', 'affilname',
       'affiliation_city', 'affiliation_country', 'author_count',
       'author_names', 'author_ids', 'author_afids', 'coverDate',
       'coverDisplayDate', 'publicationName', 'issn', 'source_id', 'eIssn',
       'aggregationType', 'volume', 'issueIdentifier', 'article_number',
       'pageRange', 'description', 'authkeywords', 'citedby_count',
       'openaccess', 'freetoread', 'freetoreadLabel', 'fund_acr', 'fund_no',
# number of rows
# view first 5 rows
# df1.head(5)
# We can index data from our new dataframe, df1.
# For example, create a list of just the DOIs
dois = df1.doi.tolist()
['10.1021/acs.jchemed.1c00904', '10.5860/crln.82.9.428', '10.1021/acs.iecr.8b02573', '10.1021/acs.jchemed.6b00602', '10.5062/F4TD9VBX', '10.1021/acs.macromol.6b02005', '10.1186/s13321-016-0181-z', '10.1021/acs.chemmater.5b04431', '10.1021/acs.jchemed.5b00512', '10.1021/acs.jchemed.5b00375', '10.5860/crln.76.9.9384', '10.5860/crln.76.2.9259', '10.1126/science.346.6214.1258', '10.1021/ed400887t', '10.1016/j.acalib.2014.03.015', '10.5062/F4XS5SB9', '10.1021/ma300328u', '10.1021/mz200108a', '10.1021/ma201170y', '10.1021/ma200184u', '10.1021/cm102374t']
# Get a list of article titles
titles = df1.title.tolist()
['Using NCBI Entrez Direct (EDirect) for Small Molecule Chemical Information Searching in a Unix Terminal',
 'Using the linux operating system full-time tips and experiences from a subject liaison librarian',
 'Analysis of the Frequency and Diversity of 1,3-Dialkylimidazolium Ionic Liquids Appearing in the Literature',
 'Rapid Access to Multicolor Three-Dimensional Printed Chemistry and Biochemistry Models Using Visualization and Three-Dimensional Printing Software Programs',
 'Text analysis of chemistry thesis and dissertation titles',
 'Phototunable Thermoplastic Elastomer Hydrogel Networks',
 'Programmatic conversion of crystal structures into 3D printable files using Jmol',
 'Dangling-End Double Networks: Tapping Hidden Toughness in Highly Swollen Thermoplastic Elastomer Hydrogels',
 'Replacing the Traditional Graduate Chemistry Literature Seminar with a Chemical Research Literacy Course',
 '3D Printed Block Copolymer Nanostructures',
 'Hypotheses in librarianship: Applying the scientific method',
 'Recruiting students to campus: Creating tangible and digital products in the academic library',
 'Finally free',
 '3D printed molecules and extended solid models for teaching symmetry and point groups',
 'Repurposing Space in a Science and Engineering Library: Considerations for a Successful Outcome',
 'A model for managing 3D printing services in academic libraries',
 'Morphological phase behavior of poly(RTIL)-containing diblock copolymer melts',
 'Network formation in an orthogonally self-assembling system',
 'Access to nanostructured hydrogel networks through photocured body-centered cubic block copolymer melts',
 'Synthesis and ordered phase separation of imidazolium-based alkyl-ionic diblock copolymers made via ROMP',
 'Thermally stable photocuring chemistry for selective morphological trapping in block copolymer melt systems']
# now a list of the cited by count
cited_by = df1.citedby_count.tolist()
[0, 0, 16, 23, 4, 11, 18, 6, 10, 24, 0, 0, 0, 94, 6, 34, 39, 31, 18, 44, 11]
# get sum of cited_by counts

3. Get Author Data in a Loop#

Number of Records for Author#

# load a list of author names and Scopus AUIDs
import csv
with open('authors.txt') as infile:
          rows = csv.reader(infile, delimiter='\t')
          author_list = list(rows)
[['Emy Decker', '36660678600'], ['Lindsey Lowry', '57210944451'], ['Karen Chapman', '35783926100'], ['Kevin Walker', '56133961300'], ['Sara Whitver', '57194760730']]
# get number of Scopus records for each author
num_records = []
for author,authorID in author_list:
    # query search
    q = ScopusSearch('AU-ID' +'(' + authorID + ')', download=False)
    num = q.get_results_size()
    # compile saved scopus data into a list of lists               
    num_records.append([author, authorID, num])
    # delay one second between api calls to be nice to Elsevier servers
[['Emy Decker', '36660678600', 14],
 ['Lindsey Lowry', '57210944451', 4],
 ['Karen Chapman', '35783926100', 29],
 ['Kevin Walker', '56133961300', 8],
 ['Sara Whitver', '57194760730', 4]]

Download Record Data#

# Let's say we want the DOIs and cited by counts in a list
cites = []
for author,authorID in author_list:
    # query search
    q = ScopusSearch('AU-ID' +'(' + authorID + ')')
    # create a dataframe
    q_df = pd.DataFrame(q.results)
    # save DOIs to a list
    doi = q_df.doi.tolist()
    # save citedby_count to a list
    citedby_count = q_df.citedby_count.tolist()
    # compile saved scopus data into a list of lists               
    cites.append([author, doi, citedby_count])
    # delay one second between api calls to be nice to Elsevier servers
# The cites variable is a list of list with the data
# view data for first two authors
[['Emy Decker',
  [0, 0, 7, 0, 0, 0, 3, 0, 6, 1, 2, 0, 0, 10]],
 ['Lindsey Lowry',
  [1, 0, 1, 0]]]
# We can transform this into a flat list as follows
# credit to Avery Fernandez for help with this clever transformation!
cites_flat = []
for authors in range(len(cites)):
    for doi in range(len(cites[authors][1])):
        cites_flat.append([cites[authors][0], cites[authors][1][doi], cites[authors][2][doi]])
cites_flat[0:18] # show first 2 author sets
[['Emy Decker', '10.1108/RSR-08-2021-0051', 0],
 ['Emy Decker', '10.1080/1072303X.2021.1929642', 0],
 ['Emy Decker', '10.1080/15367967.2021.1900740', 7],
 ['Emy Decker', '10.1080/15367967.2020.1826951', 0],
 ['Emy Decker', '10.1080/10691316.2020.1781725', 0],
 ['Emy Decker', '10.1145/3347709.3347805', 0],
 ['Emy Decker', '10.4018/978-1-5225-5631-2.ch09', 3],
 ['Emy Decker', '10.1016/B978-0-08-102409-6.00007-9', 0],
 ['Emy Decker', '10.1108/LM-10-2016-0078', 6],
 ['Emy Decker', '10.1016/B978-0-08-100775-4.00010-8', 1],
 ['Emy Decker', '10.1108/S0732-067120160000036013', 2],
 ['Emy Decker', '10.4018/978-1-4666-8624-3', 0],
 ['Emy Decker', '10.1108/S0065-2830(2013)0000037006', 0],
 ['Emy Decker', '10.1108/07378831011096268', 10],
 ['Lindsey Lowry', '10.1080/1941126X.2021.1949153', 1],
 ['Lindsey Lowry', '10.5860/lrts.65n1.4-13', 0],
 ['Lindsey Lowry', '10.1080/00987913.2020.1733173', 1],
 ['Lindsey Lowry', '10.1080/1941126X.2019.1634951', 0]]
# add to dataframe
cites_df = pd.DataFrame(cites_flat)
0 1 2
0 Emy Decker 10.1108/RSR-08-2021-0051 0
1 Emy Decker 10.1080/1072303X.2021.1929642 0
2 Emy Decker 10.1080/15367967.2021.1900740 7
3 Emy Decker 10.1080/15367967.2020.1826951 0
4 Emy Decker 10.1080/10691316.2020.1781725 0
5 Emy Decker 10.1145/3347709.3347805 0
6 Emy Decker 10.4018/978-1-5225-5631-2.ch09 3
7 Emy Decker 10.1016/B978-0-08-102409-6.00007-9 0
8 Emy Decker 10.1108/LM-10-2016-0078 6
9 Emy Decker 10.1016/B978-0-08-100775-4.00010-8 1
10 Emy Decker 10.1108/S0732-067120160000036013 2
11 Emy Decker 10.4018/978-1-4666-8624-3 0
12 Emy Decker 10.1108/S0065-2830(2013)0000037006 0
13 Emy Decker 10.1108/07378831011096268 10
14 Lindsey Lowry 10.1080/1941126X.2021.1949153 1
15 Lindsey Lowry 10.5860/lrts.65n1.4-13 0
16 Lindsey Lowry 10.1080/00987913.2020.1733173 1
17 Lindsey Lowry 10.1080/1941126X.2019.1634951 0

Save Record Data to a file#

Here is one method if you want to loop over author queries and save all Scopus document data to a file

# load a list of author names and Scopus AUIDs
import csv
with open('authors.txt') as infile:
          rows = csv.reader(infile, delimiter='\t')
          author_list = list(rows)
[['Emy Decker', '36660678600'], ['Lindsey Lowry', '57210944451'], ['Karen Chapman', '35783926100'], ['Kevin Walker', '56133961300'], ['Sara Whitver', '57194760730']]
# ****this writes one file for each author dataset*****

for authorName,authorID in author_list:
    # create new empty dataFrame on each loop
    df = pd.DataFrame()
    # query search by Author ID
    q = ScopusSearch('AU-ID' +'(' + authorID + ')')
    # convert to dataframe
    df = pd.DataFrame(q.results)
    # Save to file
    df.to_csv(str(authorName).replace(' ','_') + "_" + str(authorID) + "_ScopusData" + ".tsv", sep = '\t', index=False)
    # delay two seconds between api calls to be nice to Elsevier servers
# load one of the files into pandas
df_author3 = pd.read_csv('Karen_Chapman_35783926100_ScopusData.tsv', delimiter='\t')
# df_author3.head(5) # view first 5
# get info about citedby_count
count    29.000000
mean      5.034483
std       5.703901
min       0.000000
25%       1.000000
50%       3.000000
75%       8.000000
max      21.000000
Name: citedby_count, dtype: float64
# get info about publication titles
count                                           29
unique                                          11
top       Behavioral and Social Sciences Librarian
freq                                             8
Name: publicationName, dtype: object