Scopus API in Matlab#

by Anastasia Ramig

These recipe examples use the Elsevier Scopus API. Code was tested with MATLAB R2021b and sample data downloaded from the Scopus API on April 26, 2022 via http://api.elsevier.com and http://www.scopus.com. This tutorial content is intended to help facillitate academic research. Before continuing or reusing any of this code, please be aware of Elsevier’s API policies and appropiate use-cases. You will also need to register for an API key in order to use the Scopus API.

Setup#

We will start by setting up the API key. Save your key in a text file in the same directory as the current Matlab folder and import your key as follows:

%% import API key from file
myAPIKey = importdata("apiKey.txt");

1. Get Author Data#

Number of Records for Author#

%% setup API information and pull data
api_url = "https://api.elsevier.com/content/search/scopus?query=";
author_id = "AU-ID(55764087400)&apiKey=";
q = webread(api_url + author_id + myAPIKey)

Output:

q = struct with fields:
   search_results: [1x1 struct]
%% create an array of ones to pre-allocate doi_list
doi_list = {ones(length(q.search_results.entry), 1)};

%% create a list of dois from the data
for i = 1:length(q.search_results.entry)
   doi_list{i} = q.search_results.entry{i,1}.prism_doi;
end
doi_list

Output:

doi_list = 1x20 cell
'10.1021/acs.jchemed.1c00904''10.5860/crln.82.9.428''10.1021/acs.iecr.8b02573''10.1021/acs.jchemed.6b00602''10.5062/F4TD9VBX''10.1021/acs.macromol.6b02005''10.1186/s13321-016-0181-z''10.1021/acs.chemmater.5b04431''10.1021/acs.jchemed.5b00512''10.1021/acs.jchemed.5b00375''10.5860/crln.76.9.9384''10.5860/crln.76.2.9259''10.1021/ed400887t''10.1016/j.acalib.2014.03.015''10.5062/F4XS5SB9''10.1021/ma300328u''10.1021/mz200108a''10.1021/ma201170y''10.1021/ma200184u''10.1021/cm102374t'
%% create an array of ones to pre-allocate titles_list
titles_list = {ones(length(q.search_results.entry), 1)};

%% create a list of titles from the data
for i = 1:length(q.search_results.entry)
   titles_list{i} = q.search_results.entry{i,1}.dc_title;
end
titles_list

Output:

titles_list = 1x20 cell
'Using NCBI Entrez Direct (EDirect) for Small Molecule Chemical Informati…  'Using the linux operating system full-time tips and experiences from a s  'Analysis of the Frequency and Diversity of 1,3-Dialkylimidazolium Ionic …  'Rapid Access to Multicolor Three-Dimensional Printed Chemistry and Bioch  'Text analysis of chemistry thesis and dissertation titles''Phototunable Thermoplastic Elastomer Hydrogel Networks''Programmatic conversion of crystal structures into 3D printable files us…  'Dangling-End Double Networks: Tapping Hidden Toughness in Highly Swollen  'Replacing the Traditional Graduate Chemistry Literature Seminar with a C…  '3D Printed Block Copolymer Nanostructures''Hypotheses in librarianship: Applying the scientific method''Recruiting students to campus: Creating tangible and digital products in  '3D printed molecules and extended solid models for teaching symmetry and…  'Repurposing Space in a Science and Engineering Library: Considerations f  'A model for managing 3D printing services in academic libraries''Morphological phase behavior of poly(RTIL)-containing diblock copolymer …  'Network formation in an orthogonally self-assembling system''Access to nanostructured hydrogel networks through photocured body-cente  'Synthesis and ordered phase separation of imidazolium-based alkyl-ionic …  'Thermally stable photocuring chemistry for selective morphological trapp
%% create an array of ones to pre-allocate citedby_count
citedby_count = {ones(length(q.search_results.entry), 1)};

%% create a list of counts of how much each title was cited
for i = 1:length(q.search_results.entry)
   citedby_count{i} = q.search_results.entry{i,1}.citedby_count;
end
citedby_count

Output:

citedby_count = 1x20 cell
'0'          '0'          '17'         '25'         '5'          '11'         '20'         '6'          '10'         '25'         '0'          '0'          '98'         '6'          '34'         '40'         '31'         '18'         '45'         '11'
%% find the total number of cites
citesTotal = str2double(citedby_count);
totalCites = sum(citesTotal)

Output:

totalCites = 402

2. Get Author Data in a Loop#

Number of Records for Author#

%% import author text data as a cell array
authorList = importdata("authorData.txt")

Output:

authorList = 5x1 cell
'{Emy Decker, 36660678600}'
'{Lindsey Lowry, 57210944451}'
'{Karen Chapman, 35783926100}'
'{Kevin Walker, 56133961300}'
'{Sara Whitver, 57194760730}'
%% create a list of author names and delete the extra bracket from it
authorList2 = cellfun(@(x) strsplit(x, ","), authorList, 'UniformOutput', false);
for i = 1:length(authorList2)
   str = authorList2{i, 1}{1, 1};
   old = "{";
   new = "";
   authorList2{i, 1}{1, 1} = replace(str, old, new);
end

%% extract the author ids
author_ids = {ones(length(authorList2), 1)};
for i = 1:length(authorList2)
   pat = digitsPattern;
   author_ids{i} = extract(authorList2{i, 1}{1, 2}, pat);
end
%% preallocate an array for the number of records
numRecords = {ones(length(author_ids), 1)};

%% find the number of records for each author and add it to the author list
for i = 1:length(numRecords{1, 1})
   q1 = webread(api_url + "AU-ID(" + author_ids{1, i} + ")&apiKey=" + myAPIKey);
   numRecords{i} = length(q1.search_results.entry);
   pause(1)
   authorList2{i, 1}{1, 3} = numRecords{i};
end
disp(cell2table(authorList2))

Output:

                     authorList2
________________________________________________

{'Emy Decker'   }    {' 36660678600}'}    {[14]}
{'Lindsey Lowry'}    {' 57210944451}'}    {[ 4]}
{'Karen Chapman'}    {' 35783926100}'}    {[25]}
{'Kevin Walker' }    {' 56133961300}'}    {[ 8]}
{'Sara Whitver' }    {' 57194760730}'}    {[ 4]}

Get Record Data#

clear info
%% extract the dois and cites for each author
for i = 1:length(author_ids)
   q_records = webread(api_url + "AU-ID(" + author_ids{1, i}+")&apiKey=" + myAPIKey);
   n = length(q_records.search_results.entry);

   %% preallocate cell array for the dois and cites
   doiList = cell(1, length(author_ids));
   citeList = cell(1, length(author_ids));
   for k = 1:n
      try
            doiList{1, i}{k, 1} = q_records.search_results.entry{k, 1}.prism_doi;
            citeList{1, i}{k, 1} = q_records.search_results.entry{k, 1}.citedby_count;
      catch
      end
   end
   pause(1)

   %% add the dois and cites to an overall information array
   info{1, 1}{1, i} = doiList{1, i};
   info{2, 1}{1, i} = citeList{1, i};
end

%% create arrays for the dois and cites
dois = {};
cites = {};
for i = 1:width(info{1, 1})
   dois = vertcat(dois, info{1, 1}{1, i});
   cites = vertcat(cites, info{2, 1}{1, i});
end
%% create a conclusive array
authorArray = horzcat(dois, cites);
nameArray = {};

%% create an array of author names
for i = 1:(length(numRecords))
   nameLength = int16(numRecords{i});
   authorName = cellstr(repmat(authorList2{i, 1}{1, 1}, nameLength, 1));
   nameArray = vertcat(nameArray, authorName);
end

%% add the author names to the informational array
authorArray = horzcat(authorArray, nameArray)

Output:

authorArray = 55x3 cell
   1 2       3
1    '10.1108/RSR-08-2021-0051'      '0'     'Emy Decker'
2    '10.1080/1072303X.2021.1929642' '0'     'Emy Decker'
3    '10.1080/15367967.2021.1900740' '8'     'Emy Decker'
4    '10.1080/15367967.2020.1826951' '0'     'Emy Decker'
5    '10.1080/10691316.2020.1781725' '0'     'Emy Decker'
6    '10.1145/3347709.3347805'       '0'     'Emy Decker'
7    '10.4018/978-1-5225-5631-2.ch09'        '3'     'Emy Decker'
...
...
...

Save Record Data to a file#

%% save the search for each author to a mat file
for author = 1:length(author_ids)
   authorName = authorList2{author, 1}{1, 1};
   q2 = webread(api_url + "AU-ID" + "(" + author_ids{1, author} + ")&apiKey=" + myAPIKey);
   pause(1)
   filename = authorName + ".mat";
   save filename q2;
end
%% save the author arrays to individual text files
for i = 1:(length(numRecords))
   clear individualAuthorData;
   individualDois = info{1, 1}{1, i};
   individualCites = info{2, 1}{1, i};

   nameLength = int16(numRecords{i});
   authorName = cellstr(repmat(authorList2{i, 1}{1, 1}, nameLength, 1));

   individualAuthorData = horzcat(individualDois, individualCites);
   individualAuthorData = horzcat(individualAuthorData, authorName);

   writecell(individualAuthorData, (authorList2{i, 1}{1, 1} + ".txt"), "Delimiter", "\t");
end