Article Level Metrics
The Public Library of Science collects information for each article they publish. These metrics include citations, downloads, web page views, and community responses. The data is available in an Excel XLS file on the PLoS Article Level Metrics site. Now the metric information has also been uploaded uploaded to Google Fusion Tables, a free service for sharing data.
- Summary ALM Data
- Google Fusion Tables API
- Combined Download Statistics
- Additional Materials
- Data Updated
The metric information is divided into 5 tables:
Google Fusion Tables
Google Fusion Tables is a public data storage service that can be accessed through both a web browser and an API. Large datasets up to 100mb can be uploaded, unlike other online spreadsheets which allow only 20 mb. The Google Fusion Tables API provides an SQL-like query language that allows sorting, grouping, and other queries. The query results can be exported in common formats such as CSV for use in other visualization tools including Google Spreadsheets.
Web Based View
Each dataset on Google Fusion Tables can be viewed through the web browser. The first 100 rows and several columns are shown, with each column sortable. The menu system allows additional operations such as filtering, aggregating, and visualizing.
Summary ALM Data
The Summary ALM Data table contains basic information about each article such as the article title, the journal it was published in, and the publication date. The metric information consists of citation counts, number of downloads, and community responses. The citation counts are from CrossRef, PubMed Central, and Scopus. Usage metrics are based on website page accesses and downloads: HTML page views, PDF downloads, and XML downloads. Community reactions are a measurement of coverage in specific blogs such as Postgenomic and Nature Blogs, along with all Trackbacks, and Social Bookmarks from CiteULike and Connotea.
The Summary ALM Data table can be browsed through the web based view.
Aggregate performs operations across all rows of a column. The functions available include Sum, Average, Maximum, Minimum, and Count. The results can also be grouped together by a particular column.
Every table is referenced by an ID number. The ID number can be seen in File -> About.
The ID number for the Summary ALM Data table is
204244. This number will be used in API calls that reference the table.
Google Fusion Tables API
SELECT statement is used to choose the desired columns from a table. Options include
Use an asterisk
* to select all the columns from the Summary ALM Data table which is identified with its ID number.
SELECT * FROM 204244
LIMIT option to restrict the number of rows returned by any query. This makes the query results easier to work with in other programs such as Google Docs, which can only import up to 500kb of data.
For example, to select the first 10 results of the Summary ALM Data table.
SELECT * FROM 204244 LIMIT 10
Results can be exported in CSV format by adding the query on to a base URL.
https://www.google.com/fusiontables/exporttable?query=SELECT * FROM 204244 LIMIT 10 Use URL encoding if needed.
The CSV output can be loaded into other applications such as Google Spreadsheets. The
importData() function imports the results of the query and the spreadsheet will automatically keep it updated.
=importData("https://www.google.com/fusiontables/exporttable?query=SELECT * FROM 204244 LIMIT 10")
The number of columns can be limited, like the number of rows. The needed columns can be selected by their names. Separate multiple names by commas. Surround column names with single parentheses
' if the name contains any special characters such as spaces.
SELECT 'Article Title',DOI,URL,'Publication Year','Citations - CrossRef' FROM 204244 LIMIT 10
Results can be sorted by any column using the
ORDER BY option. The sort column must also be included in the
SELECT statement. For example, to sort by the number of CrossRef citations with highest first:
SELECT 'Article Title',DOI,URL,'Publication Year','Citations - CrossRef' FROM 204244 ORDER BY 'Citations - CrossRef' DESC LIMIT 10
Filter the results to match some value with the
WHERE option. For example, only articles that were published in 2009:
SELECT 'Article Title',DOI,URL,'Publication Year','Citations - CrossRef', FROM 204244 WHERE 'Publication Year' = 2009 ORDER BY 'Citations - CrossRef' DESC LIMIT 10
Quotation marks in a column name can be escaped with backslashes.
To select the
'Research Article' or 'non-Research Article'? column:
SELECT 'Article Title','\'Research Article\' or \'non-Research Article\'?' FROM 204244 LIMIT 10
The columns can also be referenced by number instead of names. The first column is
col0, the next
col1, and so on. To select the first 3 columns:
SELECT col0,col1,col2 FROM 204244 LIMIT 10
Combined Download Statistics
Combined Download Statistics shows the number of times each article was downloaded during a particular month. The download totals are also available separated by file type, in the HTML, PDF, and XML tables.
Sort by the
2010-3 column to show the articles that had the most downloads in March 2010.
SELECT doi,'2010-3' FROM 202272 ORDER BY '2010-3' DESC LIMIT 10
Total Monthly Download
SUM() function to add up the downloads for every article in a given month. To find the total downloads in January, 2010:
SELECT SUM('2010-1') FROM 202272
Combined Download History
The complete history of downloads can be seen by summing all of the available columns.
The history of download volume can be plotted over multiple months. This chart shows the increase in downloads over time, with various peaks such as in January 2007, and troughs such as in April 2009.
Breakdown by Download Type
In addition to the total download counts, there is also a download history for each type of file format: HTML, PDF, and XML. The same query can be used again with these file type breakdown tables to compare the downloads of each file type.
Plotting the different file type histories together in a chart allows a comparison of their usage. This chart shows that HTML downloads have grown the quickest, and while PDF downloads have also increased, XML downloads peaked around 2008 and have since declined.
Top 10 Articles 6 Month History
In addition to the overall download trends, the metric data includes usage history for individual articles. By selecting articles with the highest usage in the most recent month,
SELECT doi,'2010-3' FROM 202272 ORDER BY '2010-3' DESC LIMIT 10
There are several groupings. The most downloaded papers are 10.1371/journal.pbio.1000322 and 10.1371/journal.pone.0009505 which are new this month. Some have decreased since the previous month such as 10.1371/journal.pone.0009042 and 10.1371/journal.pone.0008733 while others such as 10.1371/journal.pcbi.1000361 and 10.1371/journal.pgen.1000862 have increased.
[All Spreadsheets][gsheet] [gsheet]: https://docs.google.com/leaf?id=0B-D5dr31WtX4NDdlY2U3ZTAtMmM3MC00MDZkLTkzZTAtMjVjOTFlZmU0Mjlk&hl=en
[All sample files][samplefiles] [samplefiles]: https://github.com/mchelen/blog.opensci.info/tree/gh-pages/file/2011-06-14-Visualize-PLoS-Article-Level-Metrics-with-Google-Fusion-Tables-and-Spreadsheets
An updated version of this ALM data is now also available on Google Fusion Tables, PLoS Article Level Metrics on Google Fusion Tables Updated to V4.