Visualize PLoS Article Level Metrics with Google Fusion Tables and Spreadsheets
- Introduction
- Summary ALM Data
- Combined Download Statistics
- Google Fusion Tables API
- Combined Download Statistics
- Additional Materials
Introduction
Article Level Metrics
The Public Library of Science collects metrics for each article they publish and have uploaded uploaded it to Google Fusion Tables. Metrics collected include citations, downloads, web page views, and community response. Previously the the data was available only as an Excel XLS file from the PLoS Article Level Metrics site.
There are 5 tables available:
Google Fusion Tables
Google Fusion Tables is a public data store that can be accessed through both a web browser and an API. Sorting, grouping, and other queries can be performed using an SQL-like query language. Query results can be exported and loaded by other data visualization tools such as Google Spreadsheets.
The Google Fusion Tables API uses an SQL-like query language to allow filtering, sorting, and aggregation. Unlike other online spreadsheet programs, large datasets up to 100mb are supported. Both web and API interfaces are available, including export to CSV. Importing into a Google Spreadsheet allows additional operations on the data and customized charting.
Web Based View
Each dataset can be viewed with the browser on Google Fusion Tables. For example, the Summary ALM Data table.

Data Updated
An updated version version of the ALM data is now available on Google Fusion Tables: PLoS Article Level Metrics on Google Fusion Tables Updated to V4.
Summary ALM Data
The Summary ALM Data table contains basic information about each article such as the article title, the journal it was published in, and the publication date. The metric information conists of citation counts, usage, and community metrics. The citation counts are from CrossRef, PubMed Central, and Scopus. Usage metrics are based on website page accesses and downloads: HTML page views, PDF downloads, and XML downloads. Community reactions are an assessment of Blog coverage, Trackbacks, and Social Bookmarks.
The Summary ALM Data table includes basic information and citation counts for each article. The columns are DOI, URL, Publication Date, Publication Year, Journal, Article Title, Citations, and Downloads.
Every table is referenced by an ID number. The ID number can be seen in File -> About.
The ID number for the Summary ALM Data table is 204244. This number will be used in API calls that reference the table.

Combined Download Statistics
In Combined Download Statistics the history of downloads for each month is available . The download history is also seperated by file type in the HTML, PDF, and XML tables.
Google Fusion Tables API
A range of queries similar to SQL are supported. See the full syntax reference.
Select
A SELECT statement is used to pick columns from a table. Options include FROM, WHERE, GROUP BY, OFFSET, and LIMIT.
For example, use an asterisk *to select all the columns from the Summary ALM Data table using its ID number:
SELECT * FROM 204244
Summary.Complete.Table.gft.sql
To export the results as CSV append the query to this base URL: http://tables.googlelabs.com/exporttable?query=
Remember to use URL encoding if needed, for example:
http://tables.googlelabs.com/exporttable?query=SELECT%20*%20FROM%20204244
Click Download CSV to download the output of each query.
When working with large datasets a result may contain many records. Use the LIMIT option to restrict the number of rows. For example, to select the first 10 results of the Summary ALM Data table:
SELECT * FROM 204244 LIMIT 10
Limit
Limit results to make sure they fit on a spreadsheet. Google Docs supports import up to 500kb.
SELECT * FROM 204244 LIMIT 10
Google Spreadsheets
The CSV output can be loaded in many applications such as Google Spreadsheets. The importData() function imports the results of the query and the spreadsheet will automatically update.
For example the formula: =importData(“http://tables.googlelabs.com/exporttable?query=SELECT * FROM 204244 LIMIT 10”)
Will import the results and display them in the spreadsheet (edit):
Columns
Choose which columns to include. Surround column names with single parenthesis for example such as for the Article Title column.
SELECT 'Article Title',DOI,URL FROM 204244 LIMIT 10
Sort
Sort by any column such as number of CrossRef citations. Download CSV
SELECT * FROM 204244 ORDER BY 'Citations - CrossRef' DESC LIMIT 10
Filter
Add a filter for a particular value, such as articles published in 2009 only.Download CSV
SELECT * FROM 204244 WHERE 'Publication Year' = 2009 ORDER BY 'Citations - CrossRef' DESC LIMIT 10
Special Characters
Besides using quotation marks special characters in column names can be escaped with backslash. This is needed when a column name itself contains a quotation mark.
SELECT '\'Research Article\' or \'non-Research Article\'?' FROM 204244 LIMIT 10
Column Reference
It may be easier to reference columns by number instead of using their names. Download CSV
SELECT col1,col2 FROM 204244 LIMIT 10
Combined Download Statistics
PLoS ALM v3 05182010 Combined Download Statistics ID 202272
Sort
Sort by the 2010-3 column to find papers with the most recent downloads. Download CSV
SELECT * FROM 202272 ORDER BY '2010-3' DESC LIMIT 10
Yearly Summary
Download totals over the past year:
SELECT SUM('2009-3'),SUM('2009-4'),SUM('2009-5'),SUM('2009-6'),SUM('2009-7'),SUM('2009-8'),SUM('2009-9'),SUM('2009-10'),SUM('2009-11'),SUM('2009-12'),SUM('2010-1'),SUM('2010-2'),SUM('2010-3') FROM 202272
Combined download sum history
SELECT SUM('2003-10'),SUM('2003-11'),SUM('2003-12'),SUM('2004-1'),SUM('2004-2'),SUM('2004-3'),SUM('2004-4'),SUM('2004-5'),SUM('2004-6'),SUM('2004-7'),SUM('2004-8'),SUM('2004-9'),SUM('2004-10'),SUM('2004-11'),SUM('2004-12'),SUM('2005-1'),SUM('2005-2'),SUM('2005-3'),SUM('2005-4'),SUM('2005-5'),SUM('2005-6'),SUM('2005-7'),SUM('2005-8'),SUM('2005-9'),SUM('2005-10'),SUM('2005-11'),SUM('2005-12'),SUM('2006-1'),SUM('2006-2'),SUM('2006-3'),SUM('2006-4'),SUM('2006-5'),SUM('2006-6'),SUM('2006-7'),SUM('2006-8'),SUM('2006-9'),SUM('2006-10'),SUM('2006-11'),SUM('2006-12'),SUM('2007-1'),SUM('2007-2'),SUM('2007-3'),SUM('2007-4'),SUM('2007-5'),SUM('2007-6'),SUM('2007-7'),SUM('2007-8'),SUM('2007-9'),SUM('2007-10'),SUM('2007-11'),SUM('2007-12'),SUM('2008-1'),SUM('2008-2'),SUM('2008-3'),SUM('2008-4'),SUM('2008-5'),SUM('2008-6'),SUM('2008-7'),SUM('2008-8'),SUM('2008-9'),SUM('2008-10'),SUM('2008-11'),SUM('2008-12'),SUM('2009-1'),SUM('2009-2'),SUM('2009-3'),SUM('2009-4'),SUM('2009-5'),SUM('2009-6'),SUM('2009-7'),SUM('2009-8'),SUM('2009-9'),SUM('2009-10'),SUM('2009-11'),SUM('2009-12'),SUM('2010-1'),SUM('2010-2'),SUM('2010-3') FROM 202272
Chart
After a little reformatting the download history can be seen in a chart.
Usage Breakdown
These queries can also be used with the type breakdown tables.
Breakdown by download type tables
PLoS ALM v3 05182010 HTML Views ID 202552
PLoS ALM v3 05182010 PDF Downloads ID 203967
PLoS ALM v3 05182010 XML Downloads ID 203785
Top 50 90 Days
There are several groupings. The most downloaded papers are 10.1371/journal.pbio.1000322 and 10.1371/journal.pone.0009505 which are new this month. Some have decreased since the previous month such as 10.1371/journal.pone.0009042 and 10.1371/journal.pone.0008733 while others such as 10.1371/journal.pcbi.1000361 and 10.1371/journal.pgen.1000862 have increased.
Additional Materials
[All Spreadsheets][gsheet] [gsheet]: https://docs.google.com/leaf?id=0B-D5dr31WtX4NDdlY2U3ZTAtMmM3MC00MDZkLTkzZTAtMjVjOTFlZmU0Mjlk&hl=en
[All sample files][samplefiles] [samplefiles]: https://github.com/mchelen/blog.opensci.info/tree/gh-pages/file/2011-06-14-Visualize-PLoS-Article-Level-Metrics-with-Google-Fusion-Tables-and-Spreadsheets
