Visualize PLoS Article Level Metrics with Google Fusion Tables and Spreadsheets

Introduction

Article Level Metrics

The Public Library of Science collects metrics for each article they publish and have uploaded uploaded it to Google Fusion Tables. Metrics collected include citations, downloads, web page views, and community response. Previously the the data was available only as an Excel XLS file from the PLoS Article Level Metrics site.

There are 5 tables available:

Google Fusion Tables

Google Fusion Tables is a public data store that can be accessed through both a web browser and an API. Sorting, grouping, and other queries can be performed using an SQL-like query language. Query results can be exported and loaded by other data visualization tools such as Google Spreadsheets.

The Google Fusion Tables API uses an SQL-like query language to allow filtering, sorting, and aggregation. Unlike other online spreadsheet programs, large datasets up to 100mb are supported. Both web and API interfaces are available, including export to CSV. Importing into a Google Spreadsheet allows additional operations on the data and customized charting.

Web Based View

Each dataset can be viewed with the browser on Google Fusion Tables. For example, the Summary ALM Data table.

Data Updated

An updated version version of the ALM data is now available on Google Fusion Tables: PLoS Article Level Metrics on Google Fusion Tables Updated to V4.

Summary ALM Data

The Summary ALM Data table contains basic information about each article such as the article title, the journal it was published in, and the publication date. The metric information conists of citation counts, usage, and community metrics. The citation counts are from CrossRef, PubMed Central, and Scopus. Usage metrics are based on website page accesses and downloads: HTML page views, PDF downloads, and XML downloads. Community reactions are an assessment of Blog coverage, Trackbacks, and Social Bookmarks.

The Summary ALM Data table includes basic information and citation counts for each article. The columns are DOI, URL, Publication Date, Publication Year, Journal, Article Title, Citations, and Downloads.


Every table is referenced by an ID number. The ID number can be seen in File -> About.

The ID number for the Summary ALM Data table is 204244. This number will be used in API calls that reference the table.

Combined Download Statistics

In Combined Download Statistics the history of downloads for each month is available . The download history is also seperated by file type in the HTML, PDF, and XML tables.

Google Fusion Tables API

A range of queries similar to SQL are supported. See the full syntax reference.

Select

A SELECT statement is used to pick columns from a table. Options include FROM, WHERE, GROUP BY, OFFSET, and LIMIT.

For example, use an asterisk *to select all the columns from the Summary ALM Data table using its ID number:

    SELECT * FROM 204244
  

Summary.Complete.Table.gft.sql

To export the results as CSV append the query to this base URL: http://tables.googlelabs.com/exporttable?query=

Remember to use URL encoding if needed, for example:

http://tables.googlelabs.com/exporttable?query=SELECT%20*%20FROM%20204244

Click Download CSV to download the output of each query.

When working with large datasets a result may contain many records. Use the LIMIT option to restrict the number of rows. For example, to select the first 10 results of the Summary ALM Data table:

SELECT * FROM 204244 LIMIT 10

Download CSV

Limit

Limit results to make sure they fit on a spreadsheet. Google Docs supports import up to 500kb.

    SELECT * FROM 204244 LIMIT 10
  

Download CSV

Google Spreadsheets

The CSV output can be loaded in many applications such as Google Spreadsheets. The importData() function imports the results of the query and the spreadsheet will automatically update.

For example the formula: =importData(“http://tables.googlelabs.com/exporttable?query=SELECT * FROM 204244 LIMIT 10”)

Will import the results and display them in the spreadsheet (edit):

Columns

Choose which columns to include. Surround column names with single parenthesis for example such as for the Article Title column.

Download CSV

    SELECT 'Article Title',DOI,URL FROM 204244 LIMIT 10
  

Sort

Sort by any column such as number of CrossRef citations. Download CSV

    SELECT * FROM 204244 ORDER BY 'Citations - CrossRef' DESC LIMIT 10
  

Filter

Add a filter for a particular value, such as articles published in 2009 only.Download CSV  

    SELECT * FROM 204244 WHERE 'Publication Year' = 2009 ORDER BY 'Citations - CrossRef' DESC LIMIT 10
  

Special Characters

Besides using quotation marks special characters in column names can be escaped with backslash. This is needed when a column name itself contains a quotation mark.

    SELECT '\'Research Article\' or \'non-Research Article\'?' FROM 204244 LIMIT 10
  

Column Reference

It may be easier to reference columns by number instead of using their names. Download CSV

    SELECT col1,col2 FROM 204244 LIMIT 10
  

Combined Download Statistics

PLoS ALM v3 05182010 Combined Download Statistics ID 202272

Sort

Sort by the 2010-3 column to find papers with the most recent downloads. Download CSV

    SELECT * FROM 202272 ORDER BY '2010-3' DESC LIMIT 10
  

Yearly Summary

Download totals over the past year:

    SELECT SUM('2009-3'),SUM('2009-4'),SUM('2009-5'),SUM('2009-6'),SUM('2009-7'),SUM('2009-8'),SUM('2009-9'),SUM('2009-10'),SUM('2009-11'),SUM('2009-12'),SUM('2010-1'),SUM('2010-2'),SUM('2010-3') FROM 202272
  

Download CSV

Combined download sum history

    SELECT SUM('2003-10'),SUM('2003-11'),SUM('2003-12'),SUM('2004-1'),SUM('2004-2'),SUM('2004-3'),SUM('2004-4'),SUM('2004-5'),SUM('2004-6'),SUM('2004-7'),SUM('2004-8'),SUM('2004-9'),SUM('2004-10'),SUM('2004-11'),SUM('2004-12'),SUM('2005-1'),SUM('2005-2'),SUM('2005-3'),SUM('2005-4'),SUM('2005-5'),SUM('2005-6'),SUM('2005-7'),SUM('2005-8'),SUM('2005-9'),SUM('2005-10'),SUM('2005-11'),SUM('2005-12'),SUM('2006-1'),SUM('2006-2'),SUM('2006-3'),SUM('2006-4'),SUM('2006-5'),SUM('2006-6'),SUM('2006-7'),SUM('2006-8'),SUM('2006-9'),SUM('2006-10'),SUM('2006-11'),SUM('2006-12'),SUM('2007-1'),SUM('2007-2'),SUM('2007-3'),SUM('2007-4'),SUM('2007-5'),SUM('2007-6'),SUM('2007-7'),SUM('2007-8'),SUM('2007-9'),SUM('2007-10'),SUM('2007-11'),SUM('2007-12'),SUM('2008-1'),SUM('2008-2'),SUM('2008-3'),SUM('2008-4'),SUM('2008-5'),SUM('2008-6'),SUM('2008-7'),SUM('2008-8'),SUM('2008-9'),SUM('2008-10'),SUM('2008-11'),SUM('2008-12'),SUM('2009-1'),SUM('2009-2'),SUM('2009-3'),SUM('2009-4'),SUM('2009-5'),SUM('2009-6'),SUM('2009-7'),SUM('2009-8'),SUM('2009-9'),SUM('2009-10'),SUM('2009-11'),SUM('2009-12'),SUM('2010-1'),SUM('2010-2'),SUM('2010-3') FROM 202272
  

Download CSV

Chart

After a little reformatting the download history can be seen in a chart.

Usage Breakdown

These queries can also be used with the type breakdown tables.

Breakdown by download type tables

PLoS ALM v3 05182010 HTML Views ID 202552

PLoS ALM v3 05182010 PDF Downloads ID 203967

PLoS ALM v3 05182010 XML Downloads ID 203785

Top 50 90 Days

There are several groupings. The most downloaded papers are 10.1371/journal.pbio.1000322 and 10.1371/journal.pone.0009505 which are new this month. Some have decreased since the previous month such as 10.1371/journal.pone.0009042 and 10.1371/journal.pone.0008733 while others such as 10.1371/journal.pcbi.1000361 and 10.1371/journal.pgen.1000862 have increased.

Additional Materials

[All Spreadsheets][gsheet] [gsheet]: https://docs.google.com/leaf?id=0B-D5dr31WtX4NDdlY2U3ZTAtMmM3MC00MDZkLTkzZTAtMjVjOTFlZmU0Mjlk&hl=en

[All sample files][samplefiles] [samplefiles]: https://github.com/mchelen/blog.opensci.info/tree/gh-pages/file/2011-06-14-Visualize-PLoS-Article-Level-Metrics-with-Google-Fusion-Tables-and-Spreadsheets

Posted on 14 Jun 2011
blog comments powered by Disqus