Visualize PLoS Article Level Metrics with Google Fusion Tables and Spreadsheets

Introduction

Article Level Metrics

ALM data can be used to track article downloads over time.

The Public Library of Science collects information for each article they publish. These metrics include citations, downloads, web page views, and community responses. The data is available in an Excel XLS file on the PLoS Article Level Metrics site. Now the metric information has also been uploaded uploaded to Google Fusion Tables, a free service for sharing data.

The metric information is divided into 5 tables:

Google Fusion Tables

Google Fusion Tables is a public data storage service that can be accessed through both a web browser and an API. Large datasets up to 100mb can be uploaded, unlike other online spreadsheets which allow only 20 mb. The Google Fusion Tables API provides an SQL-like query language that allows sorting, grouping, and other queries. The query results can be exported in common formats such as CSV for use in other visualization tools including Google Spreadsheets.

Web Based View

Each dataset on Google Fusion Tables can be viewed through the web browser. The first 100 rows and several columns are shown, with each column sortable. The menu system allows additional operations such as filtering, aggregating, and visualizing.

Summary ALM Data

The Summary ALM Data table contains basic information about each article such as the article title, the journal it was published in, and the publication date. The metric information consists of citation counts, number of downloads, and community responses. The citation counts are from CrossRef, PubMed Central, and Scopus. Usage metrics are based on website page accesses and downloads: HTML page views, PDF downloads, and XML downloads. Community reactions are a measurement of coverage in specific blogs such as Postgenomic and Nature Blogs, along with all Trackbacks, and Social Bookmarks from CiteULike and Connotea.

The Summary ALM Data table can be browsed through the web based view.

Sort

Sort any column by clicking on column title. Results can be sorted in ascending or descending order.

Sorting by Publication Date in descending order shows the most recent articles.

Filter

The filter option allows matching for certain values.

Select a column to match, an operator such as equal, greater than, or less than, and the value to match. Filter for a Publication Year equal to 2005 to show only articles published in that year.

Aggregate

Aggregate performs operations across all rows of a column. The functions available include Sum, Average, Maximum, Minimum, and Count. The results can also be grouped together by a particular column.

To look at the total number of citations, pick the Sum for each type of citation. Then group the results by Journal.

The citations are then totaled for each Journal.

Visualize

Table ID


Every table is referenced by an ID number. The ID number can be seen in File -> About.

The ID number for the Summary ALM Data table is 204244. This number will be used in API calls that reference the table.

Google Fusion Tables API

The Google Fusion Tables API supports a range of queries similar to SQL. Some of the query statments available include SELECT, LIMIT, and SORT. See the syntax reference for a full list of commands.

Select

A SELECT statement is used to choose the desired columns from a table. Options include FROM, WHERE, GROUP BY, OFFSET, and LIMIT.

Use an asterisk * to select all the columns from the Summary ALM Data table which is identified with its ID number.

SELECT * FROM 204244

Limit Results

Use the LIMIT option to restrict the number of rows returned by any query. This makes the query results easier to work with in other programs such as Google Docs, which can only import up to 500kb of data. For example, to select the first 10 results of the Summary ALM Data table.

SELECT * FROM 204244 LIMIT 10

Example Spreadsheet - Download CSV

Export CSV

Results can be exported in CSV format by adding the query on to a base URL.

https://www.google.com/fusiontables/exporttable?query=

For example,

https://www.google.com/fusiontables/exporttable?query=SELECT * FROM 204244 LIMIT 10
Use URL encoding if needed.

Google Spreadsheets

The CSV output can be loaded into other applications such as Google Spreadsheets. The importData() function imports the results of the query and the spreadsheet will automatically keep it updated.

To import the query results CSV and display it in an Example Spreadsheet (edit):

=importData("https://www.google.com/fusiontables/exporttable?query=SELECT * FROM 204244 LIMIT 10")

Columns

The number of columns can be limited, like the number of rows. The needed columns can be selected by their names. Separate multiple names by commas. Surround column names with single parentheses ' if the name contains any special characters such as spaces.

SELECT 'Article Title',DOI,URL,'Publication Year','Citations - CrossRef' FROM 204244 LIMIT 10

Example Spreadsheet

Sort

Results can be sorted by any column using the ORDER BY option. The sort column must also be included in the SELECT statement. For example, to sort by the number of CrossRef citations with highest first:

SELECT 'Article Title',DOI,URL,'Publication Year','Citations - CrossRef' FROM 204244 ORDER BY 'Citations - CrossRef' DESC LIMIT 10

Example Spreadsheet

Filter

Filter the results to match some value with the WHERE option. For example, only articles that were published in 2009:

SELECT 'Article Title',DOI,URL,'Publication Year','Citations - CrossRef', FROM 204244 WHERE 'Publication Year' = 2009 ORDER BY 'Citations - CrossRef' DESC LIMIT 10

Example Spreadsheet

Special Characters

Quotation marks in a column name can be escaped with backslashes.

To select the 'Research Article' or 'non-Research Article'? column:

SELECT 'Article Title','\'Research Article\' or \'non-Research Article\'?' FROM 204244 LIMIT 10

Example Spreadsheet

Column Reference

The columns can also be referenced by number instead of names. The first column is col0, the next col1, and so on. To select the first 3 columns:

SELECT col0,col1,col2 FROM 204244 LIMIT 10

Example Spreadsheet

Combined Download Statistics

Combined Download Statistics shows the number of times each article was downloaded during a particular month. The download totals are also available separated by file type, in the HTML, PDF, and XML tables.

Sort

Sort by the 2010-3 column to show the articles that had the most downloads in March 2010.

SELECT doi,'2010-3' FROM 202272 ORDER BY '2010-3' DESC LIMIT 10

Example Spreadsheet

Total Monthly Download

Use the SUM() function to add up the downloads for every article in a given month. To find the total downloads in January, 2010:

SELECT SUM('2010-1') FROM 202272

Combined Download History

The complete history of downloads can be seen by summing all of the available columns.

SELECT SUM('2003-10'),SUM('2003-11'),SUM('2003-12'),SUM('2004-1'),SUM('2004-2'),SUM('2004-3'),SUM('2004-4'),SUM('2004-5'),SUM('2004-6'),SUM('2004-7'),SUM('2004-8'),SUM('2004-9'),SUM('2004-10'),SUM('2004-11'),SUM('2004-12'),SUM('2005-1'),SUM('2005-2'),SUM('2005-3'),SUM('2005-4'),SUM('2005-5'),SUM('2005-6'),SUM('2005-7'),SUM('2005-8'),SUM('2005-9'),SUM('2005-10'),SUM('2005-11'),SUM('2005-12'),SUM('2006-1'),SUM('2006-2'),SUM('2006-3'),SUM('2006-4'),SUM('2006-5'),SUM('2006-6'),SUM('2006-7'),SUM('2006-8'),SUM('2006-9'),SUM('2006-10'),SUM('2006-11'),SUM('2006-12'),SUM('2007-1'),SUM('2007-2'),SUM('2007-3'),SUM('2007-4'),SUM('2007-5'),SUM('2007-6'),SUM('2007-7'),SUM('2007-8'),SUM('2007-9'),SUM('2007-10'),SUM('2007-11'),SUM('2007-12'),SUM('2008-1'),SUM('2008-2'),SUM('2008-3'),SUM('2008-4'),SUM('2008-5'),SUM('2008-6'),SUM('2008-7'),SUM('2008-8'),SUM('2008-9'),SUM('2008-10'),SUM('2008-11'),SUM('2008-12'),SUM('2009-1'),SUM('2009-2'),SUM('2009-3'),SUM('2009-4'),SUM('2009-5'),SUM('2009-6'),SUM('2009-7'),SUM('2009-8'),SUM('2009-9'),SUM('2009-10'),SUM('2009-11'),SUM('2009-12'),SUM('2010-1'),SUM('2010-2'),SUM('2010-3') FROM 202272

Example Spreadsheet

Chart

The history of download volume can be plotted over multiple months. This chart shows the increase in downloads over time, with various peaks such as in January 2007, and troughs such as in April 2009.

Breakdown by Download Type

In addition to the total download counts, there is also a download history for each type of file format: HTML, PDF, and XML. The same query can be used again with these file type breakdown tables to compare the downloads of each file type.

Table IDs: HTML 202552 PDF 203967 XML 203785

Example Spreadsheet

Plotting the different file type histories together in a chart allows a comparison of their usage. This chart shows that HTML downloads have grown the quickest, and while PDF downloads have also increased, XML downloads peaked around 2008 and have since declined.

Top 10 Articles 6 Month History

In addition to the overall download trends, the metric data includes usage history for individual articles. By selecting articles with the highest usage in the most recent month,

SELECT doi,'2010-3' FROM 202272 ORDER BY '2010-3' DESC LIMIT 10

There are several groupings. The most downloaded papers are 10.1371/journal.pbio.1000322 and 10.1371/journal.pone.0009505 which are new this month. Some have decreased since the previous month such as 10.1371/journal.pone.0009042 and 10.1371/journal.pone.0008733 while others such as 10.1371/journal.pcbi.1000361 and 10.1371/journal.pgen.1000862 have increased.

Additional Materials

[All Spreadsheets][gsheet] [gsheet]: https://docs.google.com/leaf?id=0B-D5dr31WtX4NDdlY2U3ZTAtMmM3MC00MDZkLTkzZTAtMjVjOTFlZmU0Mjlk&hl=en

[All sample files][samplefiles] [samplefiles]: https://github.com/mchelen/blog.opensci.info/tree/gh-pages/file/2011-06-14-Visualize-PLoS-Article-Level-Metrics-with-Google-Fusion-Tables-and-Spreadsheets

Data Updated

An updated version of this ALM data is now also available on Google Fusion Tables, PLoS Article Level Metrics on Google Fusion Tables Updated to V4.