Visualize PLoS ALM with Google Charts and Fusion Tables

Learn to filter, sort, and visualize Article Level Metrics with Google Fusion Tables and Google Charts

All scientific journal articles published by the Public Library of Science (PLoS) have Article Level Metrics (ALM) collected about them. These metrics include citation counts, web views, online bookmarks, and more. The metrics are available through a variety of sources including a bulk CSV file.

The CSV data can be uploaded to Google Fusion Tables, a free data hosting service that allows web-based browsing, filtering, and aggregation. The Fusion Tables API supports SQL-like queries and integrates tightly with Google Charts allowing visualization of results in a web browser using Javascript.

The advantage of Javascript compared to Flash or static images is that it works in any web browser including mobile devices, and allows interaction with the data. This makes the visualization easy to share with a large online audience. Javascript is also easy to update if you want to change the query. Visitors will see the new chart version as soon as the script is altered. This requires fewer manual steps than exporting a new image and uploading it.

Google Fusion Tables

The article metrics CSV file can be uploaded directly to Fusion Tables and browsed online. The web interface allows sorting, filtering, and other operations.

Browsing the data in Fusion Tables after uploading the CSV

For example, we can filter by publication_date to show only articles from 2013, and sort by the most CrossRef citations.

2013 sorted by CrossRef

Fusion Tables web interface shows a filtered, sorted result

Fusion Tables API

The query options from the web interface can also be used through an API. The syntax is similar to an SQL query. To see the top 10 CrossRef cited papers from 2013:

SELECT title, crossref, scopus, pubmed FROM 1zkfQ7rtG9UI5a8rPDk2bpD6d0QbgP63h2v2l9YzW WHERE publication_date >= '2013-01-01' AND publication_date < '2014-01-01' ORDER BY crossref desc LIMIT 10

The query commands closely matche SQL:

  • SELECT Pick the columns to include in the results.
  • FROM Use the unique identifier for each data table which can be found in File > About this table or in the docid= part of the table URL.
  • WHERE Filter the results by the value of a column. Dates within a certain range can be filtered by using inequality operators with AND for multiple filters. Use >= to include articles published on or after 2013-01-01 and < to include articles published only before 2014-01-01.
  • ORDER BY Sort the results using a selected column and in the chosen sort direction.
  • LIMIT Return only the specified number of results. This prevents loading too much data into a browser causing it to slow down, and also is useful for testing queries.

The results of a query can be downloaded as a CSV by appending the query to this URL: https://www.google.com/fusiontables/exporttable?query=

To make testing queries during development easier, use the Hurl.it HTTP Request tool.

Google Charts

Now that we have a working query, we can display the results using Google Charts. Because of the tight integration between Fusion Tables and Charts, the query can be passed directly as a Javascript parameter.

Starting from this sample code we can modify a few lines of the drawChart() function to visualize our data.

We can modify the function parameters of the sample code to make a chart with our data.

 1 google.visualization.drawChart({
 2         "containerId": "visualization_div",
 3         "dataSourceUrl": 'http://www.google.com/fusiontables/gvizdata?tq=',
 4         "query":"SELECT Year, Austria, Bulgaria, Denmark, Greece FROM 641716",
 5         "refreshInterval": 5,
 6         "chartType": "BarChart",
 7         "options": {
 8           "title":"Yearly Coffee Consumption by Country",
 9           "vAxis": {"title": "Year"},
10           "hAxis": {"title": "Cups"}
11         }
12       });
  • Line 4 is where the query value is set. Replace the sample query with our custom query.
  • Remove refreshInterval on line 5 because the source data is not being changed.
  • Instead of BarChart, change chartType to ColumnChart to select the type of chart.

Change these settings in the options area:

  • title A title to describe the chart.
  • vAxis Control settings for the vertical axis, such as title to show what the axis is measuring.

Modified Version

 1 google.visualization.drawChart({
 2         "containerId": "visualization_div",
 3         "dataSourceUrl": 'http://www.google.com/fusiontables/gvizdata?tq=',
 4         "query":"select title,crossref,scopus,pubmed from 1zkfQ7rtG9UI5a8rPDk2bpD6d0QbgP63h2v2l9YzW where publication_date >= '2013-01-01 00:00:00' and publication_date < '2014-01-01 00:00:00' order by crossref desc limit 10",
 5         "chartType": "ColumnChart",
 6         "options": {
 7           "title": "Top 10 Crossref Cited PLoS Articles from 2013",
 8           "vAxis": {"title": "Citations"},
 9         }
10       });

The chart ouputs to a visualization_div that must exist in the HTML. Like any HTML element, it can be styled, for example with a desired height and width. Adding text such as “Loading…” inside the div helps prevent users from being confused by a blank space while the visualization loads.

<div id="visualization_div" style="width: 100%; height: 800px;">Loading...</div>

Results

The final chart can be embedded in a web page or displayed on its own.

Loading...

The result is a query performed live on a dataset, with the results visualized in any web browser. We can change the query by editing a single file, and the new results will be displayed instantly. The query and visualization can even be edited live in a web browser. This is an easy way to experiment and to share visualizations with others.

The chart allows interaction because it uses Javascript. When the user hovers their mouse on a column, a label appears showing the full article title and an exact citation count. This provides important detail without taking additional screen space or overloading the user with information.

The chart shows the top CrossRef cited articles for 2013 and compares their citation counts. All of these articles had equal or more Scopus citations than CrossRef, except for “The Effectiveness of Mobile-Health Technology-Based Health…” which received more CrossRef citations than Scopus. This difference may be due to article awareness, citation inclusion methods, or publication date. By displaying the data visually, the viewer can identify outliers for further investigatation.

Next Steps

Fusion Tables lets anyone develop their own query to run on the PLoS ALM dataset. This query can use SQL-like operations and runs on the entire 100k+ rows of data. Calculations on the data such as SUM() and AVERAGE() can also be applied.

Google Charts also supports more user interaction, for example filtering by user selected year or journal. Extra formatting can be added to the chart, such as links to read the full articles. There are also a range of chart types available to display query results.

When PLoS releases updated versions of the CSV data, you can upload it to the Fusion Table, and the associated chart will reflect the new version of the data.