Open Access Methodology

The Stanford Data Miner: a novel approach for integrating and exploring heterogeneous immunological data

Janet C Siebert1*, Wes Munsil1, Yael Rosenberg-Hasson23, Mark M Davis234 and Holden T Maecker23

Author Affiliations

1 CytoAnalytics, Denver, CO, USA

2 The Institute for Immunity, Transplantation, and Infection, Stanford University, Stanford, CA, USA

3 The Department of Microbiology and Immunology, Stanford University, Stanford, CA, USA

4 The Howard Hughes Medical Institute, Chevy Chase, MD, USA

For all author emails, please log on.

Journal of Translational Medicine 2012, 10:62  doi:10.1186/1479-5876-10-62

Published: 28 March 2012

Abstract

Background

Systems-level approaches are increasingly common in both murine and human translational studies. These approaches employ multiple high information content assays. As a result, there is a need for tools to integrate heterogeneous types of laboratory and clinical/demographic data, and to allow the exploration of that data by aggregating and/or segregating results based on particular variables (e.g., mean cytokine levels by age and gender).

Methods

Here we describe the application of standard data warehousing tools to create a novel environment for user-driven upload, integration, and exploration of heterogeneous data. The system presented here currently supports flow cytometry and immunoassays performed in the Stanford Human Immune Monitoring Center, but could be applied more generally.

Results

Users upload assay results contained in platform-specific spreadsheets of a defined format, and clinical and demographic data in spreadsheets of flexible format. Users then map sample IDs to connect the assay results with the metadata. An OLAP (on-line analytical processing) data exploration interface allows filtering and display of various dimensions (e.g., Luminex analytes in rows, treatment group in columns, filtered on a particular study). Statistics such as mean, median, and N can be displayed. The views can be expanded or contracted to aggregate or segregate data at various levels. Individual-level data is accessible with a single click. The result is a user-driven system that permits data integration and exploration in a variety of settings. We show how the system can be used to find gender-specific differences in serum cytokine levels, and compare them across experiments and assay types.

Conclusions

We have used the tools and techniques of data warehousing, including open-source business intelligence software, to support investigator-driven data integration and mining of diverse immunological data.

Keywords:
Systems immunology; Data integration; Data warehousing; OLAP