NCI Biomedical Informatics Blog
Access to the Cloud Pilot Platforms
You can register to use one of more of the CGC platforms using the instructions below.
This document also has detailed information and links to help you get started.
You can also download the CGC Cloud Pilots Overview document for more information.
Cloud credits are available for researchers working on the CGC Pilot platforms. Find out more.
The Broad Institute’s FireCloud democratizes access to TCGA data and facilitates collaboration by providing a robust, scalable platform accessible to the community at large. Using the elastic compute capacity of Google Cloud, FireCloud empowers analysts, tool developers and production managers to perform large-scale analysis, engage in data curation, and store or publish results. Users can upload their own analysis methods and data to workspaces or run the Broad Institute’s best practice tools and pipelines on pre-loaded data.
Institute for Systems Biology
The ISB Cancer Genomics Cloud (ISB-CGC) is a cloud-based platform that provides interactive and programmatic access to TCGA data, leveraging many aspects of the Google Cloud Platform. The interactive ISB-CGC web-app allows scientists to interactively define and compare cohorts, examine the underlying molecular data for specific genes or pathways of interest, and share insights with collaborators. For computational users, programmatic interfaces and GCP tools such as BigQuery, Genomics, and Compute Engine allow users to perform complex queries from R or Python scripts, or run Dockerized workflows on sequence data available in Cloud Storage.
Seven Bridges Genomics
The Seven Bridges Genomics Cloud Pilot is a platform that enables researchers to collaborate on the analysis of large cancer genomics datasets in a secure, reproducible and scalable manner. A rich query system allows researchers to find exactly the data they are interested in and combine it with their own private data. Native implementation of the Common Workflow Language specification makes it easy for developers, analysts and bench biologists to deploy, customize and run reproducible analysis methods to learn from genomics data faster.
Anyone interested in working with controlled access data on any of the cloud platforms will need dbGaP Access. All the members of your lab team must have such access, or be authorized downloaders.
Workshops and Information Sessions
To learn more about the NCI Cancer Genomics Cloud Projects and the features of each platform, please visit the events page.
Frequently Asked Questions
What data is available in the CGC Pilots?
The Pilots will have all the data currently stored in The Cancer Genome Atlas (TCGA) Data Coordination Center (DCC) and cgHub. Users may also upload their own data to the Clouds, but this data may not be available to others – researchers may choose to keep their data private.
How do I apply for access to controlled TCGA data?
If you are an NIH researcher or already have an eRA Commons account, then visit https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi to request dbGaP access for yourself or members of your team who will be working with controlled-access data.
If you are not an NIH researcher and do not have an eRA Commons account, you will need to register for one. First, you will need an eRA Commons account. If you do not have one, please visit the eRA Commons website and complete the registration form. Once you have an eRA Commons account, you can visit https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi to request dbGaP access.
If you have any questions during the application process, please contact the dbGaP Help Desk.
How much will each pilot manage and distribute storage and compute credits?
Each pilot is managing the use of the platform differently.
Broad Institute: The Broad Institute is not providing free credits for use of the FireCloud platform. Users will need to register and pay for compute and storage as they do their work. However, Broad will provide a number of demonstration workspaces that can be cloned and used to facilitate evaluation activities.
Institute for Systems Biology: As researchers register with the ISB Cloud Pilot they can submit a proposal to ISB explaining the work they plan on doing. Based on the work proposed, ISB will provide a certain number of credits to accomplish the work.
Seven Bridges Genomics: Researchers are guaranteed at least $100 of storage and computation credits (enough to run 40 to 100 RNA analyses). Credits are distributed to researchers based on how much and how frequently they use the platform.
What is the Genomic Data Commons (GDC), and how does this relate to the Cancer Genomics Cloud Pilots?
The National Cancer Institute is establishing the NCI Genomic Data Commons to store, analyze and distribute cancer genomics data generated by NCI and other research organizations. The GDC will provide an interactive system for researchers to access data, with the goal of advancing the molecular diagnosis of cancer and suggest potential therapeutic targets based on genomic information. The GDC is the first step toward the development of a knowledge system for cancer, as originally recommended in a 2011 Institute of Medicine (IOM) report, “Toward Precision Medicine.” The IOM recommended a single data repository (which would be backed up in more than one location) as essential infrastructure for integrating basic biological knowledge with medical histories and health outcomes of individual patients. The GDC will contain all the data currently stored in The Cancer Genome Atlas (TCGA), as well as other genomic data.
When will the GDC be launched? How will this affect my data?
The GDC will launch in June 2016, approximately six months after the Cloud Pilots launch. Because of this, the data stored on the Cloud Pilots will not match exactly what is stored in the GDC. Of particular note is that the TCGA data in the GDC is being realigned to HG38, whereas the current alignment in TCGA is HG19 (which is what is hosted by the Cloud Pilots). It is important to keep this in mind with whatever analyses you conduct in the Cloud. This issue will be addressed in future, as the Cloud Pilots switch from getting their data from the CGHub to the GDC, but researchers should be aware that this realignment is taking place.