NCI Computer Services

You are here

Access the Cloud Resources

You can register to use one of more of the Cloud Resources using the instructions below.

This document also has detailed information and links to help you get started.

You can also download the Cloud Resources Overview document for more information.

 

Broad Institute

Now Available!

The Broad Institute’s FireCloud democratizes access to TCGA data and facilitates collaboration by providing a robust, scalable platform accessible to the community at large. Using the elastic compute capacity of Google Cloud, FireCloud empowers analysts, tool developers and production managers to perform large-scale analysis, engage in data curation, and store or publish results. Users can upload their own analysis methods and data to workspaces or run the Broad Institute’s best practice tools and pipelines on pre-loaded data.

 

Institute for Systems Biology

 

 

Now Available!

The ISB Cancer Genomics Cloud (ISB-CGC) is a cloud-based platform that provides interactive and programmatic access to TCGA data, leveraging many aspects of the Google Cloud Platform. The interactive ISB-CGC web-app allows scientists to interactively define and compare cohorts, examine the underlying molecular data for specific genes or pathways of interest, and share insights with collaborators. For computational users, programmatic interfaces and GCP tools such as BigQuery, Genomics, and Compute Engine allow users to perform complex queries from R or Python scripts, or run Dockerized workflows on sequence data available in Cloud Storage.

Seven Bridges 

Now Available!

The Seven Bridges Genomics Cloud is a platform that enables researchers to collaborate on the analysis of large cancer genomics datasets in a secure, reproducible and scalable manner. A rich query system allows researchers to find exactly the data they are interested in and combine it with their own private data. Native implementation of the Common Workflow Language specification makes it easy for developers, analysts and bench biologists to deploy, customize and run reproducible analysis methods to learn from genomics data faster.

Registration Instructions

Anyone interested in working with controlled access data on any of the cloud platforms will need dbGaP Access. All the members of your lab team must have such access, or be authorized downloaders.

Workshops and Information Sessions

To learn more about the NCI Cancer Genomics Cloud Resources and the features of each platform, please visit the events page.

Frequently Asked Questions

What data is available in the Cloud Resources?

The Pilots will have all the data currently stored in The Cancer Genome Atlas (TCGA) Data Coordination Center (DCC) and cgHub. Users may also upload their own data to the Clouds, but this data may not be available to others — researchers may choose to keep their data private.

How do I apply for access to controlled TCGA data?

If you are an NIH researcher or already have an eRA Commons account, then visit https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi to request dbGaP access for yourself or members of your team who will be working with controlled-access data.

If you are not an NIH researcher and do not have an eRA Commons account, you will need to register for one. First, you will need an eRA Commons account. If you do not have one, please visit the eRA Commons website and complete the registration form. Once you have an eRA Commons account, you can visit https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi to request dbGaP access.

If you have any questions during the application process, please contact the dbGaP Help Desk.

What is the Genomic Data Commons (GDC), and how does this relate to the Cancer Genomics Cloud Resources?

The National Cancer Institute has established the NCI Genomic Data Commons to store, analyze and distribute cancer genomics data generated by NCI and other research organizations. The GDC will provide an interactive system for researchers to access data, with the goal of advancing the molecular diagnosis of cancer and suggest potential therapeutic targets based on genomic information. The GDC is the first step toward the development of a knowledge system for cancer, as originally recommended in a 2011 Institute of Medicine (IOM) report, “Toward Precision Medicine.” The IOM recommended a single data repository (which would be backed up in more than one location) as essential infrastructure for integrating basic biological knowledge with medical histories and health outcomes of individual patients. The GDC will contain all the data currently stored in The Cancer Genome Atlas (TCGA), as well as other genomic data.

As the NCI Cloud Pilots transition into NCI Cloud Resources, it is important to keep in mind that TCGA data hosted on the GDC and on the Cloud Resources may not be completely synchronized. This is because the timing of downloads by each of the platforms, and because the GDC hosts a broader set of data than the Cloud Resources (e.g., archived data). This issue will be addressed in the future, as the Cloud Resources switch from hosting their own set of data to accessing the data maintained by the GDC in a commercial cloud.