NCI Biomedical Informatics Blog
|On This Page|
Metadata and Models
The caDSR and its associated applications support the research community through providing a means to manage
- detailed descriptions of data held in publicly accessible data sets
- details of shared information models
- descriptions of data to be collected and reported on case-report forms (CRFs)
caDSR comprises a database, application programming interfaces (APIs), and web-based applications for creating, editing, controlling, deploying, monitoring, and finding reusable metadata. These metadata describe common data elements (CDEs), information models, and case-report forms (CRFs) that are used for data collection and analysis or for software development.
Use of CDEs, addresses a biomedical data-management problem: namely, the varied ways in which similar or identical data can be collected and stored in databases. Inconsistency in data representation makes it nearly impossible to aggregate and manage even modest-sized data sets in order to ask basic questions and obtain meaningful answers. caDSR provides centralized documentation as well as access to common information building blocks (CDEs) to use when designing systems to capture, report, discover, and use data. The reuse of CDEs facilitates understanding, interpretation, and sharing of cancer research information, development of interoperable systems, and the collection of data generated by disparate experimental platforms. For current collections of CDEs, see the caDSR Hosted Data Standards, Downloads, and Transformations Utilities.
CDEs can be aggregated into larger collections representing logical hits of information, "Blood Pressure Panel", or CRFs and information models. The NCI's CDEs have been derived primarily from data collection forms and protocols for clinical trails. Curators analyze the questions on forms and create CDEs in caDSR using the caDSR Curation Tool. Some CDEs were generated from information models by the NCI community, using Unified Modeling Language (UML) class diagrams to represent key information domains. The semantics of the CDEs are described by annotating them with concepts drawn from NCI Enterprise Vocabulary Services (EVS), with preference given to concepts from the NCI Thesaurus to provide well-structured representation of research terminology. For UML Model derived CDEs, the models are annotated with concepts and the components of the model are extracted and transformed into CDEs in the caDSR database.
CRFs represent the set of questions to be asked in order to collect data during the conduct of clinical trials and other research studies. Each question is represented by a CDE. CRFs are developed by data managers supporting a variety of research projects through the use of the caDSR Form Builder. NCI has a collection of Standard CRF Templates intended for use as building blocks when CRFs are created to support cancer clinical trials sponsored by NCI and its partners, see the caDSR Hosted Data Standards, Downloads, and Transformations Utilities.
The caDSR CDE Browser, Curation Tool, and Form Builder provide access not only to CDEs, but also to CRFs and information models. For in-depth information about caDSR and each tool, including links to documentation and contacts as well as technical background and product status, visit the caDSR wiki.
Implementation of caDSR
caDSR content is stored in an Oracle relational database. All caDSR tools and interfaces connect to the same central resource. The tools have been implemented using web-based technologies and are publicly accessible.
NCI uses the international ISO/IEC 11179 Standard for Metadata Registries to represent caDSR content in the database. This standard offers a model for a metadata repository. The model and guidelines support the requirements for lifecycle management of data descriptions.
In implementing the ISO/IEC 11179 model, NCI extended it to support registration of two additional types of semantic content: CRFs and UML models. CRFs are used in clinical-trials applications and other data-collection systems. CDE metadata are used to standardize questions that are grouped together into Modules. Modules are grouped together to make CRFs. Both can be reused and shared across groups, helping to ensure consistency in the way questions are asked and recorded in data-management systems. CRFs can also be associated with clinical-trial protocols. Another important extension is the use of ISO/IEC 11179 Concepts to support CDE semantic components, including values in CDE pick-lists.
Those interested in working with the caDSR can review background information describing how NCI has implemented the ISO/IEC 11179 standard.
caDSR Applications and Training
caDSR provides web-based interactive applications for performing various tasks associated with managing and deploying CDEs, CRFs, and information models, as well as APIs, which provide access to caDSR content.
To make using the tools and infrastructure easier, a series of courses is offered online through the caCORE Training Wiki. Users can also review the description of the CBIIT implementation of the ISO/IEC 11179 Standard for Metadata Registries there. Online help is also available when tools are in use.
Many caDSR tools require users to register for an account by contacting ncicb [at] pop.nci.nih.gov (Application Support), describing the nature of their interest, and completing any required training. caDSR applications include these interfaces:
The caDSR Password Change Station application allows users of caDSR tools to set up security questions and change their passwords in a manner that conforms to NIH security requirements. Users must change their passwords every 60 days.
The caDSR CDE Browser supports browsing, searching, and exporting CDEs in XML or Excel formats. The intended end user is someone who is designing new studies. Predefined downloads of commonly requested CDEs are available on the Downloads wiki page.
Curators can use this tool to perform weighted semantic searches on caDSR content. Information about the application is available on the caDSR Freestyle Search wiki page.
Curators use this tool to browse caDSR content and to create and edit Data Element Concepts, Value Domains, and Data Elements.
The caDSR Form Builder was developed to address a need for the NCI Cancer Therapy Evaluation Program (CTEP) and others to define and share standard CRFs and Protocol Templates. This helps ensure that questions used across multiple CRFs are recorded in the same way. Forms can be placed into and retrieved via a Form Cart.
Curators use the caDSR SIW to add semantic annotations to a UML model by matching its classes and attributes with concepts from the NCI Thesaurus (NCIt) or by matching to existing caDSR content. The caDSR UML Loader validates the submission, and extracts and transforms elements of the model into caDSR content.
Administrators use this application to create and manage one or more Alert Definitions that allow users to monitor changes to caDSR content. Information about what is monitored and reported as well as how reports can be set up is available on the caDSR Sentinel Tool wiki page.
Context and Central caDSR administrators use this tool as the administrative interface to all caDSR features and components. This tool is accessible only within the NIH intranet.
This interface uses the caDSR HTTP API to retrieve data via a web browser. Application developers can use this interface to view content, test HTTP calls, and retrieve content in HTML or XML formats. Java APIs are also provided. For more information see the caDSR APIs and REST Examples wiki page.
- caDSR Domain Class Browser and APIs
This interface allows users to place CDEs and CRFs into a "shopping cart" using the caDSR CDE Browser and Form Builder. These items are then accessible via the API.