Study Materials - (56)zip LINK
A video of the IRF QRP: Achieving a Full Annual Increase Factor (AIF) webinar that took place on Wednesday, May 19, 2021, is available to view on YouTube at the following link: _2tqwg88. The training materials are here:
Study Materials - (56)zip
The Classification of Instructional Programs (CIP) provides a taxonomic scheme that supports the accurate tracking and reporting of fields of study and program completions activity. CIP was originally developed by the U.S. Department of Education's National Center for Education Statistics (NCES) in 1980, with revisions occurring in 1985, 1990, 2000, 2010 and 2020. Information on the 1985, 1990, 2000 and 2010 CIP can be accessed on the resources page under the section heading Archive and Historical. On the 2020 CIP Website, you can view both the 2020 CIP and the 2010 CIP. The default option is to view the 2020 CIP, which is the most recent version of the CIP. To view the 2010 CIP on this webpage, look for the Change Year Box, click on the down arrow and select 2010.
When a study is registered by a Genomic Program Administrator (GPA) in the dbGaP Submission System (SS), the GPA indicates what data is expected to be submitted. This may be verified by the Program Officer (PO) who oversees the study funding. The submitter will separately complete a Study Data Outline (SDO) through the Submission Portal (SP). This outline summarizes the data that will be uploaded and released in the current version. All data claimed in the SDO must be submitted. The GPA and PO will be notified if the information does not match between the SS and SP.
All new study versions must complete the Study Data Outline in the Submission Portal in order to assert what data types will be submitted and released for the current study version. Upon completion, a dbGaP study accession (phs######.v#.p#) will be provided.
The Study Config is a web form that collects a description of the study data, methods, and findings, inclusion/exclusion, study history, references, attributions, and terms that will be indexed to enable users to search for your study in dbGaP Advanced Search. The study config must be submitted in order to have a dbGaP study accession (phs######.v#.p#) that can be published in dbGaP and used in journal publications. Here is an example of the study report page populated by the information in the study config: ( -bin/study.cgi?study_id=phs000001.v3.p1).
Subject IDs submitted to dbGaP may be randomly assigned or may be consecutive numbers without any identifying information (i.e., the submitted Subject ID should not be based on the study person ID or any personal identifiers such as subject's birth date, health record number, or name). The same applies to sample IDs.
The first column must be the IDs of the subjects. Enter a single de-identified subject ID for each person, and preferably use "SUBJECT_ID" as the subject ID header. If necessary, you may use another variable name (but be consistent in all study files). Please do not use "dbGaP" in the variable name or the ID itself. See SUBJECT_ID in Glossary for full requirement details.
In the corresponding DD, dbGaP will automatically code 0=Subjects used as genotyping controls and/or pedigree linking members (i.e. subject IDs without any submitted phenotype and/or molecular data), so that 0 does not need to be included in the DD. For all other consent groups > 0, use the format: code=Consent Group's Title (Consent Group's Abbreviation). For example, here is what a study with 2 consent groups might look like in the DD.
Provide the biological sex value of the person listed in the SUBJECT_ID column. To speed up study processing through the dbGaP auto-pipeline, sex values have been restricted to M/Male/1 or F/Female/2 or UNK/Unknown or left empty, and should match the sex values entered into the Pedigree DS if a pedigree DS is applicable. All other values will require a resubmission.
Include the variables SUBJECT_SOURCE and SOURCE_SUBJECT_ID ONLY IF the following applies:dbGaP aims to label a single person with the same dbGaP assigned subject ID, even though the submitted subject IDs for that person might be different. This is so that users who download multiple studies will not double count a person who has been included in multiple studies. For dbGaP to assign the same dbGaP subject ID, include the two variables, SUBJECT_SOURCE and SOURCE_SUBJECT_ID. This is required for Coriell HapMap subjects, subjects in public repositories (RUCDR, NRGR, NINDS Repository, etc.), and subjects that have been or will be submitted to another dbGaP study. Please avoid a SUBJECT_SOURCE that is very general coupled with a SOURCE_SUBJECT_ID that is a simple integer. For example, SUBJECT_SOURCE=University of California and SOURCE_SUBJECT_ID=1. There is a potential for unintended subject collision; that is, two different people are assigned the same source and ID across studies. There are many University of Californias and there are many studies that use 1 as an ID.
The Pedigree DS lists the genealogical relationships of subjects within a study. If there are no known relationships, this file does not need to be submitted. However, if dbGaP finds that there are possible relationships between subjects after reviewing the genetic data (with the GRAF [Genetic Relationship and Fingerprinting] software), dbGaP will request a pedigree DS or include a README file with the results of IBD and/or dbGaP GRAF. If the IBD or pedigree information should not be released because of data sharing limitations, please let dbGaP know in writing. See GRAF in the Glossary for more information. Open the templates under Phenotype_Data:4a_Pedigree_DS.txt4b_Pedigree_DD.xlsx
SUBJECT_IDs should include any person with familial relationships relevant to the study. The SUBJECT_ID column should also include FATHER and MOTHER IDs. All SUBJECT_IDs of the pedigree file should be included in the Subject Consent (SC) DS, where the study subjects have CONSENT >=1 and linking pedigree SUBJECT_IDs have CONSENT=0. See SUBJECT_ID in Glossary for full requirement details.
Provide the biological sex value of the person listed in the SUBJECT_ID column. To speed up study processing through the dbGaP auto-pipeline, sex values have been restricted to M/Male/1 or F/Female/2 or UNK/Unknown/NULL, and should match the sex values entered into the Subject Consent DS. All other values will require a resubmission.
Metadata around the experiment or study and annotations that are necessary to reproduce any published table or analysis must be included with genomic data submissions. In particular, data pertinent to the interpretation of genomic data -- such as associated phenotype data (e.g. clinical information), exposure data, relevant metadata, and descriptive information (e.g. protocols or methodologies used) -- are expected to be shared. To avoid user questions, make sure to include self-reported RACE and relevant dates (e.g., birth, diagnosis, sample collection) written as years or normalized to a set point in time, along with any phenotypes, measured or collected data that are described in your Study Description. For the Subject Phenotypes, it would be data relevant to the individual person. For the Sample Attributes, it would be data relevant to the sample derived from the person. For instance, do not list the RACE variable in the Sample Attributes, since RACE is stable for a person across samples. However, for variables like TREATMENT, if the person was only treated once, and data was collected, then TREATMENT could belong in the Subject Phenotypes table. However, if TREATMENT was completed multiple times, and each time a sample was extracted, then it would be better for TREATMENT to be tracked in the Sample Attributes table.
The NCBI BioSample database ( ) contains descriptions of biological source materials used in experimental assays. Each of your samples will be assigned a BioSample accession number and will thus be searchable through BioSample. The first three variables below must be included to provide meaningful data for each sample's BioSample entry. HISTOLOGICAL_TYPE should only be included if applicable.
Most institutes request all data pertinent to the interpretation of genomic data, such as clinical information, exposure data, and relevant metadata pertaining to the sample. Please note that the template (6a_SampleAttributes_DS.txt) provided is based on a cancer study and the variables listed may be useful for cancer studies. However, if your study is not a cancer study, please do not include the cancer variables. Instead, submit additional sample attribute variables that will provide a greater understanding of the study. For example: sample collection date, sample extraction method and date; batch and center effects, sample plate or well number; sample run date, sample QA results; and sample affection status (ex. psoriatic skin sample vs. non-psoriatic skin sample from a case subject who has psoriasis). Relevant dates (e.g., sample collection date) that are directly tied to a person should be written as years or normalized to a set point in time. Do not include month and days directly tied to the person, which are considered HIPAA sensitive. Click here to see the algorithm dbGaP uses to find HIPAA sensitive dates: HIPAA.
Review the descriptions of variables in the APPENDIX for specific instructions on labeling header columns and file-naming conventions. Also read the Glossary for definitions of variables. To see the QC checks that dbGaP completes for each study, see section "What happens once I submit my core data files and phenotype files?".
Any document that describes study methods and data collection should be submitted, e.g., protocols, questionnaires, manuals of procedures and operations, consents, and can be published on the public dbGaP page. The preferred file format is pdf, though Word and Excel documents will be accepted. Please submit tabular images in Excel. 041b061a72