Introduction to Genomic Data Repositories and Data Analysis Resources


The goals of this face-to-face course are: 1) to familiarize participants with genomics/genetics data repositories commonly used by biomedical researchers; 2) to explain why medical library patrons (basic and translational researchers) want to utilize these data repositories as part of their research toolkit; 3) become familiar with the genomic/genetic research data lifecycle; 4) become comfortable searching within these repositories and recommending bioinformatics resources to your patrons for additional exploration of the data.

The lecture portions of the course will 1) provide an introduction to genomics for non-scientists with a focus gene expression and genetic variation; 2) introduce data repositories and curated databases such as the Gene Expression Omnibus, the Sequence Read Archive, ClinVar, and cBioPortal; 3) identify resources for basic analysis of genomic data including GEO2R and DAVID. The hands-on exercises will combine searching the GEO repository with analysis in GEO2R and DAVID for a complete, real-world data discovery-to-insight workflow.

Learning Objectives

1) Understand basic genomic and genetics concepts, experimental methods, and data lifecycle.

2) Assist patrons in finding appropriate genomic and genetic databases, curated data, and analysis resources for their specific needs.

3) Understand how genomic data is organized in the Gene Expression Omnibus.

4) Perform a search for a dataset of interest in GEO

5) Utilize GEO2R to identify an interesting set of genes

6) Understand how resources such as DAVID and the Gene Ontology provide insight on a gene list

7) Identify appropriate and comfortable levels of bioinformatics support within libraries.


8:00-8:15 Introduction and Overview of Course

8:15-9:30 Lecture - Introduction to genomics for non-scientists with a focus on gene expression, genetic variation, methods for measuring these phenomena, the genomic data lifecycle, and explanations of why basic and translational researchers employ these methods to ask their research questions.

9:30-9:45 Break

9:34-11:00 Lecture: Introduction to publicly available genomic data repositories and curated genomic/genetic data including the Gene Expression Omnibus (GEO), the Sequence Read Archive, The Cancer Genome Atlas, COSMIC, ClinVar and cBioPortal.
11:00 - noon Hands on exercise: step-by-step guide to searching GEO and using facets (keywords, controlled vocabularies) to identify relevant datasets.

12:00 - 1:30 Lunch and Networking

1:30-2:30 Lecture: Continuing on the genomic data lifecycle - background on transforming raw data into results. Hands on exercise - step-by-step analysis of your GEO dataset from raw to interesting genes using GEO2R

2:45 - 3:00 Break

3:00-3:45 Hands-on Exercise. Using DAVID to make sense of a list of genes. Step-by-step instructions for taking the list of genes you identified from GEO2R and putting them in the context of annotations from the Gene Ontology, and other curated annotations such as pathways, protein domains, etc. using NIAID/NIH sponsored Database for Annotation, Visualization and Integrated Discovery (DAVID)

3:45-4:00 - course evaluation

Facility Requirements

This course requires computers for all participants and the instructor, and access to the internet - preferably through a land line (rather than wireless if possible). The computers should be equipped with Firefox. It would also be great to have a white board with markers.

