Clean & Tidy Data: Getting Started with Spreadsheet Data

Date of Live Event: Thursday, January 20, 1:00 p.m.–2:30 p.m., central time. Recording available soon after.


Data analysis and manipulation usually begins with a spreadsheet. If you work with your own data or advise researchers on theirs, you’ll want to get started right. In this webinar, you’ll learn how to prepare your spreadsheet and format your data to serve later analyses. You’ll learn best practices for curating data, identifying and addressing common data problems, and preparing data for analysis and use. And you’ll learn when spreadsheets may not be the best place to start. Presenters will demonstrate best practices using medical data from a hypothetical pharmacokinetics study.

This webinar is a companion to Clean & Tidy Data: Making Data Usable. The courses stand alone and work together synergistically. Making Data Usable will show you how to identify the steps needed to clean and normalize data.

Special Note: This webinar is approved for the “under construction” Advanced Level of the Data Services Specialization. A Basic Level Data Services Specialization Certification is currently available.

Learning Outcomes

At the end of the webinar, participants will be able to:

  • Identify when spreadsheets are useful and when they are not

  • Assess when a task should not be done in spreadsheet software.

  • Identify the features of clean & tidy dataset

  • Identify common data problems

Audience

Medical librarians and other health information professionals who provide or plan to provide data services. Familiarity with spreadsheets is helpful.

Presenters

Anne M. BrownAnne M. Brown is an Assistant Professor in Data Services, University Libraries at Virginia Tech and affiliate faculty member in the Department of Biochemistry and Academy of Integrated Science. She is the author or co-author of a number of publications and presentations on data-related and data literacy topics.

Daniel ChenDaniel Chen is a graduate student in Genetics, Bioinformatics, and Computational Biology at Virginia Tech. His research is focused on data science education and pedagogy in the medical and biomedical sciences. He is the author of Pandas for Everyone: Python Data Analysis and a number of other data science learning materials.

Registration Information

  • Length: 1.5 hour webinar
  • Date/Time: January 20, 1:00pm–2:30pm, Central Time
  • Technical information: After you have registered, go to My Learning in MEDLIB-ED to access the live webinar, resources, evaluation, and certificate.
  • Register, participate, and earn 1.5 MLA continuing education (CE) contact hours.

Site License Information

To offer this webinar to a group, consider a site license: