Last modified: 04 Jan 2021
No matter how much of the analysis is automated, some manual steps are inevitably involved
Cartoon by Sidney Harris (The New Yorker)
| Patient ID | Sex | Date of Diagnosis | Tumour Size |
|---|---|---|---|
| 1 | M | 01-01-2013 | 3.1 |
| 2 | f | 04-18-1998 | 1.5 |
| 3 | Male | 1st of April 2004 | 105 |
| 4 | Female | NA | 67 |
| 5 | F | 2010/03/12 | 4.2 |
| 6 | F | 3.6 | |
| 7 | M | 1994-11-05T08:15:30-05:00 | 232 |
credit: @myusuf3
| Patient ID | Sex | Date of Diagnosis | Tumour Size |
|---|---|---|---|
| 001 | M | 2013-01-01 | 3.1 |
| 002 | F | 1998-04-18 | 1.5 |
| 003 | M | 2004-04-01 | 1.05 |
| 004 | F | NA | 0.67 |
| 005 | F | 2010-03-12 | 4.2 |
| 006 | F | NA | 3.6 |
| 007 | M | 1994-11-05 | 2.32 |
Figure showing locations of visitors to my Prostate Cancer data portal
NA is Ok, but what if NA is a valid category in your data?
NA as a missing value and can ignore it in calculations| Patient ID | Date | Value |
|---|---|---|
| 1 | 2015-06-14 | 213 |
| 2 | 76.5 | |
| 3 | 2015-06-18 | 32 |
| 4 | 120.3 | |
| 5 | 109 | |
| 6 | 2015-06-20 | |
| 7 | 143 |
Fill in all the cells
| Patient ID | Date | Value |
|---|---|---|
| 1 | 2015-06-14 | 213 |
| 2 | 2015-06-14 | 76.5 |
| 3 | 2015-06-18 | 32 |
| 4 | 2015-06-18 | 120.3 |
| 5 | 2015-06-18 | 109 |
| 6 | 2015-06-20 | NA |
| 7 | 2015-06-20 | 143 |
Make it rectangle
Computer doesn’t recognize it!
Mac
Windows
patient-data.csv and open in Excel, or equivalent software