Last modified: 04 Jan 2021
No matter how much of the analysis is automated, some manual steps are inevitably involved
Cartoon by Sidney Harris (The New Yorker)
Patient ID | Sex | Date of Diagnosis | Tumour Size |
---|---|---|---|
1 | M | 01-01-2013 | 3.1 |
2 | f | 04-18-1998 | 1.5 |
3 | Male | 1st of April 2004 | 105 |
4 | Female | NA | 67 |
5 | F | 2010/03/12 | 4.2 |
6 | F | 3.6 | |
7 | M | 1994-11-05T08:15:30-05:00 | 232 |
credit: @myusuf3
Patient ID | Sex | Date of Diagnosis | Tumour Size |
---|---|---|---|
001 | M | 2013-01-01 | 3.1 |
002 | F | 1998-04-18 | 1.5 |
003 | M | 2004-04-01 | 1.05 |
004 | F | NA | 0.67 |
005 | F | 2010-03-12 | 4.2 |
006 | F | NA | 3.6 |
007 | M | 1994-11-05 | 2.32 |
Figure showing locations of visitors to my Prostate Cancer data portal
NA
is Ok, but what if NA is a valid category in your data?
NA
as a missing value and can ignore it in calculationsPatient ID | Date | Value |
---|---|---|
1 | 2015-06-14 | 213 |
2 | 76.5 | |
3 | 2015-06-18 | 32 |
4 | 120.3 | |
5 | 109 | |
6 | 2015-06-20 | |
7 | 143 |
Fill in all the cells
Patient ID | Date | Value |
---|---|---|
1 | 2015-06-14 | 213 |
2 | 2015-06-14 | 76.5 |
3 | 2015-06-18 | 32 |
4 | 2015-06-18 | 120.3 |
5 | 2015-06-18 | 109 |
6 | 2015-06-20 | NA |
7 | 2015-06-20 | 143 |
Make it rectangle
Computer doesn’t recognize it!
Mac
Windows
patient-data.csv
and open in Excel, or equivalent software