unix-shell-working-with-data/demo_walkthrough.md at master · sauuyer/unix-shell-working-with-data · GitHub

Open your terminal

Explore project_dir

cd Desktop
git clone https://github.com/sauuyer/unix-shell-working-with-data
cd unix-shell-working-with-data
ls
cd data/mimic... click tab to autocomplete the mimic folder name

Explore the mimic data files

ls
cat LICENSE.txt
nano LICENSE.txt
cat labevents.csv wow! this is a long readout! use control+c to escape it if nessisary how might we learn about the contents of csvs more effectively?
wc transfers.csv
man wc
wc -l transfers.csv
head labevents.csv
head -n1 labevents.csv
tail -n3 labevents.csv
head -n1 *.csv

Let's start saving our data summary information

mkdir data_summaries
head -n *.csv > data_summaries/csv_col_headers.txt

Which csv files contain subject_id information?

grep "subject_id" *.csv
grep -l "subject_id" *.csv save this list to a text file in your data summaries folder
grep -l "subject_id" *.csv > data_summaries/subject_related_tables_list.txt

Create a variable containing all of the names of the csv files that contain subject_id information

subjects_list="$(grep -l 'subject_id' *.csv)"
echo "$subjects_list"

How can we return each item (file name) saved in our variable using a for loop?

for thing in $subjects_list; do echo this file contains a subject id: $thing done
for i in $subjects_list; do head -n2 $i; done

Let's save all of the commands we have run so far in this session

history
history > data_summaries/history.txt

In the microbiologyevents.csv, find all cases related to the Enerococcus bacteria strain

awk '/ENTEROCOCCUS/' microbiologyevents.csv | head

Say you want to combine all lab events with the subject's data found in the patients.csv file?

join -1 2 -2 2 -t , patients.csv labevents.csv > data_summaries/merged_patients_labs.csv