SciencesPo Intro To Programming 2024
11 October, 2024
We are now ready to combine some of the commands we learned.
You will see that here is where the real power lies.
Let’s navigate into our exercise data folder first.
$ cd ~/shell-lesson-data/exercise-data/proteins
$ ls
cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdbwc word count command.29 lines, 156 words, 1158 characters.
wc to a file instead with >:lengths.txt.cat prints the entire thing to screen.tail only the endhead only the beginningless lets you scroll and read (arrows up/down or j (up) and k (down), q exits.)echoecho function prints text - by default to screen:Challenge
Consider the file shell-lesson-data/exercise-data/animal-counts/animals.csv. What is result of this:
Solution
Option 3 is correct.
sortsort reads a file and sorts it’s content to screen 9 methane.pdb
12 ethane.pdb
15 propane.pdb
20 cubane.pdb
21 pentane.pdb
30 octane.pdb
107 total
9 methane.pdb
12 ethane.pdb
| the pipe. It takes output from a command and gives it to another command. 9 methane.pdb
lengths.txt: 9 methane.pdb
~/shell-lesson-data/exercise-data/proteinsPipe Dreams
Which of the following commands shows us the 3 files with the least number of lines in the current directory? Build the pipeline up from left to right to check!
wc -l * > sort -n > head -n 3wc -l * | sort -n | head -n 1-3wc -l * | sort -n | tail -n 4 | head -n 3wc -l * | sort -n | head -n 3~/shell-lesson-data/exercise-data/proteinsSolution
Option 4 is correct. Option 3 finds the ones with most lines.
.csv file here: shell-lesson-data/exercise-data/animal-countscut command to get parts of it.Building a Pipe
uniq filters adjacent matching lines in a file.uniq (and another command?) such that we get the list of unique animal names?-c flag to uniq to get a contingency table.Solution
cut -d , -f 2 animals.csv | sort | uniqcut -d , -f 2 animals.csv | sort | uniq -cThe below dataset contains information on house sales (price, location, type of house etc). We call one record a housing transaction.
Using the shell:
wget to download data to from here to your downloads folder as carburants.csv: wget https://static.data.gouv.fr/resources/demandes-de-valeurs-foncieres/20240408-125738/valeursfoncieres-2023.txtwc -l to count how many rows (lines) there arehead -n 2 to see the first two rows (the header)Valeur fonciere. You should use the awk command like this : awk 'BEGIN{s=0;}{s+=$1;}END{print s/(NR);} your_file.txt'The below dataset contains information on house sales (price, location, type of house etc). We call one record a housing transaction.
Using the shell:
Valeur fonciere. You should use the awk command like this : awk 'BEGIN{s=0;}{s+=$1;}END{print s/(NR);} your_file.txt'wget https://static.data.gouv.fr/resources/demandes-de-valeurs-foncieres/20240408-125738/valeursfoncieres-2023.txtwc -l valeursfoncieres-2023.txthead -n 2 valeursfoncieres-2023.txtcut -d '|' -f 18 valeursfoncieres-2023.txt | sort | uniq -c | sort -r | head -n 10cut -d '|' -f 11 valeursfoncieres-2023.txt | cut -d , -f 1 | awk 'BEGIN{s=0;}{s+=$1;}END{print s/(NR);}'