SciencesPo Intro To Programming 2024
11 October, 2024
We are now ready to combine some of the commands we learned.
You will see that here is where the real power lies.
Let’s navigate into our exercise data folder first.
$ cd ~/shell-lesson-data/exercise-data/proteins
$ ls
cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb
wc
word count command.29 lines, 156 words, 1158 characters.
wc
to a file instead with >
:lengths.txt
.cat
prints the entire thing to screen.tail
only the endhead
only the beginningless
lets you scroll and read (arrows up/down or j
(up) and k
(down), q
exits.)echo
echo
function prints text - by default to screen:Challenge
Consider the file shell-lesson-data/exercise-data/animal-counts/animals.csv
. What is result of this:
Solution
Option 3 is correct.
sort
sort
reads a file and sorts it’s content to screen 9 methane.pdb
12 ethane.pdb
15 propane.pdb
20 cubane.pdb
21 pentane.pdb
30 octane.pdb
107 total
9 methane.pdb
12 ethane.pdb
|
the pipe. It takes output from a command and gives it to another command. 9 methane.pdb
lengths.txt
: 9 methane.pdb
~/shell-lesson-data/exercise-data/proteins
Pipe Dreams
Which of the following commands shows us the 3 files with the least number of lines in the current directory? Build the pipeline up from left to right to check!
wc -l * > sort -n > head -n 3
wc -l * | sort -n | head -n 1-3
wc -l * | sort -n | tail -n 4 | head -n 3
wc -l * | sort -n | head -n 3
~/shell-lesson-data/exercise-data/proteins
Solution
Option 4 is correct. Option 3 finds the ones with most lines.
.csv
file here: shell-lesson-data/exercise-data/animal-counts
cut
command to get parts of it.Building a Pipe
uniq
filters adjacent matching lines in a file.uniq
(and another command?) such that we get the list of unique animal names?-c
flag to uniq
to get a contingency table.Solution
cut -d , -f 2 animals.csv | sort | uniq
cut -d , -f 2 animals.csv | sort | uniq -c
The below dataset contains information on house sales (price, location, type of house etc). We call one record a housing transaction.
Using the shell:
wget
to download data to from here to your downloads folder as carburants.csv
: wget https://static.data.gouv.fr/resources/demandes-de-valeurs-foncieres/20240408-125738/valeursfoncieres-2023.txt
wc -l
to count how many rows (lines) there arehead -n 2
to see the first two rows (the header)Valeur fonciere
. You should use the awk
command like this : awk 'BEGIN{s=0;}{s+=$1;}END{print s/(NR);} your_file.txt'
The below dataset contains information on house sales (price, location, type of house etc). We call one record a housing transaction.
Using the shell:
Valeur fonciere
. You should use the awk
command like this : awk 'BEGIN{s=0;}{s+=$1;}END{print s/(NR);} your_file.txt'
wget https://static.data.gouv.fr/resources/demandes-de-valeurs-foncieres/20240408-125738/valeursfoncieres-2023.txt
wc -l valeursfoncieres-2023.txt
head -n 2 valeursfoncieres-2023.txt
cut -d '|' -f 18 valeursfoncieres-2023.txt | sort | uniq -c | sort -r | head -n 10
cut -d '|' -f 11 valeursfoncieres-2023.txt | cut -d , -f 1 | awk 'BEGIN{s=0;}{s+=$1;}END{print s/(NR);}'