Intro to Generic Programming

In this short lecture we introduce a few core concepts used in programming. We will be using both R and python as examples, however, the concepts are transversal across all/most languages. The implementation details - i.e. how do you invoke a certain concept - will differ across languages.

How to install python.
Here is a nice introduction for Python novices.

Setup

Ideally you would try to run all commands in the below for both languages. I recommend that you open two terminal windows, one running R and one running python. For python we need the numpy package to demonstrate array support. Depending on how you installed python, there are different options.

Anaconda installation: conda install numpy
Homebrew or download from python.org : pip install numpy

Check whether the installation worked by doing

import numpy as np  
# loads the numpy library and gives it
# short name `np`

Variables

Variables are labels for objects. This can be simple numbers, or strings, but often also any other sort of object you could think of: a plot, a table, a matrix, a vector, a list, …

What is curious to know about variables is their scoping behaviour: where in our programs we can we see which variable? This differs quite importantly across languages and is something that requires some thought.

First, let’s create a variable x which holds the value 12.3:

Python
R

x = 12.3
x + 5

17.3

x <- 12.3  # = works also
x + 5

[1] 17.3

Next, a function which will use the variables - here we do not provide x as an argument to the function, so which value will it use in each case?

Python
R

def myfun(y):  
    return x + y  # must use `return`
# note the indentation!
# function definition finishes after last line of indented block.

myfun(8)

20.3

myfun <- function(y){
    x + y  # can use `return()`
}
myfun(8)

[1] 20.3

we see that in both cases, the function looked for the variable x in it’s calling scope, i.e. the environment where it was called from. This only worked because we had defined x before. This may or may not work in other languages. In general this is called lexical scoping.

Loops

If we have a repetitive task, it’s useful to be able to iterate, i.e. do the same thing to a potentially changing input. Consider that we had 4 numbers 2,3,4,5 and we wanted to print them to screen. We could do of course write 4 identical print statements, each with a different input:

Python
R

print("this is number",2)
print("this is number",3)
print("this is number",4)
print("this is number",5)

print(paste("this is number",2))
print(paste("this is number",3))
print(paste("this is number",4))
print(paste("this is number",5))

but you can see that this a lot of repetitive code, which we want to avoid. Also, adding an additional number would mean a lot of extra work. So, loops are better here:

Python
R

for i in range(2,5) :
    print(f"this is number",i) # note the indentation!

this is number 2
this is number 3
this is number 4

for (i in 2:4){
    print(paste("this is number",i))
}

[1] "this is number 2"
[1] "this is number 3"
[1] "this is number 4"

Useful Datastructures

python docs on data structures
Article about R datastructures

concept	Python	R
1d list	`[1,2]`	`c(1,2)`
1d vector	`np.array([1,2])`	`c(1,2)`
matrix	`np.array([row, col])`	`matrix(data,rows,cols)`
n-d array	`np.array`	`array`
Dictionary	`dict`	`list`
DataFrame	`pandas.df`	`data.frame`

li = [1,3]
li + li  # not well defined vector space with `+` and `*`

[1, 3, 1, 3]

li = c(1,3)
li * li # element-by-element

[1] 1 9

li + li

[1] 2 6

in python we use the numpy package for linear algebra:

Python
R

import numpy as np
li = np.array([1,3])
li * li

array([1, 9])

li + li

array([2, 6])

li = c(1,3)
li * li # element-by-element

[1] 1 9

li + li

[1] 2 6

Matrices

Python
R

import numpy as np
ma = np.array([[1,3], [2,4]])
ma * ma

array([[ 1,  9],
       [ 4, 16]])

ma + ma

array([[2, 6],
       [4, 8]])

ma = matrix(c(1,2,3,4),nrow = 2, ncol = 2)
ma * ma # element-by-element

     [,1] [,2]
[1,]    1    9
[2,]    4   16

ma + ma

     [,1] [,2]
[1,]    2    6
[2,]    4    8

N-D arrays

Python
R

a = np.arange(1,9)
np.reshape(a, (2,2,2))

array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

array(1:8,dim = c(2,2,2))

, , 1

     [,1] [,2]
[1,]    1    3
[2,]    2    4

, , 2

     [,1] [,2]
[1,]    5    7
[2,]    6    8

Dictionaries

Dicts are lists with a key -> value structure. Like a telephone book:

Python
R

di = {'peter' : 1225, 'alice' : 4333}
di

{'peter': 1225, 'alice': 4333}

di = list(peter = 1225, alice = 4333)
di

$peter
[1] 1225

$alice
[1] 4333

DataFrames

In python, we use the pandas package for dataframe support. In R they are built-in as we know. There are many ways to create a pandas dataframe.

Here is the official pandas documentation.
in R, type ?data.frame for the help entry.

Python
R

import pandas as pd
d = {"one": [1.0, 2.0, 3.0, 4.0], "two": [4.0, 3.0, 2.0, 1.0]}
pd.DataFrame(d)

   one  two
0  1.0  4.0
1  2.0  3.0
2  3.0  2.0
3  4.0  1.0

data.frame(one = c(1,2,3,4.0), two = c(4,3,2,1.0))