Why `julia`?

Everybody needs a weapon of choice

Florian Oswald

florian.oswald@unito.it

University of Turin

Collegio Carlo Alberto

JPE Data Editor

2026-02-11

Choosing a Language

You are an economics grad student and need to choose your weapon of choice.

You will use more than one language. (This is a good thing.)

All Languages have pros and cons. I will give some opinionated advice.

Then I will force you to use a certain language to complete homeworks 😜

Taxonomy of Languages

The Pros/Cons Rundown

Hello, World?

It’s good custom to first print hello world when introducing a language. It’s fun, but pretty uninformative. Like, what’s the performance of printing “hello, world”?

Instead, we will show in each language how to implement the function sum over an array of values a: \[\text{sum}(a) = \sum_{i=1}^n a_i\] where \(n\) is the number of elements in \(a\)
Later, we will then benchmark each language to see tradeoff between high- and low level languages.

`C/C++`

The world pretty much runs on C++.

If you now some C++ and some Unix you know a lot already.

Developed at Bell Labs in 1980s.

All C programs are valid C++, not other way around.

C++ Code Example

//sumvec.cpp
#include <iostream>
#include <vector>
int main(){
    std::vector<int> x;
    for (int i=1;i<5;i++){
        x.push_back(i);
    }
    int sum = 0;
    for (std::vector<int>::iterator i=x.begin();i!=x.end();i++){
        sum += *i;
    }
    std::cout << "sum is " << sum << std::endl;
}
//compile
g++ sumvec.cpp -o sum.x

`C sum`

Defining the function:

#include <stddef.h>
#include <stdio.h>
double c_sum(size_t n, double *X) {
    double s = 0.0;
    for (size_t i = 0; i < n; ++i) {
        s += X[i];
    }
    return s;
}

//main
int main(void) {
    double x[] = {1, 2, 3, 4, 5};
    size_t n = sizeof(x) / sizeof(x[0]);

    double result = c_sum(n, x);

    printf("C-computed Sum = %.2f\n", result);
    return 0;
}

`C++`: Pros and Cons

Pros

Very versatile.
Continuously Evolving. C++ 2020 standard is current.
Very performant.
Excellent open source compilers.
Very stable and widely used.
Large community.

Cons

Hard to learn. Pointers, Classes, OOP in general.
Some find it hard to work with Compiled languages.
It’s easy to overcomplicate things for novices.

Python

The general purpose language out there. Swiss Army Knife. All Terrain. 🚙

Open Source

Elegant, intuitive, full OOP support (Classes etc).

`python sum`

we just use the built-in sum:
```
a = [1,2,3,4,5]
sum(a)
```
that’s it!

Python Pros/Cons

Pros

Console: good for exploration.
Many useful libraries (NumPy, SciPy, Pandas, matplotlib)
Easy Unit Testing
Very Performant String Manipulation
Large community

Cons

Slow.
Version 2.7 or 3.6? Huge problem.
High performance routes use annoted Python code. Numba: JIT compiler,Pypy: JIT compiler, Cython: compile to C++
All feel a bit like a 78-liter 3,500hp V18 truck engine on a Mini Chassis.

Python Performance Analogy

`R`

High level open source language for statistical computing.

Ross Ihaka and Robert Gentleman developed R as a successor to John Chambers’ S

R went open source quickly via http://cran.r-project.org/

Tremendously rich add-on packages environment.

Has basic OOP support via S4 classes and methods.

`R sum`

we just use the built-in sum:
```
a = 1:5
sum(a)
```
that’s (again) it!

`R` Pros/Cons

Pros

Excellent IDEs Rstudio, positron
Thousands of high quality packages. tidyverse is a sensation in itself.
Many many econometrics-related packages.
Wide community.
Rcpp is good to connect to C++
Unbeatable for spatial data: the sf package.
Very good for data processing.

Cons

Base R is slow.
Not a modern language. Some quite arcane behaviours.
Rcpp means you end up writing C++ code.
Not straightforward to write performant low-level code close to the math (i.e.: loops)

Fortran

The Grandfather of all languages 👴🏽

FORTRAN (Formula Translation) was developed in 1957.

Still used by many economists, for example Mitman-Kaplan-Violante JPE 2020

Still used for many scientific problems (nuclear bombs design, wheather forecasts, phyical experiments, etc) 💣 🌦

Fortran Example

create a text file test.f90:

program sumit
    implicit none
    double precision a(5)   ! allocate an array with 5 slots
    a = (/1,2,3,4,5/)       ! fill with values
    print *,"result is ", sum(a)
end program sumit

compile it and run it:

$ gfortran test.f90 -o test
$ ./test
 result is    15.000000000000000

`FORTRAN` Pros/Cons

Pros

Relatively easy to learn.
Good array support built in.
Good parallelization support via MPI.
Fast.

Cons

Different compilers implement different standards (Intel Fortran vs GFORTRAN array constructor, e.g.)
Small user community.
Not very many tutorials.
Not faster than C++ (used to be true).
Hard to automatize unit testing via e.g. pfunit.
Language is very bare-bone.
Hard to process data.

Victims of Speed vs Productivity Tradeoff?

We have seen high-level langs (R, python etc) are good for productivity: Iterate fast on try, fail, repeat

We have seen that low level languages run fast, but are worse for productivity.

So we cannot have both speed and productivity.

Or can we?

Can We Have Both?

Case Study

🔥🔥 Our Time is Running Out 🔥🔥

Oswald (QE 2019): Homeownership and Location Choice

I wanted to compute this model of housing and location choice on US micro data.
Large state space. 🚫 R, 🚫 python
Non-trivial estimation exercise.
I had no code to build upon. And only a vague idea of how to do this.

When the Clock is Ticking ⏰

repo starts with C++ (Blitz++)
Given my C++ level, this was taking too much time to develop!
No chance to finish in time. 😟

When the Clock is Ticking ⏰

repo starts with C++ (Blitz++)
Given my C++ level, this was taking too much time to develop!
No chance to finish in time. 😟
Enter julia! 🚀

Why We Created Julia - 2012 blogpost

We are power Matlab users. Some of us are Lisp hackers. Some are Pythonistas, others Rubyists, still others Perl hackers. There are those of us who used Mathematica before we could grow facial hair. There are those who still can’t grow facial hair. We’ve generated more R plots than any sane person should. C is our desert island programming language. We are greedy: we want more.
We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

Julia History of Success

That proposal was very bold.
Not the first time that a new language arrives. It’s hard to carve out some market share.
Development started in 2009
Version 1.0 released in 2018
Funding model to preserve language development (JuliaComputing Inc)

Tiobe Index

Let’s browse the Thiobe Index!

What is Julia?

Modern language

Built for high performance and parallel: LLVM-JIT

Dynamically typed: good interactive use

Rich data type system

Many Packages.

Good Interoperability with other languages.

Benchmarks

Which problem does Julia want to solve?

Julia cofounder Stefan Karpinski talks about the 2 languages problem

Which problem does Julia want to solve?

Key: Wall in scientific software stack creates a social barrier. Developer and User are different people. (Basically: who knows C++/fortran?)

Scientific Software Stacks

Past: At least 4 langs!

Process messy data. String manipulation, plotting, data wrangling. R or python.
Prototype a numerical model. matlab
Work around speed bottlneck: rewrite (parts) in C++
Try to test (combo of googletest and R/python)
Analyse output data: R or python
Tie it all together with bash scripts or ruby etc.

Present: one lang only.

Process messy data. String manipulation, plotting, data wrangling. julia.
Prototype a numerical model. julia
Work around speed bottlneck: rewrite (parts) in julia
Test : julia
Analyse output data: julia
Tie it all together with julia.

Comparing Programming Languages in Economics

S. Boragoan Aruoba and Jesus Fernandez-Villaverde compare performance of languages on an Industry Standard Growth Model.

Conceptually difficult to write identical code in different languages. Authors do a good job. (How many tricks is one allowed to use before an implementation is no longer comparable? Wouldn’t users use all available tricks?)

I’ll show you the results from an updated version of the paper

Results

Not Mentioned

There are many other languages out there. We have not mentioned

Matlab: widely used in economics, but clearly inferior to julia (Expensive + license block)
Stata: very popular in microeconometrics, old design, commercial software.
Java
Rust
Scala
Go

`julia` Jaw-Droppers

What’s so cool about base julia?

Built-in Memory Mapping: BIG DATA

Big Data is everywhere, and all the cool kids do it.
Typical Out-of-core (OOC) computing on large systems is important and useful, but a bit more involved. E.g. spark.
A proper database will go a long way. See our duckdb tutorial
In julia we can do out-of-core computation without any external libraires.

Got Some Big Data, yeah?

Built-in Memory Mapping: BIG DATA

Suppose you have this binary file on your laptop, containing Float64 numbers.

floswald@PTL11077 > ls -lh large.bin
-rw-r--r--  1 floswald  staff    32G Jan 29 15:31 large.bin

Suppose your laptop has 32GB of RAM.
Suppose you don’t want to set up on a huge cluster to do you work.
You just want to compute a single mean from that data - maybe to decide whether it’s worth to pursue that particular research strategy.
How hard can it be?

What `R` you going to do?

Built-in Memory Mapping: BIG DATA

Come here R, my old friend.
Let’s see if we can compute that mean.

> (n_elements <- file.size("large.bin") / 8) # bytes per number
[1] 4.25e+09  # 4.25 billion numbers
> (GB_RAM_required_for_data <- file.size("large.bin") / 1e9)
[1] 34
> x <- readBin("large.bin", "numeric", n = n_elements)

Wow, this actually worked!
However, my OS put most of this data into swap memory.

That is - onto my storage device (an SSD disk)
What does that mean?

What `R` you going to do?

Built-in Memory Mapping: BIG DATA

What `R` you going to do?

Built-in Memory Mapping: BIG DATA

While this actually worked because my OS was bending backwards…

…it’s not a good solution:

> (n_elements <- file.size("large.bin") / 8)
[1] 4.25e+09
> x <- readBin("large.bin", "numeric", n = n_elements)
> system.time(mean(x))
    user  system elapsed 
    9.163  37.625 233.907 
> 233 / 60
[1] 3.883333

that took almost 4 minutes and put a huge strain on my system.

Out-of-core compute with `julia`?

Built-in Memory Mapping: BIG DATA

julia> using SharedArrays  # part of base julia

julia> @time s = SharedArray{Float64}(
    "/Users/floswald/git/CompEcon/large.bin",(Int(4.25e+09),)
    );
  0.585849 seconds (1.84 M allocations: 90.189 MiB, 11.90% gc time, 97.76% compilation time)

That allocated only 90 megabytes (!) of memory 🤯

Ok, and the mean?

julia> @time mean(s)
 54.721824 seconds (140.13 k allocations: 6.873 MiB, 0.07% compilation time)
0.4999939011980801

54 seconds, i.e more than a 4x speedup compared to R.
This worked because julia can map memory very efficiently.

Parallel Computation Out-of-the-box

Boostrapping…

…great for parallel demo - fully independent tasks
problem: each process needs their own copy of the data (think base R)

julia can share data across workers.

# started julia with the -p flag:
# $ julia -p auto
using Distributed  # distributed computing functions
nworkers()  # like that one
8  # julia processes running

using SharedArrays
s = SharedArray{Float64}((10^5,)); 
# put 100K random numbers into s
s .= rand(10^5);

Parallel Computation Out-of-the-box

Boostrapping…

@everywhere using Statistics
@everywhere n_bs_samples = 10_000  # num bootstrap samples

# a model...
@everywhere my_model(x) = rand(x, length(x))

# bootstrap sampling. heavy workload.
# creates an array of length n_bs_samples
@everywhere bs_means(x) = [mean(my_model(x)) for i in 1:n_bs_samples]

# single core version.
function f_1core(x)
    # compute the variance of those means
    # split over 8 independent batches
    var(vcat([bs_means(x) for i in 1:8]...))
end

# multi-core version
function f_multicore(x)
    # split a `promise` over 8 independent batches
    promise = [@spawn bs_means(x) for i in 1:8]
    # `fetch` triggers execution
    var(vcat([fetch(pr) for pr in promise]...))
end

Parallel Computation Out-of-the-box

Boostrapping…

floswald@PTL11077 > julia -p auto _snippets/bootstrap.jl

 31.704981 seconds (431.35 k allocations: 61.050 GiB, 6.51% gc time, 0.13% compilation time)

  4.879901 seconds (611.59 k allocations: 31.086 MiB, 0.06% gc time, 1.89% compilation time)

Info: result 1core = 8.387862736845913e-7
Info: result multicore = 8.295566662153075e-7

That’s a 6.5x speedup, almost linear with the 8 cores we had.
Given the effort required, I’ll take that any day.
Always Check the numbers on the output! (Why are they different?)

So What?

This is cool because it is part of the core language. Not an add-on package.
This means that contributed add-on packages can lean really hard on that infrastructure and build upon it.
The problem in R and python is that those stable foundations are largely missing, which causes issues.
That’s one of the things I really like about this language.

Final (Opinionated) Advice

It’s a good idea to learn a bit of C++. Much is based upon it, so you will see things differently. Don’t overinvest (do some quick tutorials).

Avoid learning FORTRAN unless you need to use legacy code.

Learn julia and R or python.

Stata is dominated by R.

Why julia?

Choosing a Language

Taxonomy of Languages

The Pros/Cons Rundown

Hello, World?

C/C++

C++ Code Example

C sum

C++: Pros and Cons

Pros

Cons

Python

python sum

Python Pros/Cons

Pros

Cons

Python Performance Analogy

R

R sum

R Pros/Cons

Pros

Cons

Fortran

Fortran Example

FORTRAN Pros/Cons

Pros

Cons

Victims of Speed vs Productivity Tradeoff?

Can We Have Both?

Case Study

🔥🔥 Our Time is Running Out 🔥🔥

Oswald (QE 2019): Homeownership and Location Choice

When the Clock is Ticking ⏰

When the Clock is Ticking ⏰

Why We Created Julia - 2012 blogpost

Julia History of Success

Tiobe Index

What is Julia?

Benchmarks

Which problem does Julia want to solve?

Which problem does Julia want to solve?

Scientific Software Stacks

Past: At least 4 langs!

Present: one lang only.

Comparing Programming Languages in Economics

Results

Not Mentioned

julia Jaw-Droppers

What’s so cool about base julia?

Built-in Memory Mapping: BIG DATA

Got Some Big Data, yeah?

Built-in Memory Mapping: BIG DATA

What R you going to do?

Built-in Memory Mapping: BIG DATA

What R you going to do?

Built-in Memory Mapping: BIG DATA

What R you going to do?

Built-in Memory Mapping: BIG DATA

Out-of-core compute with julia?

Built-in Memory Mapping: BIG DATA

Parallel Computation Out-of-the-box

Boostrapping…

Parallel Computation Out-of-the-box

Boostrapping…

Parallel Computation Out-of-the-box

Boostrapping…

So What?

Final (Opinionated) Advice

End

Why `julia`?

`C/C++`

`C sum`

`C++`: Pros and Cons

`python sum`

`R`

`R sum`

`R` Pros/Cons

`FORTRAN` Pros/Cons

`julia` Jaw-Droppers

What `R` you going to do?

What `R` you going to do?

What `R` you going to do?

Out-of-core compute with `julia`?