Why julia?

Everybody needs a weapon of choice

University of Turin

Collegio Carlo Alberto

JPE Data Editor

2026-02-09

Choosing a Language

  • You are an economics grad student and need to choose your weapon of choice.
  • You will use more than one language. (This is a good thing.)
  • All Languages have pros and cons. I will give some opinionated advice.
  • Then I will force you to use a certain language to complete homeworks 😜

Taxonomy of Languages

The Pros/Cons Rundown

Hello, World?

  • It’s good custom to first print hello world when introducing a language. It’s fun, but pretty uninformative. Like, what’s the performance of printing “hello, world”?
  • Instead, we will show in each language how to implement the function sum over an array of values a: \[\text{sum}(a) = \sum_{i=1}^n a_i\] where \(n\) is the number of elements in \(a\)

  • Later, we will then benchmark each language to see tradeoff between high- and low level languages.

C/C++

  • The world pretty much runs on C++.
  • If you now some C++ and some Unix you know a lot already.
  • Developed at Bell Labs in 1980s.
  • All C programs are valid C++, not other way around.

C++ Code Example

//sumvec.cpp
#include <iostream>
#include <vector>
int main(){
    std::vector<int> x;
    for (int i=1;i<5;i++){
        x.push_back(i);
    }
    int sum = 0;
    for (std::vector<int>::iterator i=x.begin();i!=x.end();i++){
        sum += *i;
    }
    std::cout << "sum is " << sum << std::endl;
}
//compile
g++ sumvec.cpp -o sum.x

C sum

  • Defining the function:

    #include <stddef.h>
    #include <stdio.h>
    double c_sum(size_t n, double *X) {
        double s = 0.0;
        for (size_t i = 0; i < n; ++i) {
            s += X[i];
        }
        return s;
    }
    
    //main
    int main(void) {
        double x[] = {1, 2, 3, 4, 5};
        size_t n = sizeof(x) / sizeof(x[0]);
    
        double result = c_sum(n, x);
    
        printf("C-computed Sum = %.2f\n", result);
        return 0;
    }

C++: Pros and Cons

Pros

  • Very versatile.
  • Continuously Evolving. C++ 2020 standard is current.
  • Very performant.
  • Excellent open source compilers.
  • Very stable and widely used.
  • Large community.

Cons

  • Hard to learn. Pointers, Classes, OOP in general.
  • Some find it hard to work with Compiled languages.
  • It’s easy to overcomplicate things for novices.

Python

  • The general purpose language out there. Swiss Army Knife. All Terrain. 🚙
  • Open Source
  • Designed by Guido von Rossum.
  • Elegant, intuitive, full OOP support (Classes etc).

python sum

  • we just use the built-in sum:

    a = [1,2,3,4,5]
    sum(a)
  • that’s it!

Python Pros/Cons

Pros

  • Console: good for exploration.
  • Many useful libraries (NumPy, SciPy, Pandas, matplotlib)
  • Easy Unit Testing
  • Very Performant String Manipulation
  • Large community

Cons

Python Performance Analogy

R

  • High level open source language for statistical computing.
  • Ross Ihaka and Robert Gentleman developed R as a successor to John Chambers’ S
  • Tremendously rich add-on packages environment.
  • Has basic OOP support via S4 classes and methods.

R sum

  • we just use the built-in sum:

    a = 1:5
    sum(a)
  • that’s (again) it!

R Pros/Cons

Pros

  • Excellent IDEs Rstudio, positron
  • Thousands of high quality packages. tidyverse is a sensation in itself.
  • Many many econometrics-related packages.
  • Wide community.
  • Rcpp is good to connect to C++
  • Unbeatable for spatial data: the sf package.
  • Very good for data processing.

Cons

  • Base R is slow.
  • Not a modern language. Some quite arcane behaviours.
  • Rcpp means you end up writing C++ code.
  • Not straightforward to write performant low-level code close to the math (i.e.: loops)

Fortran

  • The Grandfather of all languages 👴🏽
  • FORTRAN (Formula Translation) was developed in 1957.
  • Still used for many scientific problems (nuclear bombs design, wheather forecasts, phyical experiments, etc) 💣 🌦

Fortran Example

  • create a text file test.f90:

    program sumit
        implicit none
        double precision a(5)   ! allocate an array with 5 slots
        a = (/1,2,3,4,5/)       ! fill with values
        print *,"result is ", sum(a)
    end program sumit
  • compile it and run it:

    $ gfortran test.f90 -o test
    $ ./test
     result is    15.000000000000000  

FORTRAN Pros/Cons

Pros

  • Relatively easy to learn.
  • Good array support built in.
  • Good parallelization support via MPI.
  • Fast.

Cons

  • Different compilers implement different standards (Intel Fortran vs GFORTRAN array constructor, e.g.)
  • Small user community.
  • Not very many tutorials.
  • Not faster than C++ (used to be true).
  • Hard to automatize unit testing via e.g. pfunit.
  • Language is very bare-bone.
  • Hard to process data.

Victims of Speed vs Productivity Tradeoff?

  • We have seen high-level langs (R, python etc) are good for productivity: Iterate fast on try, fail, repeat
  • We have seen that low level languages run fast, but are worse for productivity.
  • So we cannot have both speed and productivity.
  • Or can we?

Case Study

🔥🔥 Our Time is Running Out 🔥🔥

Oswald (QE 2019): Homeownership and Location Choice

  • I wanted to compute this model of housing and location choice on US micro data.

  • Large state space. 🚫 R, 🚫 python

  • Non-trivial estimation exercise.

  • I had no code to build upon. And only a vague idea of how to do this.

When the Clock is Ticking ⏰

  • repo starts with C++ (Blitz++)

  • Given my C++ level, this was taking too much time to develop!

  • No chance to finish in time. 😟

When the Clock is Ticking ⏰

  • repo starts with C++ (Blitz++)

  • Given my C++ level, this was taking too much time to develop!

  • No chance to finish in time. 😟

  • Enter julia! 🚀

Why We Created Julia - 2012 blogpost

We are power Matlab users. Some of us are Lisp hackers. Some are Pythonistas, others Rubyists, still others Perl hackers. There are those of us who used Mathematica before we could grow facial hair. There are those who still can’t grow facial hair. We’ve generated more R plots than any sane person should. C is our desert island programming language. We are greedy: we want more.
We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

Julia History of Success

  • That proposal was very bold.
  • Not the first time that a new language arrives. It’s hard to carve out some market share.
  • Development started in 2009
  • Version 1.0 released in 2018
  • Funding model to preserve language development (JuliaComputing Inc)

Tiobe Index

Let’s browse the Thiobe Index!

What is Julia?

  • Modern language
  • Built for high performance and parallel: LLVM-JIT
  • Dynamically typed: good interactive use
  • Rich data type system
  • Many Packages.
  • Good Interoperability with other languages.

Benchmarks

Which problem does Julia want to solve?

Which problem does Julia want to solve?

  • Key: Wall in scientific software stack creates a social barrier. Developer and User are different people. (Basically: who knows C++/fortran?)

Scientific Software Stacks

Past: At least 4 langs!

  • Process messy data. String manipulation, plotting, data wrangling. R or python.
  • Prototype a numerical model. matlab
  • Work around speed bottlneck: rewrite (parts) in C++
  • Try to test (combo of googletest and R/python)
  • Analyse output data: R or python
  • Tie it all together with bash scripts or ruby etc.

Present: one lang only.

  • Process messy data. String manipulation, plotting, data wrangling. julia.
  • Prototype a numerical model. julia
  • Work around speed bottlneck: rewrite (parts) in julia
  • Test : julia
  • Analyse output data: julia
  • Tie it all together with julia.

Comparing Programming Languages in Economics

  • Conceptually difficult to write identical code in different languages. Authors do a good job. (How many tricks is one allowed to use before an implementation is no longer comparable? Wouldn’t users use all available tricks?)

Results

Not Mentioned

There are many other languages out there. We have not mentioned

  • Matlab: widely used in economics, but clearly inferior to julia (Expensive + license block)
  • Stata: very popular in microeconometrics, old design, commercial software.
  • Java
  • Rust
  • Scala
  • Go

julia Jaw-Droppers

What’s so cool about base julia?

Built-in Memory Mapping: BIG DATA

  • Big Data is everywhere, and all the cool kids do it.
  • Typical Out-of-core (OOC) computing on large systems is important and useful, but a bit more involved. E.g. spark.
  • A proper database will go a long way. See our duckdb tutorial
  • In julia we can do out-of-core computation without any external libraires.

Got Some Big Data, yeah?

Built-in Memory Mapping: BIG DATA

  • Suppose you have this binary file on your laptop, containing Float64 numbers.

    floswald@PTL11077 > ls -lh large.bin
    -rw-r--r--  1 floswald  staff    32G Jan 29 15:31 large.bin
  • Suppose your laptop has 32GB of RAM.

  • Suppose you don’t want to set up on a huge cluster to do you work.

  • You just want to compute a single mean from that data - maybe to decide whether it’s worth to pursue that particular research strategy.

  • How hard can it be?

What R you going to do?

Built-in Memory Mapping: BIG DATA

  • Come here R, my old friend.
  • Let’s see if we can compute that mean.
> (n_elements <- file.size("large.bin") / 8) # bytes per number
[1] 4.25e+09  # 4.25 billion numbers
> (GB_RAM_required_for_data <- file.size("large.bin") / 1e9)
[1] 34
> x <- readBin("large.bin", "numeric", n = n_elements)


  • Wow, this actually worked!
  • However, my OS put most of this data into swap memory.
  • That is - onto my storage device (an SSD disk)
  • What does that mean?

What R you going to do?

Built-in Memory Mapping: BIG DATA

swap used: 20GB

disk IO during computation of mean(x)

What R you going to do?

Built-in Memory Mapping: BIG DATA

  • While this actually worked because my OS was bending backwards…

  • …it’s not a good solution:

    > (n_elements <- file.size("large.bin") / 8)
    [1] 4.25e+09
    > x <- readBin("large.bin", "numeric", n = n_elements)
    > system.time(mean(x))
        user  system elapsed 
        9.163  37.625 233.907 
    > 233 / 60
    [1] 3.883333
  • that took almost 4 minutes and put a huge strain on my system.

Out-of-core compute with julia?

Built-in Memory Mapping: BIG DATA

julia> using SharedArrays  # part of base julia

julia> @time s = SharedArray{Float64}(
    "/Users/floswald/git/CompEcon/large.bin",(Int(4.25e+09),)
    );
  0.585849 seconds (1.84 M allocations: 90.189 MiB, 11.90% gc time, 97.76% compilation time)
  • That allocated only 90 megabytes (!) of memory 🤯
  • Ok, and the mean?
julia> @time mean(s)
 54.721824 seconds (140.13 k allocations: 6.873 MiB, 0.07% compilation time)
0.4999939011980801
  • 54 seconds, i.e more than a 4x speedup compared to R.
  • This worked because julia can map memory very efficiently.

Parallel Computation Out-of-the-box

Boostrapping…

  • …great for parallel demo - fully independent tasks
  • problem: each process needs their own copy of the data (think base R)
  • julia can share data across workers.
# started julia with the -p flag:
# $ julia -p auto
using Distributed  # distributed computing functions
nworkers()  # like that one
8  # julia processes running

using SharedArrays
s = SharedArray{Float64}((10^5,)); 
# put 100K random numbers into s
s .= rand(10^5);  

Parallel Computation Out-of-the-box

Boostrapping…

@everywhere using Statistics
@everywhere n_bs_samples = 10_000  # num bootstrap samples

# a model...
@everywhere my_model(x) = rand(x, length(x))

# bootstrap sampling. heavy workload.
# creates an array of length n_bs_samples
@everywhere bs_means(x) = [mean(my_model(x)) for i in 1:n_bs_samples]

# single core version.
function f_1core(x)
    # compute the variance of those means
    # split over 8 independent batches
    var(vcat([bs_means(x) for i in 1:8]...))
end

# multi-core version
function f_multicore(x)
    # split a `promise` over 8 independent batches
    promise = [@spawn bs_means(x) for i in 1:8]
    # `fetch` triggers execution
    var(vcat([fetch(pr) for pr in promise]...))
end

Parallel Computation Out-of-the-box

Boostrapping…

floswald@PTL11077 > julia -p auto _snippets/bootstrap.jl

 31.704981 seconds (431.35 k allocations: 61.050 GiB, 6.51% gc time, 0.13% compilation time)

  4.879901 seconds (611.59 k allocations: 31.086 MiB, 0.06% gc time, 1.89% compilation time)

Info: result 1core = 8.387862736845913e-7
Info: result multicore = 8.295566662153075e-7


  • That’s a 6.5x speedup, almost linear with the 8 cores we had.
  • Given the effort required, I’ll take that any day.
  • Always Check the numbers on the output! (Why are they different?)

So What?

  • This is cool because it is part of the core language. Not an add-on package.

  • This means that contributed add-on packages can lean really hard on that infrastructure and build upon it.

  • The problem in R and python is that those stable foundations are largely missing, which causes issues.

  • That’s one of the things I really like about this language.

Final (Opinionated) Advice

  • It’s a good idea to learn a bit of C++. Much is based upon it, so you will see things differently. Don’t overinvest (do some quick tutorials).
  • Avoid learning FORTRAN unless you need to use legacy code.
  • Learn julia and R or python.
  • Stata is dominated by R.

End