using DuckDB
using DataFrames
using Dates
using Chain
con = DBInterface.connect(DuckDB.DB)DuckDB.DB(":memory:")
Use the same queries from Julia
December 1, 2025
We have a julia package as well for duckdb. 1
We proceed as before with the other languages:
DuckDB.DB(":memory:")
Let’s run the first simple query again to get the number of rows, as before with R:
(rows = (nrows = [570009660],), cols = 19)
… and we know that this is a lot of data:
Same examples as previously:
0.778765916
That took 0.78 seconds. Here is the result:
Let’s try out some more aggregation queries. How about a slightly variation on a our first example query, where we (a) add “month” as a second grouping variable, and (b) subset to only the first three months of the year.
0.210989083
This time we clocked up 0.21 seconds - keep in mind that this is a lot of data to go through each time.
register_data_frameOne cool thing about the julia package is that you can register a local dataframe into the database, which means that there is no copy oepration performed. Duckdb will directly read that julia dataframe instead:
AppenderThe appender is an advanced duckdb feature which is optimized for fast row insertion into a database table. This feature is not available in neither R nor python packages. Here is an example. Let’s first create a new dataframe in your julia session and then read it row by row into an empty database table.
| Row | id | value | timestamp | date |
|---|---|---|---|---|
| Int64 | Float32 | DateTime | Date | |
| 1 | 1 | 0.885621 | 2025-12-01T22:29:56.677 | 2025-12-02 |
| 2 | 2 | 0.773069 | 2025-12-01T22:29:57.677 | 2025-12-03 |
| 3 | 3 | 0.807024 | 2025-12-01T22:29:58.677 | 2025-12-04 |
| 4 | 4 | 0.76593 | 2025-12-01T22:29:59.677 | 2025-12-05 |
| 5 | 5 | 0.588932 | 2025-12-01T22:30:00.677 | 2025-12-06 |
create a new database with an empty table called test_table1:
(Count = Int64[],)
Now let us prepare a SQL statement, into which we will repeatedly insert the row for each data. This makes use of the positional parameters ?:
DuckDB.Stmt(DuckDB.Connection(":memory:"), Ptr{Nothing}(0x000000011d74f620), "INSERT INTO test_table1 VALUES(?,?,?,?)\n", DuckDB.MaterializedResult)
We are ready to time this now.
39.3169185
On my machine this took 39.32 seconds. This is probably not the most efficient way to get data into your database, but it could happen (if you create the data yourself from some modeling task, for example).
Now, let’s try with the Appender API. We create an identical table, and attach an Appender object to it:
DuckDB.Appender(Ptr{Nothing}(0x0000000336f056d0))
let’s see how fast that insert operation is now:
0.449401458
Now only 0.45 seconds only! Holy smokes.
For the purpose of working with this notebook, we can currently only have a single quarto engine per notebook. While it’s perfectly possible to call both R and python code from a running julia session, one needs to annotate the code with some additional markup, which is not great for a learner. So I prefered to separate the julia version from the others. It’s most useful to create a local environment. To achieve this, you can open julia in the root of this document and do
and quarto will pick up this env when it sees engine: julia in the frontmatter.