Quickstart for dbt Core using DuckDB

dbt Core

Quickstart

Beginner

Introduction

In this quickstart guide, you'll learn how to use dbt Core with DuckDB, enabling you to get set up quickly and efficiently. DuckDB is an open-source database management system which is designed for analytical workloads. It is designed to provide fast and easy access to large datasets, making it well-suited for data analytics tasks.

This guide will demonstrate how to:

Create a virtual development environment using a template provided by dbt Labs.
- This sets up a fully functional dbt environment with an operational and executable project. The codespace automatically connects to the DuckDB database and loads a year's worth of data from our fictional Jaffle Shop café, which sells food and beverages in several US cities.
- For additional information, refer to the README for the Jaffle Shop template. It includes instructions on how to do this, along with animated GIFs.
Run any dbt command from the environment’s terminal.
Generate a larger dataset for the Jaffle Shop café (for example, five years of data instead of just one).

You can learn more through high-quality dbt Learn courses and workshops.

Prerequisites

When using DuckDB with dbt Core, you'll need to use the dbt command-line interface (CLI). Currently, DuckDB is not supported in dbt Cloud.
It's important that you know some basics of the terminal. In particular, you should understand cd, ls , and pwd to navigate through the directory structure of your computer easily.
You have a GitHub account.

Set up DuckDB for dbt Core

This section will provide a step-by-step guide for setting up DuckDB for use in local (Mac and Windows) environments and web browsers.

In the repository, there's a requirements.txt file which is used to install dbt Core, DuckDB, and all other necessary dependencies. You can check this file to see what will be installed on your machine. It's typically located in the root directory of your project.

The requirements.txt file is placed at the top level of your dbt project directory, alongside other key files like dbt_project.yml:

/my_dbt_project/
├── dbt_project.yml
├── models/
│   ├── my_model.sql
├── tests/
│   ├── my_test.sql
└── requirements.txt

For more information on the setup of DuckDB, you can refer to DuckDB setup.

Local
Web browser

First, clone the Jaffle Shop git repository by running the following command in your terminal:
```
git clone https://github.com.dbt-labs/jaffle_shop_duckdb.git
```
Change into the docs-duckdb directory from the command line:
```
cd jaffle_shop_duck_db
```

Install dbt Core and DuckDB in a virtual environment.

Example for Mac

Example for Windows

Example for Windows PowerShell

Ensure your profile is setup correctly from the command line by running the following:
- dbt compile — generates executable SQL from your project source files
- dbt run — compiles and runs your project
- dbt test — compiles and tests your project
- dbt build — compiles, runs, and tests your project
- dbt docs generate — generates your project's documentation.
- dbt docs serve — starts a webserver on port 8080 to serve your documentation locally and opens the documentation site in your default browser.
For complete details, refer to the dbt command reference.

note

The steps will fail if you decide to run this project in your data warehouse (outside of this DuckDB demo). You will need to reconfigure the project files for your warehouse. Definitely consider this if you are using a community-contributed adapter.

Troubleshoot

Could not set lock on file error

Go to the jaffle-shop-template repository after you log in to your GitHub account.
Click Use this template at the top of the page and choose Create new repository.
Click Create repository from template when you’re done setting the options for your new repository.
Click Code (at the top of the new repository’s page). Under the Codespaces tab, choose Create codespace on main. Depending on how you've configured your computer's settings, this either opens a new browser tab with the Codespace development environment with VSCode running in it or opens a new VSCode window with the codespace in it.
Wait for the codespace to finish building by waiting for the postCreateCommand command to complete; this can take several minutes:

Wait for postCreateCommand to complete

When this command completes, you can start using the codespace development environment. The terminal the command ran in will close and you will get a prompt in a brand new terminal.
At the terminal's prompt, you can execute any dbt command you want. For example:
```
/workspaces/test (main) $ dbt build
```
You can also use the duckcli to write SQL against the warehouse from the command line or build reports in the Evidence project provided in the reports directory.

For complete information, refer to the dbt command reference. Common commands are:
- dbt compile — generates executable SQL from your project source files
- dbt run — compiles and runs your project
- dbt test — compiles and tests your project
- dbt build — compiles, runs, and tests your project

Generate a larger data set

If you'd like to work with a larger selection of Jaffle Shop data, you can generate an arbitrary number of years of fictitious data from within your codespace.

Install the Python package called jafgen. At the terminal's prompt, run:
```
/workspaces/test (main) $ python -m pip install jafgen
```
When installation is done, run:
```
/workspaces/test (main) $ jafgen --years NUMBER_OF_YEARS
```
Replace NUMBER_OF_YEARS with the number of years you want to simulate. This command builds the CSV files and stores them in the jaffle-data folder, and is automatically sourced based on the sources.yml file and the dbt-duckdb adapter.

As you increase the number of years, it takes exponentially more time to generate the data because the Jaffle Shop stores grow in size and number. For a good balance of data size and time to build, dbt Labs suggests a maximum of 6 years.

Introduction​

Related content​

Prerequisites​

Set up DuckDB for dbt Core​

Troubleshoot​

Generate a larger data set​

Introduction

Related content

Prerequisites

Set up DuckDB for dbt Core

Troubleshoot

Generate a larger data set