This readme explains what types of extensions there are in DuckDB and how to build them. # What are DuckDB extensions? DuckDB extensions are libraries containing additional DuckDB functionality separate from the main codebase. These extensions can provide added functionality to DuckDB that can/should not live in DuckDB main code for various reasons. DuckDB extensions can be built in two ways. Firstly, they can be statically linked into DuckDBs executables (duckdb cli, unittest binary, benchmark runner binary, etc). Doing so will automatically make them available when using these binaries. Secondly, DuckDB has an extension loading mechanism to dynamically load extension binaries. # Extension Types DuckDB Extensions can de divided into different types: In-tree extensions and out-of-tree extensions. These types refer to where the extensions live and who maintains them. ### In-tree extensions In-tree extensions are extensions that live in the main DuckDB repository. These extensions are considered fundamental to DuckDB and/or tie into to DuckDB so deeply that changes to DuckDB are expected to regularly break them. We aim to keep the amount of in-tree extensions to a minimum and strive to move extensions out-of-tree where possible. ### Out-of-tree Extensions (OOTEs) Out-of-tree extensions live in separate repositories outside the main DuckDB repository. The reasons for moving extensions out-of-tree can vary. Firstly, moving extensions out of the main DuckDB code-base keeps the core DuckDB code smaller and less complex. Secondly, keeping extensions out-of-tree can be useful for licensing reasons. There are two main types of OOTEs. Firstly, there are the **DuckDB Managed OOTEs**. These are distributed through the main DuckDB CI. These extensions are signed using DuckDBs signing key and are maintained by the DuckDB team. Some examples are the `sqlite_scanner` and `postgres_scanner` extensions. The DuckDB Managed OOTEs are distributed automatically with every release of DuckDB. For the current list of extensions in this category check out `.github/config/out_of_tree_extensions.cmake` Secondly, there are **External OOTEs**. Extensions in this category are not tied to the DuckDB CI, but instead their CI/CD runs in their own repository. The maintainer of the external OOTE repo is responsible for testing, distribution and making sure that an up-to-date version of the extension is available. Depending on who maintains the extension, these extensions may or may not be signed. # Building extensions Under the hood, all types of extensions are built the same way, which is using the DuckDB's root `CMakeLists.txt` file as root CMake file and passing the extensions that should be build to it. DuckDB has various methods to configure which extensions to build. Additionally, we can configure for each extension how we want to build it: for example, whether to only build the loadable extension, or also link the extension in the DuckDB binaries. There's different ways to load extensions in DuckDB with various ## Makefile/Cmake variables The simplest way to specify which extensions to load is using the `DUCKDB_EXTENSIONS` variable. To specify which extensions to build when making duckdb set the extensions variable to a `;` separated list of extensions names. For example: ```bash DUCKDB_EXTENSIONS='json;icu' make ``` The `DUCKDB_EXTENSIONS` variable is simply passed to a CMake variable `BUILD_EXTENSIONS` which can also be invoked directly: ```bash cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_EXTENSIONS='parquet;icu;tpch;tpcds;fts;json' ``` ## Makefile environment variables Another way to specify building an extension is with the `BUILD_` variables defined in the root `Makefile` in this repository. For example, to build the JSON extension, simply run `BUILD_JSON=1 make`. These Makevars should be added manually for each extension and are simply syntactic sugar around the DUCKDB_EXTENSIONS variable. ## Config files To have more control over how in-tree extensions are built, extension config files should be used. These config files are simply CMake files that are included by DuckDB's CMake build. There are 4 different places that will be searched for config files: 1) The base configuration `extension/extension_config.cmake`. The extensions specified here will be built every time DuckDB is built. This configuration is always loaded. 2) (Optional) The client specific extensions specification in `tools/*/duckdb_extension_config.cmake`. These config specify which extensions are built and linked into each client. 3) (Optional) The local configuration file `extension/extension_config_local.cmake` This is where you would specify extensions you need included in your local/custom/dev build of DuckDB. This file is gitignored and to be created by the developer. 4) (Optional) Additional configuration files passed to the `DUCKDB_EXTENSION_CONFIGS` parameter. This can be used to point DuckDB to config files stored anywhere on the machine. DuckDB will load these config files in reverse order and ignore subsequent calls to load an extension with the same name. This allows overriding the base configuration of an extension by providing a different configuration in the local config. For example, currently the parquet extension is always statically linked into DuckDB, because of this line in `extension/extension_config.cmake`: ```cmake duckdb_extension_load(parquet) ``` Now say we want to build DuckDB with our custom parquet extension, and we also don't want to link this statically in DuckDB, but only produce the loadable binary. We can achieve this creating the `extension/extension_config_local.cmake` file and adding: ```cmake duckdb_extension_load(parquet DONT_LINK SOURCE_DIR /path/to/my/custom/parquet ) ``` Now when we run `make` cmake will output: ```shell -- Building extension 'parquet' from 'path/to/my/custom/parquet' -- Extensions built but not linked: parquet ``` # Using extension config files The `duckdb_extension_load` function is used in the configuration files to specify how an extension should be loaded. There are 3 different ways this can be done. For some examples, check out `.github/config/*.cmake`. These are the configurations used in DuckDBs CI to select which extensions are built. ## Automatic loading The simplest way to load an extension is just passing the extension name. This will automatically try to load the extension. Optionally, the DONT_LINK parameter can be passed to disable linking the extension into DuckDB. ```cmake duckdb_extension_load( (DONT_LINK)) ``` This configuration of `duckdb_extension_load` will search the `./extension` and `./extension_external` directories for extensions and attempt to load them if possible. Note that the `extension_external` directory does not exist but should be created and populated with the out-of-tree extensions that should be built. Extensions based on the [extension-template](https://github.com/duckdb/extension-template) should work out of the box using this automatic loading when placed in the `extension_external` directory. ## Custom path When extensions are located in a path or their project structure is different from that the [extension-template](https://github.com/duckdb/extension-template), the `SOURCE_DIR` and `INCLUDE_DIR` variables can be used to tell DuckDB how to load the extension: ```cmake duckdb_extension_load( (DONT_LINK) SOURCE_DIR (INCLUDE_DIR ) ) ``` ## Remote GitHub repo Directly installing extensions from GitHub repositories is also supported. This will download the extension to the current cmake build directory and build it from there: ```cmake duckdb_extension_load(postgres_scanner (DONT_LINK) GIT_URL https://github.com/duckdb/postgres_scanner GIT_TAG cd043b49cdc9e0d3752535b8333c9433e1007a48 ) ``` # Explicitly disabling extensions Because the sometimes you may want to override extensions set by other configurations, explicitly disabling extensions is also possible using the `DONT_BUILD flag`. This will disable the extension from being built all together. For example, to build DuckDB without the parquet extension which is enabled by default, in `extension/extension_config_local.cmake` specify: ```cmake duckdb_extension_load(parquet DONT_BUILD) ``` Note that this can also be done from the Makefile: ```bash DUCKDB_EXTENSIONS='tpch;json' SKIP_EXTENSIONS=parquet make ``` results in: ```bash ... -- Building extension 'tpch' from '/Users/sam/Development/duckdb/extensions' -- Building extension 'json' from '/Users/sam/Development/duckdb/extensions' -- Extensions linked into DuckDB: tpch, json -- Extensions explicitly skipped: parquet ... ``` # VCPKG dependency management DuckDB extensions can use [VCPKG](https://vcpkg.io/en/) to manage their dependencies. Check out the [Extension Template](https://github.com/duckdb/extension-template) for an example on how to set up vcpkg in extensions. ## Building DuckDB with multiple extensions that use vcpkg To build duckdb with multiple extensions that all use vcpkg, some extra steps are required. This is due to the fact that each extension will specify their own vcpkg.json manifest for their dependencies, but vcpkg allows only a single manifest. The workaround here is to merge the dependencies from the manifests of all extensions being built. This repo contains a script to do automatically perform this merge. ### Example build with 2 extensions using vcpkg For example, lets say we want to create a DuckDB binary which has two extensions statically linked that each use vcpkg. The first step is to add the two extensions to `extension/extension_config_local.cmake`: ```cmake duckdb_extension_load(extension_1 GIT_URL https://github.com/example/extension_1 GIT_TAG some_git_hash ) duckdb_extension_load(extension_2 GIT_URL https://github.com/example/extension_2 GIT_TAG some_git_hash ) ``` Now to merge the vcpkg.json manifests from these two extension run: ```shell make extension_configuration ``` This will create a merged manifest in `./build/extension_configuration/vcpkg.json`. Next, run: ```shell USE_MERGED_VCPKG_MANIFEST=1 VCPKG_TOOLCHAIN_PATH="/path/to/your/vcpkg/installation" make ``` which will use the merged manifest to install all required dependencies, build `extension_1` and `extension_2`, build DuckDB, and finally link both extensions into DuckDB.