should be it
This commit is contained in:
185
external/duckdb/extension/README.md
vendored
Normal file
185
external/duckdb/extension/README.md
vendored
Normal file
@@ -0,0 +1,185 @@
|
||||
This readme explains what types of extensions there are in DuckDB and how to build them.
|
||||
|
||||
# What are DuckDB extensions?
|
||||
DuckDB extensions are libraries containing additional DuckDB functionality separate from the main codebase. These
|
||||
extensions can provide added functionality to DuckDB that can/should not live in DuckDB main code for various reasons.
|
||||
DuckDB extensions can be built in two ways. Firstly, they can be statically linked into DuckDBs executables (duckdb cli,
|
||||
unittest binary, benchmark runner binary, etc). Doing so will automatically make them available when using these binaries.
|
||||
Secondly, DuckDB has an extension loading mechanism to dynamically load extension binaries.
|
||||
|
||||
# Extension Types
|
||||
DuckDB Extensions can de divided into different types: In-tree extensions and out-of-tree extensions. These types refer
|
||||
to where the extensions live and who maintains them.
|
||||
|
||||
### In-tree extensions
|
||||
In-tree extensions are extensions that live in the main DuckDB repository. These extensions are considered fundamental
|
||||
to DuckDB and/or tie into to DuckDB so deeply that changes to DuckDB are expected to regularly break them. We aim to
|
||||
keep the amount of in-tree extensions to a minimum and strive to move extensions out-of-tree where possible.
|
||||
### Out-of-tree Extensions (OOTEs)
|
||||
Out-of-tree extensions live in separate repositories outside the main DuckDB repository. The reasons for moving extensions
|
||||
out-of-tree can vary. Firstly, moving extensions out of the main DuckDB code-base keeps the core DuckDB code smaller
|
||||
and less complex. Secondly, keeping extensions out-of-tree can be useful for licensing reasons.
|
||||
|
||||
There are two main types of OOTEs. Firstly, there are the **DuckDB Managed OOTEs**. These are distributed through the main
|
||||
DuckDB CI. These extensions are signed using DuckDBs signing key and are maintained by the DuckDB team. Some examples are
|
||||
the `sqlite_scanner` and `postgres_scanner` extensions. The DuckDB Managed OOTEs are distributed automatically with every
|
||||
release of DuckDB. For the current list of extensions in this category check out `.github/config/out_of_tree_extensions.cmake`
|
||||
|
||||
Secondly, there are **External OOTEs**. Extensions in this category are not tied to the DuckDB CI, but instead their CI/CD
|
||||
runs in their own repository. The maintainer of the external OOTE repo is responsible for testing, distribution and making
|
||||
sure that an up-to-date version of the extension is available. Depending on who maintains the extension, these extensions
|
||||
may or may not be signed.
|
||||
|
||||
# Building extensions
|
||||
Under the hood, all types of extensions are built the same way, which is using the DuckDB's root `CMakeLists.txt` file as root CMake file
|
||||
and passing the extensions that should be build to it. DuckDB has various methods to configure which extensions to build.
|
||||
Additionally, we can configure for each extension how we want to build it: for example, whether to only
|
||||
build the loadable extension, or also link the extension in the DuckDB binaries. There's different ways to load extensions
|
||||
in DuckDB with various
|
||||
|
||||
## Makefile/Cmake variables
|
||||
The simplest way to specify which extensions to load is using the `DUCKDB_EXTENSIONS` variable. To specify which extensions
|
||||
to build when making duckdb set the extensions variable to a `;` separated list of extensions names. For example:
|
||||
```bash
|
||||
DUCKDB_EXTENSIONS='json;icu' make
|
||||
```
|
||||
The `DUCKDB_EXTENSIONS` variable is simply passed to a CMake variable `BUILD_EXTENSIONS` which can also be invoked directly:
|
||||
```bash
|
||||
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_EXTENSIONS='parquet;icu;tpch;tpcds;fts;json'
|
||||
```
|
||||
|
||||
## Makefile environment variables
|
||||
Another way to specify building an extension is with the `BUILD_<extension name>` variables defined in the root
|
||||
`Makefile` in this repository. For example, to build the JSON extension, simply run `BUILD_JSON=1 make`. These Makevars
|
||||
should be added manually for each extension and are simply syntactic sugar around the DUCKDB_EXTENSIONS variable.
|
||||
|
||||
## Config files
|
||||
To have more control over how in-tree extensions are built, extension config files should be used. These config files
|
||||
are simply CMake files that are included by DuckDB's CMake build. There are 4 different places that will be searched
|
||||
for config files:
|
||||
|
||||
1) The base configuration `extension/extension_config.cmake`. The extensions specified here will be built every time DuckDB
|
||||
is built. This configuration is always loaded.
|
||||
2) (Optional) The client specific extensions specification in `tools/*/duckdb_extension_config.cmake`. These config specify
|
||||
which extensions are built and linked into each client.
|
||||
3) (Optional) The local configuration file `extension/extension_config_local.cmake` This is where you would specify extensions you need
|
||||
included in your local/custom/dev build of DuckDB. This file is gitignored and to be created by the developer.
|
||||
4) (Optional) Additional configuration files passed to the `DUCKDB_EXTENSION_CONFIGS` parameter. This can be used to point DuckDB
|
||||
to config files stored anywhere on the machine.
|
||||
|
||||
DuckDB will load these config files in reverse order and ignore subsequent calls to load an extension with the
|
||||
same name. This allows overriding the base configuration of an extension by providing a different configuration
|
||||
in the local config. For example, currently the parquet extension is always statically linked into DuckDB, because of this
|
||||
line in `extension/extension_config.cmake`:
|
||||
```cmake
|
||||
duckdb_extension_load(parquet)
|
||||
```
|
||||
Now say we want to build DuckDB with our custom parquet extension, and we also don't want to link this statically in DuckDB,
|
||||
but only produce the loadable binary. We can achieve this creating the `extension/extension_config_local.cmake` file and adding:
|
||||
```cmake
|
||||
duckdb_extension_load(parquet
|
||||
DONT_LINK
|
||||
SOURCE_DIR /path/to/my/custom/parquet
|
||||
)
|
||||
```
|
||||
Now when we run `make` cmake will output:
|
||||
```shell
|
||||
-- Building extension 'parquet' from 'path/to/my/custom/parquet'
|
||||
-- Extensions built but not linked: parquet
|
||||
```
|
||||
|
||||
# Using extension config files
|
||||
The `duckdb_extension_load` function is used in the configuration files to specify how an extension should
|
||||
be loaded. There are 3 different ways this can be done. For some examples, check out `.github/config/*.cmake`. These are
|
||||
the configurations used in DuckDBs CI to select which extensions are built.
|
||||
|
||||
## Automatic loading
|
||||
The simplest way to load an extension is just passing the extension name. This will automatically try to load the extension.
|
||||
Optionally, the DONT_LINK parameter can be passed to disable linking the extension into DuckDB.
|
||||
```cmake
|
||||
duckdb_extension_load(<extension_name> (DONT_LINK))
|
||||
```
|
||||
This configuration of `duckdb_extension_load` will search the `./extension` and `./extension_external` directories for
|
||||
extensions and attempt to load them if possible. Note that the `extension_external` directory does not exist but should
|
||||
be created and populated with the out-of-tree extensions that should be built. Extensions based on the
|
||||
[extension-template](https://github.com/duckdb/extension-template) should work out of the box using this automatic
|
||||
loading when placed in the `extension_external` directory.
|
||||
|
||||
## Custom path
|
||||
When extensions are located in a path or their project structure is different from that the
|
||||
[extension-template](https://github.com/duckdb/extension-template), the `SOURCE_DIR` and `INCLUDE_DIR` variables can
|
||||
be used to tell DuckDB how to load the extension:
|
||||
```cmake
|
||||
duckdb_extension_load(<extension_name>
|
||||
(DONT_LINK)
|
||||
SOURCE_DIR <absolute_path_to_extension_root>
|
||||
(INCLUDE_DIR <absolute_path_to_extension_header>)
|
||||
)
|
||||
```
|
||||
|
||||
## Remote GitHub repo
|
||||
Directly installing extensions from GitHub repositories is also supported. This will download the extension to the current
|
||||
cmake build directory and build it from there:
|
||||
```cmake
|
||||
duckdb_extension_load(postgres_scanner
|
||||
(DONT_LINK)
|
||||
GIT_URL https://github.com/duckdb/postgres_scanner
|
||||
GIT_TAG cd043b49cdc9e0d3752535b8333c9433e1007a48
|
||||
)
|
||||
```
|
||||
|
||||
# Explicitly disabling extensions
|
||||
Because the sometimes you may want to override extensions set by other configurations, explicitly disabling extensions is
|
||||
also possible using the `DONT_BUILD flag`. This will disable the extension from being built all together. For example, to build DuckDB without the parquet extension which is enabled by default, in `extension/extension_config_local.cmake` specify:
|
||||
```cmake
|
||||
duckdb_extension_load(parquet DONT_BUILD)
|
||||
```
|
||||
Note that this can also be done from the Makefile:
|
||||
```bash
|
||||
DUCKDB_EXTENSIONS='tpch;json' SKIP_EXTENSIONS=parquet make
|
||||
```
|
||||
results in:
|
||||
```bash
|
||||
...
|
||||
-- Building extension 'tpch' from '/Users/sam/Development/duckdb/extensions'
|
||||
-- Building extension 'json' from '/Users/sam/Development/duckdb/extensions'
|
||||
-- Extensions linked into DuckDB: tpch, json
|
||||
-- Extensions explicitly skipped: parquet
|
||||
...
|
||||
```
|
||||
|
||||
# VCPKG dependency management
|
||||
DuckDB extensions can use [VCPKG](https://vcpkg.io/en/) to manage their dependencies. Check out the [Extension Template](https://github.com/duckdb/extension-template) for an example
|
||||
on how to set up vcpkg in extensions.
|
||||
|
||||
## Building DuckDB with multiple extensions that use vcpkg
|
||||
To build duckdb with multiple extensions that all use vcpkg, some extra steps are required. This is due to the fact that each
|
||||
extension will specify their own vcpkg.json manifest for their dependencies, but vcpkg allows only a single manifest. The workaround here
|
||||
is to merge the dependencies from the manifests of all extensions being built. This repo contains a script to do automatically perform this merge.
|
||||
|
||||
### Example build with 2 extensions using vcpkg
|
||||
For example, lets say we want to create a DuckDB binary which has two extensions statically linked that each use vcpkg. The first step is to add the two extensions
|
||||
to `extension/extension_config_local.cmake`:
|
||||
```cmake
|
||||
duckdb_extension_load(extension_1
|
||||
GIT_URL https://github.com/example/extension_1
|
||||
GIT_TAG some_git_hash
|
||||
)
|
||||
duckdb_extension_load(extension_2
|
||||
GIT_URL https://github.com/example/extension_2
|
||||
GIT_TAG some_git_hash
|
||||
)
|
||||
```
|
||||
|
||||
Now to merge the vcpkg.json manifests from these two extension run:
|
||||
```shell
|
||||
make extension_configuration
|
||||
```
|
||||
This will create a merged manifest in `./build/extension_configuration/vcpkg.json`.
|
||||
|
||||
Next, run:
|
||||
```shell
|
||||
USE_MERGED_VCPKG_MANIFEST=1 VCPKG_TOOLCHAIN_PATH="/path/to/your/vcpkg/installation" make
|
||||
```
|
||||
which will use the merged manifest to install all required dependencies, build `extension_1` and `extension_2`, build DuckDB,
|
||||
and finally link both extensions into DuckDB.
|
||||
Reference in New Issue
Block a user