should be it

This commit is contained in:
2025-10-24 19:21:19 -05:00
parent a4b23fc57c
commit f09560c7b1
14047 changed files with 3161551 additions and 1 deletions

View File

@@ -0,0 +1,2 @@
Manifest.toml

View File

@@ -0,0 +1,7 @@
Copyright 2018-2024 Stichting DuckDB Foundation
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

149
external/duckdb/tools/juliapkg/README.md vendored Normal file
View File

@@ -0,0 +1,149 @@
# Official DuckDB Julia Package
DuckDB is a high-performance in-process analytical database system. It is designed to be fast, reliable and easy to use. For more information on the goals of DuckDB, please refer to [the Why DuckDB page on our website](https://duckdb.org/why_duckdb).
The DuckDB Julia package provides a high-performance front-end for DuckDB. Much like SQLite, DuckDB runs in-process within the Julia client, and provides a DBInterface front-end.
The package also supports multi-threaded execution. It uses Julia threads/tasks for this purpose. If you wish to run queries in parallel, you must launch Julia with multi-threading support (by e.g. setting the `JULIA_NUM_THREADS` environment variable).
## Installation
```julia
pkg> add DuckDB
julia> using DuckDB
```
## Basics
```julia
# create a new in-memory database
con = DBInterface.connect(DuckDB.DB, ":memory:")
# create a table
DBInterface.execute(con, "CREATE TABLE integers(i INTEGER)")
# insert data using a prepared statement
stmt = DBInterface.prepare(con, "INSERT INTO integers VALUES(?)")
DBInterface.execute(stmt, [42])
# query the database
results = DBInterface.execute(con, "SELECT 42 a")
print(results)
```
## Scanning DataFrames
The DuckDB Julia package also provides support for querying Julia DataFrames. Note that the DataFrames are directly read by DuckDB - they are not inserted or copied into the database itself.
If you wish to load data from a DataFrame into a DuckDB table you can run a `CREATE TABLE AS` or `INSERT INTO` query.
```julia
using DuckDB
using DataFrames
# create a new in-memory database
con = DBInterface.connect(DuckDB.DB)
# create a DataFrame
df = DataFrame(a = [1, 2, 3], b = [42, 84, 42])
# register it as a view in the database
DuckDB.register_data_frame(con, df, "my_df")
# run a SQL query over the DataFrame
results = DBInterface.execute(con, "SELECT * FROM my_df")
print(results)
```
## Original Julia Connector
Credits to kimmolinna for the [original DuckDB Julia connector](https://github.com/kimmolinna/DuckDB.jl).
## Contributing to the Julia Package
### Formatting
The format script must be run when changing anything. This can be done by running the following command from within the root directory of the project:
```bash
julia tools/juliapkg/scripts/format.jl
```
### Testing
You can run the tests using the `test.sh` script:
```
./test.sh
```
Specific test files can be run by adding the name of the file as an argument:
```
./test.sh test_connection.jl
```
### Development
Build using `DISABLE_SANITIZER=1 make debug`
To run against a locally compiled version of duckdb, you'll need to set the `JULIA_DUCKDB_LIBRARY` environment variable, e.g.:
```bash
export JULIA_DUCKDB_LIBRARY="`pwd`/../../build/debug/src/libduckdb.dylib"
```
Note that Julia pre-compilation caching might get in the way of changes to this variable taking effect. You can clear these caches using the following command:
```bash
rm -rf ~/.julia/compiled
```
For development a few packages are required, these live in a Project.toml in the `test` directory, installed like so:
```bash
cd tools/juliapkg
```
```julia
using Pkg
Pkg.activate("./test")
Pkg.instantiate()
```
#### Debugging using LLDB
Julia's builtin version management system `juliaup` can get in the way of starting a process with lldb attached as it provides a shim for the `julia` binary.
The actual `julia` binaries live in `~/.julia/juliaup/<version>/bin/julia`
`lldb -- julia ...` will likely not work and you'll need to provide the absolute path of the julia binary, e.g:
```bash
lldb -- ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/bin/julia ...
```
#### Testing
To run the test suite in it's entirety:
```bash
julia -e "import Pkg; Pkg.activate(\".\"); include(\"test/runtests.jl\")"
```
To run a specific test listed in `test/runtests.jl`, you can provide the name, e.g:
```bash
julia -e "import Pkg; Pkg.activate(\".\"); include(\"test/runtests.jl\")" "test_basic_queries.jl"
```
Just as mentioned before, to attach lldb to this, you'll have to replace the `julia` part with the absolute path.
### Automatic API generation
A base Julia wrapper around the C-API is generated using the `update_api.sh` script (which internally calls the python script `scripts/generate_c_api_julia.py`). This script uses the definitions of DuckDB C-API to automatically generate the Julia wrapper that is complete and consistent with the C-API. To generate the wrapper, just run:
```bash
./update_api.sh
```
### Submitting a New Package
The DuckDB Julia package depends on the [DuckDB_jll package](https://github.com/JuliaBinaryWrappers/DuckDB_jll.jl), which can be updated by sending a PR to [Yggdrassil](https://github.com/JuliaPackaging/Yggdrasil/pull/5049).
After the `DuckDB_jll` package is updated, the DuckDB package can be updated by incrementing the version number (and dependency version numbers) in `Project.toml`, followed by [adding a comment containing the text `@JuliaRegistrator register subdir=tools/juliapkg`](https://github.com/duckdb/duckdb/commit/88b59799f41fce7cbe166e5c33d0d5f6d480278d#commitcomment-76533721) to the commit.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

4
external/duckdb/tools/juliapkg/format.sh vendored Executable file
View File

@@ -0,0 +1,4 @@
set -e
cd ../..
julia tools/juliapkg/scripts/format.jl

View File

@@ -0,0 +1,16 @@
set -e
if [[ $(git diff) ]]; then
echo "There are already differences prior to the format! Commit your changes prior to running format_check.sh"
exit 1
fi
./format.sh
if [[ $(git diff) ]]; then
echo "Julia format found differences:"
git diff
exit 1
else
echo "No differences found"
exit 0
fi

View File

@@ -0,0 +1,91 @@
import subprocess
import os
import argparse
import re
parser = argparse.ArgumentParser(description='Publish a Julia release.')
parser.add_argument(
'--yggdrassil-fork',
dest='yggdrassil',
action='store',
help='Fork of the Julia Yggdrassil repository (https://github.com/JuliaPackaging/Yggdrasil)',
default='/Users/myth/Programs/Yggdrasil',
)
args = parser.parse_args()
if not os.path.isfile(os.path.join('tools', 'juliapkg', 'release.py')):
print('This script must be run from the root DuckDB directory (i.e. `python3 tools/juliapkg/release.py`)')
exit(1)
def run_syscall(syscall, ignore_failure=False):
res = os.system(syscall)
if ignore_failure:
return
if res != 0:
print(f'Failed to execute {syscall}: got exit code {str(res)}')
exit(1)
# helper script to generate a julia release
duckdb_path = os.getcwd()
# fetch the latest tags
os.system('git fetch upstream --tags')
proc = subprocess.Popen(['git', 'show-ref', '--tags'], stdout=subprocess.PIPE)
tags = [x for x in proc.stdout.read().decode('utf8').split('\n') if len(x) > 0 and 'master-builds' not in x]
def extract_tag(x):
keys = x.split('refs/tags/')[1].lstrip('v').split('.')
return int(keys[0]) * 10000000 + int(keys[1]) * 10000 + int(keys[2])
tags.sort(key=extract_tag)
# latest tag
splits = tags[-1].split(' ')
hash = splits[0]
tag = splits[1].replace('refs/tags/', '')
if tag[0] != 'v':
print(f"Tag {tag} does not start with a v?")
exit(1)
print(f'Creating a Julia release from the latest tag {tag} with commit hash {hash}')
print('> Creating a PR to the Yggdrassil repository (https://github.com/JuliaPackaging/Yggdrasil)')
os.chdir(args.yggdrassil)
run_syscall('git checkout master')
run_syscall('git pull upstream master')
run_syscall(f'git branch -D {tag}', True)
run_syscall(f'git checkout -b {tag}')
tarball_build = os.path.join('D', 'DuckDB', 'build_tarballs.jl')
with open(tarball_build, 'r') as f:
text = f.read()
text = re.sub('\nversion = v["][0-9.]+["]\n', f'\nversion = v"{tag[1:]}"\n', text)
text = re.sub(
'GitSource[(]["]https[:][/][/]github[.]com[/]duckdb[/]duckdb[.]git["][,] ["][a-zA-Z0-9]+["][)]',
f'GitSource("https://github.com/duckdb/duckdb.git", "{hash}")',
text,
)
with open(tarball_build, 'w+') as f:
f.write(text)
run_syscall(f'git add {tarball_build}')
run_syscall(f'git commit -m "[DuckDB] Bump to {tag}"')
run_syscall(f'git push --set-upstream origin {tag}')
run_syscall(
f'gh pr create --title "[DuckDB] Bump to {tag}" --repo "https://github.com/JuliaPackaging/Yggdrasil" --body ""'
)
print('PR has been created.\n')
print(f'Next up we need to bump the version and DuckDB_jll version to {tag} in `tools/juliapkg/Project.toml`')
print('This is not yet automated.')
print(
'> After that PR is merged - we need to post a comment containing the text `@JuliaRegistrator register subdir=tools/juliapkg`'
)
print('> For example, see https://github.com/duckdb/duckdb/commit/0f0461113f3341135471805c9928c4d71d1f5874')

View File

@@ -0,0 +1,4 @@
using JuliaFormatter
format("tools/juliapkg/src")
format("tools/juliapkg/test")

View File

@@ -0,0 +1,143 @@
import os
import json
import re
import glob
import copy
from packaging.version import Version
from functools import reduce
from pathlib import Path
EXT_API_DEFINITION_PATTERN = "src/include/duckdb/main/capi/header_generation/apis/v1/*/*.json"
# The JSON files that define all available CAPI functions
CAPI_FUNCTION_DEFINITION_FILES = 'src/include/duckdb/main/capi/header_generation/functions/**/*.json'
# The original order of the function groups in the duckdb.h files. We maintain this for easier PR reviews.
# TODO: replace this with alphabetical ordering in a separate PR
ORIGINAL_FUNCTION_GROUP_ORDER = [
'open_connect',
'configuration',
'query_execution',
'result_functions',
'safe_fetch_functions',
'helpers',
'date_time_timestamp_helpers',
'hugeint_helpers',
'unsigned_hugeint_helpers',
'decimal_helpers',
'prepared_statements',
'bind_values_to_prepared_statements',
'execute_prepared_statements',
'extract_statements',
'pending_result_interface',
'value_interface',
'logical_type_interface',
'data_chunk_interface',
'vector_interface',
'validity_mask_functions',
'scalar_functions',
'aggregate_functions',
'table_functions',
'table_function_bind',
'table_function_init',
'table_function',
'replacement_scans',
'profiling_info',
'appender',
'table_description',
'arrow_interface',
'threading_information',
'streaming_result_interface',
'cast_functions',
'expression_interface',
]
def get_extension_api_version(ext_api_definitions):
latest_version = ""
for version_entry in ext_api_definitions:
if version_entry["version"].startswith("v"):
latest_version = version_entry["version"]
if version_entry["version"].startswith("unstable_"):
break
return latest_version
# Parse the CAPI_FUNCTION_DEFINITION_FILES to get the full list of functions
def parse_capi_function_definitions(function_definition_file_pattern):
# Collect all functions
# function_files = glob.glob(CAPI_FUNCTION_DEFINITION_FILES, recursive=True)
function_files = glob.glob(function_definition_file_pattern, recursive=True)
function_groups = []
function_map = {}
# Read functions
for file in function_files:
with open(file, "r") as f:
try:
json_data = json.loads(f.read())
except json.decoder.JSONDecodeError as err:
print(f"Invalid JSON found in {file}: {err}")
exit(1)
function_groups.append(json_data)
for function in json_data["entries"]:
if function["name"] in function_map:
print(f"Duplicate symbol found when parsing C API file {file}: {function['name']}")
exit(1)
function["group"] = json_data["group"]
if "deprecated" in json_data:
function["group_deprecated"] = json_data["deprecated"]
function_map[function["name"]] = function
# Reorder to match original order: purely intended to keep the PR review sane
function_groups_ordered = []
if len(function_groups) != len(ORIGINAL_FUNCTION_GROUP_ORDER):
print(
"The list used to match the original order of function groups in the original the duckdb.h file does not match the new one. Did you add a new function group? please also add it to ORIGINAL_FUNCTION_GROUP_ORDER for now."
)
for order_group in ORIGINAL_FUNCTION_GROUP_ORDER:
curr_group = next(group for group in function_groups if group["group"] == order_group)
function_groups.remove(curr_group)
function_groups_ordered.append(curr_group)
return (function_groups_ordered, function_map)
# Read extension API
def parse_ext_api_definitions(ext_api_definition):
api_definitions = {}
versions = []
dev_versions = []
for file in list(glob.glob(ext_api_definition)):
with open(file, "r") as f:
try:
obj = json.loads(f.read())
api_definitions[obj["version"]] = obj
if obj["version"].startswith("unstable_"):
dev_versions.append(obj["version"])
else:
if Path(file).stem != obj["version"]:
print(
f"\nMismatch between filename and version in file for {file}. Note that unstable versions should have a version starting with 'unstable_' and that stable versions should have the version as their filename"
)
exit(1)
versions.append(obj["version"])
except json.decoder.JSONDecodeError as err:
print(f"\nInvalid JSON found in {file}: {err}")
exit(1)
versions.sort(key=Version)
dev_versions.sort()
return [api_definitions[x] for x in (versions + dev_versions)]

View File

@@ -0,0 +1,918 @@
import argparse
import logging
import os
import pathlib
import re
from types import NoneType
from typing import Dict, List, NotRequired, TypedDict, Union
from generate_c_api import (
EXT_API_DEFINITION_PATTERN,
get_extension_api_version,
parse_capi_function_definitions,
parse_ext_api_definitions,
)
class FunctionDefParam(TypedDict):
type: str
name: str
class FunctionDefComment(TypedDict):
description: str
param_comments: dict[str, str]
return_value: str
class FunctionDef(TypedDict):
name: str
group: str
deprecated: bool
group_deprecated: bool
return_type: str
params: list[FunctionDefParam]
comment: FunctionDefComment
class FunctionGroup(TypedDict):
group: str
deprecated: bool
entries: list[FunctionDef]
class DuckDBApiInfo(TypedDict):
version: str
commit: NotRequired[str]
def parse_c_type(type_str: str, type: list[str] = []):
"""Parses simple C types (no function pointer or array types) and returns a list of the type components.
Args:
type_str: A C type string to parse, e.g.: "const char* const"
type: List to track components, used for recursion. Defaults to [].
Returns:
list: A list of the type components, e.g.: "const char* const" -> ["Const Ptr", "const char"]
"""
type_str = type_str.strip()
ptr_pattern = r"^(.*)\*(\s*const\s*)?$"
if (m1 := re.match(ptr_pattern, type_str)) is not None:
before_ptr = m1.group(1)
is_const = bool(m1.group(2))
type.append("Const Ptr" if is_const else "Ptr")
return parse_c_type(before_ptr, type)
type.append(type_str)
return type
JULIA_RESERVED_KEYWORDS = {
"function",
"if",
"else",
"while",
"for",
"try",
"catch",
"finally",
"return",
"break",
"continue",
"end",
"begin",
"quote",
"let",
"local",
"global",
"const",
"do",
"struct",
"mutable",
"abstract",
"type",
"module",
"using",
"import",
"export",
"public",
}
JULIA_BASE_TYPE_MAP = {
# Julia Standard Types
"char": "Char",
"int": "Int",
"int8_t": "Int8",
"int16_t": "Int16",
"int32_t": "Int32",
"int64_t": "Int64",
"uint8_t": "UInt8",
"uint16_t": "UInt16",
"uint32_t": "UInt32",
"uint64_t": "UInt64",
"double": "Float64",
"float": "Float32",
"bool": "Bool",
"void": "Cvoid",
"size_t": "Csize_t",
# DuckDB specific types
"idx_t": "idx_t",
"duckdb_type": "DUCKDB_TYPE",
"duckdb_string_t": "duckdb_string_t", # INLINE prefix with pointer string type
"duckdb_string": "duckdb_string", # Pointer + size type
"duckdb_table_function": "duckdb_table_function", # actually struct pointer
"duckdb_table_function_t": "duckdb_table_function_ptr", # function pointer type
"duckdb_cast_function": "duckdb_cast_function", # actually struct pointer
"duckdb_cast_function_t": "duckdb_cast_function_ptr", # function pointer type
}
# TODO this the original order of the functions in `api.jl` and is only used to keep the PR review small
JULIA_API_ORIGINAL_ORDER = [
"duckdb_open",
"duckdb_open_ext",
"duckdb_close",
"duckdb_connect",
"duckdb_disconnect",
"duckdb_create_config",
"duckdb_config_count",
"duckdb_get_config_flag",
"duckdb_set_config",
"duckdb_destroy_config",
"duckdb_query",
"duckdb_destroy_result",
"duckdb_column_name",
"duckdb_column_type",
"duckdb_column_logical_type",
"duckdb_column_count",
"duckdb_row_count",
"duckdb_rows_changed",
"duckdb_column_data",
"duckdb_nullmask_data",
"duckdb_result_error",
"duckdb_result_get_chunk",
"duckdb_result_is_streaming",
"duckdb_stream_fetch_chunk",
"duckdb_result_chunk_count",
"duckdb_value_boolean",
"duckdb_value_int8",
"duckdb_value_int16",
"duckdb_value_int32",
"duckdb_value_int64",
"duckdb_value_hugeint",
"duckdb_value_uhugeint",
"duckdb_value_uint8",
"duckdb_value_uint16",
"duckdb_value_uint32",
"duckdb_value_uint64",
"duckdb_value_float",
"duckdb_value_double",
"duckdb_value_date",
"duckdb_value_time",
"duckdb_value_timestamp",
"duckdb_value_interval",
"duckdb_value_varchar",
"duckdb_value_varchar_internal",
"duckdb_value_is_null",
"duckdb_malloc",
"duckdb_free",
"duckdb_vector_size",
"duckdb_from_time_tz",
"duckdb_prepare",
"duckdb_destroy_prepare",
"duckdb_prepare_error",
"duckdb_nparams",
"duckdb_param_type",
"duckdb_bind_boolean",
"duckdb_bind_int8",
"duckdb_bind_int16",
"duckdb_bind_int32",
"duckdb_bind_int64",
"duckdb_bind_hugeint",
"duckdb_bind_uhugeint",
"duckdb_bind_uint8",
"duckdb_bind_uint16",
"duckdb_bind_uint32",
"duckdb_bind_uint64",
"duckdb_bind_float",
"duckdb_bind_double",
"duckdb_bind_date",
"duckdb_bind_time",
"duckdb_bind_timestamp",
"duckdb_bind_interval",
"duckdb_bind_varchar",
"duckdb_bind_varchar_length",
"duckdb_bind_blob",
"duckdb_bind_null",
"duckdb_execute_prepared",
"duckdb_pending_prepared",
"duckdb_pending_prepared_streaming",
"duckdb_pending_execute_check_state",
"duckdb_destroy_pending",
"duckdb_pending_error",
"duckdb_pending_execute_task",
"duckdb_execute_pending",
"duckdb_pending_execution_is_finished",
"duckdb_destroy_value",
"duckdb_create_varchar",
"duckdb_create_varchar_length",
"duckdb_create_int64",
"duckdb_get_varchar",
"duckdb_get_int64",
"duckdb_create_logical_type",
"duckdb_create_decimal_type",
"duckdb_get_type_id",
"duckdb_decimal_width",
"duckdb_decimal_scale",
"duckdb_decimal_internal_type",
"duckdb_enum_internal_type",
"duckdb_enum_dictionary_size",
"duckdb_enum_dictionary_value",
"duckdb_list_type_child_type",
"duckdb_struct_type_child_count",
"duckdb_union_type_member_count",
"duckdb_struct_type_child_name",
"duckdb_union_type_member_name",
"duckdb_struct_type_child_type",
"duckdb_union_type_member_type",
"duckdb_destroy_logical_type",
"duckdb_create_data_chunk",
"duckdb_destroy_data_chunk",
"duckdb_data_chunk_reset",
"duckdb_data_chunk_get_column_count",
"duckdb_data_chunk_get_size",
"duckdb_data_chunk_set_size",
"duckdb_data_chunk_get_vector",
"duckdb_vector_get_column_type",
"duckdb_vector_get_data",
"duckdb_vector_get_validity",
"duckdb_vector_ensure_validity_writable",
"duckdb_list_vector_get_child",
"duckdb_list_vector_get_size",
"duckdb_struct_vector_get_child",
"duckdb_union_vector_get_member",
"duckdb_vector_assign_string_element",
"duckdb_vector_assign_string_element_len",
"duckdb_create_table_function",
"duckdb_destroy_table_function",
"duckdb_table_function_set_name",
"duckdb_table_function_add_parameter",
"duckdb_table_function_set_extra_info",
"duckdb_table_function_set_bind",
"duckdb_table_function_set_init",
"duckdb_table_function_set_local_init",
"duckdb_table_function_set_function",
"duckdb_table_function_supports_projection_pushdown",
"duckdb_register_table_function",
"duckdb_bind_get_extra_info",
"duckdb_bind_add_result_column",
"duckdb_bind_get_parameter_count",
"duckdb_bind_get_parameter",
"duckdb_bind_set_bind_data",
"duckdb_bind_set_cardinality",
"duckdb_bind_set_error",
"duckdb_init_get_extra_info",
"duckdb_init_get_bind_data",
"duckdb_init_set_init_data",
"duckdb_init_get_column_count",
"duckdb_init_get_column_index",
"duckdb_init_set_max_threads",
"duckdb_init_set_error",
"duckdb_function_get_extra_info",
"duckdb_function_get_bind_data",
"duckdb_function_get_init_data",
"duckdb_function_get_local_init_data",
"duckdb_function_set_error",
"duckdb_add_replacement_scan",
"duckdb_replacement_scan_set_function_name",
"duckdb_replacement_scan_add_parameter",
"duckdb_replacement_scan_set_error",
"duckdb_appender_create",
"duckdb_appender_error",
"duckdb_appender_flush",
"duckdb_appender_close",
"duckdb_appender_destroy",
"duckdb_appender_begin_row",
"duckdb_appender_end_row",
"duckdb_append_bool",
"duckdb_append_int8",
"duckdb_append_int16",
"duckdb_append_int32",
"duckdb_append_int64",
"duckdb_append_hugeint",
"duckdb_append_uhugeint",
"duckdb_append_uint8",
"duckdb_append_uint16",
"duckdb_append_uint32",
"duckdb_append_uint64",
"duckdb_append_float",
"duckdb_append_double",
"duckdb_append_date",
"duckdb_append_time",
"duckdb_append_timestamp",
"duckdb_append_interval",
"duckdb_append_varchar",
"duckdb_append_varchar_length",
"duckdb_append_blob",
"duckdb_append_null",
"duckdb_execute_tasks",
"duckdb_create_task_state",
"duckdb_execute_tasks_state",
"duckdb_execute_n_tasks_state",
"duckdb_finish_execution",
"duckdb_task_state_is_finished",
"duckdb_destroy_task_state",
"duckdb_execution_is_finished",
"duckdb_create_scalar_function",
"duckdb_destroy_scalar_function",
"duckdb_scalar_function_set_name",
"duckdb_scalar_function_add_parameter",
"duckdb_scalar_function_set_return_type",
"duckdb_scalar_function_set_function",
"duckdb_register_scalar_function",
]
class JuliaApiTarget:
indent: int = 0
linesep: str = os.linesep
type_maps: dict[str, str] = {} # C to Julia
inverse_type_maps: dict[str, list[str]] = {} # Julia to C
deprecated_functions: list[str] = []
type_map: dict[str, str]
# Functions to skip
skipped_functions = set()
skip_deprecated_functions = False
# Explicit function order
manual_order: Union[List[str], NoneType] = None
overwrite_function_signatures = {}
# Functions that use indices either as ARG or RETURN and should be converted to 1-based indexing
auto_1base_index: bool
auto_1base_index_return_functions = set()
auto_1base_index_ignore_functions = set()
def __init__(
self,
file,
indent=0,
auto_1base_index=True,
auto_1base_index_return_functions=set(),
auto_1base_index_ignore_functions=set(),
skipped_functions=set(),
skip_deprecated_functions=False,
type_map={},
overwrite_function_signatures={},
):
# check if file is a string or a file object
if isinstance(file, str) or isinstance(file, pathlib.Path):
self.filename = pathlib.Path(file)
else:
raise ValueError("file must be a string or a path object")
self.indent = indent
self.auto_1base_index = auto_1base_index
self.auto_1base_index_return_functions = auto_1base_index_return_functions
self.auto_1base_index_ignore_functions = auto_1base_index_ignore_functions
self.linesep = os.linesep
self.type_map = type_map
self.skipped_functions = skipped_functions
self.skip_deprecated_functions = skip_deprecated_functions
self.overwrite_function_signatures = overwrite_function_signatures
super().__init__()
def __enter__(self):
self.file = open(self.filename, "w")
return self
def __exit__(self, exc_type, exc_value, traceback):
self.file.close()
def write_empty_line(self, n=1) -> None:
"""Writes an empty line to the output file."""
for i in range(n):
self.file.write(self.linesep)
def _get_casted_type(self, type_str: str, is_return_arg=False, auto_remove_t_suffix=True):
type_str = type_str.strip()
type_definition = parse_c_type(type_str, [])
def reduce_type(type_list: list[str]):
if len(type_list) == 0:
return ""
t = type_list[0]
if len(type_list) == 1:
is_const = False # Track that the type is const, even though we cannot use it in Julia
if t.startswith("const "):
t, is_const = t.removeprefix("const "), True
if t in self.type_map:
return self.type_map[t]
else:
if auto_remove_t_suffix and t.endswith("_t"):
t = t.removesuffix("_t")
if " " in t:
raise (ValueError(f"Unknown type: {t}"))
return t
# Handle Pointer types
if t not in ("Ptr", "Const Ptr"):
raise ValueError(f"Unexpected non-pointer type: {t}")
if len(type_list) >= 2 and type_list[1].strip() in (
"char",
"const char",
):
return "Cstring"
else:
if is_return_arg:
# Use Ptr for return types, because they are not tracked by the Julia GC
return "Ptr{" + reduce_type(type_list[1:]) + "}"
else:
# Prefer Ref over Ptr for arguments
return "Ref{" + reduce_type(type_list[1:]) + "}"
return reduce_type(type_definition)
def _is_index_argument(self, name: str, function_obj: FunctionDef):
# Check if the argument is (likely) an index
if name not in (
"index",
"idx",
"i",
"row",
"col",
"column",
"col_idx",
"column_idx",
"column_index",
"row_idx",
"row_index",
"chunk_index",
# "param_idx", # TODO creates errors in bind_param
):
return False
x = None
for param in function_obj["params"]:
if param["name"] == name:
x = param
break
arg_type = self._get_casted_type(x["type"])
if arg_type not in (
"Int",
"Int64",
"UInt",
"UInt64",
"idx_t",
"idx" "Int32",
"UInt32",
"Csize_t",
):
return False
return True
def get_argument_names_and_types(self, function_obj: FunctionDef):
def _get_arg_name(name: str):
if name in JULIA_RESERVED_KEYWORDS:
return f"_{name}"
return name
arg_names = [_get_arg_name(param["name"]) for param in function_obj["params"]]
if function_obj["name"] in self.overwrite_function_signatures:
return_type, arg_types = self.overwrite_function_signatures[function_obj["name"]]
return arg_names, arg_types
arg_types = [self._get_casted_type(param["type"]) for param in function_obj["params"]]
return arg_names, arg_types
def is_index1_function(self, function_obj: FunctionDef):
fname = function_obj["name"]
if not self.auto_1base_index:
return [False for param in function_obj["params"]], False
if fname in self.auto_1base_index_ignore_functions:
return [False for param in function_obj["params"]], False
is_index1_return = fname in self.auto_1base_index_return_functions
is_index1_arg = [self._is_index_argument(param["name"], function_obj) for param in function_obj["params"]]
return is_index1_arg, is_index1_return
def _write_function_docstring(self, function_obj: FunctionDef):
r"""_create_function_docstring
Example:
```julia
\"\"\"
duckdb_get_int64(value)
Obtains an int64 of the given value.
# Arguments
- `value`: The value
Returns: The int64 value, or 0 if no conversion is possible
\"\"\"
```
Args:
function_obj: _description_
"""
description = function_obj.get("comment", {}).get("description", "").strip()
description = description.replace('"', '\\"') # escape double quotes
index1_args, index1_return = self.is_index1_function(function_obj)
# Arguments
arg_names, arg_types = self.get_argument_names_and_types(function_obj)
arg_comments = []
for ix, (name, param, t, is_index1) in enumerate(
zip(arg_names, function_obj["params"], arg_types, index1_args)
):
param_comment = function_obj.get("comment", {}).get("param_comments", {}).get(param["name"], "")
if is_index1:
parts = [f"`{name}`:", f"`{t}`", "(1-based index)", param_comment]
else:
parts = [f"`{name}`:", f"`{t}`", param_comment]
arg_comments.append(" ".join(parts))
arg_names_s = ", ".join(arg_names)
# Return Values
return_type = self._get_casted_type(function_obj["return_type"], is_return_arg=True)
if return_type == "Cvoid":
return_type = "Nothing" # Cvoid is equivalent to Nothing in Julia
return_comments = [
f"`{return_type}`",
function_obj.get("comment", {}).get("return_value", ""),
]
if index1_return:
return_comments.append("(1-based index)")
return_value_comment = " ".join(return_comments)
self.file.write(f"{' ' * self.indent}\"\"\"\n")
self.file.write(f"{' ' * self.indent} {function_obj['name']}({arg_names_s})\n")
self.file.write(f"{' ' * self.indent}\n")
self.file.write(f"{' ' * self.indent}{description}\n")
self.file.write(f"{' ' * self.indent}\n")
self.file.write(f"{' ' * self.indent}# Arguments\n")
for i, arg_name in enumerate(arg_names):
self.file.write(f"{' ' * self.indent}- {arg_comments[i]}\n")
self.file.write(f"{' ' * self.indent}\n")
self.file.write(f"{' ' * self.indent}Returns: {return_value_comment}\n")
self.file.write(f"{' ' * self.indent}\"\"\"\n")
def _get_depwarning_message(self, function_obj: FunctionDef):
description = function_obj.get("comment", {}).get("description", "")
if not description.startswith("**DEPRECATION NOTICE**:"):
description = f"**DEPRECATION NOTICE**: {description}"
# Only use the first line of the description
notice = description.split("\n")[0]
notice = notice.replace("\n", " ").replace('"', '\\"').strip()
return notice
def _write_function_depwarn(self, function_obj: FunctionDef, indent: int = 0):
"""
Writes a deprecation warning for a function.
Example:
```julia
Base.depwarn(
"The `G` type parameter will be deprecated in a future release. " *
"Please use `MyType(args...)` instead of `MyType{$G}(args...)`.",
:MyType,
)
```
"""
indent = self.indent + indent # total indent
notice = self._get_depwarning_message(function_obj)
self.file.write(f"{' ' * indent}Base.depwarn(\n")
self.file.write(f"{' ' * indent} \"{notice}\",\n")
self.file.write(f"{' ' * indent} :{function_obj['name']},\n")
self.file.write(f"{' ' * indent})\n")
def _list_to_julia_tuple(self, lst):
if len(lst) == 0:
return "()"
elif len(lst) == 1:
return f"({lst[0]},)"
else:
return f"({', '.join(lst)})"
def _write_function_definition(self, function_obj: FunctionDef):
fname = function_obj["name"]
index1_args, index1_return = self.is_index1_function(function_obj)
arg_names, arg_types = self.get_argument_names_and_types(function_obj)
arg_types_tuple = self._list_to_julia_tuple(arg_types)
arg_names_definition = ", ".join(arg_names)
arg_names_call = []
for arg_name, is_index1 in zip(arg_names, index1_args):
if is_index1:
arg_names_call.append(f"{arg_name} - 1")
else:
arg_names_call.append(arg_name)
arg_names_call = ", ".join(arg_names_call)
return_type = self._get_casted_type(function_obj["return_type"], is_return_arg=True)
self.file.write(f"{' ' * self.indent}function {fname}({arg_names_definition})\n")
if function_obj.get("group_deprecated", False) or function_obj.get("deprecated", False):
self._write_function_depwarn(function_obj, indent=1)
self.file.write(
f"{' ' * self.indent} return ccall((:{fname}, libduckdb), {return_type}, {arg_types_tuple}, {arg_names_call}){' + 1' if index1_return else ''}\n"
)
self.file.write(f"{' ' * self.indent}end\n")
def write_function(self, function_obj: FunctionDef):
if function_obj["name"] in self.skipped_functions:
return
if function_obj.get("group_deprecated", False) or function_obj.get("deprecated", False):
self.deprecated_functions.append(function_obj["name"])
self._write_function_docstring(function_obj)
self._write_function_definition(function_obj)
def write_footer(self):
self.write_empty_line(n=1)
s = """
# !!!!!!!!!!!!
# WARNING: this file is autogenerated by scripts/generate_c_api_julia.py, manual changes will be overwritten
# !!!!!!!!!!!!
"""
self.file.write(s)
self.write_empty_line()
def write_header(self, version=""):
s = """
###############################################################################
#
# DuckDB Julia API
#
# !!!!!!!!!!!!
# WARNING: this file is autogenerated by scripts/generate_c_api_julia.py, manual changes will be overwritten
# !!!!!!!!!!!!
#
###############################################################################
using Base.Libc
if "JULIA_DUCKDB_LIBRARY" in keys(ENV)
libduckdb = ENV["JULIA_DUCKDB_LIBRARY"]
else
using DuckDB_jll
end
"""
if version[0] == "v":
# remove the v prefix and use Julia Version String
version = version[1:]
self.file.write(s)
self.file.write("\n")
self.file.write(f'DUCKDB_API_VERSION = v"{version}"\n')
self.file.write("\n")
def write_functions(
self,
version,
function_groups: List[FunctionGroup],
function_map: Dict[str, FunctionDef],
):
self._analyze_types(function_groups) # Create the julia type map
self.write_header(version)
self.write_empty_line()
if self.manual_order is not None:
current_group = None
for f in self.manual_order:
if f not in function_map:
print(f"WARNING: Function {f} not found in function_map")
continue
if current_group != function_map[f]["group"]:
current_group = function_map[f]["group"]
self.write_group_start(current_group)
self.write_empty_line()
self.write_function(function_map[f])
self.write_empty_line()
# Write new functions
self.write_empty_line(n=1)
self.write_group_start("New Functions")
self.write_empty_line(n=2)
current_group = None
for group in function_groups:
for fn in group["entries"]:
if fn["name"] in self.manual_order:
continue
if current_group != group["group"]:
current_group = group["group"]
self.write_group_start(current_group)
self.write_empty_line()
self.write_function(fn)
self.write_empty_line()
else:
for group in function_groups:
self.write_group_start(group["group"])
self.write_empty_line()
for fn in group["entries"]:
self.write_function(fn)
self.write_empty_line()
self.write_empty_line()
self.write_empty_line()
self.write_footer()
def _analyze_types(self, groups: List[FunctionGroup]):
for group in groups:
for fn in group["entries"]:
for param in fn["params"]:
if param["type"] not in self.type_maps:
self.type_maps[param["type"]] = self._get_casted_type(param["type"])
if fn["return_type"] not in self.type_maps:
self.type_maps[fn["return_type"]] = self._get_casted_type(fn["return_type"])
for k, v in self.type_maps.items():
if v not in self.inverse_type_maps:
self.inverse_type_maps[v] = []
self.inverse_type_maps[v].append(k)
return
def write_group_start(self, group):
group = group.replace("_", " ").strip()
# make group title uppercase
group = " ".join([x.capitalize() for x in group.split(" ")])
self.file.write(f"# {'-' * 80}\n")
self.file.write(f"# {group}\n")
self.file.write(f"# {'-' * 80}\n")
@staticmethod
def get_function_order(filepath):
path = pathlib.Path(filepath)
if not path.exists() or not path.is_file():
raise FileNotFoundError(f"File {path} does not exist")
with open(path, "r") as f:
lines = f.readlines()
is_julia_file = path.suffix == ".jl"
if not is_julia_file:
# read the file and assume that we have a function name per line
return [x.strip() for x in lines if x.strip() != ""]
# find the function definitions
# TODO this a very simple regex that only supports the long function form `function name(...)`
function_regex = r"^function\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*\("
function_order = []
for line in lines:
line = line.strip()
if line.startswith("#"):
continue
m = re.match(function_regex, line)
if m is not None:
function_order.append(m.group(1))
return function_order
def main():
"""Main function to generate the Julia API."""
print("Creating Julia API")
parser = configure_parser()
args = parser.parse_args()
print("Arguments:")
for k, v in vars(args).items():
print(f" {k}: {v}")
julia_path = pathlib.Path(args.output)
enable_auto_1base_index = args.auto_1_index
enable_original_order = args.use_original_order
capi_defintions_dir = pathlib.Path(args.capi_dir)
ext_api_definition_pattern = str(capi_defintions_dir) + "/apis/v1/*/*.json"
capi_function_definition_pattern = str(capi_defintions_dir) + "/functions/**/*.json"
ext_api_definitions = parse_ext_api_definitions(ext_api_definition_pattern)
ext_api_version = get_extension_api_version(ext_api_definitions)
function_groups, function_map = parse_capi_function_definitions(capi_function_definition_pattern)
overwrite_function_signatures = {
# Must be Ptr{Cvoid} and not Ref
"duckdb_free": (
"Cvoid",
("Ptr{Cvoid}",),
),
"duckdb_bind_blob": (
"duckdb_state",
("duckdb_prepared_statement", "idx_t", "Ptr{Cvoid}", "idx_t"),
),
"duckdb_vector_assign_string_element_len": (
"Cvoid",
(
"duckdb_vector",
"idx_t",
"Ptr{UInt8}",
"idx_t",
), # Must be Ptr{UInt8} instead of Cstring to allow '\0' in the middle
),
}
with JuliaApiTarget(
julia_path,
indent=0,
auto_1base_index=enable_auto_1base_index, # WARNING: every arg named "col/row/index" or similar will be 1-based indexed, so the argument is subtracted by 1
auto_1base_index_return_functions={"duckdb_init_get_column_index"},
auto_1base_index_ignore_functions={
"duckdb_parameter_name", # Parameter names start at 1
"duckdb_param_type", # Parameter types (like names) start at 1
"duckdb_param_logical_type", # ...
"duckdb_bind_get_parameter", # Would be breaking API change
},
skipped_functions={},
type_map=JULIA_BASE_TYPE_MAP,
overwrite_function_signatures=overwrite_function_signatures,
) as printer:
if enable_original_order:
print("INFO: Using the original order of the functions from the old API file.")
printer.manual_order = JULIA_API_ORIGINAL_ORDER
printer.write_functions(ext_api_version, function_groups, function_map)
if args.print_type_mapping:
print("Type maps: (Julia Type -> C Type)")
K = list(printer.inverse_type_maps.keys())
K.sort()
for k in K:
if k.startswith("Ptr") or k.startswith("Ref"):
continue
v = ", ".join(printer.inverse_type_maps[k])
print(f" {k} -> {v}")
print("Julia API generated successfully!")
print("Please review the mapped types and check the generated file:")
print("Hint: also run './format.sh' to format the file and reduce the diff.")
print(f"Output: {julia_path}")
def configure_parser():
parser = argparse.ArgumentParser(description="Generate the DuckDB Julia API")
parser.add_argument(
"--auto-1-index",
action="store_true",
default=True,
help="Automatically convert 0-based indices to 1-based indices",
)
parser.add_argument(
"--use-original-order",
action="store_true",
default=False,
help="Use the original order of the functions from the old API file. New functions will be appended at the end.",
)
parser.add_argument(
"--print-type-mapping",
action="store_true",
default=False,
help="Print the type mapping from C to Julia",
)
parser.add_argument(
"--capi-dir",
type=str,
required=True,
help="Path to the input C API definitions. Should be a directory containing JSON files.",
)
parser.add_argument(
"output",
type=str,
# default="src/api.jl",
help="Path to the output file",
)
return parser
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,41 @@
module DuckDB
using DBInterface
using WeakRefStrings
using Tables
using Base.Libc
using Dates
using Tables
using UUIDs
using FixedPointDecimals
export DBInterface, DuckDBException
abstract type ResultType end
struct MaterializedResult <: ResultType end
struct StreamResult <: ResultType end
include("helper.jl")
include("exceptions.jl")
include("ctypes.jl")
include("api.jl")
include("api_helper.jl")
include("logical_type.jl")
include("value.jl")
include("validity_mask.jl")
include("vector.jl")
include("data_chunk.jl")
include("config.jl")
include("database.jl")
include("statement.jl")
include("result.jl")
include("transaction.jl")
include("ddl.jl")
include("appender.jl")
include("table_function.jl")
include("scalar_function.jl")
include("replacement_scan.jl")
include("table_scan.jl")
include("old_interface.jl")
end # module

8254
external/duckdb/tools/juliapkg/src/api.jl vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,30 @@
"""
duckdb_free(s::Cstring)
Free a Cstring allocated by DuckDB. This function is a wrapper around `duckdb_free`.
"""
function duckdb_free(s::Cstring)
p = pointer(s)
return duckdb_free(p)
end
"""
Retrieves the member vector of a union vector.
The resulting vector is valid as long as the parent vector is valid.
* vector: The vector
* index: The member index
* returns: The member vector
"""
function duckdb_union_vector_get_member(vector, index)
return ccall(
(:duckdb_struct_vector_get_child, libduckdb),
duckdb_vector,
(duckdb_vector, UInt64),
vector,
1 + (index - 1)
)
end

View File

@@ -0,0 +1,131 @@
using Dates
"""
Appender(db_connection, table, [schema])
An appender object that can be used to append rows to an existing table.
* DateTime objects in Julia are stored in milliseconds since the Unix epoch but are converted to microseconds when stored in duckdb.
* Time objects in Julia are stored in nanoseconds since midnight but are converted to microseconds when stored in duckdb.
* Missing and Nothing are stored as NULL in duckdb, but will be converted to Missing when the data is queried back.
# Example
```julia
using DuckDB, DataFrames, Dates
db = DuckDB.DB()
# create a table
DBInterface.execute(db, "CREATE OR REPLACE TABLE data(id INT PRIMARY KEY, value FLOAT, timestamp TIMESTAMP, date DATE)")
# data to insert
len = 100
df = DataFrames.DataFrame(id=collect(1:len),
value=rand(len),
timestamp=Dates.now() + Dates.Second.(1:len),
date=Dates.today() + Dates.Day.(1:len))
# append data by row
appender = DuckDB.Appender(db, "data")
for i in eachrow(df)
for j in i
DuckDB.append(appender, j)
end
DuckDB.end_row(appender)
end
# flush the appender after all rows
DuckDB.flush(appender)
DuckDB.close(appender)
```
"""
mutable struct Appender
handle::duckdb_appender
function Appender(con::Connection, table::AbstractString, schema::Union{AbstractString, Nothing} = nothing)
handle = Ref{duckdb_appender}()
if duckdb_appender_create(con.handle, something(schema, C_NULL), table, handle) != DuckDBSuccess
error_ptr = duckdb_appender_error(handle)
if error_ptr == C_NULL
error_message = string("Opening of Appender for table \"", table, "\" failed: unknown error")
else
error_message = string(error_ptr)
end
duckdb_appender_destroy(handle)
throw(QueryException(error_message))
end
con = new(handle[])
finalizer(_close_appender, con)
return con
end
function Appender(db::DB, table::AbstractString, schema::Union{AbstractString, Nothing} = nothing)
return Appender(db.main_connection, table, schema)
end
end
function _close_appender(appender::Appender)
if appender.handle != C_NULL
duckdb_appender_destroy(appender.handle)
end
appender.handle = C_NULL
return
end
function close(appender::Appender)
_close_appender(appender)
return
end
append(appender::Appender, val::AbstractFloat) = duckdb_append_double(appender.handle, Float64(val));
append(appender::Appender, val::Bool) = duckdb_append_bool(appender.handle, val);
append(appender::Appender, val::Int8) = duckdb_append_int8(appender.handle, val);
append(appender::Appender, val::Int16) = duckdb_append_int16(appender.handle, val);
append(appender::Appender, val::Int32) = duckdb_append_int32(appender.handle, val);
append(appender::Appender, val::Int64) = duckdb_append_int64(appender.handle, val);
append(appender::Appender, val::Int128) = duckdb_append_hugeint(appender.handle, val);
append(appender::Appender, val::UInt128) = duckdb_append_uhugeint(appender.handle, val);
append(appender::Appender, val::UInt8) = duckdb_append_uint8(appender.handle, val);
append(appender::Appender, val::UInt16) = duckdb_append_uint16(appender.handle, val);
append(appender::Appender, val::UInt32) = duckdb_append_uint32(appender.handle, val);
append(appender::Appender, val::UInt64) = duckdb_append_uint64(appender.handle, val);
append(appender::Appender, val::Float32) = duckdb_append_float(appender.handle, val);
append(appender::Appender, val::Float64) = duckdb_append_double(appender.handle, val);
append(appender::Appender, ::Union{Missing, Nothing}) = duckdb_append_null(appender.handle);
append(appender::Appender, val::AbstractString) = duckdb_append_varchar(appender.handle, val);
append(appender::Appender, val::Base.UUID) = append(appender, string(val));
append(appender::Appender, val::Vector{UInt8}) = duckdb_append_blob(appender.handle, val, sizeof(val));
append(appender::Appender, val::FixedDecimal) = append(appender, string(val));
# append(appender::Appender, val::WeakRefString{UInt8}) = duckdb_append_varchar(stmt.handle, i, val.ptr, val.len);
append(appender::Appender, val::Date) =
duckdb_append_date(appender.handle, Dates.date2epochdays(val) - ROUNDING_EPOCH_TO_UNIX_EPOCH_DAYS);
# nanosecond to microseconds
append(appender::Appender, val::Time) = duckdb_append_time(appender.handle, Dates.value(val) ÷ 1000);
# milliseconds to microseconds
append(appender::Appender, val::DateTime) =
duckdb_append_timestamp(appender.handle, (Dates.datetime2epochms(val) - ROUNDING_EPOCH_TO_UNIX_EPOCH_MS) * 1000);
function append(appender::Appender, val::AbstractVector{T}) where {T}
value = create_value(val)
if length(val) == 0
duckdb_append_null(appender.handle)
else
duckdb_append_value(appender.handle, value.handle)
end
return
end
function append(appender::Appender, val::Any)
println(val)
throw(NotImplementedException("unsupported type for append"))
end
function end_row(appender::Appender)
duckdb_appender_end_row(appender.handle)
return
end
function flush(appender::Appender)
duckdb_appender_flush(appender.handle)
return
end
DBInterface.close!(appender::Appender) = _close_appender(appender)

View File

@@ -0,0 +1,48 @@
"""
Configuration object
"""
mutable struct Config
handle::duckdb_config
function Config(args...; kwargs...)
handle = Ref{duckdb_connection}()
duckdb_create_config(handle)
result = new(handle[])
finalizer(_destroy_config, result)
_fill_config!(result, args...; kwargs...)
return result
end
end
function _destroy_config(config::Config)
if config.handle != C_NULL
duckdb_destroy_config(config.handle)
end
config.handle = C_NULL
return
end
DBInterface.close!(config::Config) = _destroy_config(config)
function Base.setindex!(config::Config, option::AbstractString, name::AbstractString)
if duckdb_set_config(config.handle, name, option) != DuckDBSuccess
throw(QueryException(string("Unrecognized configuration option \"", name, "\"")))
end
end
@deprecate set_config(config::Config, name::AbstractString, option::AbstractString) setindex!(config, option, name)
_fill_config!(config, options::AbstractVector) =
for (name, option) in options
config[name] = option
end
_fill_config!(config, options::Union{NamedTuple, AbstractDict}) =
for (name, option) in pairs(options)
config[string(name)] = option
end
_fill_config!(config; kwargs...) = _fill_config!(config, NamedTuple(kwargs))

View File

@@ -0,0 +1,582 @@
const STRING_INLINE_LENGTH = 12 # length of the inline string in duckdb_string_t
const idx_t = UInt64 # DuckDB index type
const duckdb_aggregate_combine = Ptr{Cvoid}
const duckdb_aggregate_destroy = Ptr{Cvoid}
const duckdb_aggregate_finalize = Ptr{Cvoid}
const duckdb_aggregate_function = Ptr{Cvoid}
const duckdb_aggregate_function_set = Ptr{Cvoid}
const duckdb_aggregate_init = Ptr{Cvoid}
const duckdb_aggregate_state_size = Ptr{Cvoid}
const duckdb_aggregate_update = Ptr{Cvoid}
const duckdb_appender = Ptr{Cvoid}
const duckdb_arrow = Ptr{Cvoid}
const duckdb_arrow_array = Ptr{Cvoid}
const duckdb_arrow_schema = Ptr{Cvoid}
const duckdb_arrow_stream = Ptr{Cvoid}
const duckdb_bind_info = Ptr{Cvoid}
const duckdb_cast_function = Ptr{Cvoid}
const duckdb_cast_function_ptr = Ptr{Cvoid}
const duckdb_client_context = Ptr{Cvoid}
const duckdb_config = Ptr{Cvoid}
const duckdb_connection = Ptr{Cvoid}
const duckdb_create_type_info = Ptr{Cvoid}
const duckdb_data_chunk = Ptr{Cvoid}
const duckdb_database = Ptr{Cvoid}
const duckdb_delete_callback = Ptr{Cvoid}
const duckdb_extracted_statements = Ptr{Cvoid}
const duckdb_function_info = Ptr{Cvoid}
const duckdb_init_info = Ptr{Cvoid}
const duckdb_instance_cache = Ptr{Cvoid}
const duckdb_logical_type = Ptr{Cvoid}
const duckdb_pending_result = Ptr{Cvoid}
const duckdb_prepared_statement = Ptr{Cvoid}
const duckdb_profiling_info = Ptr{Cvoid}
const duckdb_replacement_callback = Ptr{Cvoid}
const duckdb_replacement_scan_info = Ptr{Cvoid}
const duckdb_scalar_function = Ptr{Cvoid}
const duckdb_scalar_function_bind = Ptr{Cvoid}
const duckdb_scalar_function_set = Ptr{Cvoid}
const duckdb_selection_vector = Ptr{Cvoid}
const duckdb_table_description = Ptr{Cvoid}
const duckdb_table_function = Ptr{Cvoid}
const duckdb_table_function_ptr = Ptr{Cvoid}
const duckdb_table_function_bind = Ptr{Cvoid}
const duckdb_table_function_init = Ptr{Cvoid}
const duckdb_task_state = Ptr{Cvoid}
const duckdb_value = Ptr{Cvoid}
const duckdb_vector = Ptr{Cvoid}
const duckdb_state = Cint;
const DuckDBSuccess = 0;
const DuckDBError = 1;
const duckdb_pending_state = Cint;
const DUCKDB_PENDING_RESULT_READY = 0;
const DUCKDB_PENDING_RESULT_NOT_READY = 1;
const DUCKDB_PENDING_ERROR = 2;
const DUCKDB_PENDING_NO_TASKS_AVAILABLE = 3;
@enum DUCKDB_RESULT_TYPE_::Cint begin
DUCKDB_RESULT_TYPE_INVALID = 0
DUCKDB_RESULT_TYPE_CHANGED_ROWS = 1
DUCKDB_RESULT_TYPE_NOTHING = 2
DUCKDB_RESULT_TYPE_QUERY_RESULT = 3
end
const duckdb_result_type = DUCKDB_RESULT_TYPE_;
@enum DUCKDB_STATEMENT_TYPE_::Cint begin
DUCKDB_STATEMENT_TYPE_INVALID = 0
DUCKDB_STATEMENT_TYPE_SELECT = 1
DUCKDB_STATEMENT_TYPE_INSERT = 2
DUCKDB_STATEMENT_TYPE_UPDATE = 3
DUCKDB_STATEMENT_TYPE_EXPLAIN = 4
DUCKDB_STATEMENT_TYPE_DELETE = 5
DUCKDB_STATEMENT_TYPE_PREPARE = 6
DUCKDB_STATEMENT_TYPE_CREATE = 7
DUCKDB_STATEMENT_TYPE_EXECUTE = 8
DUCKDB_STATEMENT_TYPE_ALTER = 9
DUCKDB_STATEMENT_TYPE_TRANSACTION = 10
DUCKDB_STATEMENT_TYPE_COPY = 11
DUCKDB_STATEMENT_TYPE_ANALYZE = 12
DUCKDB_STATEMENT_TYPE_VARIABLE_SET = 13
DUCKDB_STATEMENT_TYPE_CREATE_FUNC = 14
DUCKDB_STATEMENT_TYPE_DROP = 15
DUCKDB_STATEMENT_TYPE_EXPORT = 16
DUCKDB_STATEMENT_TYPE_PRAGMA = 17
DUCKDB_STATEMENT_TYPE_VACUUM = 18
DUCKDB_STATEMENT_TYPE_CALL = 19
DUCKDB_STATEMENT_TYPE_SET = 20
DUCKDB_STATEMENT_TYPE_LOAD = 21
DUCKDB_STATEMENT_TYPE_RELATION = 22
DUCKDB_STATEMENT_TYPE_EXTENSION = 23
DUCKDB_STATEMENT_TYPE_LOGICAL_PLAN = 24
DUCKDB_STATEMENT_TYPE_ATTACH = 25
DUCKDB_STATEMENT_TYPE_DETACH = 26
DUCKDB_STATEMENT_TYPE_MULTI = 27
end
const duckdb_statement_type = DUCKDB_STATEMENT_TYPE_
@enum DUCKDB_ERROR_TYPE_::Cint begin
DUCKDB_ERROR_INVALID = 0
DUCKDB_ERROR_OUT_OF_RANGE = 1
DUCKDB_ERROR_CONVERSION = 2
DUCKDB_ERROR_UNKNOWN_TYPE = 3
DUCKDB_ERROR_DECIMAL = 4
DUCKDB_ERROR_MISMATCH_TYPE = 5
DUCKDB_ERROR_DIVIDE_BY_ZERO = 6
DUCKDB_ERROR_OBJECT_SIZE = 7
DUCKDB_ERROR_INVALID_TYPE = 8
DUCKDB_ERROR_SERIALIZATION = 9
DUCKDB_ERROR_TRANSACTION = 10
DUCKDB_ERROR_NOT_IMPLEMENTED = 11
DUCKDB_ERROR_EXPRESSION = 12
DUCKDB_ERROR_CATALOG = 13
DUCKDB_ERROR_PARSER = 14
DUCKDB_ERROR_PLANNER = 15
DUCKDB_ERROR_SCHEDULER = 16
DUCKDB_ERROR_EXECUTOR = 17
DUCKDB_ERROR_CONSTRAINT = 18
DUCKDB_ERROR_INDEX = 19
DUCKDB_ERROR_STAT = 20
DUCKDB_ERROR_CONNECTION = 21
DUCKDB_ERROR_SYNTAX = 22
DUCKDB_ERROR_SETTINGS = 23
DUCKDB_ERROR_BINDER = 24
DUCKDB_ERROR_NETWORK = 25
DUCKDB_ERROR_OPTIMIZER = 26
DUCKDB_ERROR_NULL_POINTER = 27
DUCKDB_ERROR_IO = 28
DUCKDB_ERROR_INTERRUPT = 29
DUCKDB_ERROR_FATAL = 30
DUCKDB_ERROR_INTERNAL = 31
DUCKDB_ERROR_INVALID_INPUT = 32
DUCKDB_ERROR_OUT_OF_MEMORY = 33
DUCKDB_ERROR_PERMISSION = 34
DUCKDB_ERROR_PARAMETER_NOT_RESOLVED = 35
DUCKDB_ERROR_PARAMETER_NOT_ALLOWED = 36
DUCKDB_ERROR_DEPENDENCY = 37
DUCKDB_ERROR_HTTP = 38
DUCKDB_ERROR_MISSING_EXTENSION = 39
DUCKDB_ERROR_AUTOLOAD = 40
DUCKDB_ERROR_SEQUENCE = 41
DUCKDB_INVALID_CONFIGURATION = 42
end
const duckdb_error_type = DUCKDB_ERROR_TYPE_
@enum DUCKDB_CAST_MODE_::Cint begin
DUCKDB_CAST_NORMAL = 0
DUCKDB_CAST_TRY = 1
end
const duckdb_cast_mode = DUCKDB_CAST_MODE_
@enum DUCKDB_TYPE_::Cint begin
DUCKDB_TYPE_INVALID = 0
DUCKDB_TYPE_BOOLEAN = 1
DUCKDB_TYPE_TINYINT = 2
DUCKDB_TYPE_SMALLINT = 3
DUCKDB_TYPE_INTEGER = 4
DUCKDB_TYPE_BIGINT = 5
DUCKDB_TYPE_UTINYINT = 6
DUCKDB_TYPE_USMALLINT = 7
DUCKDB_TYPE_UINTEGER = 8
DUCKDB_TYPE_UBIGINT = 9
DUCKDB_TYPE_FLOAT = 10
DUCKDB_TYPE_DOUBLE = 11
DUCKDB_TYPE_TIMESTAMP = 12
DUCKDB_TYPE_DATE = 13
DUCKDB_TYPE_TIME = 14
DUCKDB_TYPE_INTERVAL = 15
DUCKDB_TYPE_HUGEINT = 16
DUCKDB_TYPE_UHUGEINT = 32
DUCKDB_TYPE_VARCHAR = 17
DUCKDB_TYPE_BLOB = 18
DUCKDB_TYPE_DECIMAL = 19
DUCKDB_TYPE_TIMESTAMP_S = 20
DUCKDB_TYPE_TIMESTAMP_MS = 21
DUCKDB_TYPE_TIMESTAMP_NS = 22
DUCKDB_TYPE_ENUM = 23
DUCKDB_TYPE_LIST = 24
DUCKDB_TYPE_STRUCT = 25
DUCKDB_TYPE_MAP = 26
DUCKDB_TYPE_UUID = 27
DUCKDB_TYPE_UNION = 28
DUCKDB_TYPE_BIT = 29
DUCKDB_TYPE_TIME_TZ = 30
DUCKDB_TYPE_TIMESTAMP_TZ = 31
DUCKDB_TYPE_ARRAY = 33
DUCKDB_TYPE_ANY = 34
DUCKDB_TYPE_BIGNUM = 35
DUCKDB_TYPE_SQLNULL = 36
DUCKDB_TYPE_STRING_LITERAL = 37
DUCKDB_TYPE_INTEGER_LITERAL = 38
end
const DUCKDB_TYPE = DUCKDB_TYPE_
"""
Days are stored as days since 1970-01-01\n
Use the duckdb_from_date/duckdb_to_date function to extract individual information
"""
struct duckdb_date
days::Int32
end
struct duckdb_date_struct
year::Int32
month::Int8
day::Int8
end
"""
Time is stored as microseconds since 00:00:00\n
Use the duckdb_from_time/duckdb_to_time function to extract individual information
"""
struct duckdb_time
micros::Int64
end
struct duckdb_time_struct
hour::Int8
min::Int8
sec::Int8
micros::Int32
end
struct duckdb_time_tz
bits::UInt64
end
struct duckdb_time_tz_struct
time::duckdb_time_struct
offset::Int32
end
"""
Timestamps are stored as microseconds since 1970-01-01\n
Use the duckdb_from_timestamp/duckdb_to_timestamp function to extract individual information
"""
struct duckdb_timestamp
micros::Int64
end
struct duckdb_timestamp_s
seconds::Int64
end
struct duckdb_timestamp_ms
millis::Int64
end
struct duckdb_timestamp_ns
nanos::Int64
end
struct duckdb_timestamp_struct
date::duckdb_date_struct
time::duckdb_time_struct
end
struct duckdb_interval
months::Int32
days::Int32
micros::Int64
end
"""
Hugeints are composed in a (lower, upper) component\n
The value of the hugeint is upper * 2^64 + lower\n
For easy usage, the functions duckdb_hugeint_to_double/duckdb_double_to_hugeint are recommended
"""
struct duckdb_hugeint
lower::UInt64
upper::Int64
end
struct duckdb_uhugeint
lower::UInt64
upper::UInt64
end
"""
Decimals are composed of a width and a scale, and are stored in a hugeint
"""
struct duckdb_decimal
width::UInt8
scale::UInt8
value::duckdb_hugeint
end
struct duckdb_string_t
length::UInt32
data::NTuple{STRING_INLINE_LENGTH, UInt8}
end
struct duckdb_string_t_ptr
length::UInt32
prefix::NTuple{4, UInt8} # 4 bytes prefix
data::Cstring
end
struct duckdb_list_entry_t
offset::UInt64
length::UInt64
end
struct duckdb_query_progress_type
percentage::Float64
rows_processed::UInt64
total_rows_to_process::UInt64
end
struct duckdb_bignum
data::Ptr{UInt8}
size::idx_t
is_negative::Bool
end
struct duckdb_column
__deprecated_data::Ptr{Cvoid}
__deprecated_nullmask::Ptr{UInt8}
__deprecated_type::Ptr{DUCKDB_TYPE}
__deprecated_name::Ptr{UInt8}
internal_data::Ptr{Cvoid}
end
struct duckdb_result
__deprecated_column_count::Ptr{UInt64}
__deprecated_row_count::Ptr{UInt64}
__deprecated_rows_changed::Ptr{UInt64}
__deprecated_columns::Ptr{duckdb_column}
__deprecated_error_message::Ptr{UInt8}
internal_data::Ptr{Cvoid}
end
INTERNAL_TYPE_MAP = Dict(
DUCKDB_TYPE_BOOLEAN => Bool,
DUCKDB_TYPE_TINYINT => Int8,
DUCKDB_TYPE_SMALLINT => Int16,
DUCKDB_TYPE_INTEGER => Int32,
DUCKDB_TYPE_BIGINT => Int64,
DUCKDB_TYPE_UTINYINT => UInt8,
DUCKDB_TYPE_USMALLINT => UInt16,
DUCKDB_TYPE_UINTEGER => UInt32,
DUCKDB_TYPE_UBIGINT => UInt64,
DUCKDB_TYPE_FLOAT => Float32,
DUCKDB_TYPE_DOUBLE => Float64,
DUCKDB_TYPE_TIMESTAMP => duckdb_timestamp,
DUCKDB_TYPE_TIMESTAMP_S => duckdb_timestamp_s,
DUCKDB_TYPE_TIMESTAMP_MS => duckdb_timestamp_ms,
DUCKDB_TYPE_TIMESTAMP_NS => duckdb_timestamp_ns,
DUCKDB_TYPE_TIMESTAMP_TZ => duckdb_timestamp,
DUCKDB_TYPE_DATE => duckdb_date,
DUCKDB_TYPE_TIME => duckdb_time,
DUCKDB_TYPE_TIME_TZ => duckdb_time_tz,
DUCKDB_TYPE_INTERVAL => duckdb_interval,
DUCKDB_TYPE_HUGEINT => duckdb_hugeint,
DUCKDB_TYPE_UHUGEINT => duckdb_uhugeint,
DUCKDB_TYPE_UUID => duckdb_hugeint,
DUCKDB_TYPE_VARCHAR => duckdb_string_t,
DUCKDB_TYPE_BLOB => duckdb_string_t,
DUCKDB_TYPE_BIT => duckdb_string_t,
DUCKDB_TYPE_UUID => duckdb_hugeint,
DUCKDB_TYPE_LIST => duckdb_list_entry_t,
DUCKDB_TYPE_STRUCT => Cvoid,
DUCKDB_TYPE_MAP => duckdb_list_entry_t,
DUCKDB_TYPE_UNION => Cvoid
)
JULIA_TYPE_MAP = Dict(
DUCKDB_TYPE_INVALID => Missing,
DUCKDB_TYPE_BOOLEAN => Bool,
DUCKDB_TYPE_TINYINT => Int8,
DUCKDB_TYPE_SMALLINT => Int16,
DUCKDB_TYPE_INTEGER => Int32,
DUCKDB_TYPE_BIGINT => Int64,
DUCKDB_TYPE_HUGEINT => Int128,
DUCKDB_TYPE_UHUGEINT => UInt128,
DUCKDB_TYPE_UTINYINT => UInt8,
DUCKDB_TYPE_USMALLINT => UInt16,
DUCKDB_TYPE_UINTEGER => UInt32,
DUCKDB_TYPE_UBIGINT => UInt64,
DUCKDB_TYPE_FLOAT => Float32,
DUCKDB_TYPE_DOUBLE => Float64,
DUCKDB_TYPE_DATE => Date,
DUCKDB_TYPE_TIME => Time,
DUCKDB_TYPE_TIME_TZ => Time,
DUCKDB_TYPE_TIMESTAMP => DateTime,
DUCKDB_TYPE_TIMESTAMP_TZ => DateTime,
DUCKDB_TYPE_TIMESTAMP_S => DateTime,
DUCKDB_TYPE_TIMESTAMP_MS => DateTime,
DUCKDB_TYPE_TIMESTAMP_NS => DateTime,
DUCKDB_TYPE_INTERVAL => Dates.CompoundPeriod,
DUCKDB_TYPE_UUID => UUID,
DUCKDB_TYPE_VARCHAR => String,
DUCKDB_TYPE_ENUM => String,
DUCKDB_TYPE_BLOB => Base.CodeUnits{UInt8, String},
DUCKDB_TYPE_BIT => Base.CodeUnits{UInt8, String},
DUCKDB_TYPE_MAP => Dict
)
# convert a DuckDB type into Julia equivalent
function duckdb_type_to_internal_type(x::DUCKDB_TYPE)
if !haskey(INTERNAL_TYPE_MAP, x)
throw(NotImplementedException(string("Unsupported type for duckdb_type_to_internal_type: ", x)))
end
return INTERNAL_TYPE_MAP[x]
end
function duckdb_type_to_julia_type(x)
type_id = get_type_id(x)
if type_id == DUCKDB_TYPE_DECIMAL
internal_type_id = get_internal_type_id(x)
scale = get_decimal_scale(x)
if internal_type_id == DUCKDB_TYPE_SMALLINT
return FixedDecimal{Int16, scale}
elseif internal_type_id == DUCKDB_TYPE_INTEGER
return FixedDecimal{Int32, scale}
elseif internal_type_id == DUCKDB_TYPE_BIGINT
return FixedDecimal{Int64, scale}
elseif internal_type_id == DUCKDB_TYPE_HUGEINT
return FixedDecimal{Int128, scale}
else
throw(NotImplementedException("Unimplemented internal type for decimal"))
end
elseif type_id == DUCKDB_TYPE_LIST
return Vector{Union{Missing, duckdb_type_to_julia_type(get_list_child_type(x))}}
elseif type_id == DUCKDB_TYPE_STRUCT
child_count = get_struct_child_count(x)
struct_names::Vector{Symbol} = Vector()
for i in 1:child_count
child_name::Symbol = Symbol(get_struct_child_name(x, i))
push!(struct_names, child_name)
end
struct_names_tuple = Tuple(x for x in struct_names)
return Union{Missing, NamedTuple{struct_names_tuple}}
elseif type_id == DUCKDB_TYPE_UNION
member_count = get_union_member_count(x)
member_types::Vector{DataType} = Vector()
for i in 1:member_count
member_type::DataType = duckdb_type_to_julia_type(get_union_member_type(x, i))
push!(member_types, member_type)
end
return Union{Missing, member_types...}
end
if !haskey(JULIA_TYPE_MAP, type_id)
throw(NotImplementedException(string("Unsupported type for duckdb_type_to_julia_type: ", type_id)))
end
return JULIA_TYPE_MAP[type_id]
end
const ROUNDING_EPOCH_TO_UNIX_EPOCH_DAYS = 719528
const ROUNDING_EPOCH_TO_UNIX_EPOCH_MS = 62167219200000
sym(ptr) = ccall(:jl_symbol, Ref{Symbol}, (Ptr{UInt8},), ptr)
sym(ptr::Cstring) = ccall(:jl_symbol, Ref{Symbol}, (Cstring,), ptr)
# %% --- Older Types ------------------------------------------ #
struct duckdb_string
data::Ptr{UInt8}
length::idx_t
function duckdb_string(data, length)
Base.depwarn("duckdb_string is deprecated, use duckdb_string_t instead", :deprecated)
return new(data, length)
end
end
"""
BLOBs are composed of a byte pointer and a size. You must free blob.data
with `duckdb_free`.
"""
struct duckdb_blob
data::Ref{UInt8}
length::idx_t
end
"""
BITs are composed of a byte pointer and a size.
BIT byte data has 0 to 7 bits of padding.
The first byte contains the number of padding bits.
This number of bits of the second byte are set to 1, starting from the MSB.
You must free `data` with `duckdb_free`.
"""
struct duckdb_bit
data::Ref{UInt8}
size::idx_t
end
Base.convert(::Type{duckdb_blob}, val::AbstractArray{UInt8}) = duckdb_blob(val, length(val))
Base.convert(::Type{duckdb_blob}, val::AbstractString) = duckdb_blob(codeunits(val))
# %% ----- Conversions ------------------------------
# HUGEINT / INT128
# Fast Conversion without typechecking
Base.convert(::Type{Int128}, val::duckdb_hugeint) = Int128(val.lower) + Int128(val.upper) << 64
Base.convert(::Type{UInt128}, val::duckdb_uhugeint) = UInt128(val.lower) + UInt128(val.upper) << 64
Base.cconvert(::Type{duckdb_hugeint}, x::Int128) =
duckdb_hugeint((x & 0xFFFF_FFFF_FFFF_FFFF) % UInt64, (x >> 64) % Int64)
Base.cconvert(::Type{duckdb_uhugeint}, v::UInt128) = duckdb_uhugeint(v % UInt64, (v >> 64) % UInt64)
# DATE & TIME Raw
Base.convert(::Type{duckdb_date}, val::Integer) = duckdb_date(val)
Base.convert(::Type{duckdb_time}, val::Integer) = duckdb_time(val)
Base.convert(::Type{duckdb_timestamp}, val::Integer) = duckdb_timestamp(val)
Base.convert(::Type{duckdb_timestamp_s}, val::Integer) = duckdb_timestamp_s(val)
Base.convert(::Type{duckdb_timestamp_ms}, val::Integer) = duckdb_timestamp_ms(val)
Base.convert(::Type{duckdb_timestamp_ns}, val::Integer) = duckdb_timestamp_ns(val)
Base.convert(::Type{duckdb_time_tz}, val::Integer) = duckdb_time_tz(val)
Base.convert(::Type{<:Integer}, val::duckdb_date) = val.days
Base.convert(::Type{<:Integer}, val::duckdb_time) = val.micros
Base.convert(::Type{<:Integer}, val::duckdb_timestamp) = val.micros
Base.convert(::Type{<:Integer}, val::duckdb_timestamp_s) = val.seconds
Base.convert(::Type{<:Integer}, val::duckdb_timestamp_ms) = val.millis
Base.convert(::Type{<:Integer}, val::duckdb_timestamp_ns) = val.nanos
function Base.convert(::Type{Date}, val::duckdb_date)
return Dates.epochdays2date(val.days + ROUNDING_EPOCH_TO_UNIX_EPOCH_DAYS)
end
function Base.convert(::Type{duckdb_date}, val::Date)
return duckdb_date(Dates.date2epochdays(val - ROUNDING_EPOCH_TO_UNIX_EPOCH_DAYS))
end
function Base.convert(::Type{Time}, val::duckdb_time)
return Dates.Time(
val.micros ÷ 3_600_000_000,
val.micros ÷ 60_000_000 % 60,
val.micros ÷ 1_000_000 % 60,
val.micros ÷ 1_000 % 1_000,
val.micros % 1_000
)
end
function Base.convert(::Type{Time}, val::duckdb_time_tz)
time_tz = duckdb_from_time_tz(val)
# TODO: how to preserve the offset?
return Dates.Time(
time_tz.time.hour,
time_tz.time.min,
time_tz.time.sec,
time_tz.time.micros ÷ 1000,
time_tz.time.micros % 1000
)
end
Base.convert(::Type{Dates.DateTime}, val::duckdb_timestamp_s) =
Dates.epochms2datetime((val.seconds * 1000) + ROUNDING_EPOCH_TO_UNIX_EPOCH_MS)
Base.convert(::Type{Dates.DateTime}, val::duckdb_timestamp_ms) =
Dates.epochms2datetime((val.millis) + ROUNDING_EPOCH_TO_UNIX_EPOCH_MS)
Base.convert(::Type{Dates.DateTime}, val::duckdb_timestamp) =
Dates.epochms2datetime((val.micros ÷ 1_000) + ROUNDING_EPOCH_TO_UNIX_EPOCH_MS)
Base.convert(::Type{Dates.DateTime}, val::duckdb_timestamp_ns) =
Dates.epochms2datetime((val.nanos ÷ 1_000_000) + ROUNDING_EPOCH_TO_UNIX_EPOCH_MS)
Base.convert(::Type{Dates.CompoundPeriod}, val::duckdb_interval) =
Dates.CompoundPeriod(Dates.Month(val.months), Dates.Day(val.days), Dates.Microsecond(val.micros))
function Base.convert(::Type{UUID}, val::duckdb_hugeint)
hugeint = convert(Int128, val)
base_value = Int128(170141183460469231731687303715884105727)
if hugeint < 0
return UUID(UInt128(hugeint + base_value + 1))
else
return UUID(UInt128(hugeint) + base_value + 1)
end
end
# DECIMALS
Base.convert(::Type{Float64}, val::duckdb_decimal) = duckdb_decimal_to_double(val)
Base.convert(::Type{duckdb_decimal}, val::Float64) = duckdb_double_to_decimal(val)

View File

@@ -0,0 +1,66 @@
"""
DuckDB data chunk
"""
mutable struct DataChunk
handle::duckdb_data_chunk
function DataChunk(handle::duckdb_data_chunk, destroy::Bool)
result = new(handle)
if destroy
finalizer(_destroy_data_chunk, result)
end
return result
end
end
function get_column_count(chunk::DataChunk)
return duckdb_data_chunk_get_column_count(chunk.handle)
end
function get_size(chunk::DataChunk)
return duckdb_data_chunk_get_size(chunk.handle)
end
function set_size(chunk::DataChunk, size::Int64)
return duckdb_data_chunk_set_size(chunk.handle, size)
end
function get_vector(chunk::DataChunk, col_idx::Int64)::Vec
if col_idx < 1 || col_idx > get_column_count(chunk)
throw(
InvalidInputException(
string(
"get_array column index ",
col_idx,
" out of range, expected value between 1 and ",
get_column_count(chunk)
)
)
)
end
return Vec(duckdb_data_chunk_get_vector(chunk.handle, col_idx))
end
function get_array(chunk::DataChunk, col_idx::Int64, ::Type{T})::Vector{T} where {T}
return get_array(get_vector(chunk, col_idx), T)
end
function get_validity(chunk::DataChunk, col_idx::Int64)::ValidityMask
return get_validity(get_vector(chunk, col_idx))
end
function all_valid(chunk::DataChunk, col_idx::Int64)
return all_valid(get_vector(chunk, col_idx), get_size(chunk))
end
# this is only required when we own the data chunk
function _destroy_data_chunk(chunk::DataChunk)
if chunk.handle != C_NULL
duckdb_destroy_data_chunk(chunk.handle)
end
return chunk.handle = C_NULL
end
function destroy_data_chunk(chunk::DataChunk)
return _destroy_data_chunk(chunk)
end

View File

@@ -0,0 +1,122 @@
"""
Internal DuckDB database handle.
"""
mutable struct DuckDBHandle
file::String
handle::duckdb_database
functions::Vector{Any}
scalar_functions::Dict{String, Any}
registered_objects::Dict{Any, Any}
function DuckDBHandle(f::AbstractString, config::Config)
f = String(isempty(f) ? f : expanduser(f))
handle = Ref{duckdb_database}()
error = Ref{Cstring}()
if duckdb_open_ext(f, handle, config.handle, error) != DuckDBSuccess
error_message = unsafe_string(error[])
duckdb_free(pointer(error[]))
throw(ConnectionException(error_message))
end
db = new(f, handle[], Vector(), Dict(), Dict())
finalizer(_close_database, db)
return db
end
end
function _close_database(db::DuckDBHandle)
# disconnect from DB
if db.handle != C_NULL
duckdb_close(db.handle)
end
return db.handle = C_NULL
end
"""
A connection object to a DuckDB database.
Transaction contexts are local to a single connection.
A connection can only run a single query concurrently.
It is possible to open multiple connections to a single DuckDB database instance.
Multiple connections can run multiple queries concurrently.
"""
mutable struct Connection <: DBInterface.Connection
db::DuckDBHandle
handle::duckdb_connection
function Connection(db::DuckDBHandle)
handle = Ref{duckdb_connection}()
if duckdb_connect(db.handle, handle) != DuckDBSuccess
throw(ConnectionException("Failed to open connection"))
end
con = new(db, handle[])
finalizer(_close_connection, con)
return con
end
end
function _close_connection(con::Connection)
# disconnect
if con.handle != C_NULL
duckdb_disconnect(con.handle)
end
con.handle = C_NULL
return
end
"""
A DuckDB database object.
By default a DuckDB database object has an open connection object (db.main_connection).
When the database object is used directly in queries, it is actually the underlying main_connection that is used.
It is possible to open new connections to a single database instance using DBInterface.connect(db).
"""
mutable struct DB <: DBInterface.Connection
handle::DuckDBHandle
main_connection::Connection
function DB(f::AbstractString, config::Config)
config["threads"] = string(Threads.nthreads())
config["external_threads"] = string(Threads.nthreads()) # all threads are external
handle = DuckDBHandle(f, config)
main_connection = Connection(handle)
db = new(handle, main_connection)
_add_table_scan(db)
return db
end
function DB(f::AbstractString; config = [], readonly = false)
config = Config(config)
if readonly
config["access_mode"] = "READ_ONLY"
end
return DB(f, config)
end
end
function close_database(db::DB)
_close_connection(db.main_connection)
_close_database(db.handle)
return
end
const VECTOR_SIZE = duckdb_vector_size()
const ROW_GROUP_SIZE = VECTOR_SIZE * 100
DB(; kwargs...) = DB(":memory:"; kwargs...)
DBInterface.connect(::Type{DB}; kwargs...) = DB(; kwargs...)
DBInterface.connect(::Type{DB}, f::AbstractString; kwargs...) = DB(f; kwargs...)
DBInterface.connect(::Type{DB}, f::AbstractString, config::Config) = DB(f, config)
DBInterface.connect(db::DB) = Connection(db.handle)
DBInterface.close!(db::DB) = close_database(db)
DBInterface.close!(con::Connection) = _close_connection(con)
Base.close(db::DB) = close_database(db)
Base.close(con::Connection) = _close_connection(con)
Base.isopen(db::DB) = db.handle.handle != C_NULL
Base.isopen(con::Connection) = con.handle != C_NULL
Base.show(io::IO, db::DuckDB.DB) = print(io, string("DuckDB.DB(", "\"$(db.handle.file)\"", ")"))
Base.show(io::IO, con::DuckDB.Connection) = print(io, string("DuckDB.Connection(", "\"$(con.db.file)\"", ")"))

View File

@@ -0,0 +1,5 @@
function drop!(db::DB, table::AbstractString; ifexists::Bool = false)
exists = ifexists ? "IF EXISTS" : ""
return execute(db, "DROP TABLE $exists $(esc_id(table))")
end

View File

@@ -0,0 +1,17 @@
mutable struct ConnectionException <: Exception
var::String
end
mutable struct QueryException <: Exception
var::String
end
mutable struct NotImplementedException <: Exception
var::String
end
mutable struct InvalidInputException <: Exception
var::String
end
Base.showerror(io::IO, e::ConnectionException) = print(io, e.var)
Base.showerror(io::IO, e::QueryException) = print(io, e.var)
Base.showerror(io::IO, e::NotImplementedException) = print(io, e.var)
Base.showerror(io::IO, e::InvalidInputException) = print(io, e.var)

View File

@@ -0,0 +1,5 @@
function esc_id end
esc_id(x::AbstractString) = "\"" * replace(x, "\"" => "\"\"") * "\""
esc_id(X::AbstractVector{S}) where {S <: AbstractString} = join(map(esc_id, X), ',')

View File

@@ -0,0 +1,139 @@
"""
DuckDB type
"""
mutable struct LogicalType
handle::duckdb_logical_type
function LogicalType(type::DUCKDB_TYPE)
handle = duckdb_create_logical_type(type)
result = new(handle)
finalizer(_destroy_type, result)
return result
end
function LogicalType(handle::duckdb_logical_type)
result = new(handle)
finalizer(_destroy_type, result)
return result
end
end
function _destroy_type(type::LogicalType)
if type.handle != C_NULL
duckdb_destroy_logical_type(type.handle)
end
type.handle = C_NULL
return
end
create_logical_type(::Type{T}) where {T <: String} = DuckDB.LogicalType(DuckDB.DUCKDB_TYPE_VARCHAR)
create_logical_type(::Type{T}) where {T <: Bool} = DuckDB.LogicalType(DuckDB.DUCKDB_TYPE_BOOLEAN)
create_logical_type(::Type{T}) where {T <: Int8} = DuckDB.LogicalType(DuckDB.DUCKDB_TYPE_TINYINT)
create_logical_type(::Type{T}) where {T <: Int16} = DuckDB.LogicalType(DuckDB.DUCKDB_TYPE_SMALLINT)
create_logical_type(::Type{T}) where {T <: Int32} = DuckDB.LogicalType(DuckDB.DUCKDB_TYPE_INTEGER)
create_logical_type(::Type{T}) where {T <: Int64} = DuckDB.LogicalType(DuckDB.DUCKDB_TYPE_BIGINT)
create_logical_type(::Type{T}) where {T <: Int128} = DuckDB.LogicalType(DuckDB.DUCKDB_TYPE_HUGEINT)
create_logical_type(::Type{T}) where {T <: UInt8} = DuckDB.LogicalType(DuckDB.DUCKDB_TYPE_UTINYINT)
create_logical_type(::Type{T}) where {T <: UInt16} = DuckDB.LogicalType(DuckDB.DUCKDB_TYPE_USMALLINT)
create_logical_type(::Type{T}) where {T <: UInt32} = DuckDB.LogicalType(DuckDB.DUCKDB_TYPE_UINTEGER)
create_logical_type(::Type{T}) where {T <: UInt64} = DuckDB.LogicalType(DuckDB.DUCKDB_TYPE_UBIGINT)
create_logical_type(::Type{T}) where {T <: UInt128} = DuckDB.LogicalType(DuckDB.DUCKDB_TYPE_UHUGEINT)
create_logical_type(::Type{T}) where {T <: Float32} = DuckDB.LogicalType(DuckDB.DUCKDB_TYPE_FLOAT)
create_logical_type(::Type{T}) where {T <: Float64} = DuckDB.LogicalType(DuckDB.DUCKDB_TYPE_DOUBLE)
create_logical_type(::Type{T}) where {T <: Date} = DuckDB.LogicalType(DuckDB.DUCKDB_TYPE_DATE)
create_logical_type(::Type{T}) where {T <: Time} = DuckDB.LogicalType(DuckDB.DUCKDB_TYPE_TIME)
create_logical_type(::Type{T}) where {T <: DateTime} = DuckDB.LogicalType(DuckDB.DUCKDB_TYPE_TIMESTAMP)
create_logical_type(::Type{T}) where {T <: AbstractString} = DuckDB.LogicalType(DuckDB.DUCKDB_TYPE_VARCHAR)
function create_logical_type(::Type{T}) where {T <: FixedDecimal}
int_type = T.parameters[1]
width = 0
scale = T.parameters[2]
if int_type == Int16
width = 4
elseif int_type == Int32
width = 9
elseif int_type == Int64
width = 18
elseif int_type == Int128
width = 38
else
throw(NotImplementedException("Unsupported internal type for decimal"))
end
return DuckDB.LogicalType(duckdb_create_decimal_type(width, scale))
end
function create_logical_type(::Type{T}) where {T}
throw(NotImplementedException("Unsupported type for create_logical_type"))
end
function get_type_id(type::LogicalType)
return duckdb_get_type_id(type.handle)
end
function get_internal_type_id(type::LogicalType)
type_id = get_type_id(type)
if type_id == DUCKDB_TYPE_DECIMAL
type_id = duckdb_decimal_internal_type(type.handle)
elseif type_id == DUCKDB_TYPE_ENUM
type_id = duckdb_enum_internal_type(type.handle)
end
return type_id
end
function get_decimal_scale(type::LogicalType)
return duckdb_decimal_scale(type.handle)
end
function get_enum_dictionary(type::LogicalType)
dict::Vector{String} = Vector{String}()
dict_size = duckdb_enum_dictionary_size(type.handle)
for i in 1:dict_size
val = duckdb_enum_dictionary_value(type.handle, i)
str_val = String(unsafe_string(val))
push!(dict, str_val)
duckdb_free(val)
end
return dict
end
function get_list_child_type(type::LogicalType)
return LogicalType(duckdb_list_type_child_type(type.handle))
end
##===--------------------------------------------------------------------===##
## Struct methods
##===--------------------------------------------------------------------===##
function get_struct_child_count(type::LogicalType)
return duckdb_struct_type_child_count(type.handle)
end
function get_struct_child_name(type::LogicalType, index::UInt64)
val = duckdb_struct_type_child_name(type.handle, index)
result = unsafe_string(val)
duckdb_free(val)
return result
end
function get_struct_child_type(type::LogicalType, index::UInt64)
return LogicalType(duckdb_struct_type_child_type(type.handle, index))
end
##===--------------------------------------------------------------------===##
## Union methods
##===--------------------------------------------------------------------===##
function get_union_member_count(type::LogicalType)
return duckdb_union_type_member_count(type.handle)
end
function get_union_member_name(type::LogicalType, index::UInt64)
val = duckdb_union_type_member_name(type.handle, index)
result = unsafe_string(val)
duckdb_free(val)
return result
end
function get_union_member_type(type::LogicalType, index::UInt64)
return LogicalType(duckdb_union_type_member_type(type.handle, index))
end

View File

@@ -0,0 +1,33 @@
# old interface, deprecated
open(dbpath::AbstractString) = DBInterface.connect(DuckDB.DB, dbpath)
connect(db::DB) = DBInterface.connect(db)
disconnect(con::Connection) = DBInterface.close!(con)
close(db::DB) = DBInterface.close!(db)
# not really a dataframe anymore
# if needed for backwards compatibility, can add through Requires/1.9 extension
toDataFrame(r::QueryResult) = Tables.columntable(r)
toDataFrame(con::Connection, sql::AbstractString) = toDataFrame(DBInterface.execute(con, sql))
function appendDataFrame(input_df, con::Connection, table::AbstractString, schema::String = "main")
register_data_frame(con, input_df, "__append_df")
DBInterface.execute(con, "INSERT INTO \"$schema\".\"$table\" SELECT * FROM __append_df")
return unregister_data_frame(con, "__append_df")
end
appendDataFrame(input_df, db::DB, table::AbstractString, schema::String = "main") =
appendDataFrame(input_df, db.main_connection, table, schema)
"""
DuckDB.load!(con, input_df, table)
Load an input DataFrame `input_df` into a new DuckDB table that will be named `table`.
"""
function load!(con, input_df, table::AbstractString, schema::String = "main")
register_data_frame(con, input_df, "__append_df")
DBInterface.execute(con, "CREATE TABLE \"$schema\".\"$table\" AS SELECT * FROM __append_df")
unregister_data_frame(con, "__append_df")
return
end

View File

@@ -0,0 +1,72 @@
mutable struct ReplacementFunction
db::DB
replacement_func::Function
extra_data::Any
uuid::UUID
end
struct ReplacementFunctionInfo
handle::duckdb_replacement_scan_info
main_function::ReplacementFunction
table_name::String
function ReplacementFunctionInfo(
handle::duckdb_replacement_scan_info,
main_function::ReplacementFunction,
table_name::String
)
result = new(handle, main_function, table_name)
return result
end
end
function _replacement_scan_function(handle::duckdb_replacement_scan_info, table_name::Ptr{UInt8}, data::Ptr{Cvoid})
try
func::ReplacementFunction = unsafe_pointer_to_objref(data)
tname = unsafe_string(table_name)
info = ReplacementFunctionInfo(handle, func, tname)
func.replacement_func(info)
catch
duckdb_replacement_scan_set_error(handle, get_exception_info())
return
end
end
function getdb(info::ReplacementFunctionInfo)
return info.main_function.db
end
function get_extra_data(info::ReplacementFunctionInfo)
return info.main_function.extra_data
end
function get_table_name(info::ReplacementFunctionInfo)
return info.table_name
end
function set_function_name(info::ReplacementFunctionInfo, function_name::String)
return duckdb_replacement_scan_set_function_name(info.handle, function_name)
end
function add_function_parameter(info::ReplacementFunctionInfo, parameter::Value)
return duckdb_replacement_scan_add_parameter(info.handle, parameter.handle)
end
function _replacement_func_cleanup(data::Ptr{Cvoid})
info::ReplacementFunction = unsafe_pointer_to_objref(data)
delete!(info.db.handle.registered_objects, info.uuid)
return
end
function add_replacement_scan!(db::DB, replacement_func::Function, extra_data::Any)
func = ReplacementFunction(db, replacement_func, extra_data, uuid4())
db.handle.registered_objects[func.uuid] = func
return duckdb_add_replacement_scan(
db.handle.handle,
@cfunction(_replacement_scan_function, Cvoid, (duckdb_replacement_scan_info, Ptr{UInt8}, Ptr{Cvoid})),
pointer_from_objref(func),
@cfunction(_replacement_func_cleanup, Cvoid, (Ptr{Cvoid},))
)
end

View File

@@ -0,0 +1,903 @@
import Base.Threads.@spawn
mutable struct QueryResult
handle::Ref{duckdb_result}
names::Vector{Symbol}
types::Vector{Type}
tbl::Union{Missing, NamedTuple}
chunk_index::UInt64
function QueryResult(handle::Ref{duckdb_result})
column_count = duckdb_column_count(handle)
names::Vector{Symbol} = Vector()
for i in 1:column_count
name = sym(duckdb_column_name(handle, i))
if name in view(names, 1:(i - 1))
j = 1
new_name = Symbol(name, :_, j)
while new_name in view(names, 1:(i - 1))
j += 1
new_name = Symbol(name, :_, j)
end
name = new_name
end
push!(names, name)
end
types::Vector{Type} = Vector()
for i in 1:column_count
logical_type = LogicalType(duckdb_column_logical_type(handle, i))
push!(types, Union{Missing, duckdb_type_to_julia_type(logical_type)})
end
result = new(handle, names, types, missing, 1)
finalizer(_close_result, result)
return result
end
end
function _close_result(result::QueryResult)
duckdb_destroy_result(result.handle)
return
end
const DataChunks = Union{Vector{DataChunk}, Tuple{DataChunk}}
mutable struct ColumnConversionData{ChunksT <: DataChunks}
chunks::ChunksT
col_idx::Int64
logical_type::LogicalType
conversion_data::Any
end
mutable struct ListConversionData
conversion_func::Function
conversion_loop_func::Function
child_type::LogicalType
internal_type::Type
target_type::Type
child_conversion_data::Any
end
mutable struct StructConversionData
tuple_type::Any
child_conversion_data::Vector{ListConversionData}
end
nop_convert(column_data::ColumnConversionData, val) = val
function convert_string(column_data::ColumnConversionData, val::Ptr{Cvoid}, idx::UInt64)
base_ptr = val + (idx - 1) * sizeof(duckdb_string_t)
length_ptr = Base.unsafe_convert(Ptr{Int32}, base_ptr)
length = unsafe_load(length_ptr)
if length <= STRING_INLINE_LENGTH
prefix_ptr = Base.unsafe_convert(Ptr{UInt8}, base_ptr + sizeof(Int32))
return unsafe_string(prefix_ptr, length)
else
ptr_ptr = Base.unsafe_convert(Ptr{Ptr{UInt8}}, base_ptr + sizeof(Int32) * 2)
data_ptr = Base.unsafe_load(ptr_ptr)
return unsafe_string(data_ptr, length)
end
end
function convert_blob(column_data::ColumnConversionData, val::Ptr{Cvoid}, idx::UInt64)::Base.CodeUnits{UInt8, String}
return Base.codeunits(convert_string(column_data, val, idx))
end
convert_date(column_data::ColumnConversionData, val) = convert(Date, val)
convert_time(column_data::ColumnConversionData, val) = convert(Time, val)
convert_time_tz(column_data::ColumnConversionData, val) = convert(Time, convert(duckdb_time_tz, val))
convert_timestamp(column_data::ColumnConversionData, val) = convert(DateTime, convert(duckdb_timestamp, val))
convert_timestamp_s(column_data::ColumnConversionData, val) = convert(DateTime, convert(duckdb_timestamp_s, val))
convert_timestamp_ms(column_data::ColumnConversionData, val) = convert(DateTime, convert(duckdb_timestamp_ms, val))
convert_timestamp_ns(column_data::ColumnConversionData, val) = convert(DateTime, convert(duckdb_timestamp_ns, val))
convert_interval(column_data::ColumnConversionData, val::duckdb_interval) = convert(Dates.CompoundPeriod, val)
convert_hugeint(column_data::ColumnConversionData, val::duckdb_hugeint) = convert(Int128, val)
convert_uhugeint(column_data::ColumnConversionData, val::duckdb_uhugeint) = convert(UInt128, val)
convert_uuid(column_data::ColumnConversionData, val::duckdb_hugeint) = convert(UUID, val)
function convert_enum(column_data::ColumnConversionData, val)::String
return column_data.conversion_data[val + 1]
end
function convert_decimal_hugeint(column_data::ColumnConversionData, val::duckdb_hugeint)
return Base.reinterpret(column_data.conversion_data, convert_hugeint(column_data, val))
end
function convert_decimal(column_data::ColumnConversionData, val)
return Base.reinterpret(column_data.conversion_data, val)
end
function convert_vector(
column_data::ColumnConversionData,
vector::Vec,
size::UInt64,
convert_func::Function,
result,
position,
all_valid,
::Type{SRC},
::Type{DST}
) where {SRC, DST}
array = get_array(vector, SRC, size)
if !all_valid
validity = get_validity(vector, size)
end
for i in 1:size
if all_valid || isvalid(validity, i)
result[position] = convert_func(column_data, array[i])
end
position += 1
end
return size
end
function convert_vector_string(
column_data::ColumnConversionData,
vector::Vec,
size::UInt64,
convert_func::Function,
result,
position,
all_valid,
::Type{SRC},
::Type{DST}
) where {SRC, DST}
raw_ptr = duckdb_vector_get_data(vector.handle)
ptr = Base.unsafe_convert(Ptr{duckdb_string_t}, raw_ptr)
if !all_valid
validity = get_validity(vector, size)
end
for i in 1:size
if all_valid || isvalid(validity, i)
result[position] = convert_func(column_data, raw_ptr, i)
end
position += 1
end
return size
end
function convert_vector_list(
column_data::ColumnConversionData,
vector::Vec,
size::UInt64,
convert_func::Function,
result,
position,
all_valid,
::Type{SRC},
::Type{DST}
) where {SRC, DST}
child_vector = list_child(vector)
lsize = list_size(vector)
# convert the child vector
ldata = column_data.conversion_data
child_column_data =
ColumnConversionData(column_data.chunks, column_data.col_idx, ldata.child_type, ldata.child_conversion_data)
child_array = Array{Union{Missing, ldata.target_type}}(missing, lsize)
ldata.conversion_loop_func(
child_column_data,
child_vector,
lsize,
ldata.conversion_func,
child_array,
1,
false,
ldata.internal_type,
ldata.target_type
)
array = get_array(vector, SRC, size)
if !all_valid
validity = get_validity(vector, size)
end
for i in 1:size
if all_valid || isvalid(validity, i)
start_offset::UInt64 = array[i].offset + 1
end_offset::UInt64 = array[i].offset + array[i].length
result[position] = child_array[start_offset:end_offset]
end
position += 1
end
return size
end
function convert_struct_children(column_data::ColumnConversionData, vector::Vec, size::UInt64)
# convert the child vectors of the struct
child_count = get_struct_child_count(column_data.logical_type)
child_arrays = Vector()
for i in 1:child_count
child_vector = struct_child(vector, i)
ldata = column_data.conversion_data.child_conversion_data[i]
child_column_data =
ColumnConversionData(column_data.chunks, column_data.col_idx, ldata.child_type, ldata.child_conversion_data)
child_array = Array{Union{Missing, ldata.target_type}}(missing, size)
ldata.conversion_loop_func(
child_column_data,
child_vector,
size,
ldata.conversion_func,
child_array,
1,
false,
ldata.internal_type,
ldata.target_type
)
push!(child_arrays, child_array)
end
return child_arrays
end
function convert_vector_struct(
column_data::ColumnConversionData,
vector::Vec,
size::UInt64,
convert_func::Function,
result,
position,
all_valid,
::Type{SRC},
::Type{DST}
) where {SRC, DST}
child_count = get_struct_child_count(column_data.logical_type)
child_arrays = convert_struct_children(column_data, vector, size)
if !all_valid
validity = get_validity(vector, size)
end
for i in 1:size
if all_valid || isvalid(validity, i)
result_tuple = Vector()
for child_idx in 1:child_count
push!(result_tuple, child_arrays[child_idx][i])
end
result[position] = NamedTuple{column_data.conversion_data.tuple_type}(result_tuple)
end
position += 1
end
return size
end
function convert_vector_union(
column_data::ColumnConversionData,
vector::Vec,
size::UInt64,
convert_func::Function,
result,
position,
all_valid,
::Type{SRC},
::Type{DST}
) where {SRC, DST}
child_arrays = convert_struct_children(column_data, vector, size)
if !all_valid
validity = get_validity(vector, size)
end
for row in 1:size
# For every row/record
if all_valid || isvalid(validity, row)
# Get the tag of this row
tag::UInt64 = child_arrays[1][row]
type::DataType = duckdb_type_to_julia_type(get_union_member_type(column_data.logical_type, tag + 1))
# Get the value from the child array indicated by the tag
# Offset by 1 because of julia
# Offset by another 1 because of the tag vector
value = child_arrays[tag + 2][row]
result[position] = isequal(value, missing) ? missing : type(value)
end
position += 1
end
return size
end
function convert_vector_map(
column_data::ColumnConversionData,
vector::Vec,
size::UInt64,
convert_func::Function,
result,
position,
all_valid,
::Type{SRC},
::Type{DST}
) where {SRC, DST}
child_vector = list_child(vector)
lsize = list_size(vector)
# convert the child vector
ldata = column_data.conversion_data
child_column_data =
ColumnConversionData(column_data.chunks, column_data.col_idx, ldata.child_type, ldata.child_conversion_data)
child_array = Array{Union{Missing, ldata.target_type}}(missing, lsize)
ldata.conversion_loop_func(
child_column_data,
child_vector,
lsize,
ldata.conversion_func,
child_array,
1,
false,
ldata.internal_type,
ldata.target_type
)
child_arrays = convert_struct_children(child_column_data, child_vector, lsize)
keys = child_arrays[1]
values = child_arrays[2]
array = get_array(vector, SRC, size)
if !all_valid
validity = get_validity(vector, size)
end
for i in 1:size
if all_valid || isvalid(validity, i)
result_dict = Dict()
start_offset::UInt64 = array[i].offset + 1
end_offset::UInt64 = array[i].offset + array[i].length
for key_idx in start_offset:end_offset
result_dict[keys[key_idx]] = values[key_idx]
end
result[position] = result_dict
end
position += 1
end
return size
end
function convert_column_loop(
column_data::ColumnConversionData,
convert_func::Function,
::Type{SRC},
::Type{DST},
convert_vector_func::Function
) where {SRC, DST}
# first check if there are null values in any chunks
has_missing = false
row_count = 0
for chunk in column_data.chunks
if !all_valid(chunk, column_data.col_idx)
has_missing = true
end
row_count += get_size(chunk)
end
if has_missing
# missing values
result = Array{Union{Missing, DST}}(missing, row_count)
position = 1
for chunk in column_data.chunks
position += convert_vector_func(
column_data,
get_vector(chunk, column_data.col_idx),
get_size(chunk),
convert_func,
result,
position,
all_valid(chunk, column_data.col_idx),
SRC,
DST
)
end
else
# no missing values
result = Array{DST}(undef, row_count)
position = 1
for chunk in column_data.chunks
position += convert_vector_func(
column_data,
get_vector(chunk, column_data.col_idx),
get_size(chunk),
convert_func,
result,
position,
true,
SRC,
DST
)
end
end
return result
end
function create_child_conversion_data(child_type::LogicalType)
internal_type_id = get_internal_type_id(child_type)
internal_type = duckdb_type_to_internal_type(internal_type_id)
target_type = duckdb_type_to_julia_type(child_type)
conversion_func = get_conversion_function(child_type)
conversion_loop_func = get_conversion_loop_function(child_type)
child_conversion_data = init_conversion_loop(child_type)
return ListConversionData(
conversion_func,
conversion_loop_func,
child_type,
internal_type,
target_type,
child_conversion_data
)
end
function init_conversion_loop(logical_type::LogicalType)
type = get_type_id(logical_type)
if type == DUCKDB_TYPE_DECIMAL
return duckdb_type_to_julia_type(logical_type)
elseif type == DUCKDB_TYPE_ENUM
return get_enum_dictionary(logical_type)
elseif type == DUCKDB_TYPE_LIST || type == DUCKDB_TYPE_MAP
child_type = get_list_child_type(logical_type)
return create_child_conversion_data(child_type)
elseif type == DUCKDB_TYPE_STRUCT || type == DUCKDB_TYPE_UNION
child_count_fun::Function = get_struct_child_count
child_type_fun::Function = get_struct_child_type
child_name_fun::Function = get_struct_child_name
#if type == DUCKDB_TYPE_UNION
# child_count_fun = get_union_member_count
# child_type_fun = get_union_member_type
# child_name_fun = get_union_member_name
#end
child_count = child_count_fun(logical_type)
child_symbols::Vector{Symbol} = Vector()
child_data::Vector{ListConversionData} = Vector()
for i in 1:child_count
child_symbol = Symbol(child_name_fun(logical_type, i))
child_type = child_type_fun(logical_type, i)
child_conv_data = create_child_conversion_data(child_type)
push!(child_symbols, child_symbol)
push!(child_data, child_conv_data)
end
return StructConversionData(Tuple(x for x in child_symbols), child_data)
else
return nothing
end
end
function get_conversion_function(logical_type::LogicalType)::Function
type = get_type_id(logical_type)
if type == DUCKDB_TYPE_VARCHAR
return convert_string
elseif type == DUCKDB_TYPE_BLOB || type == DUCKDB_TYPE_BIT
return convert_blob
elseif type == DUCKDB_TYPE_DATE
return convert_date
elseif type == DUCKDB_TYPE_TIME
return convert_time
elseif type == DUCKDB_TYPE_TIME_TZ
return convert_time_tz
elseif type == DUCKDB_TYPE_TIMESTAMP || type == DUCKDB_TYPE_TIMESTAMP_TZ
return convert_timestamp
elseif type == DUCKDB_TYPE_TIMESTAMP_S
return convert_timestamp_s
elseif type == DUCKDB_TYPE_TIMESTAMP_MS
return convert_timestamp_ms
elseif type == DUCKDB_TYPE_TIMESTAMP_NS
return convert_timestamp_ns
elseif type == DUCKDB_TYPE_INTERVAL
return convert_interval
elseif type == DUCKDB_TYPE_HUGEINT
return convert_hugeint
elseif type == DUCKDB_TYPE_UHUGEINT
return convert_uhugeint
elseif type == DUCKDB_TYPE_UUID
return convert_uuid
elseif type == DUCKDB_TYPE_DECIMAL
internal_type_id = get_internal_type_id(logical_type)
if internal_type_id == DUCKDB_TYPE_HUGEINT
return convert_decimal_hugeint
else
return convert_decimal
end
elseif type == DUCKDB_TYPE_ENUM
return convert_enum
else
return nop_convert
end
end
function get_conversion_loop_function(logical_type::LogicalType)::Function
type = get_type_id(logical_type)
if type == DUCKDB_TYPE_VARCHAR || type == DUCKDB_TYPE_BLOB || type == DUCKDB_TYPE_BIT
return convert_vector_string
elseif type == DUCKDB_TYPE_LIST
return convert_vector_list
elseif type == DUCKDB_TYPE_STRUCT
return convert_vector_struct
elseif type == DUCKDB_TYPE_MAP
return convert_vector_map
elseif type == DUCKDB_TYPE_UNION
return convert_vector_union
else
return convert_vector
end
end
function convert_column(column_data::ColumnConversionData)
internal_type_id = get_internal_type_id(column_data.logical_type)
internal_type = duckdb_type_to_internal_type(internal_type_id)
target_type = duckdb_type_to_julia_type(column_data.logical_type)
conversion_func = get_conversion_function(column_data.logical_type)
conversion_loop_func = get_conversion_loop_function(column_data.logical_type)
column_data.conversion_data = init_conversion_loop(column_data.logical_type)
return convert_column_loop(column_data, conversion_func, internal_type, target_type, conversion_loop_func)
end
function convert_columns(q::QueryResult, chunks::DataChunks, column_count::Integer = duckdb_column_count(q.handle))
return NamedTuple{Tuple(q.names)}(ntuple(column_count) do i
j = Int64(i)
logical_type = LogicalType(duckdb_column_logical_type(q.handle, j))
column_data = ColumnConversionData(chunks, j, logical_type, nothing)
return convert_column(column_data)
end)
end
function Tables.columns(q::QueryResult)
if q.tbl === missing
if q.chunk_index != 1
throw(
NotImplementedException(
"Materializing into a Julia table is not supported after calling nextDataChunk"
)
)
end
# gather all the data chunks
chunks::Vector{DataChunk} = []
while true
# fetch the next chunk
chunk = DuckDB.nextDataChunk(q)
if chunk === missing
# consumed all chunks
break
end
push!(chunks, chunk)
end
q.tbl = convert_columns(q, chunks)
end
return Tables.CopiedColumns(q.tbl)
end
mutable struct PendingQueryResult
handle::duckdb_pending_result
success::Bool
function PendingQueryResult(stmt::Stmt)
pending_handle = Ref{duckdb_pending_result}()
ret = executePending(stmt.handle, pending_handle, stmt.result_type)
result = new(pending_handle[], ret == DuckDBSuccess)
finalizer(_close_pending_result, result)
return result
end
end
function executePending(
handle::duckdb_prepared_statement,
pending_handle::Ref{duckdb_pending_result},
::Type{MaterializedResult}
)
return duckdb_pending_prepared(handle, pending_handle)
end
function executePending(
handle::duckdb_prepared_statement,
pending_handle::Ref{duckdb_pending_result},
::Type{StreamResult}
)
return duckdb_pending_prepared_streaming(handle, pending_handle)
end
function _close_pending_result(pending::PendingQueryResult)
if pending.handle == C_NULL
return
end
duckdb_destroy_pending(pending.handle)
pending.handle = C_NULL
return
end
function fetch_error(sql::AbstractString, error_ptr)
if error_ptr == C_NULL
return string("Execute of query \"", sql, "\" failed: unknown error")
else
return string("Execute of query \"", sql, "\" failed: ", unsafe_string(error_ptr))
end
end
function get_error(stmt::Stmt, pending::PendingQueryResult)
error_ptr = duckdb_pending_error(pending.handle)
error_message = fetch_error(stmt.sql, error_ptr)
_close_pending_result(pending)
return error_message
end
# execute tasks from a pending query result in a loop
function pending_execute_tasks(pending::PendingQueryResult)::Bool
ret = DUCKDB_PENDING_RESULT_NOT_READY
while !duckdb_pending_execution_is_finished(ret)
GC.safepoint()
ret = duckdb_pending_execute_task(pending.handle)
end
return ret != DUCKDB_PENDING_ERROR
end
function pending_execute_check_state(pending::PendingQueryResult)::duckdb_pending_state
ret = duckdb_pending_execute_check_state(pending.handle)
return ret
end
# execute background tasks in a loop, until task execution is finished
function execute_tasks(state::duckdb_task_state, con::Connection)
while !duckdb_task_state_is_finished(state)
duckdb_execute_n_tasks_state(state, 1)
GC.safepoint()
Base.yield()
if duckdb_execution_is_finished(con.handle)
break
end
end
return
end
# cleanup background tasks
function cleanup_tasks(tasks, state)
# mark execution as finished so the individual tasks will quit
duckdb_finish_execution(state)
# now wait for all tasks to finish executing
exceptions = []
for task in tasks
try
Base.wait(task)
catch ex
push!(exceptions, ex)
end
end
# clean up the tasks and task state
empty!(tasks)
duckdb_destroy_task_state(state)
# if any tasks threw, propagate the error upwards by throwing as well
for ex in exceptions
throw(ex)
end
return
end
function execute_singlethreaded(pending::PendingQueryResult)::Bool
# Only when there are no additional threads, use the main thread to execute
success = true
try
# now start executing tasks of the pending result in a loop
success = pending_execute_tasks(pending)
catch ex
throw(ex)
end
return success
end
function execute_multithreaded(stmt::Stmt, pending::PendingQueryResult)
# if multi-threading is enabled, launch background tasks
task_state = duckdb_create_task_state(stmt.con.db.handle)
tasks = []
for _ in 1:Threads.nthreads()
task_val = @spawn execute_tasks(task_state, stmt.con)
push!(tasks, task_val)
end
# When we have additional worker threads, don't execute using the main thread
while duckdb_execution_is_finished(stmt.con.handle) == false
ret = pending_execute_check_state(pending)
if ret == DUCKDB_PENDING_RESULT_READY || ret == DUCKDB_PENDING_ERROR
break
end
Base.yield()
GC.safepoint()
end
# we finished execution of all tasks, cleanup the tasks
return cleanup_tasks(tasks, task_state)
end
# this function is responsible for executing a statement and returning a result
function execute(stmt::Stmt, params::DBInterface.StatementParams = ())
bind_parameters(stmt, params)
# first create a pending query result
pending = PendingQueryResult(stmt)
if !pending.success
throw(QueryException(get_error(stmt, pending)))
end
success = true
if Threads.nthreads() == 1
success = execute_singlethreaded(pending)
# check if an error was thrown
if !success
throw(QueryException(get_error(stmt, pending)))
end
else
execute_multithreaded(stmt, pending)
end
handle = Ref{duckdb_result}()
ret = duckdb_execute_pending(pending.handle, handle)
if ret != DuckDBSuccess
error_ptr = duckdb_result_error(handle)
error_message = fetch_error(stmt.sql, error_ptr)
duckdb_destroy_result(handle)
throw(QueryException(error_message))
end
return QueryResult(handle)
end
# explicitly close prepared statement
DBInterface.close!(stmt::Stmt) = _close_stmt(stmt)
function execute(con::Connection, sql::AbstractString, params::DBInterface.StatementParams)
stmt = Stmt(con, sql, MaterializedResult)
try
return execute(stmt, params)
finally
_close_stmt(stmt) # immediately close, don't wait for GC
end
end
execute(con::Connection, sql::AbstractString; kwargs...) = execute(con, sql, values(kwargs))
execute(db::DB, sql::AbstractString, params::DBInterface.StatementParams) = execute(db.main_connection, sql, params)
execute(db::DB, sql::AbstractString; kwargs...) = execute(db.main_connection, sql, values(kwargs))
Tables.istable(::Type{QueryResult}) = true
Tables.isrowtable(::Type{QueryResult}) = true
Tables.columnaccess(::Type{QueryResult}) = true
Tables.schema(q::QueryResult) = Tables.Schema(q.names, q.types)
Base.IteratorSize(::Type{QueryResult}) = Base.SizeUnknown()
Base.eltype(q::QueryResult) = Any
DBInterface.close!(q::QueryResult) = _close_result(q)
Base.iterate(q::QueryResult) = iterate(Tables.rows(Tables.columns(q)))
Base.iterate(q::QueryResult, state) = iterate(Tables.rows(Tables.columns(q)), state)
struct QueryResultChunk
tbl::NamedTuple
end
function Tables.columns(chunk::QueryResultChunk)
return Tables.CopiedColumns(chunk.tbl)
end
Tables.istable(::Type{QueryResultChunk}) = true
Tables.isrowtable(::Type{QueryResultChunk}) = true
Tables.columnaccess(::Type{QueryResultChunk}) = true
Tables.schema(chunk::QueryResultChunk) = Tables.Schema(chunk.q.names, chunk.q.types)
struct QueryResultChunkIterator
q::QueryResult
column_count::Int64
end
function next_chunk(iter::QueryResultChunkIterator)
chunk = DuckDB.nextDataChunk(iter.q)
if chunk === missing
return nothing
end
return QueryResultChunk(convert_columns(iter.q, (chunk,), iter.column_count))
end
Base.iterate(iter::QueryResultChunkIterator) = iterate(iter, 0x0000000000000001)
function Base.iterate(iter::QueryResultChunkIterator, state)
if iter.q.chunk_index != state
throw(
NotImplementedException(
"Iterating chunks more than once is not supported. " *
"(Did you iterate the result of Tables.partitions() once already, call nextDataChunk or materialise QueryResult?)"
)
)
end
chunk = next_chunk(iter)
if chunk === nothing
return nothing
end
return (chunk, state + 1)
end
Base.IteratorSize(::Type{QueryResultChunkIterator}) = Base.SizeUnknown()
Base.eltype(iter::QueryResultChunkIterator) = Any
function Tables.partitions(q::QueryResult)
column_count = duckdb_column_count(q.handle)
return QueryResultChunkIterator(q, column_count)
end
function nextDataChunk(q::QueryResult)::Union{Missing, DataChunk}
if duckdb_result_is_streaming(q.handle[])
chunk_handle = duckdb_stream_fetch_chunk(q.handle[])
if chunk_handle == C_NULL
return missing
end
chunk = DataChunk(chunk_handle, true)
if get_size(chunk) == 0
return missing
end
else
chunk_count = duckdb_result_chunk_count(q.handle[])
if q.chunk_index > chunk_count
return missing
end
chunk = DataChunk(duckdb_result_get_chunk(q.handle[], q.chunk_index), true)
end
q.chunk_index += 1
return chunk
end
"Return the last row insert id from the executed statement"
DBInterface.lastrowid(con::Connection) = throw(NotImplementedException("Unimplemented: lastrowid"))
DBInterface.lastrowid(db::DB) = DBInterface.lastrowid(db.main_connection)
"""
DBInterface.prepare(db::DuckDB.DB, sql::AbstractString)
Prepare an SQL statement given as a string in the DuckDB database; returns a `DuckDB.Stmt` object.
See `DBInterface.execute`(@ref) for information on executing a prepared statement and passing parameters to bind.
A `DuckDB.Stmt` object can be closed (resources freed) using `DBInterface.close!`(@ref).
"""
DBInterface.prepare(con::Connection, sql::AbstractString, result_type::Type) = Stmt(con, sql, result_type)
DBInterface.prepare(con::Connection, sql::AbstractString) = DBInterface.prepare(con, sql, MaterializedResult)
DBInterface.prepare(db::DB, sql::AbstractString) = DBInterface.prepare(db.main_connection, sql)
DBInterface.prepare(db::DB, sql::AbstractString, result_type::Type) =
DBInterface.prepare(db.main_connection, sql, result_type)
"""
DBInterface.execute(db::DuckDB.DB, sql::String, [params])
DBInterface.execute(stmt::SQLite.Stmt, [params])
Bind any positional (`params` as `Vector` or `Tuple`) or named (`params` as `NamedTuple` or `Dict`) parameters to an SQL statement, given by `db` and `sql` or
as an already prepared statement `stmt`, execute the query and return an iterator of result rows.
Note that the returned result row iterator only supports a single-pass, forward-only iteration of the result rows.
Calling `SQLite.reset!(result)` will re-execute the query and reset the iterator back to the beginning.
The resultset iterator supports the [Tables.jl](https://github.com/JuliaData/Tables.jl) interface, so results can be collected in any Tables.jl-compatible sink,
like `DataFrame(results)`, `CSV.write("results.csv", results)`, etc.
"""
DBInterface.execute(stmt::Stmt, params::DBInterface.StatementParams) = execute(stmt, params)
function DBInterface.execute(con::Connection, sql::AbstractString, result_type::Type)
stmt = Stmt(con, sql, result_type)
try
return execute(stmt)
finally
_close_stmt(stmt) # immediately close, don't wait for GC
end
end
DBInterface.execute(con::Connection, sql::AbstractString) = DBInterface.execute(con, sql, MaterializedResult)
DBInterface.execute(db::DB, sql::AbstractString, result_type::Type) =
DBInterface.execute(db.main_connection, sql, result_type)
Base.show(io::IO, result::DuckDB.QueryResult) = print(io, Tables.columntable(result))
"""
Executes a SQL query within a connection and returns the full (materialized) result.
The query function is able to run queries with multiple statements, unlike `DBInterface.execute`(@ref) which is only able to prepare a single statement.
"""
function query(con::DuckDB.Connection, sql::AbstractString)
handle = Ref{duckdb_result}()
ret = duckdb_query(con.handle, sql, handle)
if ret != DuckDBSuccess
error_ptr = duckdb_result_error(handle)
error_message = fetch_error(sql, error_ptr)
duckdb_destroy_result(handle)
throw(QueryException(error_message))
end
return QueryResult(handle)
end
query(db::DuckDB.DB, sql::AbstractString) = query(db.main_connection, sql)

View File

@@ -0,0 +1,387 @@
#=
//===--------------------------------------------------------------------===//
// Scalar Function
//===--------------------------------------------------------------------===//
=#
"""
ScalarFunction(
name::AbstractString,
parameters::Vector{DataType},
return_type::DataType,
func,
wrapper = nothing
)
Creates a new scalar function object. It is recommended to use the `@create_scalar_function`
macro to create a new scalar function instead of calling this constructor directly.
# Arguments
- `name::AbstractString`: The name of the function.
- `parameters::Vector{DataType}`: The data types of the parameters.
- `return_type::DataType`: The return type of the function.
- `func`: The function to be called.
- `wrapper`: The wrapper function that is used to call the function from DuckDB.
- `wrapper_id`: A unique id for the wrapper function.
See also [`register_scalar_function`](@ref), [`@create_scalar_function`](@ref)
"""
mutable struct ScalarFunction
handle::duckdb_scalar_function
name::AbstractString
parameters::Vector{DataType}
return_type::DataType
logical_parameters::Vector{LogicalType}
logical_return_type::LogicalType
func::Function
wrapper::Union{Nothing, Function} # the wrapper function to hold a reference to it to prevent GC
wrapper_id::Union{Nothing, UInt64}
function ScalarFunction(
name::AbstractString,
parameters::Vector{DataType},
return_type::DataType,
func,
wrapper = nothing,
wrapper_id = nothing
)
handle = duckdb_create_scalar_function()
duckdb_scalar_function_set_name(handle, name)
logical_parameters = Vector{LogicalType}()
for parameter_type in parameters
push!(logical_parameters, create_logical_type(parameter_type))
end
logical_return_type = create_logical_type(return_type)
for param in logical_parameters
duckdb_scalar_function_add_parameter(handle, param.handle)
end
duckdb_scalar_function_set_return_type(handle, logical_return_type.handle)
result = new(
handle,
name,
parameters,
return_type,
logical_parameters,
logical_return_type,
func,
wrapper,
wrapper_id
)
finalizer(_destroy_scalar_function, result)
duckdb_scalar_function_set_extra_info(handle, pointer_from_objref(result), C_NULL)
return result
end
end
name(func::ScalarFunction) = func.name
signature(func::ScalarFunction) = string(func.name, "(", join(func.parameters, ", "), ") -> ", func.return_type)
function Base.show(io::IO, func::ScalarFunction)
print(io, "DuckDB.ScalarFunction(", signature(func), ")")
return
end
function _destroy_scalar_function(func::ScalarFunction)
# disconnect from DB
if func.handle != C_NULL
duckdb_destroy_scalar_function(func.handle)
end
# remove the wrapper from the cache
if func.wrapper_id !== nothing && func.wrapper_id in keys(_UDF_WRAPPER_CACHE)
delete!(_UDF_WRAPPER_CACHE, func.wrapper_id)
end
func.handle = C_NULL
return
end
"""
register_scalar_function(db::DB, fun::ScalarFunction)
register_scalar_function(con::Connection, fun::ScalarFunction)
Register a scalar function in the database.
"""
register_scalar_function(db::DB, fun::ScalarFunction) = register_scalar_function(db.main_connection, fun)
function register_scalar_function(con::Connection, fun::ScalarFunction)
if fun.name in keys(con.db.scalar_functions)
throw(ArgumentError(string("Scalar function \"", fun.name, "\" already registered")))
end
result = duckdb_register_scalar_function(con.handle, fun.handle)
if result != DuckDBSuccess
throw(ArgumentError(string("Failed to register scalar function \"", fun.name, "\"")))
end
con.db.scalar_functions[fun.name] = fun
return
end
# %% --- Scalar Function Macro ------------------------------------------ #
"""
name, at, rt = _udf_parse_function_expr(expr::Expr)
Parses a function expression and returns the function name, parameters and return type.
The parameters are turned as a vector of argument name, argument type tuples.
# Example
```julia
expr = :(my_sum(a::Int, b::Int)::Int)
name, at, rt = _udf_parse_function_expr(expr)
```
"""
function _udf_parse_function_expr(expr::Expr)
function parse_parameter(parameter_expr::Expr)
parameter_expr.head === :(::) || throw(ArgumentError("parameter_expr must be a type annotation"))
parameter, parameter_type = parameter_expr.args
if !isa(parameter, Symbol)
throw(ArgumentError("parameter name must be a symbol"))
end
# if !isa(parameter_type, Symbol)
# throw(ArgumentError("parameter_type must be a symbol"))
# end
return parameter, parameter_type
end
expr.head === :(::) ||
throw(ArgumentError("expr must be a typed function signature, e.g. func(a::Int, b::String)::Int"))
inner, return_type = expr.args
# parse inner
if !isa(inner, Expr)
throw(ArgumentError("inner must be an expression"))
end
inner.head === :call ||
throw(ArgumentError("expr must be a typed function signature, e.g. func(a::Int, b::String)::Int"))
func_name = inner.args[1]
parameters = parse_parameter.(inner.args[2:(end)])
return func_name, parameters, return_type
end
function _udf_generate_conversion_expressions(parameters, logical_type, convert, var_name, chunk_name)
# Example:
# data_1 = convert(Int, LT[1], chunk, 1)
var_names = [Symbol("$(var_name)_$(i)") for i in 1:length(parameters)]
expressions = [
Expr(:(=), var_names[i], Expr(:call, convert, p_type, Expr(:ref, logical_type, i), chunk_name, i)) for
(i, (p_name, p_type)) in enumerate(parameters)
]
return var_names, expressions
end
function _udf_generate_wrapper(func_expr, func_esc)
index_name = :i
log_param_types_name = :log_param_types
log_return_type_name = :log_return_type
# Parse the function definition, e.g. my_func(a::Int, b::String)::Int
func, parameters, return_type = _udf_parse_function_expr(func_expr)
func_name = string(func)
# Generate expressions to unpack the data chunk:
# param_1 = convert(Int, LT[1], chunk, 1)
# param_2 = convert(Int, LT[2], chunk, 2)
var_names, input_assignments =
_udf_generate_conversion_expressions(parameters, log_param_types_name, :_udf_convert_chunk, :param, :chunk)
# Generate the call expression: result = func(param_1, param_2, ...)
call_args_loop = [:($var_name[$index_name]) for var_name in var_names]
call_expr = Expr(:call, func_esc, call_args_loop...)
# Generate the validity expression: get_validity(chunk, i)
validity_expr_i = i -> Expr(:call, :get_validity, :chunk, i)
validity_expr = Expr(:tuple, (validity_expr_i(i) for i in 1:length(parameters))...)
return quote
function (info::DuckDB.duckdb_function_info, input::DuckDB.duckdb_data_chunk, output::DuckDB.duckdb_vector)
extra_info_ptr = DuckDB.duckdb_scalar_function_get_extra_info(info)
scalar_func::DuckDB.ScalarFunction = unsafe_pointer_to_objref(extra_info_ptr)
$log_param_types_name::Vector{LogicalType} = scalar_func.logical_parameters
$log_return_type_name::LogicalType = scalar_func.logical_return_type
try
vec = Vec(output)
chunk = DataChunk(input, false) # create a data chunk object, that does not own the data
$(input_assignments...) # Assign the input values
N = Int64(get_size(chunk))
# initialize the result container, to avoid calling get_array() in the loop
result_container = _udf_assign_result_init($return_type, vec)
# Check data validity
validity = $validity_expr
chunk_is_valid = all(all_valid.(validity))
result_validity = get_validity(vec)
for $index_name in 1:N
if chunk_is_valid || all(isvalid(v, $index_name) for v in validity)
result::$return_type = $call_expr
# Hopefully this optimized away if the type has no missing values
if ismissing(result)
setinvalid(result_validity, $index_name)
else
_udf_assign_result!(result_container, $return_type, vec, result, $index_name)
end
else
setinvalid(result_validity, $index_name)
end
end
return nothing
catch e
duckdb_scalar_function_set_error(
info,
"Exception in " * signature(scalar_func) * ": " * get_exception_info()
)
end
end
end
end
"""
Internal storage to have globally accessible functions pointers.
HACK: This is a workaround to dynamically generate a function pointer on ALL architectures.
"""
const _UDF_WRAPPER_CACHE = Dict{UInt64, Function}()
function _udf_register_wrapper(id, wrapper)
if id in keys(_UDF_WRAPPER_CACHE)
throw(
InvalidInputException(
"A function with the same id has already been registered. This should not happen. Please report this issue."
)
)
end
_UDF_WRAPPER_CACHE[id] = wrapper
# HACK: This is a workaround to dynamically generate a function pointer on ALL architectures
# We need to delay the cfunction call until the moment wrapper function is generated
fptr = QuoteNode(:(_UDF_WRAPPER_CACHE[$id]))
cfunction_type = Ptr{Cvoid}
rt = :Cvoid
at = :(duckdb_function_info, duckdb_data_chunk, duckdb_vector)
attr_svec = Expr(:call, GlobalRef(Core, :svec), at.args...)
cfun = Expr(:cfunction, cfunction_type, fptr, rt, attr_svec, QuoteNode(:ccall))
ptr = eval(cfun)
return ptr
end
"""
@create_scalar_function func_expr [func_ref]
Creates a new Scalar Function object that can be registered in a DuckDB database.
# Arguments
- `func_expr`: An expression that defines the function signature.
- `func_def`: An optional definition of the function or a closure. If omitted, it is assumed that a function with same name given in `func_expr` is defined in the global scope.
# Example
```julia
db = DuckDB.DB()
my_add(a,b) = a + b
fun = @create_scalar_function my_add(a::Int, b::Int)::Int
DuckDB.register_scalar_function(db, fun) # Register UDF
```
"""
macro create_scalar_function(func_expr, func_ref = nothing)
func, parameters, return_type = _udf_parse_function_expr(func_expr)
if func_ref !== nothing
func_esc = esc(func_ref)
else
func_esc = esc(func)
end
#@info "Create Scalar Function" func func_esc, parameters, return_type
func_name = string(func)
parameter_names = [p[1] for p in parameters]
parameter_types = [p[2] for p in parameters]
parameter_types_vec = Expr(:vect, parameter_types...) # create a vector expression, e.g. [Int, Int]
wrapper_expr = _udf_generate_wrapper(func_expr, func_esc)
id = hash((func_expr, rand(UInt64))) # generate a unique id for the function
return quote
local wrapper = $(wrapper_expr)
local fun = ScalarFunction($func_name, $parameter_types_vec, $return_type, $func_esc, wrapper, $id)
ptr = _udf_register_wrapper($id, wrapper)
# Everything below only works in GLOBAL scope in the repl
# ptr = @cfunction(fun.wrapper, Cvoid, (duckdb_function_info, duckdb_data_chunk, duckdb_vector))
duckdb_scalar_function_set_function(fun.handle, ptr)
fun
end
end
# %% --- Conversions ------------------------------------------ #
function _udf_assign_result_init(::Type{T}, vec::Vec) where {T}
T_internal = julia_to_duck_type(T)
arr = get_array(vec, T_internal) # this call is quite slow, so we only call it once
return arr
end
function _udf_assign_result_init(::Type{T}, vec::Vec) where {T <: AbstractString}
return nothing
end
function _udf_assign_result!(container, ::Type{T}, vec::Vec, result::T, index) where {T}
container[index] = value_to_duckdb(result) # convert the value to duckdb and assign it to the array
return nothing
end
function _udf_assign_result!(container, ::Type{T}, vec::Vec, result::T, index) where {T <: AbstractString}
s = string(result)
DuckDB.assign_string_element(vec, index, s)
return nothing
end
function _udf_convert_chunk(::Type{T}, lt::LogicalType, chunk::DataChunk, ix) where {T <: Number}
x::Vector{T} = get_array(chunk, ix, T)
return x
end
function _udf_convert_chunk(::Type{T}, lt::LogicalType, chunk::DataChunk, ix) where {T <: AbstractString}
data = ColumnConversionData((chunk,), ix, lt, nothing)
return convert_column(data)
end
function _udf_convert_chunk(::Type{T}, lt::LogicalType, chunk::DataChunk, ix) where {T}
data = ColumnConversionData((chunk,), ix, lt, nothing)
return convert_column(data)
end

View File

@@ -0,0 +1,113 @@
mutable struct Stmt <: DBInterface.Statement
con::Connection
handle::duckdb_prepared_statement
sql::AbstractString
result_type::Type
function Stmt(con::Connection, sql::AbstractString, result_type::Type)
handle = Ref{duckdb_prepared_statement}()
result = duckdb_prepare(con.handle, sql, handle)
if result != DuckDBSuccess
ptr = duckdb_prepare_error(handle[])
if ptr == C_NULL
error_message = "Preparation of statement failed: unknown error"
else
error_message = unsafe_string(ptr)
end
duckdb_destroy_prepare(handle)
throw(QueryException(error_message))
end
stmt = new(con, handle[], sql, result_type)
finalizer(_close_stmt, stmt)
return stmt
end
function Stmt(db::DB, sql::AbstractString, result_type::Type)
return Stmt(db.main_connection, sql, result_type)
end
end
function _close_stmt(stmt::Stmt)
if stmt.handle != C_NULL
duckdb_destroy_prepare(stmt.handle)
end
stmt.handle = C_NULL
return
end
DBInterface.getconnection(stmt::Stmt) = stmt.con
function nparameters(stmt::Stmt)
return Int(duckdb_nparams(stmt.handle))
end
duckdb_bind_internal(stmt::Stmt, i::Integer, val::AbstractFloat) = duckdb_bind_double(stmt.handle, i, Float64(val));
duckdb_bind_internal(stmt::Stmt, i::Integer, val::Bool) = duckdb_bind_boolean(stmt.handle, i, val);
duckdb_bind_internal(stmt::Stmt, i::Integer, val::Int8) = duckdb_bind_int8(stmt.handle, i, val);
duckdb_bind_internal(stmt::Stmt, i::Integer, val::Int16) = duckdb_bind_int16(stmt.handle, i, val);
duckdb_bind_internal(stmt::Stmt, i::Integer, val::Int32) = duckdb_bind_int32(stmt.handle, i, val);
duckdb_bind_internal(stmt::Stmt, i::Integer, val::Int64) = duckdb_bind_int64(stmt.handle, i, val);
duckdb_bind_internal(stmt::Stmt, i::Integer, val::UInt8) = duckdb_bind_uint8(stmt.handle, i, val);
duckdb_bind_internal(stmt::Stmt, i::Integer, val::UInt16) = duckdb_bind_uint16(stmt.handle, i, val);
duckdb_bind_internal(stmt::Stmt, i::Integer, val::UInt32) = duckdb_bind_uint32(stmt.handle, i, val);
duckdb_bind_internal(stmt::Stmt, i::Integer, val::UInt64) = duckdb_bind_uint64(stmt.handle, i, val);
duckdb_bind_internal(stmt::Stmt, i::Integer, val::Float32) = duckdb_bind_float(stmt.handle, i, val);
duckdb_bind_internal(stmt::Stmt, i::Integer, val::Float64) = duckdb_bind_double(stmt.handle, i, val);
duckdb_bind_internal(stmt::Stmt, i::Integer, val::Date) = duckdb_bind_date(stmt.handle, i, value_to_duckdb(val));
duckdb_bind_internal(stmt::Stmt, i::Integer, val::Time) = duckdb_bind_time(stmt.handle, i, value_to_duckdb(val));
duckdb_bind_internal(stmt::Stmt, i::Integer, val::DateTime) =
duckdb_bind_timestamp(stmt.handle, i, value_to_duckdb(val));
duckdb_bind_internal(stmt::Stmt, i::Integer, val::Missing) = duckdb_bind_null(stmt.handle, i);
duckdb_bind_internal(stmt::Stmt, i::Integer, val::Nothing) = duckdb_bind_null(stmt.handle, i);
duckdb_bind_internal(stmt::Stmt, i::Integer, val::AbstractString) =
duckdb_bind_varchar_length(stmt.handle, i, val, ncodeunits(val));
duckdb_bind_internal(stmt::Stmt, i::Integer, val::Vector{UInt8}) = duckdb_bind_blob(stmt.handle, i, val, sizeof(val));
duckdb_bind_internal(stmt::Stmt, i::Integer, val::WeakRefString{UInt8}) =
duckdb_bind_varchar_length(stmt.handle, i, val.ptr, val.len);
function duckdb_bind_internal(stmt::Stmt, i::Integer, val::AbstractVector{T}) where {T}
value = create_value(val)
return duckdb_bind_value(stmt.handle, i, value.handle)
end
function duckdb_bind_internal(stmt::Stmt, i::Integer, val::Any)
println(val)
throw(NotImplementedException("unsupported type for bind"))
end
function bind_parameters(stmt::Stmt, params::DBInterface.PositionalStatementParams)
i = 1
for param in params
if duckdb_bind_internal(stmt, i, param) != DuckDBSuccess
throw(QueryException("Failed to bind parameter"))
end
i += 1
end
end
function bind_parameters(stmt::Stmt, params::DBInterface.NamedStatementParams)
N = nparameters(stmt)
if length(params) == 0
return # no parameters to bind
end
K = eltype(keys(params))
for i in 1:N
name_ptr = duckdb_parameter_name(stmt.handle, i)
name = unsafe_string(name_ptr)
duckdb_free(name_ptr)
name_key = K(name)
if !haskey(params, name_key)
if isa(params, NamedTuple)
value = params[i] # FIXME this is a workaround to keep the interface consistent, see the test in test_sqlite.jl
else
throw(QueryException("Parameter '$name' not found"))
end
else
value = getindex(params, name_key)
end
if duckdb_bind_internal(stmt, i, value) != DuckDBSuccess
throw(QueryException("Failed to bind parameter '$name'"))
end
end
end

View File

@@ -0,0 +1,369 @@
#=
//===--------------------------------------------------------------------===//
// Table Function Bind
//===--------------------------------------------------------------------===//
=#
struct BindInfo
handle::duckdb_bind_info
main_function::Any
function BindInfo(handle::duckdb_bind_info, main_function)
result = new(handle, main_function)
return result
end
end
mutable struct InfoWrapper
main_function::Any
info::Any
function InfoWrapper(main_function, info)
return new(main_function, info)
end
end
function parameter_count(bind_info::BindInfo)
return duckdb_bind_get_parameter_count(bind_info.handle)
end
function get_parameter(bind_info::BindInfo, index::Int64)
return Value(duckdb_bind_get_parameter(bind_info.handle, index))
end
function set_stats_cardinality(bind_info::BindInfo, cardinality::UInt64, is_exact::Bool)
duckdb_bind_set_cardinality(bind_info.handle, cardinality, is_exact)
return
end
function add_result_column(bind_info::BindInfo, name::AbstractString, type::DataType)
return add_result_column(bind_info, name, create_logical_type(type))
end
function add_result_column(bind_info::BindInfo, name::AbstractString, type::LogicalType)
return duckdb_bind_add_result_column(bind_info.handle, name, type.handle)
end
function get_extra_data(bind_info::BindInfo)
return bind_info.main_function.extra_data
end
function _add_global_object(main_function, object)
begin
lock(main_function.global_lock)
try
push!(main_function.global_objects, object)
finally
unlock(main_function.global_lock)
end
end
return
end
function _remove_global_object(main_function, object)
begin
lock(main_function.global_lock)
try
delete!(main_function.global_objects, object)
finally
unlock(main_function.global_lock)
end
end
return
end
function _table_bind_cleanup(data::Ptr{Cvoid})
info::InfoWrapper = unsafe_pointer_to_objref(data)
_remove_global_object(info.main_function, info)
return
end
function get_exception_info()
error = ""
for (exc, bt) in current_exceptions()
error = string(error, sprint(showerror, exc, bt))
end
return error
end
function _table_bind_function(info::duckdb_bind_info)
try
main_function = unsafe_pointer_to_objref(duckdb_bind_get_extra_info(info))
binfo = BindInfo(info, main_function)
bind_data = InfoWrapper(main_function, main_function.bind_func(binfo))
bind_data_pointer = pointer_from_objref(bind_data)
_add_global_object(main_function, bind_data)
duckdb_bind_set_bind_data(info, bind_data_pointer, @cfunction(_table_bind_cleanup, Cvoid, (Ptr{Cvoid},)))
catch
duckdb_bind_set_error(info, get_exception_info())
return
end
return
end
#=
//===--------------------------------------------------------------------===//
// Table Function Init
//===--------------------------------------------------------------------===//
=#
struct InitInfo
handle::duckdb_init_info
main_function::Any
function InitInfo(handle::duckdb_init_info, main_function)
result = new(handle, main_function)
return result
end
end
function _table_init_function_generic(info::duckdb_init_info, init_fun::Function)
try
main_function = unsafe_pointer_to_objref(duckdb_init_get_extra_info(info))
binfo = InitInfo(info, main_function)
init_data = InfoWrapper(main_function, init_fun(binfo))
init_data_pointer = pointer_from_objref(init_data)
_add_global_object(main_function, init_data)
duckdb_init_set_init_data(info, init_data_pointer, @cfunction(_table_bind_cleanup, Cvoid, (Ptr{Cvoid},)))
catch
duckdb_init_set_error(info, get_exception_info())
return
end
return
end
function _table_init_function(info::duckdb_init_info)
main_function = unsafe_pointer_to_objref(duckdb_init_get_extra_info(info))
return _table_init_function_generic(info, main_function.init_func)
end
function _table_local_init_function(info::duckdb_init_info)
main_function = unsafe_pointer_to_objref(duckdb_init_get_extra_info(info))
return _table_init_function_generic(info, main_function.init_local_func)
end
function get_bind_info(info::InitInfo, ::Type{T})::T where {T}
return unsafe_pointer_to_objref(duckdb_init_get_bind_data(info.handle)).info
end
function get_extra_data(info::InitInfo)
return info.main_function.extra_data
end
function set_max_threads(info::InitInfo, max_threads)
return duckdb_init_set_max_threads(info.handle, max_threads)
end
function get_projected_columns(info::InitInfo)::Vector{Int64}
result::Vector{Int64} = Vector()
column_count = duckdb_init_get_column_count(info.handle)
for i in 1:column_count
push!(result, duckdb_init_get_column_index(info.handle, i))
end
return result
end
function _empty_init_info(info::DuckDB.InitInfo)
return missing
end
#=
//===--------------------------------------------------------------------===//
// Main Table Function
//===--------------------------------------------------------------------===//
=#
struct FunctionInfo
handle::duckdb_function_info
main_function::Any
function FunctionInfo(handle::duckdb_function_info, main_function)
result = new(handle, main_function)
return result
end
end
function get_bind_info(info::FunctionInfo, ::Type{T})::T where {T}
return unsafe_pointer_to_objref(duckdb_function_get_bind_data(info.handle)).info
end
function get_init_info(info::FunctionInfo, ::Type{T})::T where {T}
return unsafe_pointer_to_objref(duckdb_function_get_init_data(info.handle)).info
end
function get_local_info(info::FunctionInfo, ::Type{T})::T where {T}
return unsafe_pointer_to_objref(duckdb_function_get_local_init_data(info.handle)).info
end
function _table_main_function(info::duckdb_function_info, chunk::duckdb_data_chunk)
main_function::TableFunction = unsafe_pointer_to_objref(duckdb_function_get_extra_info(info))
binfo::FunctionInfo = FunctionInfo(info, main_function)
try
main_function.main_func(binfo, DataChunk(chunk, false))
catch
duckdb_function_set_error(info, get_exception_info())
end
return
end
#=
//===--------------------------------------------------------------------===//
// Table Function
//===--------------------------------------------------------------------===//
=#
"""
DuckDB table function
"""
mutable struct TableFunction
handle::duckdb_table_function
bind_func::Function
init_func::Function
init_local_func::Function
main_func::Function
extra_data::Any
global_objects::Set{Any}
global_lock::ReentrantLock
function TableFunction(
name::AbstractString,
parameters::Vector{LogicalType},
bind_func::Function,
init_func::Function,
init_local_func::Function,
main_func::Function,
extra_data::Any,
projection_pushdown::Bool
)
handle = duckdb_create_table_function()
duckdb_table_function_set_name(handle, name)
for param in parameters
duckdb_table_function_add_parameter(handle, param.handle)
end
result = new(handle, bind_func, init_func, init_local_func, main_func, extra_data, Set(), ReentrantLock())
finalizer(_destroy_table_function, result)
duckdb_table_function_set_extra_info(handle, pointer_from_objref(result), C_NULL)
duckdb_table_function_set_bind(handle, @cfunction(_table_bind_function, Cvoid, (duckdb_bind_info,)))
duckdb_table_function_set_init(handle, @cfunction(_table_init_function, Cvoid, (duckdb_init_info,)))
duckdb_table_function_set_local_init(handle, @cfunction(_table_local_init_function, Cvoid, (duckdb_init_info,)))
duckdb_table_function_set_function(
handle,
@cfunction(_table_main_function, Cvoid, (duckdb_function_info, duckdb_data_chunk))
)
duckdb_table_function_supports_projection_pushdown(handle, projection_pushdown)
return result
end
end
function _destroy_table_function(func::TableFunction)
# disconnect from DB
if func.handle != C_NULL
duckdb_destroy_table_function(func.handle)
end
return func.handle = C_NULL
end
function create_table_function(
con::Connection,
name::AbstractString,
parameters::Vector{LogicalType},
bind_func::Function,
init_func::Function,
main_func::Function,
extra_data::Any = missing,
projection_pushdown::Bool = false,
init_local_func::Union{Missing, Function} = missing
)
if init_local_func === missing
init_local_func = _empty_init_info
end
fun = TableFunction(
name,
parameters,
bind_func,
init_func,
init_local_func,
main_func,
extra_data,
projection_pushdown
)
if duckdb_register_table_function(con.handle, fun.handle) != DuckDBSuccess
throw(QueryException(string("Failed to register table function \"", name, "\"")))
end
push!(con.db.functions, fun)
return
end
function create_table_function(
con::Connection,
name::AbstractString,
parameters::Vector{DataType},
bind_func::Function,
init_func::Function,
main_func::Function,
extra_data::Any = missing,
projection_pushdown::Bool = false,
init_local_func::Union{Missing, Function} = missing
)
parameter_types::Vector{LogicalType} = Vector()
for parameter_type in parameters
push!(parameter_types, create_logical_type(parameter_type))
end
return create_table_function(
con,
name,
parameter_types,
bind_func,
init_func,
main_func,
extra_data,
projection_pushdown,
init_local_func
)
end
function create_table_function(
db::DB,
name::AbstractString,
parameters::Vector{LogicalType},
bind_func::Function,
init_func::Function,
main_func::Function,
extra_data::Any = missing,
projection_pushdown::Bool = false,
init_local_func::Union{Missing, Function} = missing
)
return create_table_function(
db.main_connection,
name,
parameters,
bind_func,
init_func,
main_func,
extra_data,
projection_pushdown,
init_local_func
)
end
function create_table_function(
db::DB,
name::AbstractString,
parameters::Vector{DataType},
bind_func::Function,
init_func::Function,
main_func::Function,
extra_data::Any = missing,
projection_pushdown::Bool = false,
init_local_func::Union{Missing, Function} = missing
)
return create_table_function(
db.main_connection,
name,
parameters,
bind_func,
init_func,
main_func,
extra_data,
projection_pushdown,
init_local_func
)
end

View File

@@ -0,0 +1,236 @@
struct TableBindInfo
tbl::Any
input_columns::Vector
scan_types::Vector{Type}
result_types::Vector{Type}
scan_functions::Vector{Function}
function TableBindInfo(
tbl,
input_columns::Vector,
scan_types::Vector{Type},
result_types::Vector{Type},
scan_functions::Vector{Function}
)
return new(tbl, input_columns, scan_types, result_types, scan_functions)
end
end
table_result_type(tbl, entry) = Core.Compiler.typesubtract(eltype(tbl[entry]), Missing, 1)
julia_to_duck_type(::Type{Date}) = Int32
julia_to_duck_type(::Type{Time}) = Int64
julia_to_duck_type(::Type{DateTime}) = Int64
julia_to_duck_type(::Type{T}) where {T} = T
value_to_duckdb(val::Date) = convert(Int32, Dates.date2epochdays(val) - ROUNDING_EPOCH_TO_UNIX_EPOCH_DAYS)
value_to_duckdb(val::Time) = convert(Int64, Dates.value(val) / 1000)
value_to_duckdb(val::DateTime) = convert(Int64, (Dates.datetime2epochms(val) - ROUNDING_EPOCH_TO_UNIX_EPOCH_MS) * 1000)
value_to_duckdb(val::AbstractString) = throw(
NotImplementedException(
"Cannot use value_to_duckdb to convert string values - use DuckDB.assign_string_element on a vector instead"
)
)
value_to_duckdb(val) = val
function tbl_scan_column(
input_column::AbstractVector{JL_TYPE},
row_offset::Int64,
col_idx::Int64,
result_idx::Int64,
scan_count::Int64,
output::DuckDB.DataChunk,
::Type{DUCK_TYPE},
::Type{JL_TYPE}
) where {DUCK_TYPE, JL_TYPE}
vector::Vec = DuckDB.get_vector(output, result_idx)
result_array::Vector{DUCK_TYPE} = DuckDB.get_array(vector, DUCK_TYPE)
validity::ValidityMask = DuckDB.get_validity(vector)
for i::Int64 in 1:scan_count
val = getindex(input_column, row_offset + i)
if val === missing
DuckDB.setinvalid(validity, i)
else
result_array[i] = value_to_duckdb(val)
end
end
end
function tbl_scan_string_column(
input_column::AbstractVector{JL_TYPE},
row_offset::Int64,
col_idx::Int64,
result_idx::Int64,
scan_count::Int64,
output::DuckDB.DataChunk,
::Type{DUCK_TYPE},
::Type{JL_TYPE}
) where {DUCK_TYPE, JL_TYPE}
vector::Vec = DuckDB.get_vector(output, result_idx)
validity::ValidityMask = DuckDB.get_validity(vector)
for i::Int64 in 1:scan_count
val = getindex(input_column, row_offset + i)
if val === missing
DuckDB.setinvalid(validity, i)
else
DuckDB.assign_string_element(vector, i, val)
end
end
end
function tbl_scan_function(tbl, entry)
result_type = table_result_type(tbl, entry)
if result_type <: AbstractString
return tbl_scan_string_column
end
return tbl_scan_column
end
function tbl_bind_function(info::DuckDB.BindInfo)
# fetch the tbl name from the function parameters
parameter = DuckDB.get_parameter(info, 0)
name = DuckDB.getvalue(parameter, String)
# fetch the actual tbl using the function name
extra_data = DuckDB.get_extra_data(info)
tbl = extra_data[name]
# set the cardinality
row_count::UInt64 = Tables.rowcount(tbl)
DuckDB.set_stats_cardinality(info, row_count, true)
# register the result columns
input_columns = Vector()
scan_types::Vector{Type} = Vector()
result_types::Vector{Type} = Vector()
scan_functions::Vector{Function} = Vector()
for entry in Tables.columnnames(tbl)
result_type = table_result_type(tbl, entry)
scan_function = tbl_scan_function(tbl, entry)
push!(input_columns, tbl[entry])
push!(scan_types, eltype(tbl[entry]))
push!(result_types, julia_to_duck_type(result_type))
push!(scan_functions, scan_function)
DuckDB.add_result_column(info, string(entry), result_type)
end
return TableBindInfo(tbl, input_columns, scan_types, result_types, scan_functions)
end
mutable struct TableGlobalInfo
pos::Int64
global_lock::ReentrantLock
function TableGlobalInfo()
return new(0, ReentrantLock())
end
end
mutable struct TableLocalInfo
columns::Vector{Int64}
current_pos::Int64
end_pos::Int64
function TableLocalInfo(columns)
return new(columns, 0, 0)
end
end
function tbl_global_init_function(info::DuckDB.InitInfo)
bind_info = DuckDB.get_bind_info(info, TableBindInfo)
# figure out the maximum number of threads to launch from the tbl size
row_count::Int64 = Tables.rowcount(bind_info.tbl)
max_threads::Int64 = ceil(row_count / DuckDB.ROW_GROUP_SIZE)
DuckDB.set_max_threads(info, max_threads)
return TableGlobalInfo()
end
function tbl_local_init_function(info::DuckDB.InitInfo)
columns = DuckDB.get_projected_columns(info)
return TableLocalInfo(columns)
end
function tbl_scan_function(info::DuckDB.FunctionInfo, output::DuckDB.DataChunk)
bind_info = DuckDB.get_bind_info(info, TableBindInfo)
global_info = DuckDB.get_init_info(info, TableGlobalInfo)
local_info = DuckDB.get_local_info(info, TableLocalInfo)
if local_info.current_pos >= local_info.end_pos
# ran out of data to scan in the local info: fetch new rows from the global state (if any)
# we can in increments of 100 vectors
lock(global_info.global_lock) do
row_count::Int64 = Tables.rowcount(bind_info.tbl)
local_info.current_pos = global_info.pos
total_scan_amount::Int64 = DuckDB.ROW_GROUP_SIZE
if local_info.current_pos + total_scan_amount >= row_count
total_scan_amount = row_count - local_info.current_pos
end
local_info.end_pos = local_info.current_pos + total_scan_amount
return global_info.pos += total_scan_amount
end
end
scan_count::Int64 = DuckDB.VECTOR_SIZE
current_row::Int64 = local_info.current_pos
if current_row + scan_count >= local_info.end_pos
scan_count = local_info.end_pos - current_row
end
local_info.current_pos += scan_count
result_idx::Int64 = 1
for col_idx::Int64 in local_info.columns
if col_idx == 0
result_idx += 1
continue
end
bind_info.scan_functions[col_idx](
bind_info.input_columns[col_idx],
current_row,
col_idx,
result_idx,
scan_count,
output,
bind_info.result_types[col_idx],
bind_info.scan_types[col_idx]
)
result_idx += 1
end
DuckDB.set_size(output, scan_count)
return
end
function register_table(con::Connection, tbl, name::AbstractString)
con.db.registered_objects[name] = columntable(tbl)
DBInterface.execute(
con,
string("CREATE OR REPLACE VIEW \"", name, "\" AS SELECT * FROM julia_tbl_scan('", name, "')")
)
return
end
register_table(db::DB, tbl, name::AbstractString) = register_table(db.main_connection, tbl, name)
function unregister_table(con::Connection, name::AbstractString)
pop!(con.db.registered_objects, name)
DBInterface.execute(con, string("DROP VIEW IF EXISTS \"", name, "\""))
return
end
unregister_table(db::DB, name::AbstractString) = unregister_table(db.main_connection, name)
# for backwards compatibility:
const register_data_frame = register_table
const unregister_data_frame = unregister_table
function _add_table_scan(db::DB)
# add the table scan function
DuckDB.create_table_function(
db.main_connection,
"julia_tbl_scan",
[String],
tbl_bind_function,
tbl_global_init_function,
tbl_scan_function,
db.handle.registered_objects,
true,
tbl_local_init_function
)
return
end

View File

@@ -0,0 +1,48 @@
function DBInterface.transaction(f, con::Connection)
begin_transaction(con)
try
f()
catch
rollback(con)
rethrow()
end
commit(con)
return
end
function DBInterface.transaction(f, db::DB)
return DBInterface.transaction(f, db.main_connection)
end
"""
DuckDB.begin(db)
begin a transaction
"""
function begin_transaction end
begin_transaction(con::Connection) = execute(con, "BEGIN TRANSACTION;")
begin_transaction(db::DB) = begin_transaction(db.main_connection)
transaction(con::Connection) = begin_transaction(con)
transaction(db::DB) = begin_transaction(db)
"""
DuckDB.commit(db)
commit a transaction
"""
function commit end
commit(con::Connection) = execute(con, "COMMIT TRANSACTION;")
commit(db::DB) = commit(db.main_connection)
"""
DuckDB.rollback(db)
rollback transaction
"""
function rollback end
rollback(con::Connection) = execute(con, "ROLLBACK TRANSACTION;")
rollback(db::DB) = rollback(db.main_connection)

View File

@@ -0,0 +1,36 @@
"""
DuckDB validity mask
"""
struct ValidityMask
data::Vector{UInt64}
function ValidityMask(data::Vector{UInt64})
result = new(data)
return result
end
end
const BITS_PER_VALUE = 64;
function get_entry_index(row_idx)
return ((row_idx - 1) ÷ BITS_PER_VALUE) + 1
end
function get_index_in_entry(row_idx)
return (row_idx - 1) % BITS_PER_VALUE
end
function setinvalid(mask::ValidityMask, index)
entry_idx = get_entry_index(index)
index_in_entry = get_index_in_entry(index)
mask.data[entry_idx] &= ~(1 << index_in_entry)
return
end
function isvalid(mask::ValidityMask, index)::Bool
entry_idx = get_entry_index(index)
index_in_entry = get_index_in_entry(index)
return (mask.data[entry_idx] & (1 << index_in_entry)) != 0
end
all_valid(mask::ValidityMask) = all(==(typemax(eltype(mask.data))), mask.data)

View File

@@ -0,0 +1,59 @@
"""
DuckDB value
"""
mutable struct Value
handle::duckdb_value
function Value(handle::duckdb_value)
result = new(handle)
finalizer(_destroy_value, result)
return result
end
end
function _destroy_value(val::Value)
if val.handle != C_NULL
duckdb_destroy_value(val.handle)
end
val.handle = C_NULL
return
end
getvalue(val::Value, ::Type{T}) where {T <: Int64} = duckdb_get_int64(val.handle)
function getvalue(val::Value, ::Type{T}) where {T <: String}
ptr = duckdb_get_varchar(val.handle)
result = unsafe_string(ptr)
duckdb_free(ptr)
return result
end
function getvalue(val::Value, ::Type{T}) where {T}
throw(NotImplementedException("Unsupported type for getvalue"))
end
create_value(val::T) where {T <: Bool} = Value(duckdb_create_bool(val))
create_value(val::T) where {T <: Int8} = Value(duckdb_create_int8(val))
create_value(val::T) where {T <: Int16} = Value(duckdb_create_int16(val))
create_value(val::T) where {T <: Int32} = Value(duckdb_create_int32(val))
create_value(val::T) where {T <: Int64} = Value(duckdb_create_int64(val))
create_value(val::T) where {T <: Int128} = Value(duckdb_create_hugeint(val))
create_value(val::T) where {T <: UInt8} = Value(duckdb_create_uint8(val))
create_value(val::T) where {T <: UInt16} = Value(duckdb_create_uint16(val))
create_value(val::T) where {T <: UInt32} = Value(duckdb_create_uint32(val))
create_value(val::T) where {T <: UInt64} = Value(duckdb_create_uint64(val))
create_value(val::T) where {T <: UInt128} = Value(duckdb_create_uhugeint(val))
create_value(val::T) where {T <: Float32} = Value(duckdb_create_float(val))
create_value(val::T) where {T <: Float64} = Value(duckdb_create_double(val))
create_value(val::T) where {T <: Date} =
Value(duckdb_create_date(Dates.date2epochdays(val) - ROUNDING_EPOCH_TO_UNIX_EPOCH_DAYS))
create_value(val::T) where {T <: Time} = Value(duckdb_create_time(Dates.value(val) ÷ 1000))
create_value(val::T) where {T <: DateTime} =
Value(duckdb_create_timestamp((Dates.datetime2epochms(val) - ROUNDING_EPOCH_TO_UNIX_EPOCH_MS) * 1000))
create_value(val::T) where {T <: AbstractString} = Value(duckdb_create_varchar_length(val, length(val)))
function create_value(val::AbstractVector{T}) where {T}
type = create_logical_type(T)
values = create_value.(val)
return Value(duckdb_create_list_value(type.handle, map(x -> x.handle, values), length(values)))
end
function create_value(val::T) where {T}
throw(NotImplementedException("Unsupported type for getvalue"))
end

View File

@@ -0,0 +1,58 @@
"""
DuckDB vector
"""
struct Vec
handle::duckdb_vector
function Vec(handle::duckdb_vector)
result = new(handle)
return result
end
end
function get_array(vector::Vec, ::Type{T}, size = VECTOR_SIZE)::Vector{T} where {T}
raw_ptr = duckdb_vector_get_data(vector.handle)
ptr = Base.unsafe_convert(Ptr{T}, raw_ptr)
return unsafe_wrap(Vector{T}, ptr, size, own = false)
end
function get_validity(vector::Vec, size = VECTOR_SIZE)::ValidityMask
duckdb_vector_ensure_validity_writable(vector.handle)
validity_ptr = duckdb_vector_get_validity(vector.handle)
ptr = Base.unsafe_convert(Ptr{UInt64}, validity_ptr)
size_words = div(size, BITS_PER_VALUE, RoundUp)
validity_vector = unsafe_wrap(Vector{UInt64}, ptr, size_words, own = false)
return ValidityMask(validity_vector)
end
function all_valid(vector::Vec, size = VECTOR_SIZE)::Bool
validity_ptr = duckdb_vector_get_validity(vector.handle)
validity_ptr == C_NULL && return true
size_words = div(size, BITS_PER_VALUE, RoundUp)
validity_vector = unsafe_wrap(Vector{UInt64}, validity_ptr, size_words, own = false)
return all_valid(ValidityMask(validity_vector))
end
function list_child(vector::Vec)::Vec
return Vec(duckdb_list_vector_get_child(vector.handle))
end
function list_size(vector::Vec)::UInt64
return duckdb_list_vector_get_size(vector.handle)
end
function struct_child(vector::Vec, index::UInt64)::Vec
return Vec(duckdb_struct_vector_get_child(vector.handle, index))
end
function union_member(vector::Vec, index::UInt64)::Vec
return Vec(duckdb_union_vector_get_member(vector.handle, index))
end
function assign_string_element(vector::Vec, index::Int64, str::String)
return duckdb_vector_assign_string_element_len(vector.handle, index, str, sizeof(str))
end
function assign_string_element(vector::Vec, index::Int64, str::AbstractString)
return duckdb_vector_assign_string_element_len(vector.handle, index, str, sizeof(str))
end

9
external/duckdb/tools/juliapkg/test.sh vendored Executable file
View File

@@ -0,0 +1,9 @@
set -e
export JULIA_DUCKDB_LIBRARY="`pwd`/../../build/debug/src/libduckdb.dylib"
#export JULIA_DUCKDB_LIBRARY="`pwd`/../../build/release/src/libduckdb.dylib"
# memory profiling: --track-allocation=user
export JULIA_NUM_THREADS=1
julia --project -e "import Pkg; Pkg.test(; test_args = [\"$1\"])"

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,45 @@
using DataFrames
using Tables
using DuckDB
using Test
using Dates
using FixedPointDecimals
using UUIDs
test_files = [
"test_appender.jl",
"test_basic_queries.jl",
"test_big_nested.jl",
"test_config.jl",
"test_connection.jl",
"test_tbl_scan.jl",
"test_prepare.jl",
"test_transaction.jl",
"test_sqlite.jl",
"test_replacement_scan.jl",
"test_table_function.jl",
"test_old_interface.jl",
"test_all_types.jl",
"test_union_type.jl",
"test_decimals.jl",
"test_threading.jl",
"test_tpch.jl",
"test_tpch_multithread.jl",
"test_stream_data_chunk.jl",
"test_scalar_udf.jl"
]
if length(ARGS) > 0 && !isempty(ARGS[1])
filtered_test_files = []
for test_file in test_files
if test_file == ARGS[1]
push!(filtered_test_files, test_file)
end
end
test_files = filtered_test_files
end
for fname in test_files
println(fname)
include(fname)
end

View File

@@ -0,0 +1,190 @@
# test_all_types.jl
@testset "Test All Types" begin
db = DBInterface.connect(DuckDB.DB)
con = DBInterface.connect(db)
df = DataFrame(
DBInterface.execute(
con,
"""SELECT * EXCLUDE(time, time_tz, fixed_int_array, fixed_varchar_array, fixed_nested_int_array,
fixed_nested_varchar_array, fixed_struct_array, struct_of_fixed_array, fixed_array_of_int_list,
list_of_fixed_int_array, bignum)
, CASE WHEN time = '24:00:00'::TIME THEN '23:59:59.999999'::TIME ELSE time END AS time
, CASE WHEN time_tz = '24:00:00-15:59:59'::TIMETZ THEN '23:59:59.999999-15:59:59'::TIMETZ ELSE time_tz END AS time_tz
FROM test_all_types()
"""
)
)
#println(names(df))
# we can also use 'propertynames()' to get the column names as symbols, that might make for a better testing approach
# If we add a dictionary that maps from the symbol to the expected result
@test isequal(df.bool, [false, true, missing])
@test isequal(df.tinyint, [-128, 127, missing])
@test isequal(df.smallint, [-32768, 32767, missing])
@test isequal(df.int, [-2147483648, 2147483647, missing])
@test isequal(df.bigint, [-9223372036854775808, 9223372036854775807, missing])
@test isequal(
df.hugeint,
[-170141183460469231731687303715884105728, 170141183460469231731687303715884105727, missing]
)
@test isequal(df.uhugeint, [0, 340282366920938463463374607431768211455, missing])
@test isequal(df.utinyint, [0, 255, missing])
@test isequal(df.usmallint, [0, 65535, missing])
@test isequal(df.uint, [0, 4294967295, missing])
@test isequal(df.ubigint, [0, 18446744073709551615, missing])
@test isequal(df.float, [-3.4028235f38, 3.4028235f38, missing])
@test isequal(df.double, [-1.7976931348623157e308, 1.7976931348623157e308, missing])
@test isequal(df.dec_4_1, [-999.9, 999.9, missing])
@test isequal(df.dec_9_4, [-99999.9999, 99999.9999, missing])
@test isequal(df.dec_18_6, [-999999999999.999999, 999999999999.999999, missing])
@test isequal(
df.dec38_10,
[-9999999999999999999999999999.9999999999, 9999999999999999999999999999.9999999999, missing]
)
@test isequal(
df.dec38_10,
[-9999999999999999999999999999.9999999999, 9999999999999999999999999999.9999999999, missing]
)
@test isequal(
df.dec38_10,
[-9999999999999999999999999999.9999999999, 9999999999999999999999999999.9999999999, missing]
)
@test isequal(df.small_enum, ["DUCK_DUCK_ENUM", "GOOSE", missing])
@test isequal(df.medium_enum, ["enum_0", "enum_299", missing])
@test isequal(df.large_enum, ["enum_0", "enum_69999", missing])
@test isequal(df.date, [Dates.Date(-5877641, 6, 25), Dates.Date(5881580, 7, 10), missing])
@test isequal(df.time, [Dates.Time(0, 0, 0), Dates.Time(23, 59, 59, 999, 999), missing])
@test isequal(df.time_tz, [Dates.Time(0, 0, 0), Dates.Time(23, 59, 59, 999, 999), missing])
@test isequal(
df.timestamp,
[Dates.DateTime(-290308, 12, 22, 0, 0, 0), Dates.DateTime(294247, 1, 10, 4, 0, 54, 775), missing]
)
@test isequal(
df.timestamp_tz,
[Dates.DateTime(-290308, 12, 22, 0, 0, 0), Dates.DateTime(294247, 1, 10, 4, 0, 54, 775), missing]
)
@test isequal(
df.timestamp_s,
[Dates.DateTime(-290308, 12, 22, 0, 0, 0), Dates.DateTime(294247, 1, 10, 4, 0, 54, 0), missing]
)
@test isequal(
df.timestamp_ms,
[Dates.DateTime(-290308, 12, 22, 0, 0, 0), Dates.DateTime(294247, 1, 10, 4, 0, 54, 775), missing]
)
@test isequal(
df.timestamp_ns,
[Dates.DateTime(1677, 9, 22, 0, 0, 0, 0), Dates.DateTime(2262, 4, 11, 23, 47, 16, 854), missing]
)
@test isequal(
df.interval,
[
Dates.CompoundPeriod(Dates.Month(0), Dates.Day(0), Dates.Microsecond(0)),
Dates.CompoundPeriod(Dates.Month(999), Dates.Day(999), Dates.Microsecond(999999999)),
missing
]
)
@test isequal(df.varchar, ["🦆🦆🦆🦆🦆🦆", "goo\0se", missing])
@test isequal(
df.blob,
[
UInt8[
0x74,
0x68,
0x69,
0x73,
0x69,
0x73,
0x61,
0x6c,
0x6f,
0x6e,
0x67,
0x62,
0x6c,
0x6f,
0x62,
0x00,
0x77,
0x69,
0x74,
0x68,
0x6e,
0x75,
0x6c,
0x6c,
0x62,
0x79,
0x74,
0x65,
0x73
],
UInt8[0x00, 0x00, 0x00, 0x61],
missing
]
)
@test isequal(df.uuid, [UUID(0), UUID(UInt128(340282366920938463463374607431768211455)), missing])
@test isequal(df.int_array, [[], [42, 999, missing, missing, -42], missing])
@test isequal(df.double_array, [[], [42, NaN, Inf, -Inf, missing, -42], missing])
@test isequal(
df.date_array,
[
[],
[
Dates.Date(1970, 1, 1),
Dates.Date(5881580, 7, 11),
Dates.Date(-5877641, 6, 24),
missing,
Dates.Date(2022, 5, 12)
],
missing
]
)
@test isequal(
df.timestamp_array,
[
[],
[
Dates.DateTime(1970, 1, 1),
Dates.DateTime(294247, 1, 10, 4, 0, 54, 775),
Dates.DateTime(-290308, 12, 21, 19, 59, 5, 225),
missing,
Dates.DateTime(2022, 5, 12, 16, 23, 45)
],
missing
]
)
@test isequal(
df.timestamptz_array,
[
[],
[
Dates.DateTime(1970, 1, 1),
Dates.DateTime(294247, 1, 10, 4, 0, 54, 775),
Dates.DateTime(-290308, 12, 21, 19, 59, 5, 225),
missing,
Dates.DateTime(2022, 05, 12, 23, 23, 45)
],
missing
]
)
@test isequal(df.varchar_array, [[], ["🦆🦆🦆🦆🦆🦆", "goose", missing, ""], missing])
@test isequal(
df.nested_int_array,
[[], [[], [42, 999, missing, missing, -42], missing, [], [42, 999, missing, missing, -42]], missing]
)
@test isequal(df.struct, [(a = missing, b = missing), (a = 42, b = "🦆🦆🦆🦆🦆🦆"), missing])
@test isequal(
df.struct_of_arrays,
[
(a = missing, b = missing),
(a = [42, 999, missing, missing, -42], b = ["🦆🦆🦆🦆🦆🦆", "goose", missing, ""]),
missing
]
)
@test isequal(df.array_of_structs, [[], [(a = missing, b = missing), (a = 42, b = "🦆🦆🦆🦆🦆🦆"), missing], missing])
@test isequal(df.map, [Dict(), Dict("key1" => "🦆🦆🦆🦆🦆🦆", "key2" => "goose"), missing])
end

View File

@@ -0,0 +1,158 @@
@testset "Appender Error" begin
db = DBInterface.connect(DuckDB.DB)
con = DBInterface.connect(db)
@test_throws DuckDB.QueryException DuckDB.Appender(db, "nonexistanttable")
@test_throws DuckDB.QueryException DuckDB.Appender(con, "t")
end
@testset "Appender Usage - Schema $(schema_provided ? "Provided" : "Not Provided")" for schema_provided in (false, true)
db = DBInterface.connect(DuckDB.DB)
table_name = "integers"
if schema_provided
schema_name = "test"
full_table_name = "$(schema_name).$(table_name)"
DBInterface.execute(db, "CREATE SCHEMA $(schema_name)")
else
schema_name = nothing
full_table_name = table_name
end
DBInterface.execute(db, "CREATE TABLE $(full_table_name)(i INTEGER)")
appender = DuckDB.Appender(db, table_name, schema_name)
DuckDB.close(appender)
DuckDB.close(appender)
# close!
appender = DuckDB.Appender(db, table_name, schema_name)
DBInterface.close!(appender)
appender = DuckDB.Appender(db, table_name, schema_name)
for i in 0:9
DuckDB.append(appender, i)
DuckDB.end_row(appender)
end
DuckDB.flush(appender)
DuckDB.close(appender)
results = DBInterface.execute(db, "SELECT * FROM $(full_table_name)")
df = DataFrame(results)
@test names(df) == ["i"]
@test size(df, 1) == 10
@test df.i == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# close the database
DuckDB.close(appender)
end
@testset "Appender API" begin
# Open the database
db = DBInterface.connect(DuckDB.DB)
uuid = Base.UUID("a36a5689-48ec-4104-b147-9fed600d8250")
# Test data for the appender api test
# - `col_name`: DuckDB column name
# - `duck_type`: DuckDB column type
# - `append_value`: Value to insert via DuckDB.append
# - `ref_value`: (optional) Expected value from querying the DuckDB table. If not provided, uses `append_value`
test_data = [
(; col_name = :bool, duck_type = "BOOLEAN", append_value = true, ref_value = true),
(; col_name = :tint, duck_type = "TINYINT", append_value = -1, ref_value = Int8(-1)),
(; col_name = :sint, duck_type = "SMALLINT", append_value = -2, ref_value = Int16(-2)),
(; col_name = :int, duck_type = "INTEGER", append_value = -3, ref_value = Int32(-3)),
(; col_name = :bint, duck_type = "BIGINT", append_value = -4, ref_value = Int64(-4)),
(; col_name = :hint, duck_type = "HUGEINT", append_value = Int128(-5), ref_value = Int128(-5)),
(; col_name = :utint, duck_type = "UTINYINT", append_value = 1, ref_value = UInt8(1)),
(; col_name = :usint, duck_type = "USMALLINT", append_value = 2, ref_value = UInt16(2)),
(; col_name = :uint, duck_type = "UINTEGER", append_value = 3, ref_value = UInt32(3)),
(; col_name = :ubint, duck_type = "UBIGINT", append_value = 4, ref_value = UInt64(4)),
(; col_name = :uhint, duck_type = "UHUGEINT", append_value = UInt128(5), ref_value = UInt128(5)),
(; col_name = :dec16, duck_type = "DECIMAL(4,2)", append_value = FixedDecimal{Int16, 2}(1.01)),
(; col_name = :dec32, duck_type = "DECIMAL(9,2)", append_value = FixedDecimal{Int32, 2}(1.02)),
(; col_name = :dec64, duck_type = "DECIMAL(18,2)", append_value = FixedDecimal{Int64, 2}(1.03)),
(; col_name = :dec128, duck_type = "DECIMAL(38,2)", append_value = FixedDecimal{Int128, 2}(1.04)),
(; col_name = :float, duck_type = "FLOAT", append_value = 1.0, ref_value = Float32(1.0)),
(; col_name = :double, duck_type = "DOUBLE", append_value = 2.0, ref_value = Float64(2.0)),
(; col_name = :date, duck_type = "DATE", append_value = Dates.Date("1970-04-11")),
(; col_name = :time, duck_type = "TIME", append_value = Dates.Time(0, 0, 0, 0, 200)),
(; col_name = :timestamp, duck_type = "TIMESTAMP", append_value = Dates.DateTime("1970-01-02T01:23:45.678")),
(; col_name = :missingval, duck_type = "INTEGER", append_value = missing),
(; col_name = :nothingval, duck_type = "INTEGER", append_value = nothing, ref_value = missing),
(; col_name = :largeval, duck_type = "INTEGER", append_value = Int32(2^16)),
(; col_name = :uuid, duck_type = "UUID", append_value = uuid),
(; col_name = :varchar, duck_type = "VARCHAR", append_value = "Foo"),
# lists
(; col_name = :list_bool, duck_type = "BOOLEAN[]", append_value = Vector{Bool}([true, false, true])),
(; col_name = :list_int8, duck_type = "TINYINT[]", append_value = Vector{Int8}([1, -2, 3])),
(; col_name = :list_int16, duck_type = "SMALLINT[]", append_value = Vector{Int16}([1, -2, 3])),
(; col_name = :list_int32, duck_type = "INTEGER[]", append_value = Vector{Int32}([1, -2, 3])),
(; col_name = :list_int64, duck_type = "BIGINT[]", append_value = Vector{Int64}([1, -2, 3])),
(;
col_name = :list_int128,
duck_type = "HUGEINT[]",
append_value = Vector{Int128}([Int128(1), Int128(-2), Int128(3)])
),
# (; col_name = :list_uint8, duck_type = "UTINYINT[]", append_value = Vector{UInt8}([1, 2, 3])),
(; col_name = :list_uint16, duck_type = "USMALLINT[]", append_value = Vector{UInt16}([1, 2, 3])),
(; col_name = :list_uint32, duck_type = "UINTEGER[]", append_value = Vector{UInt32}([1, 2, 3])),
(; col_name = :list_uint64, duck_type = "UBIGINT[]", append_value = Vector{UInt64}([1, 2, 3])),
(;
col_name = :list_uint128,
duck_type = "UHUGEINT[]",
append_value = Vector{UInt128}([UInt128(1), UInt128(2), UInt128(3)])
),
(; col_name = :list_float, duck_type = "FLOAT[]", append_value = Vector{Float32}([1.0, 2.0, 3.0])),
(; col_name = :list_double, duck_type = "DOUBLE[]", append_value = Vector{Float64}([1.0, 2.0, 3.0])),
(; col_name = :list_string, duck_type = "VARCHAR[]", append_value = Vector{String}(["a", "bb", "ccc"])),
(;
col_name = :list_date,
duck_type = "DATE[]",
append_value = Vector{Dates.Date}([
Dates.Date("1970-01-01"),
Dates.Date("1970-01-02"),
Dates.Date("1970-01-03")
])
),
(;
col_name = :list_time,
duck_type = "TIME[]",
append_value = Vector{Dates.Time}([Dates.Time(1), Dates.Time(1, 2), Dates.Time(1, 2, 3)])
),
(;
col_name = :list_timestamp,
duck_type = "TIMESTAMP[]",
append_value = Vector{Dates.DateTime}([
Dates.DateTime("1970-01-01T00:00:00"),
Dates.DateTime("1970-01-02T00:00:00"),
Dates.DateTime("1970-01-03T00:00:00")
])
)
]
sql = """CREATE TABLE dtypes(
$(join(("$(row.col_name) $(row.duck_type)" for row in test_data), ",\n"))
)"""
DuckDB.execute(db, sql)
appender = DuckDB.Appender(db, "dtypes")
for row in test_data
DuckDB.append(appender, row.append_value)
end
# End the row of the appender
DuckDB.end_row(appender)
# Destroy the appender and flush the data
DuckDB.flush(appender)
DuckDB.close(appender)
results = DBInterface.execute(db, "select * from dtypes;")
df = DataFrame(results)
for row in test_data
ref_value = get(row, :ref_value, row.append_value)
@test isequal(df[!, row.col_name], [ref_value])
end
# close the database
DBInterface.close!(db)
end

View File

@@ -0,0 +1,181 @@
# test_basic_queries.jl
using Tables: partitions
@testset "Test DBInterface.execute" begin
con = DBInterface.connect(DuckDB.DB)
results = DBInterface.execute(con, "SELECT 42 a")
# iterator
for row in Tables.rows(results)
@test row.a == 42
@test row[1] == 42
end
# convert to DataFrame
df = DataFrame(results)
@test names(df) == ["a"]
@test size(df, 1) == 1
@test df.a == [42]
# do block syntax to automatically close cursor
df = DBInterface.execute(con, "SELECT 42 a") do results
return DataFrame(results)
end
@test names(df) == ["a"]
@test size(df, 1) == 1
@test df.a == [42]
DBInterface.close!(con)
end
@testset "Test numeric data types" begin
con = DBInterface.connect(DuckDB.DB)
results = DBInterface.execute(
con,
"""
SELECT 42::TINYINT a, 42::INT16 b, 42::INT32 c, 42::INT64 d, 42::UINT8 e, 42::UINT16 f, 42::UINT32 g, 42::UINT64 h
UNION ALL
SELECT NULL, NULL, NULL, NULL, NULL, NULL, 43, NULL
"""
)
df = DataFrame(results)
@test size(df, 1) == 2
@test isequal(df.a, [42, missing])
@test isequal(df.b, [42, missing])
@test isequal(df.c, [42, missing])
@test isequal(df.d, [42, missing])
@test isequal(df.e, [42, missing])
@test isequal(df.f, [42, missing])
@test isequal(df.g::Vector{Int}, [42, 43])
@test isequal(df.h, [42, missing])
DBInterface.close!(con)
end
@testset "Test strings" begin
con = DBInterface.connect(DuckDB.DB)
results = DBInterface.execute(
con,
"""
SELECT 'hello world' s
UNION ALL
SELECT NULL
UNION ALL
SELECT 'this is a long string'
UNION ALL
SELECT 'obligatory mühleisen'
UNION ALL
SELECT '🦆🍞🦆'
"""
)
df = DataFrame(results)
@test size(df, 1) == 5
@test isequal(df.s, ["hello world", missing, "this is a long string", "obligatory mühleisen", "🦆🍞🦆"])
for s in ["foo", "🦆DB", SubString("foobar", 1, 3), SubString("🦆ling", 1, 6)]
results = DBInterface.execute(con, "SELECT length(?) as len", [s])
@test only(results).len == 3
end
DBInterface.close!(con)
end
@testset "DBInterface.execute - parser error" begin
con = DBInterface.connect(DuckDB.DB)
# parser error
@test_throws DuckDB.QueryException DBInterface.execute(con, "SELEC")
DBInterface.close!(con)
end
@testset "DBInterface.execute - binder error" begin
con = DBInterface.connect(DuckDB.DB)
# binder error
@test_throws DuckDB.QueryException DBInterface.execute(con, "SELECT * FROM this_table_does_not_exist")
DBInterface.close!(con)
end
@testset "DBInterface.execute - runtime error" begin
con = DBInterface.connect(DuckDB.DB)
res = DBInterface.execute(con, "select current_setting('threads')")
df = DataFrame(res)
print(df)
# run-time error
@test_throws DuckDB.QueryException DBInterface.execute(
con,
"SELECT i::int FROM (SELECT '42' UNION ALL SELECT 'hello') tbl(i)"
)
DBInterface.close!(con)
end
# test a PIVOT query that generates multiple prepared statements and will fail with execute
@testset "Test DBInterface.query" begin
db = DuckDB.DB()
con = DuckDB.connect(db)
DuckDB.execute(con, "CREATE TABLE Cities (Country VARCHAR, Name VARCHAR, Year INT, Population INT);")
DuckDB.execute(con, "INSERT INTO Cities VALUES ('NL', 'Amsterdam', 2000, 1005)")
DuckDB.execute(con, "INSERT INTO Cities VALUES ('NL', 'Amsterdam', 2010, 1065)")
results = DuckDB.query(con, "PIVOT Cities ON Year USING first(Population);")
# iterator
for row in Tables.rows(results)
@test row[:Name] == "Amsterdam"
@test row[4] == 1065
end
# convert to DataFrame
df = DataFrame(results)
@test names(df) == ["Country", "Name", "2000", "2010"]
@test size(df, 1) == 1
@test df[1, :Country] == "NL"
@test df[1, :Name] == "Amsterdam"
@test df[1, "2000"] == 1005
@test df[1, 4] == 1065
@test DataFrame(DuckDB.query(db, "select 'a'; select 2;"))[1, 1] == "a"
DBInterface.close!(con)
end
@testset "Test chunked response" begin
con = DBInterface.connect(DuckDB.DB)
DBInterface.execute(con, "CREATE TABLE chunked_table AS SELECT * FROM range(2049)")
result = DBInterface.execute(con, "SELECT * FROM chunked_table;")
chunks_it = partitions(result)
chunks = collect(chunks_it)
@test length(chunks) == 2
@test_throws DuckDB.NotImplementedException collect(chunks_it)
result = DBInterface.execute(con, "SELECT * FROM chunked_table;", DuckDB.StreamResult)
chunks_it = partitions(result)
chunks = collect(chunks_it)
@test length(chunks) == 2
@test_throws DuckDB.NotImplementedException collect(chunks_it)
DuckDB.execute(
con,
"""
CREATE TABLE large (x1 INT, x2 INT, x3 INT, x4 INT, x5 INT, x6 INT, x7 INT, x8 INT, x9 INT, x10 INT, x11 INT);
"""
)
DuckDB.execute(con, "INSERT INTO large VALUES (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1);")
result = DBInterface.execute(con, "SELECT * FROM large ;")
chunks_it = partitions(result)
chunks = collect(chunks_it)
@test length(chunks) == 1
DBInterface.close!(con)
end

View File

@@ -0,0 +1,74 @@
@testset "Test big list" begin
con = DBInterface.connect(DuckDB.DB)
DBInterface.execute(con, "CREATE TABLE list_table (int_list INT[]);")
DBInterface.execute(con, "INSERT INTO list_table VALUES (range(2049));")
df = DataFrame(DBInterface.execute(con, "SELECT * FROM list_table;"))
@test length(df[1, :int_list]) == 2049
DBInterface.close!(con)
end
@testset "Test big bitstring" begin
con = DBInterface.connect(DuckDB.DB)
DBInterface.execute(con, "CREATE TABLE bit_table (bits BIT);")
# 131073 = 64 * 2048 + 1
DBInterface.execute(con, "INSERT INTO bit_table VALUES (bitstring('1010', 131073));")
df = DataFrame(DBInterface.execute(con, "SELECT * FROM bit_table;"))
# Currently mapped to Julia in an odd way.
# Can reenable following https://github.com/duckdb/duckdb/issues/7065
@test length(df[1, :bits]) == 131073 skip = true
DBInterface.close!(con)
end
@testset "Test big string" begin
con = DBInterface.connect(DuckDB.DB)
DBInterface.execute(con, "CREATE TABLE str_table (str VARCHAR);")
DBInterface.execute(con, "INSERT INTO str_table VALUES (repeat('🦆', 1024) || '🪿');")
df = DataFrame(DBInterface.execute(con, "SELECT * FROM str_table;"))
@test length(df[1, :str]) == 1025
DBInterface.close!(con)
end
@testset "Test big map" begin
con = DBInterface.connect(DuckDB.DB)
DBInterface.execute(con, "CREATE TABLE map_table (map MAP(VARCHAR, INT));")
DBInterface.execute(
con,
"INSERT INTO map_table VALUES (map_from_entries([{'k': 'billy' || num, 'v': num} for num in range(2049)]));"
)
df = DataFrame(DBInterface.execute(con, "SELECT * FROM map_table;"))
@test length(df[1, :map]) == 2049
DBInterface.close!(con)
end
@testset "Test big struct" begin
con = DBInterface.connect(DuckDB.DB)
DBInterface.execute(con, "CREATE TABLE struct_table (stct STRUCT(a INT[], b INT[]));")
DBInterface.execute(con, "INSERT INTO struct_table VALUES ({'a': range(1024), 'b': range(1025)});")
df = DataFrame(DBInterface.execute(con, "SELECT * FROM struct_table;"))
s = df[1, :stct]
@test length(s.a) == 1024
@test length(s.b) == 1025
DBInterface.close!(con)
end
@testset "Test big union" begin
con = DBInterface.connect(DuckDB.DB)
DBInterface.execute(con, "CREATE TABLE union_table (uni UNION(a INT[], b INT));")
DBInterface.execute(con, "INSERT INTO union_table (uni) VALUES (union_value(a := range(2049))), (42);")
df = DataFrame(DBInterface.execute(con, "SELECT * FROM union_table;"))
@test length(df[1, :uni]) == 2049
DBInterface.close!(con)
end

View File

@@ -0,0 +1,12 @@
@testset "C API Type Checks" begin
# Check struct sizes.
# Timestamp struct size mismatch, eventually structs are stored as pointers. This happens if they are declared as mutable structs.
@test sizeof(DuckDB.duckdb_timestamp_struct) ==
sizeof(DuckDB.duckdb_date_struct) + sizeof(DuckDB.duckdb_time_struct)
# Bot structs are equivalent and actually stored as a Union type in C.
@test sizeof(DuckDB.duckdb_string_t) == sizeof(DuckDB.duckdb_string_t_ptr)
end

View File

@@ -0,0 +1,96 @@
# test_config.jl
@testset "Test configuration parameters" begin
# by default NULLs come first
con = DBInterface.connect(DuckDB.DB, ":memory:")
results = DBInterface.execute(con, "SELECT 42 a UNION ALL SELECT NULL ORDER BY a")
tbl = rowtable(results)
@test isequal(tbl, [(a = 42,), (a = missing,)])
DBInterface.close!(con)
# if we add this configuration flag, nulls should come last
config = DuckDB.Config()
DuckDB.set_config(config, "default_null_order", "nulls_first")
con = DBInterface.connect(DuckDB.DB, ":memory:", config)
# NULL should come last now
results = DBInterface.execute(con, "SELECT 42 a UNION ALL SELECT NULL ORDER BY a")
df = DataFrame(results)
@test names(df) == ["a"]
@test size(df, 1) == 2
@test isequal(df.a, [missing, 42])
DBInterface.close!(con)
DuckDB.set_config(config, "unrecognized option", "aaa")
@test_throws DuckDB.ConnectionException con = DBInterface.connect(DuckDB.DB, ":memory:", config)
DBInterface.close!(config)
DBInterface.close!(config)
# test different ways to create a config object, all should be equivalent
conf1 = DuckDB.Config()
DuckDB.set_config(conf1, "default_null_order", "nulls_first")
conf2 = DuckDB.Config()
conf2["default_null_order"] = "nulls_first"
conf3 = DuckDB.Config(default_null_order = "nulls_first")
conf4 = DuckDB.Config(["default_null_order" => "nulls_first"])
@testset for config in [conf1, conf2, conf3, conf4]
con = DBInterface.connect(DuckDB.DB, ":memory:", config)
# NULL should come last now
results = DBInterface.execute(con, "SELECT 42 a UNION ALL SELECT NULL ORDER BY a")
tbl = rowtable(results)
@test isequal(tbl, [(a = missing,), (a = 42,)])
DBInterface.close!(con)
DuckDB.set_config(config, "unrecognized option", "aaa")
@test_throws DuckDB.ConnectionException con = DBInterface.connect(DuckDB.DB, ":memory:", config)
DBInterface.close!(config)
DBInterface.close!(config)
end
# config options can be specified directly in the call
con = DBInterface.connect(DuckDB.DB, ":memory:"; config = ["default_null_order" => "nulls_first"])
tbl = DBInterface.execute(con, "SELECT 42 a UNION ALL SELECT NULL ORDER BY a") |> rowtable
@test isequal(tbl, [(a = missing,), (a = 42,)])
close(con)
con = DBInterface.connect(DuckDB.DB, ":memory:"; config = (; default_null_order = "nulls_first"))
tbl = DBInterface.execute(con, "SELECT 42 a UNION ALL SELECT NULL ORDER BY a") |> rowtable
@test isequal(tbl, [(a = missing,), (a = 42,)])
close(con)
# special handling of the readonly option
file = tempname()
con = DBInterface.connect(DuckDB.DB, file)
DBInterface.execute(con, "CREATE TABLE t1(a INTEGER)")
close(con)
con = DBInterface.connect(DuckDB.DB, file; readonly = true)
@test_throws DuckDB.QueryException DBInterface.execute(con, "CREATE TABLE t2(a INTEGER)")
close(con)
end
@testset "Test Set TimeZone" begin
con = DBInterface.connect(DuckDB.DB, ":memory:")
DBInterface.execute(con, "SET TimeZone='UTC'")
results = DBInterface.execute(con, "SELECT CURRENT_SETTING('TimeZone') AS tz")
df = DataFrame(results)
@test isequal(df[1, "tz"], "UTC")
DBInterface.execute(con, "SET TimeZone='America/Los_Angeles'")
results = DBInterface.execute(con, "SELECT CURRENT_SETTING('TimeZone') AS tz")
df = DataFrame(results)
@test isequal(df[1, "tz"], "America/Los_Angeles")
DBInterface.close!(con)
end

View File

@@ -0,0 +1,55 @@
# test_connection.jl
@testset "Test opening and closing an in-memory database" begin
con = DBInterface.connect(DuckDB.DB, ":memory:")
DBInterface.close!(con)
# verify that double-closing does not cause any problems
DBInterface.close!(con)
DBInterface.close!(con)
@test 1 == 1
con = DBInterface.connect(DuckDB.DB, ":memory:")
@test isopen(con)
close(con)
@test !isopen(con)
end
@testset "Test opening a bogus directory" begin
@test_throws DuckDB.ConnectionException DBInterface.connect(DuckDB.DB, "/path/to/bogus/directory")
end
@testset "Test opening and closing an on-disk database" begin
# This checks for an issue where the DB and the connection are
# closed but the actual db is not (and subsequently cannot be opened
# in a different process). To check this, we create a DB, write some
# data to it, close the connection and check if the WAL file exists.
#
# Ideally, the WAL file should not exist, but Garbage Collection of Julia
# may not have run yet, so open database handles may still exist, preventing
# the database from being closed properly.
db_path = joinpath(mktempdir(), "duckdata.db")
db_path_wal = db_path * ".wal"
function write_data(dbfile::String)
db = DuckDB.DB(dbfile)
conn = DBInterface.connect(db)
DBInterface.execute(conn, "CREATE OR REPLACE TABLE test (a INTEGER, b INTEGER);")
DBInterface.execute(conn, "INSERT INTO test VALUES (1, 2);")
DBInterface.close!(conn)
DuckDB.close_database(db)
return true
end
write_data(db_path) # call the function
@test isfile(db_path_wal) === false # WAL file should not exist
@test isfile(db_path) # check if the database file exists
# check if the database can be opened
if haskey(ENV, "JULIA_DUCKDB_LIBRARY")
duckdb_binary = joinpath(dirname(ENV["JULIA_DUCKDB_LIBRARY"]), "..", "duckdb")
result = run(`$duckdb_binary $db_path -c "SELECT * FROM test LIMIT 1"`) # check if the database can be opened
@test success(result)
end
end

View File

@@ -0,0 +1,89 @@
# test_decimals.jl
@testset "Test decimal support" begin
con = DBInterface.connect(DuckDB.DB)
results = DBInterface.execute(
con,
"SELECT 42.3::DECIMAL(4,1) a, 4923.3::DECIMAL(9,1) b, 421.423::DECIMAL(18,3) c, 129481294.3392::DECIMAL(38,4) d"
)
# convert to DataFrame
df = DataFrame(results)
@test names(df) == ["a", "b", "c", "d"]
@test size(df, 1) == 1
@test df.a == [42.3]
@test df.b == [4923.3]
@test df.c == [421.423]
@test df.d == [129481294.3392]
DBInterface.close!(con)
end
# test returning decimals in a table function
function my_bind_function(info::DuckDB.BindInfo)
DuckDB.add_result_column(info, "a", FixedDecimal{Int16, 0})
DuckDB.add_result_column(info, "b", FixedDecimal{Int32, 1})
DuckDB.add_result_column(info, "c", FixedDecimal{Int64, 2})
DuckDB.add_result_column(info, "d", FixedDecimal{Int128, 3})
return missing
end
mutable struct MyInitStruct
pos::Int64
function MyInitStruct()
return new(0)
end
end
function my_init_function(info::DuckDB.InitInfo)
return MyInitStruct()
end
function my_main_function(info::DuckDB.FunctionInfo, output::DuckDB.DataChunk)
init_info = DuckDB.get_init_info(info, MyInitStruct)
a_array = DuckDB.get_array(output, 1, Int16)
b_array = DuckDB.get_array(output, 2, Int32)
c_array = DuckDB.get_array(output, 3, Int64)
d_array = DuckDB.get_array(output, 4, Int128)
count = 0
multiplier = 1
for i in 1:(DuckDB.VECTOR_SIZE)
if init_info.pos >= 3
break
end
a_array[count + 1] = 42 * multiplier
b_array[count + 1] = 42 * multiplier
c_array[count + 1] = 42 * multiplier
d_array[count + 1] = 42 * multiplier
count += 1
init_info.pos += 1
multiplier *= 10
end
DuckDB.set_size(output, count)
return
end
@testset "Test returning decimals from a table functions" begin
con = DBInterface.connect(DuckDB.DB)
arguments::Vector{DataType} = Vector()
DuckDB.create_table_function(con, "my_function", arguments, my_bind_function, my_init_function, my_main_function)
GC.gc()
# 3 elements
results = DBInterface.execute(con, "SELECT * FROM my_function()")
GC.gc()
df = DataFrame(results)
@test names(df) == ["a", "b", "c", "d"]
@test size(df, 1) == 3
@test df.a == [42, 420, 4200]
@test df.b == [4.2, 42, 420]
@test df.c == [0.42, 4.2, 42]
@test df.d == [0.042, 0.42, 4.2]
end

View File

@@ -0,0 +1,190 @@
# test_old_interface.jl
@testset "DB Connection" begin
db = DuckDB.open(":memory:")
con = DuckDB.connect(db)
@test isa(con, DuckDB.Connection)
DuckDB.disconnect(con)
DuckDB.close(db)
end
@testset "Test append DataFrame" begin
# Open the database
db = DuckDB.open(":memory:")
con = DuckDB.connect(db)
# Create the table the data is appended to
DuckDB.execute(
con,
"CREATE TABLE dtypes(bool BOOLEAN, tint TINYINT, sint SMALLINT, int INTEGER, bint BIGINT, utint UTINYINT, usint USMALLINT, uint UINTEGER, ubint UBIGINT, float FLOAT, double DOUBLE, date DATE, time TIME, vchar VARCHAR, nullval INTEGER)"
)
# Create test DataFrame
input_df = DataFrame(
bool = [true, false],
tint = Int8.(1:2),
sint = Int16.(1:2),
int = Int32.(1:2),
bint = Int64.(1:2),
utint = UInt8.(1:2),
usint = UInt16.(1:2),
uint = UInt32.(1:2),
ubint = UInt64.(1:2),
float = Float32.(1:2),
double = Float64.(1:2),
date = [Dates.Date("1970-04-11"), Dates.Date("1970-04-12")],
time = [Dates.Time(0, 0, 0, 100, 0), Dates.Time(0, 0, 0, 200, 0)],
vchar = ["Foo", "Bar"],
nullval = [missing, Int32(2)]
)
# append the DataFrame to the table
DuckDB.appendDataFrame(input_df, con, "dtypes")
# Output the data from the table
output_df = DataFrame(DuckDB.toDataFrame(con, "select * from dtypes;"))
# Compare each column of the input and output dataframe with each other
for (col_pos, input_col) in enumerate(eachcol(input_df))
@test isequal(input_col, output_df[:, col_pos])
end
# Disconnect and close the database
DuckDB.disconnect(con)
DuckDB.close(db)
end
@testset "Test README" begin
db = DuckDB.open(":memory:")
con = DuckDB.connect(db)
res = DuckDB.execute(con, "CREATE TABLE integers(date DATE, jcol INTEGER)")
res = DuckDB.execute(con, "INSERT INTO integers VALUES ('2021-09-27', 4), ('2021-09-28', 6), ('2021-09-29', 8)")
res = DuckDB.execute(con, "SELECT * FROM integers")
df = DataFrame(DuckDB.toDataFrame(res))
@test isa(df, DataFrame)
df = DataFrame(DuckDB.toDataFrame(con, "SELECT * FROM integers"))
println(typeof(df))
@test isa(df, DataFrame)
DuckDB.appendDataFrame(df, con, "integers")
DuckDB.disconnect(con)
DuckDB.close(db)
end
#
@testset "HUGE Int test" begin
db = DuckDB.open(":memory:")
con = DuckDB.connect(db)
res = DuckDB.execute(con, "CREATE TABLE huge(id INTEGER,data HUGEINT);")
res = DuckDB.execute(con, "INSERT INTO huge VALUES (1,NULL), (2, 1761718171), (3, 171661889178);")
res = DuckDB.toDataFrame(con, "SELECT * FROM huge")
DuckDB.disconnect(con)
DuckDB.close(db)
end
@testset "Interval type" begin
db = DuckDB.open(":memory:")
con = DuckDB.connect(db)
res = DuckDB.execute(con, "CREATE TABLE interval(interval INTERVAL);")
res = DuckDB.execute(
con,
"""
INSERT INTO interval VALUES
(INTERVAL 5 HOUR),
(INTERVAL 12 MONTH),
(INTERVAL 12 MICROSECOND),
(INTERVAL 1 YEAR);
"""
)
res = DataFrame(DuckDB.toDataFrame(con, "SELECT * FROM interval;"))
@test isa(res, DataFrame)
DuckDB.disconnect(con)
DuckDB.close(db)
end
@testset "Timestamp" begin
db = DuckDB.open(":memory:")
con = DuckDB.connect(db)
# insert without timezone, display as UTC
res = DuckDB.execute(con, "CREATE TABLE timestamp(timestamp TIMESTAMP , data INTEGER);")
res = DuckDB.execute(
con,
"INSERT INTO timestamp VALUES ('2021-09-27 11:30:00.000', 4), ('2021-09-28 12:30:00.000', 6), ('2021-09-29 13:30:00.000', 8);"
)
res = DuckDB.execute(con, "SELECT * FROM timestamp WHERE timestamp='2021-09-27T11:30:00Z';")
df = DataFrame(res)
@test isequal(df[1, "timestamp"], DateTime(2021, 9, 27, 11, 30, 0))
# insert with timezone, display as UTC
res = DuckDB.execute(con, "CREATE TABLE timestamp1(timestamp TIMESTAMP , data INTEGER);")
res = DuckDB.execute(
con,
"INSERT INTO timestamp1 VALUES ('2021-09-27T10:30:00.000', 4), ('2021-09-28T11:30:00.000', 6), ('2021-09-29T12:30:00.000', 8);"
)
res = DuckDB.execute(con, "SELECT * FROM timestamp1 WHERE timestamp=?;", [DateTime(2021, 9, 27, 10, 30, 0)])
df = DataFrame(res)
@test isequal(df[1, "timestamp"], DateTime(2021, 9, 27, 10, 30, 0))
# query with local datetime, display as UTC
res = DuckDB.execute(con, "SELECT * FROM timestamp1 WHERE timestamp='2021-09-27T10:30:00.000';")
df = DataFrame(res)
@test isequal(df[1, "timestamp"], DateTime(2021, 9, 27, 10, 30, 0))
DuckDB.disconnect(con)
DuckDB.close(db)
end
@testset "TimestampTZ" begin
db = DuckDB.open(":memory:")
con = DuckDB.connect(db)
DuckDB.execute(con, "SET TimeZone='Asia/Shanghai'") # UTC+8
res = DuckDB.execute(con, "SELECT TIMESTAMPTZ '2021-09-27 11:30:00' tz, TIMESTAMP '2021-09-27 11:30:00' ts;")
df = DataFrame(res)
@test isequal(df[1, "tz"], DateTime(2021, 9, 27, 3, 30, 0))
@test isequal(df[1, "ts"], DateTime(2021, 9, 27, 11, 30, 0))
res = DuckDB.execute(con, "CREATE TABLE timestamptz(timestamp TIMESTAMPTZ , data INTEGER);")
res = DuckDB.execute(
con,
"INSERT INTO timestamptz VALUES ('2021-09-27 11:30:00.000', 4), ('2021-09-28 12:30:00.000', 6), ('2021-09-29 13:30:00.000', 8);"
)
res = DuckDB.execute(con, "SELECT * FROM timestamptz WHERE timestamp='2021-09-27 11:30:00'")
df = DataFrame(res)
@test isequal(df[1, "data"], 4)
@test isequal(df[1, "timestamp"], DateTime(2021, 9, 27, 3, 30, 0))
res = DuckDB.execute(con, "SELECT * FROM timestamptz WHERE timestamp='2021-09-27T03:30:00Z'")
df = DataFrame(res)
@test isequal(df[1, "data"], 4)
@test isequal(df[1, "timestamp"], DateTime(2021, 9, 27, 3, 30, 0))
res = DuckDB.execute(con, "SELECT * FROM timestamptz WHERE timestamp='2021-09-27T12:30:00+09'")
df = DataFrame(res)
@test isequal(df[1, "data"], 4)
@test isequal(df[1, "timestamp"], DateTime(2021, 9, 27, 3, 30, 0))
DuckDB.disconnect(con)
DuckDB.close(db)
end
@testset "Items table" begin
db = DuckDB.open(":memory:")
con = DuckDB.connect(db)
res = DuckDB.execute(con, "CREATE TABLE items(item VARCHAR, value DECIMAL(10,2), count INTEGER);")
res = DuckDB.execute(con, "INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2);")
res = DataFrame(DuckDB.toDataFrame(con, "SELECT * FROM items;"))
@test isa(res, DataFrame)
DuckDB.disconnect(con)
end
@testset "Integers and dates table" begin
db = DuckDB.DB()
res = DBInterface.execute(db, "CREATE TABLE integers(date DATE, data INTEGER);")
res =
DBInterface.execute(db, "INSERT INTO integers VALUES ('2021-09-27', 4), ('2021-09-28', 6), ('2021-09-29', 8);")
res = DBInterface.execute(db, "SELECT * FROM integers;")
res = DataFrame(DuckDB.toDataFrame(res))
@test res.date == [Date(2021, 9, 27), Date(2021, 9, 28), Date(2021, 9, 29)]
@test isa(res, DataFrame)
DBInterface.close!(db)
end

View File

@@ -0,0 +1,154 @@
# test_prepare.jl
@testset "Test DBInterface.prepare" begin
con = DBInterface.connect(DuckDB.DB)
DBInterface.execute(con, "CREATE TABLE test_table(i INTEGER, j DOUBLE)")
stmt = DBInterface.prepare(con, "INSERT INTO test_table VALUES(?, ?)")
DBInterface.execute(stmt, [1, 3.5])
DBInterface.execute(stmt, [missing, nothing])
DBInterface.execute(stmt, [2, 0.5])
results = DBInterface.execute(con, "SELECT * FROM test_table")
df = DataFrame(results)
@test isequal(df.i, [1, missing, 2])
@test isequal(df.j, [3.5, missing, 0.5])
# execute many
DBInterface.executemany(stmt, (col1 = [1, 2, 3, 4, 5], col2 = [1, 2, 4, 8, -0.5]))
results = DBInterface.execute(con, "SELECT * FROM test_table")
df = DataFrame(results)
@test isequal(df.i, [1, missing, 2, 1, 2, 3, 4, 5])
@test isequal(df.j, [3.5, missing, 0.5, 1, 2, 4, 8, -0.5])
# can bind vectors to parameters
stmt = DBInterface.prepare(con, "FROM test_table WHERE i IN ?;")
results = DBInterface.execute(stmt, ([1, 2],))
df = DataFrame(results)
@test all(df.i .∈ Ref([1, 2]))
# verify that double-closing does not cause any problems
DBInterface.close!(stmt)
DBInterface.close!(stmt)
DBInterface.close!(con)
DBInterface.close!(con)
end
@testset "Test DBInterface.prepare with various types" begin
con = DBInterface.connect(DuckDB.DB)
type_names = [
"BOOLEAN",
"TINYINT",
"SMALLINT",
"INTEGER",
"BIGINT",
"UTINYINT",
"USMALLINT",
"UINTEGER",
"UBIGINT",
"FLOAT",
"DOUBLE",
"DATE",
"TIME",
"TIMESTAMP",
"VARCHAR",
"INTEGER",
"BLOB"
]
type_values = [
Bool(true),
Int8(3),
Int16(4),
Int32(8),
Int64(20),
UInt8(42),
UInt16(300),
UInt32(420421),
UInt64(43294832),
Float32(0.5),
Float64(0.25),
Date(1992, 9, 20),
Time(23, 10, 33),
DateTime(1992, 9, 20, 23, 10, 33),
String("hello world"),
missing,
rand(UInt8, 100)
]
for i in 1:size(type_values, 1)
stmt = DBInterface.prepare(con, string("SELECT ?::", type_names[i], " a"))
result = DataFrame(DBInterface.execute(stmt, [type_values[i]]))
@test isequal(result.a, [type_values[i]])
end
end
@testset "DBInterface.prepare: named parameters not supported yet" begin
con = DBInterface.connect(DuckDB.DB)
DBInterface.execute(con, "CREATE TABLE test_table(i INTEGER, j DOUBLE)")
@test_throws DuckDB.QueryException DBInterface.prepare(con, "INSERT INTO test_table VALUES(:col1, :col2)")
DBInterface.close!(con)
end
@testset "prepare: Named parameters" begin
con = DBInterface.connect(DuckDB.DB)
DBInterface.execute(con, "CREATE TABLE test_table(i INTEGER, j DOUBLE)")
# Check named syntax with Kwargs and Dict
stmt = DBInterface.prepare(con, raw"INSERT INTO test_table VALUES($col1, $col2)")
DBInterface.execute(stmt, Dict(["col1" => 1, "col2" => 3.5]))
DBInterface.execute(stmt; col1 = 2, col2 = 4.5)
results = DBInterface.execute(con, "SELECT * FROM test_table") |> DataFrame
@test isequal(results.i, [1, 2])
@test isequal(results.j, [3.5, 4.5])
# Check positional syntax
DBInterface.execute(con, "TRUNCATE TABLE test_table")
stmt = DBInterface.prepare(con, raw"INSERT INTO test_table VALUES($2, $1)")
DBInterface.execute(stmt, (3.5, 1))
DBInterface.execute(stmt, (4.5, 2))
results = DBInterface.execute(con, "SELECT * FROM test_table") |> DataFrame
@test isequal(results.i, [1, 2])
@test isequal(results.j, [3.5, 4.5])
DBInterface.close!(con)
end
@testset "DBInterface.prepare: execute many" begin
con = DBInterface.connect(DuckDB.DB)
DBInterface.execute(con, "CREATE TABLE test_table(i INTEGER, j DOUBLE)")
@test_throws DuckDB.QueryException DBInterface.prepare(con, "INSERT INTO test_table VALUES(:col1, :col2)")
stmt = DBInterface.prepare(con, raw"INSERT INTO test_table VALUES($col1, $col2)")
col1 = [1, 2, 3, 4, 5]
col2 = [1, 2, 4, 8, -0.5]
DBInterface.executemany(stmt, (col1 = col1, col2 = col2))
results = DBInterface.execute(con, "SELECT * FROM test_table") |> DataFrame
@test isequal(results.i, col1)
@test isequal(results.j, col2)
DBInterface.close!(con)
end
@testset "DBInterface.prepare: ambiguous parameters" begin
con = DBInterface.connect(DuckDB.DB)
stmt = DBInterface.prepare(con, "SELECT ? AS a")
result = DataFrame(DBInterface.execute(stmt, [42]))
@test isequal(result.a, [42])
result = DataFrame(DBInterface.execute(stmt, ["hello world"]))
@test isequal(result.a, ["hello world"])
result = DataFrame(DBInterface.execute(stmt, [DateTime(1992, 9, 20, 23, 10, 33)]))
@test isequal(result.a, [DateTime(1992, 9, 20, 23, 10, 33)])
end

View File

@@ -0,0 +1,72 @@
# test_replacement_scan.jl
function RangeReplacementScan(info)
table_name = DuckDB.get_table_name(info)
number = tryparse(Int64, table_name)
if number === nothing
return
end
DuckDB.set_function_name(info, "range")
DuckDB.add_function_parameter(info, DuckDB.create_value(number))
return
end
@testset "Test replacement scans" begin
con = DBInterface.connect(DuckDB.DB)
# add a replacement scan that turns any number provided as a table name into range(X)
DuckDB.add_replacement_scan!(con, RangeReplacementScan, nothing)
df = DataFrame(DBInterface.execute(con, "SELECT * FROM \"2\" tbl(a)"))
@test df.a == [0, 1]
# this still fails
@test_throws DuckDB.QueryException DBInterface.execute(con, "SELECT * FROM nonexistant")
DBInterface.close!(con)
end
function RepeatReplacementScan(info)
table_name = DuckDB.get_table_name(info)
splits = split(table_name, "*")
if size(splits, 1) != 2
return
end
number = tryparse(Int64, splits[2])
if number === nothing
return
end
DuckDB.set_function_name(info, "repeat")
DuckDB.add_function_parameter(info, DuckDB.create_value(splits[1]))
DuckDB.add_function_parameter(info, DuckDB.create_value(number))
return
end
@testset "Test string replacement scans" begin
con = DBInterface.connect(DuckDB.DB)
# add a replacement scan that turns any number provided as a table name into range(X)
DuckDB.add_replacement_scan!(con, RepeatReplacementScan, nothing)
df = DataFrame(DBInterface.execute(con, "SELECT * FROM \"hello*2\" tbl(a)"))
@test df.a == ["hello", "hello"]
# this still fails
@test_throws DuckDB.QueryException DBInterface.execute(con, "SELECT * FROM nonexistant")
DBInterface.close!(con)
end
function ErrorReplacementScan(info)
throw("replacement scan eek")
end
@testset "Test error replacement scans" begin
con = DBInterface.connect(DuckDB.DB)
DuckDB.add_replacement_scan!(con, ErrorReplacementScan, nothing)
@test_throws DuckDB.QueryException DBInterface.execute(con, "SELECT * FROM nonexistant")
DBInterface.close!(con)
end

View File

@@ -0,0 +1,436 @@
# test_scalar_udf.jl
# Define a simple scalar UDF that doubles the input value
function my_double_function(
info::DuckDB.duckdb_function_info,
input::DuckDB.duckdb_data_chunk,
output::DuckDB.duckdb_vector
)
# Convert input data chunk to DataChunk object
input_chunk = DuckDB.DataChunk(input, false)
n = DuckDB.get_size(input_chunk)
# Get input vector (assuming one input parameter)
input_vector = DuckDB.get_vector(input_chunk, 1)
input_array = DuckDB.get_array(input_vector, Int64, n)
# Get output vector
output_array = DuckDB.get_array(DuckDB.Vec(output), Int64, n)
# Perform the operation: double each input value
for i in 1:n
output_array[i] = input_array[i] * 2
end
end
# Define a scalar UDF that returns NULL for odd numbers and the number itself for even numbers
function my_null_function(
info::DuckDB.duckdb_function_info,
input::DuckDB.duckdb_data_chunk,
output::DuckDB.duckdb_vector
)
# Convert input data chunk to DataChunk object
input_chunk = DuckDB.DataChunk(input, false)
n = DuckDB.get_size(input_chunk)
# Get input vector
input_vector = DuckDB.get_vector(input_chunk, 1)
input_array = DuckDB.get_array(input_vector, Int64, n)
validity_input = DuckDB.get_validity(input_vector)
# Get output vector
output_vector = DuckDB.Vec(output)
output_array = DuckDB.get_array(output_vector, Int64, n)
validity_output = DuckDB.get_validity(output_vector)
# Perform the operation
for i in 1:n
if DuckDB.isvalid(validity_input, i)
if input_array[i] % 2 == 0
output_array[i] = input_array[i]
# Validity is true by default, no need to set
else
# Set output as NULL
DuckDB.setinvalid(validity_output, i)
end
else
# Input is NULL, set output as NULL
DuckDB.setinvalid(validity_output, i)
end
end
end
# Define a scalar UDF that always throws an error
function my_error_function(
info::DuckDB.duckdb_function_info,
input::DuckDB.duckdb_data_chunk,
output::DuckDB.duckdb_vector
)
throw(ErrorException("Runtime error in scalar function"))
end
function my_string_function_count_a(
info::DuckDB.duckdb_function_info,
input::DuckDB.duckdb_data_chunk,
output::DuckDB.duckdb_vector
)
input_chunk = DuckDB.DataChunk(input, false)
output_vec = DuckDB.Vec(output)
n = DuckDB.get_size(input_chunk)
chunks = [input_chunk]
extra_info_ptr = DuckDB.duckdb_scalar_function_get_extra_info(info)
extra_info::DuckDB.ScalarFunction = unsafe_pointer_to_objref(extra_info_ptr)
conversion_data = DuckDB.ColumnConversionData(chunks, 1, extra_info.logical_parameters[1], nothing)
a_data_converted = DuckDB.DuckDB.convert_column(conversion_data)
output_data = DuckDB.get_array(DuckDB.Vec(output), Int, n)
# # # @info "Values" a_data b_data
for row in 1:n
result = count(x -> x == 'a', a_data_converted[row])
output_data[row] = result
end
return nothing
end
function my_string_function_reverse_concat(
info::DuckDB.duckdb_function_info,
input::DuckDB.duckdb_data_chunk,
output::DuckDB.duckdb_vector
)
input_chunk = DuckDB.DataChunk(input, false)
output_vec = DuckDB.Vec(output)
n = Int64(DuckDB.get_size(input_chunk))
chunks = [input_chunk]
extra_info_ptr = DuckDB.duckdb_scalar_function_get_extra_info(info)
extra_info::DuckDB.ScalarFunction = unsafe_pointer_to_objref(extra_info_ptr)
conversion_data_a = DuckDB.ColumnConversionData(chunks, 1, extra_info.logical_parameters[1], nothing)
conversion_data_b = DuckDB.ColumnConversionData(chunks, 2, extra_info.logical_parameters[2], nothing)
a_data_converted = DuckDB.DuckDB.convert_column(conversion_data_a)
b_data_converted = DuckDB.DuckDB.convert_column(conversion_data_b)
for row in 1:n
result = string(reverse(a_data_converted[row]), b_data_converted[row])
DuckDB.assign_string_element(output_vec, row, result)
end
return nothing
end
@testset "Test custom scalar functions" begin
# Connect to DuckDB
db = DuckDB.DB()
con = DuckDB.connect(db)
# Create the test table
DuckDB.query(con, "CREATE TABLE test_table AS SELECT i FROM range(10) t(i)")
# Define logical type BIGINT
type_bigint = DuckDB.duckdb_create_logical_type(DuckDB.DUCKDB_TYPE_BIGINT)
# Test 1: Double Function
# Create the scalar function
f_double = DuckDB.duckdb_create_scalar_function()
DuckDB.duckdb_scalar_function_set_name(f_double, "double_value")
# Set parameter types
DuckDB.duckdb_scalar_function_add_parameter(f_double, type_bigint)
# Set return type
DuckDB.duckdb_scalar_function_set_return_type(f_double, type_bigint)
# Set the function
CMyDoubleFunction = @cfunction(
my_double_function,
Cvoid,
(DuckDB.duckdb_function_info, DuckDB.duckdb_data_chunk, DuckDB.duckdb_vector)
)
DuckDB.duckdb_scalar_function_set_function(f_double, CMyDoubleFunction)
# Register the function
res = DuckDB.duckdb_register_scalar_function(con.handle, f_double)
@test res == DuckDB.DuckDBSuccess
# Execute the function in a query
results = DuckDB.query(con, "SELECT i, double_value(i) as doubled FROM test_table")
df = DataFrame(results)
@test names(df) == ["i", "doubled"]
@test size(df, 1) == 10
@test df.doubled == df.i .* 2
# Test 2: Null Function
# Create the scalar function
f_null = DuckDB.duckdb_create_scalar_function()
DuckDB.duckdb_scalar_function_set_name(f_null, "null_if_odd")
# Set parameter types
DuckDB.duckdb_scalar_function_add_parameter(f_null, type_bigint)
# Set return type
DuckDB.duckdb_scalar_function_set_return_type(f_null, type_bigint)
# Set the function
CMyNullFunction = @cfunction(
my_null_function,
Cvoid,
(DuckDB.duckdb_function_info, DuckDB.duckdb_data_chunk, DuckDB.duckdb_vector)
)
DuckDB.duckdb_scalar_function_set_function(f_null, CMyNullFunction)
# Register the function
res_null = DuckDB.duckdb_register_scalar_function(con.handle, f_null)
@test res_null == DuckDB.DuckDBSuccess
# Execute the function in a query
results_null = DuckDB.query(con, "SELECT i, null_if_odd(i) as value_or_null FROM test_table")
df_null = DataFrame(results_null)
@test names(df_null) == ["i", "value_or_null"]
@test size(df_null, 1) == 10
expected_values = Vector{Union{Missing, Int64}}(undef, 10)
for idx in 1:10
i = idx - 1 # Since i ranges from 0 to 9
if i % 2 == 0
expected_values[idx] = i
else
expected_values[idx] = missing
end
end
@test all(df_null.value_or_null .=== expected_values)
# Adjusted Test 3: Error Function
# Create the scalar function
f_error = DuckDB.duckdb_create_scalar_function()
DuckDB.duckdb_scalar_function_set_name(f_error, "error_function")
# Set parameter types
DuckDB.duckdb_scalar_function_add_parameter(f_error, type_bigint)
# Set return type
DuckDB.duckdb_scalar_function_set_return_type(f_error, type_bigint)
# Set the function
CMyErrorFunction = @cfunction(
my_error_function,
Cvoid,
(DuckDB.duckdb_function_info, DuckDB.duckdb_data_chunk, DuckDB.duckdb_vector)
)
DuckDB.duckdb_scalar_function_set_function(f_error, CMyErrorFunction)
# Register the function
res_error = DuckDB.duckdb_register_scalar_function(con.handle, f_error)
@test res_error == DuckDB.DuckDBSuccess
# Adjusted test to expect ErrorException
@test_throws ErrorException DuckDB.query(con, "SELECT error_function(i) FROM test_table")
# Clean up logical type
DuckDB.duckdb_destroy_logical_type(type_bigint)
# Disconnect and close
DuckDB.disconnect(con)
DuckDB.close(db)
end
mysum(a, b) = a + b # Dummy function
my_reverse(s) = string(reverse(s))
@testset "UDF_Macro" begin
# Parse Expression
expr = :(mysum(a::Int, b::String)::Int)
func_name, func_params, return_value = DuckDB._udf_parse_function_expr(expr)
@test func_name == :mysum
@test func_params == [(:a, :Int), (:b, :String)]
@test return_value == :Int
# Build expressions
var_names, expressions =
DuckDB._udf_generate_conversion_expressions(func_params, :log_types, :convert, :param, :chunk)
@test var_names == [:param_1, :param_2]
@test expressions[1] == :(param_1 = convert(Int, log_types[1], chunk, 1))
@test expressions[2] == :(param_2 = convert(String, log_types[2], chunk, 2))
# Generate UDF
db = DuckDB.DB()
con = DuckDB.connect(db)
fun = DuckDB.@create_scalar_function mysum(a::Int, b::Int)::Int
#ptr = @cfunction(fun.wrapper, Cvoid, (Ptr{Cvoid},Ptr{Cvoid},Ptr{Cvoid}))
#ptr = pointer_from_objref(mysum_udf.wrapper)
#DuckDB.duckdb_scalar_function_set_function(mysum_udf.handle, ptr)
DuckDB.register_scalar_function(con, fun) # Register UDF
@test_throws ArgumentError DuckDB.register_scalar_function(con, fun) # Register UDF twice
DuckDB.execute(con, "CREATE TABLE test1 (a INT, b INT);")
DuckDB.execute(con, "INSERT INTO test1 VALUES ('1', '2'), ('3','4'), ('5', '6')")
result = DuckDB.execute(con, "SELECT mysum(a, b) as result FROM test1") |> DataFrame
@test result.result == [3, 7, 11]
end
@testset "UDF Macro Various Types" begin
import Dates
db = DuckDB.DB()
con = DuckDB.connect(db)
my_reverse_inner = (s) -> ("Inner:" * string(reverse(s)))
fun_is_weekend = (d) -> Dates.dayofweek(d) in (6, 7)
date_2020 = (x) -> Dates.Date(2020, 1, 1) + Dates.Day(x) # Dummy function
my_and(a, b) = a && b
my_int_add(a, b) = a + b
my_mixed_add(a::Int, b::Float64) = a + b
df_numbers =
DataFrame(a = rand(1:100, 30), b = rand(1:100, 30), c = rand(30), d = rand(Bool, 30), e = rand(Bool, 30))
df_strings = DataFrame(a = ["hello", "world", "julia", "duckdb", "🦆DB"])
t = Date(2020, 1, 1):Day(1):Date(2020, 12, 31)
df_dates = DataFrame(t = t, k = 1:length(t), is_weekend = fun_is_weekend.(t))
DuckDB.register_table(con, df_strings, "test_strings")
DuckDB.register_table(con, df_dates, "test_dates")
DuckDB.register_table(con, df_numbers, "test_numbers")
# Register UDFs
fun_string = DuckDB.@create_scalar_function my_reverse(s::String)::String (s) -> my_reverse_inner(s)
DuckDB.register_scalar_function(con, fun_string) # Register UDF
fun_date = DuckDB.@create_scalar_function is_weekend(d::Date)::Bool fun_is_weekend
fun_date2 = DuckDB.@create_scalar_function date_2020(x::Int)::Date date_2020
DuckDB.register_scalar_function(con, fun_date) # Register UDF
DuckDB.register_scalar_function(con, fun_date2) # Register UDF
fun_and = DuckDB.@create_scalar_function my_and(a::Bool, b::Bool)::Bool my_and
fun_int_add = DuckDB.@create_scalar_function my_int_add(a::Int, b::Int)::Int my_int_add
fun_mixed_add = DuckDB.@create_scalar_function my_mixed_add(a::Int, b::Float64)::Float64 my_mixed_add
DuckDB.register_scalar_function(con, fun_and)
DuckDB.register_scalar_function(con, fun_int_add)
DuckDB.register_scalar_function(con, fun_mixed_add)
result1 = DuckDB.execute(con, "SELECT my_reverse(a) as result FROM test_strings") |> DataFrame
@test result1.result == my_reverse_inner.(df_strings.a)
result2_1 = DuckDB.execute(con, "SELECT is_weekend(t) as result FROM test_dates") |> DataFrame
@test result2_1.result == fun_is_weekend.(df_dates.t)
result2_2 = DuckDB.execute(con, "SELECT date_2020(k) as result FROM test_dates") |> DataFrame
@test result2_2.result == date_2020.(df_dates.k)
result3 = DuckDB.execute(con, "SELECT my_and(d, e) as result FROM test_numbers") |> DataFrame
@test result3.result == my_and.(df_numbers.d, df_numbers.e)
result4 = DuckDB.execute(con, "SELECT my_int_add(a, b) as result FROM test_numbers") |> DataFrame
@test result4.result == my_int_add.(df_numbers.a, df_numbers.b)
result5 = DuckDB.execute(con, "SELECT my_mixed_add(a, c) as result FROM test_numbers") |> DataFrame
@test result5.result == my_mixed_add.(df_numbers.a, df_numbers.c)
end
@testset "UDF Macro Exception" begin
f_error = function (a)
if iseven(a)
throw(ArgumentError("Even number"))
else
return a + 1
end
end
db = DuckDB.DB()
con = DuckDB.connect(db)
fun_error = DuckDB.@create_scalar_function f_error(a::Int)::Int f_error
DuckDB.register_scalar_function(con, fun_error) # Register UDF
df = DataFrame(a = 1:10)
DuckDB.register_table(con, df, "test1")
@test_throws Exception result = DuckDB.execute(con, "SELECT f_error(a) as result FROM test1") |> DataFrame
end
@testset "UDF Macro Missing Values" begin
f_add = (a, b) -> a + b
db = DuckDB.DB()
con = DuckDB.connect(db)
fun = DuckDB.@create_scalar_function f_add(a::Int, b::Int)::Int f_add
DuckDB.register_scalar_function(con, fun)
df = DataFrame(a = [1, missing, 3], b = [missing, 2, 3])
DuckDB.register_table(con, df, "test1")
result = DuckDB.execute(con, "SELECT f_add(a, b) as result FROM test1") |> DataFrame
@test isequal(result.result, [missing, missing, 6])
end
@testset "UDF Macro Benchmark" begin
# Check if the generated UDF is comparable to pure Julia or DuckDB expressions
#
# Currently UDFs takes about as much time as Julia/DuckDB expressions
# - The evaluation of the wrapper takes around 20% of the execution time
# - slow calls are setindex! and getindex
# - table_scan_func is the slowest call
db = DuckDB.DB()
con = DuckDB.connect(db)
fun_int = DuckDB.@create_scalar_function mysum(a::Int, b::Int)::Int
fun_float = DuckDB.@create_scalar_function mysum_f(a::Float64, b::Float64)::Float64 mysum
DuckDB.register_scalar_function(con, fun_int) # Register UDF
DuckDB.register_scalar_function(con, fun_float) # Register UDF
N = 10_000_000
df = DataFrame(a = 1:N, b = 1:N, c = rand(N), d = rand(N))
DuckDB.register_table(con, df, "test1")
# Precompile functions
precompile(mysum, (Int, Int))
precompile(mysum, (Float64, Float64))
DuckDB.execute(con, "SELECT mysum(a, b) as result FROM test1")
DuckDB.execute(con, "SELECT mysum_f(c, d) as result FROM test1")
# INTEGER Benchmark
t1 = @elapsed result_exp = df.a .+ df.b
t2 = @elapsed result = DuckDB.execute(con, "SELECT mysum(a, b) as result FROM test1")
t3 = @elapsed result2 = DuckDB.execute(con, "SELECT a + b as result FROM test1")
@test DataFrame(result).result == result_exp
# Prints:
# Benchmark Int: Julia Expression: 0.092947083, UDF: 0.078665125, DDB: 0.065306042
@info "Benchmark Int: Julia Expression: $t1, UDF: $t2, DDB: $t3"
# FLOAT Benchmark
t1 = @elapsed result_exp = df.c .+ df.d
t2 = @elapsed result = DuckDB.execute(con, "SELECT mysum_f(c, d) as result FROM test1")
t3 = @elapsed result2 = DuckDB.execute(con, "SELECT c + d as result FROM test1")
@test DataFrame(result).result result_exp atol = 1e-6
# Prints:
# Benchmark Float: Julia Expression: 0.090409625, UDF: 0.080781, DDB: 0.054156167
@info "Benchmark Float: Julia Expression: $t1, UDF: $t2, DDB: $t3"
end

View File

@@ -0,0 +1,327 @@
# test_sqlite.jl
# tests adopted from SQLite.jl
using Tables
function setup_clean_test_db(f::Function, args...)
tables = [
"album",
"artist",
"customer",
"employee",
"genre",
"invoice",
"invoiceline",
"mediatype",
"playlist",
"playlisttrack",
"track"
]
con = DBInterface.connect(DuckDB.DB)
datadir = joinpath(@__DIR__, "../data")
for table in tables
DBInterface.execute(con, "CREATE TABLE $table AS SELECT * FROM '$datadir/$table.parquet'")
end
try
f(con)
finally
close(con)
end
end
@testset "DB Connection" begin
con = DBInterface.connect(DuckDB.DB)
@test con isa DuckDB.DB
DBInterface.close!(con)
end
@testset "Issue #207: 32 bit integers" begin
setup_clean_test_db() do db
ds = DBInterface.execute(db, "SELECT 42::INT64 a FROM Track LIMIT 1") |> columntable
@test ds.a[1] isa Int64
end
end
@testset "Regular DuckDB Tests" begin
setup_clean_test_db() do db
@test_throws DuckDB.QueryException DBInterface.execute(db, "just some syntax error")
# syntax correct, table missing
@test_throws DuckDB.QueryException DBInterface.execute(
db,
"SELECT name FROM sqlite_nomaster WHERE type='table';"
)
end
end
@testset "close!(query)" begin
setup_clean_test_db() do db
qry = DBInterface.execute(db, "SELECT name FROM sqlite_master WHERE type='table';")
DBInterface.close!(qry)
return DBInterface.close!(qry) # test it doesn't throw on double-close
end
end
@testset "Query tables" begin
setup_clean_test_db() do db
ds = DBInterface.execute(db, "SELECT name FROM sqlite_master WHERE type='table';") |> columntable
@test length(ds) == 1
@test keys(ds) == (:name,)
@test length(ds.name) == 11
end
end
@testset "DBInterface.execute([f])" begin
setup_clean_test_db() do db
# pipe approach
results = DBInterface.execute(db, "SELECT * FROM Employee;") |> columntable
@test length(results) == 15
@test length(results[1]) == 8
# callable approach
@test isequal(DBInterface.execute(columntable, db, "SELECT * FROM Employee"), results)
employees_stmt = DBInterface.prepare(db, "SELECT * FROM Employee")
@test isequal(columntable(DBInterface.execute(employees_stmt)), results)
@test isequal(DBInterface.execute(columntable, employees_stmt), results)
@testset "throwing from f()" begin
f(::DuckDB.QueryResult) = error("I'm throwing!")
@test_throws ErrorException DBInterface.execute(f, employees_stmt)
@test_throws ErrorException DBInterface.execute(f, db, "SELECT * FROM Employee")
end
return DBInterface.close!(employees_stmt)
end
end
@testset "isempty(::Query)" begin
setup_clean_test_db() do db
@test !DBInterface.execute(isempty, db, "SELECT * FROM Employee")
@test DBInterface.execute(isempty, db, "SELECT * FROM Employee WHERE FirstName='Joanne'")
end
end
@testset "empty query has correct schema and return type" begin
setup_clean_test_db() do db
empty_scheme = DBInterface.execute(Tables.schema, db, "SELECT * FROM Employee WHERE FirstName='Joanne'")
all_scheme = DBInterface.execute(Tables.schema, db, "SELECT * FROM Employee WHERE FirstName='Joanne'")
@test empty_scheme.names == all_scheme.names
@test all(ea -> ea[1] <: ea[2], zip(empty_scheme.types, all_scheme.types))
empty_tbl = DBInterface.execute(columntable, db, "SELECT * FROM Employee WHERE FirstName='Joanne'")
all_tbl = DBInterface.execute(columntable, db, "SELECT * FROM Employee")
@test propertynames(empty_tbl) == propertynames(all_tbl)
end
end
@testset "Create table, run commit/rollback tests" begin
setup_clean_test_db() do db
DBInterface.execute(db, "create table temp as select * from album")
DBInterface.execute(db, "alter table temp add column colyear int")
DBInterface.execute(db, "update temp set colyear = 2014")
r = DBInterface.execute(db, "select * from temp limit 10") |> columntable
@test length(r) == 4 && length(r[1]) == 10
@test all(==(2014), r[4])
@test_throws DuckDB.QueryException DuckDB.rollback(db)
@test_throws DuckDB.QueryException DuckDB.commit(db)
DuckDB.transaction(db)
DBInterface.execute(db, "update temp set colyear = 2015")
DuckDB.rollback(db)
r = DBInterface.execute(db, "select * from temp limit 10") |> columntable
@test all(==(2014), r[4])
DuckDB.transaction(db)
DBInterface.execute(db, "update temp set colyear = 2015")
DuckDB.commit(db)
r = DBInterface.execute(db, "select * from temp limit 10") |> columntable
@test all(==(2015), r[4])
end
end
@testset "Dates" begin
setup_clean_test_db() do db
DBInterface.execute(db, "create table temp as select * from album")
DBInterface.execute(db, "alter table temp add column dates date")
stmt = DBInterface.prepare(db, "update temp set dates = ?")
DBInterface.execute(stmt, (Date(2014, 1, 1),))
r = DBInterface.execute(db, "select * from temp limit 10") |> columntable
@test length(r) == 4 && length(r[1]) == 10
@test isa(r[4][1], Date)
@test all(Bool[x == Date(2014, 1, 1) for x in r[4]])
return DBInterface.execute(db, "drop table temp")
end
end
@testset "Prepared Statements" begin
setup_clean_test_db() do db
DBInterface.execute(db, "CREATE TABLE temp AS SELECT * FROM Album")
r = DBInterface.execute(db, "SELECT * FROM temp LIMIT ?", [3]) |> columntable
@test length(r) == 3 && length(r[1]) == 3
r = DBInterface.execute(db, "SELECT * FROM temp WHERE Title ILIKE ?", ["%time%"]) |> columntable
@test r[1] == [76, 111, 187]
DBInterface.execute(db, "INSERT INTO temp VALUES (?1, ?3, ?2)", [0, 0, "Test Album"])
r = DBInterface.execute(db, "SELECT * FROM temp WHERE AlbumId = 0") |> columntable
@test r[1][1] == 0
@test r[2][1] == "Test Album"
@test r[3][1] == 0
DuckDB.drop!(db, "temp")
DBInterface.execute(db, "CREATE TABLE temp AS SELECT * FROM Album")
# FIXME Does it make sense to use named parameters here?
r = DBInterface.execute(db, "SELECT * FROM temp LIMIT ?", (a = 3,)) |> columntable
@test length(r) == 3 && length(r[1]) == 3
r = DBInterface.execute(db, "SELECT * FROM temp LIMIT ?", a = 3) |> columntable
@test length(r) == 3 && length(r[1]) == 3
r = DBInterface.execute(db, "SELECT * FROM temp WHERE Title ILIKE ?", (word = "%time%",)) |> columntable
@test r[1] == [76, 111, 187]
# FIXME: these are supposed to be named parameter tests, but we don't support that yet
DBInterface.execute(db, "INSERT INTO temp VALUES (?, ?, ?)", (lid = 0, title = "Test Album", rid = 1))
DBInterface.execute(db, "INSERT INTO temp VALUES (?, ?, ?)", lid = 400, title = "Test2 Album", rid = 3)
r = DBInterface.execute(db, "SELECT * FROM temp WHERE AlbumId IN (0, 400)") |> columntable
@test r[1] == [0, 400]
@test r[2] == ["Test Album", "Test2 Album"]
@test r[3] == [1, 3]
return DuckDB.drop!(db, "temp")
end
end
@testset "DuckDB to Julia type conversion" begin
binddb = DBInterface.connect(DuckDB.DB)
DBInterface.execute(
binddb,
"CREATE TABLE temp (n INTEGER, i1 INT, i2 integer,
f1 REAL, f2 FLOAT, f3 DOUBLE,
s1 TEXT, s2 CHAR(10), s3 VARCHAR(15), s4 NVARCHAR(5),
d1 DATETIME, ts1 TIMESTAMP)"
)
DBInterface.execute(
binddb,
"INSERT INTO temp VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
[
missing,
Int64(6),
Int64(4),
6.4,
6.3,
Int64(7),
"some long text",
"short text",
"another text",
"short",
"2021-02-21",
"2021-02-12 12:01:32"
]
)
rr = DBInterface.execute(rowtable, binddb, "SELECT * FROM temp")
@test length(rr) == 1
r = first(rr)
@test typeof.(Tuple(r)) ==
(Missing, Int32, Int32, Float32, Float32, Float64, String, String, String, String, DateTime, DateTime)
# Issue #4809: Concrete `String` types.
# Want to test exactly the types `execute` returns, so check the schema directly and
# avoid calling `Tuple` or anything else that would narrow the types in the result.
schema = Tables.schema(rr)
@test nonmissingtype.(schema.types) ==
(Int32, Int32, Int32, Float32, Float32, Float64, String, String, String, String, DateTime, DateTime)
end
@testset "Issue #158: Missing DB File" begin
@test_throws DuckDB.ConnectionException DuckDB.DB("nonexistentdir/not_there.db")
end
@testset "Issue #180, Query" begin
param = "Hello!"
query = DBInterface.execute(DuckDB.DB(), "SELECT ?1 UNION ALL SELECT ?1", [param])
param = "x"
for row in query
@test row[1] == "Hello!"
GC.gc() # this must NOT garbage collect the "Hello!" bound value
end
db = DBInterface.connect(DuckDB.DB)
DBInterface.execute(db, "CREATE TABLE T (a TEXT, PRIMARY KEY (a))")
q = DBInterface.prepare(db, "INSERT INTO T VALUES(?)")
DBInterface.execute(q, ["a"])
@test_throws DuckDB.QueryException DBInterface.execute(q, [1, "a"])
end
@testset "show(DB)" begin
io = IOBuffer()
db = DuckDB.DB()
show(io, db)
@test String(take!(io)) == "DuckDB.DB(\":memory:\")"
DBInterface.close!(db)
end
@testset "DuckDB.execute()" begin
db = DBInterface.connect(DuckDB.DB)
DBInterface.execute(db, "CREATE TABLE T (x INT UNIQUE)")
q = DBInterface.prepare(db, "INSERT INTO T VALUES(?)")
DuckDB.execute(q, (1,))
r = DBInterface.execute(db, "SELECT * FROM T") |> columntable
@test r[1] == [1]
DuckDB.execute(q, [2])
r = DBInterface.execute(db, "SELECT * FROM T") |> columntable
@test r[1] == [1, 2]
q = DBInterface.prepare(db, "INSERT INTO T VALUES(?)")
DuckDB.execute(q, [3])
r = DBInterface.execute(columntable, db, "SELECT * FROM T")
@test r[1] == [1, 2, 3]
DuckDB.execute(q, [4])
r = DBInterface.execute(columntable, db, "SELECT * FROM T")
@test r[1] == [1, 2, 3, 4]
DuckDB.execute(db, "INSERT INTO T VALUES(?)", [5])
r = DBInterface.execute(columntable, db, "SELECT * FROM T")
@test r[1] == [1, 2, 3, 4, 5]
r = DBInterface.execute(db, strip(" SELECT * FROM T ")) |> columntable
@test r[1] == [1, 2, 3, 4, 5]
r = DBInterface.execute(db, "SELECT * FROM T")
@test Tables.istable(r)
@test Tables.rowaccess(r)
@test Tables.rows(r) === r
@test Base.IteratorSize(typeof(r)) == Base.SizeUnknown()
row = first(r)
end
@testset "last_insert_rowid unsupported" begin
db = DBInterface.connect(DuckDB.DB)
@test_throws DuckDB.NotImplementedException DBInterface.lastrowid(db)
@test DuckDB.esc_id(["1", "2", "3"]) == "\"1\",\"2\",\"3\""
end
@testset "Escaping" begin
@test DuckDB.esc_id(["1", "2", "3"]) == "\"1\",\"2\",\"3\""
end
@testset "Issue #253: Ensure query column names are unique by default" begin
db = DuckDB.DB()
res = DBInterface.execute(db, "select 1 as x2, 2 as x2, 3 as x2, 4 as x2_2") |> columntable
@test res == (x2 = [1], x2_1 = [2], x2_2 = [3], x2_2_1 = [4])
end
@testset "drop!() table name escaping" begin
db = DuckDB.DB()
DBInterface.execute(db, "CREATE TABLE \"escape 10.0%\"(i INTEGER)")
# table exists
DBInterface.execute(db, "SELECT * FROM \"escape 10.0%\"")
# drop the table
DuckDB.drop!(db, "escape 10.0%")
# it should no longer exist
@test_throws DuckDB.QueryException DBInterface.execute(db, "SELECT * FROM \"escape 10.0%\"")
end

View File

@@ -0,0 +1,96 @@
# test_stream_data_chunk.jl
@testset "Test streaming result sets" begin
result_types::Vector = Vector()
push!(result_types, DuckDB.MaterializedResult)
push!(result_types, DuckDB.StreamResult)
for result_type in result_types
con = DBInterface.connect(DuckDB.DB)
res = DBInterface.execute(con, "SELECT * FROM range(10000) t(i)", result_type)
@test res.names == [:i]
@test res.types == [Union{Missing, Int64}]
# loop over the chunks and perform a sum + count
sum::Int64 = 0
total_count::Int64 = 0
while true
# fetch the next chunk
chunk = DuckDB.nextDataChunk(res)
if chunk === missing
# consumed all chunks
break
end
# read the data of this chunk
count = DuckDB.get_size(chunk)
data = DuckDB.get_array(chunk, 1, Int64)
for i in 1:count
sum += data[i]
end
total_count += count
DuckDB.destroy_data_chunk(chunk)
end
@test sum == 49995000
@test total_count == 10000
end
GC.gc(true)
end
@testset "Test giant streaming result" begin
# this would take forever if it wasn't streaming
con = DBInterface.connect(DuckDB.DB)
res = DBInterface.execute(con, "SELECT * FROM range(1000000000000) t(i)", DuckDB.StreamResult)
@test res.names == [:i]
@test res.types == [Union{Missing, Int64}]
# fetch the first three chunks
for i in 1:3
chunk = DuckDB.nextDataChunk(res)
@test chunk !== missing
DuckDB.destroy_data_chunk(chunk)
end
DBInterface.close!(res)
DBInterface.close!(con)
GC.gc(true)
end
@testset "Test streaming data chunk destruction" begin
paths = ["types_map.parquet", "types_list.parquet", "types_nested.parquet"]
for path in paths
# DuckDB "in memory database"
connection = DBInterface.connect(DuckDB.DB)
statement = DuckDB.Stmt(connection, "SELECT * FROM read_parquet(?, file_row_number=1)", DuckDB.StreamResult)
result = DBInterface.execute(statement, [joinpath(@__DIR__, "resources", path)])
num_columns = length(result.types)
while true
chunk = DuckDB.nextDataChunk(result)
chunk === missing && break # are we done?
num_rows = DuckDB.get_size(chunk) # number of rows in the retrieved chunk
row_ids = DuckDB.get_array(chunk, num_columns, Int64)
# move over each column, last column are the row_ids
for column_idx in 1:(num_columns - 1)
column_name::Symbol = result.names[column_idx]
# Convert from the DuckDB internal types into Julia types
duckdb_logical_type = DuckDB.LogicalType(DuckDB.duckdb_column_logical_type(result.handle, column_idx))
duckdb_conversion_state = DuckDB.ColumnConversionData([chunk], column_idx, duckdb_logical_type, nothing)
duckdb_data = DuckDB.convert_column(duckdb_conversion_state)
for i in 1:num_rows
row_id = row_ids[i] + 1 # julia indices start at 1
value = duckdb_data[i]
@test value !== missing
end
end
DuckDB.destroy_data_chunk(chunk)
end
close(connection)
end
GC.gc(true)
end

View File

@@ -0,0 +1,223 @@
# test_table_function.jl
struct MyBindStruct
count::Int64
function MyBindStruct(count::Int64)
return new(count)
end
end
function my_bind_function(info::DuckDB.BindInfo)
DuckDB.add_result_column(info, "forty_two", Int64)
parameter = DuckDB.get_parameter(info, 0)
number = DuckDB.getvalue(parameter, Int64)
return MyBindStruct(number)
end
mutable struct MyInitStruct
pos::Int64
function MyInitStruct()
return new(0)
end
end
function my_init_function(info::DuckDB.InitInfo)
return MyInitStruct()
end
function my_main_function_print(info::DuckDB.FunctionInfo, output::DuckDB.DataChunk)
bind_info = DuckDB.get_bind_info(info, MyBindStruct)
init_info = DuckDB.get_init_info(info, MyInitStruct)
result_array = DuckDB.get_array(output, 1, Int64)
count = 0
for i in 1:(DuckDB.VECTOR_SIZE)
if init_info.pos >= bind_info.count
break
end
result_array[count + 1] = init_info.pos % 2 == 0 ? 42 : 84
# We print within the table function to test behavior with synchronous API calls in Julia table functions
println(result_array[count + 1])
count += 1
init_info.pos += 1
end
DuckDB.set_size(output, count)
return
end
function my_main_function(info::DuckDB.FunctionInfo, output::DuckDB.DataChunk)
bind_info = DuckDB.get_bind_info(info, MyBindStruct)
init_info = DuckDB.get_init_info(info, MyInitStruct)
result_array = DuckDB.get_array(output, 1, Int64)
count = 0
for i in 1:(DuckDB.VECTOR_SIZE)
if init_info.pos >= bind_info.count
break
end
result_array[count + 1] = init_info.pos % 2 == 0 ? 42 : 84
count += 1
init_info.pos += 1
end
DuckDB.set_size(output, count)
return
end
function my_main_function_nulls(info::DuckDB.FunctionInfo, output::DuckDB.DataChunk)
bind_info = DuckDB.get_bind_info(info, MyBindStruct)
init_info = DuckDB.get_init_info(info, MyInitStruct)
result_array = DuckDB.get_array(output, 1, Int64)
validity = DuckDB.get_validity(output, 1)
count = 0
for i in 1:(DuckDB.VECTOR_SIZE)
if init_info.pos >= bind_info.count
break
end
if init_info.pos % 2 == 0
result_array[count + 1] = 42
else
DuckDB.setinvalid(validity, count + 1)
end
count += 1
init_info.pos += 1
end
DuckDB.set_size(output, count)
return
end
@testset "Test custom table functions that produce IO" begin
con = DBInterface.connect(DuckDB.DB)
DuckDB.create_table_function(
con,
"forty_two_print",
[Int64],
my_bind_function,
my_init_function,
my_main_function_print
)
GC.gc()
# 3 elements
results = DBInterface.execute(con, "SELECT * FROM forty_two_print(3)")
GC.gc()
df = DataFrame(results)
@test names(df) == ["forty_two"]
@test size(df, 1) == 3
@test df.forty_two == [42, 84, 42]
# > vsize elements
results = DBInterface.execute(con, "SELECT COUNT(*) cnt FROM forty_two_print(10000)")
GC.gc()
df = DataFrame(results)
@test df.cnt == [10000]
# @time begin
# results = DBInterface.execute(con, "SELECT SUM(forty_two) cnt FROM forty_two(10000000)")
# end
# df = DataFrame(results)
# println(df)
end
@testset "Test custom table functions" begin
con = DBInterface.connect(DuckDB.DB)
DuckDB.create_table_function(con, "forty_two", [Int64], my_bind_function, my_init_function, my_main_function)
GC.gc()
# 3 elements
results = DBInterface.execute(con, "SELECT * FROM forty_two(3)")
GC.gc()
df = DataFrame(results)
@test names(df) == ["forty_two"]
@test size(df, 1) == 3
@test df.forty_two == [42, 84, 42]
# > vsize elements
results = DBInterface.execute(con, "SELECT COUNT(*) cnt FROM forty_two(10000)")
GC.gc()
df = DataFrame(results)
@test df.cnt == [10000]
# @time begin
# results = DBInterface.execute(con, "SELECT SUM(forty_two) cnt FROM forty_two(10000000)")
# end
# df = DataFrame(results)
# println(df)
# return null values from a table function
DuckDB.create_table_function(
con,
"forty_two_nulls",
[Int64],
my_bind_function,
my_init_function,
my_main_function_nulls
)
results = DBInterface.execute(con, "SELECT COUNT(*) total_cnt, COUNT(forty_two) cnt FROM forty_two_nulls(10000)")
df = DataFrame(results)
@test df.total_cnt == [10000]
@test df.cnt == [5000]
# @time begin
# results = DBInterface.execute(con, "SELECT SUM(forty_two) cnt FROM forty_two_nulls(10000000)")
# end
# df = DataFrame(results)
# println(df)
end
function my_bind_error_function(info::DuckDB.BindInfo)
throw("bind error")
end
function my_init_error_function(info::DuckDB.InitInfo)
throw("init error")
end
function my_main_error_function(info::DuckDB.FunctionInfo, output::DuckDB.DataChunk)
throw("runtime error")
end
@testset "Test table function errors" begin
con = DBInterface.connect(DuckDB.DB)
DuckDB.create_table_function(
con,
"bind_error_function",
[Int64],
my_bind_error_function,
my_init_function,
my_main_function
)
DuckDB.create_table_function(
con,
"init_error_function",
[Int64],
my_bind_function,
my_init_error_function,
my_main_function
)
DuckDB.create_table_function(
con,
"main_error_function",
[Int64],
my_bind_function,
my_init_function,
my_main_error_function
)
@test_throws DuckDB.QueryException DBInterface.execute(con, "SELECT * FROM bind_error_function(3)")
@test_throws DuckDB.QueryException DBInterface.execute(con, "SELECT * FROM init_error_function(3)")
@test_throws DuckDB.QueryException DBInterface.execute(con, "SELECT * FROM main_error_function(3)")
end

View File

@@ -0,0 +1,328 @@
# test_tbl_scan.jl
@testset "Test standard DataFrame scan" begin
con = DBInterface.connect(DuckDB.DB)
df = DataFrame(a = [1, 2, 3], b = [42, 84, 42])
DuckDB.register_table(con, df, "my_df")
GC.gc()
results = DBInterface.execute(con, "SELECT * FROM my_df")
GC.gc()
df = DataFrame(results)
@test names(df) == ["a", "b"]
@test size(df, 1) == 3
@test df.a == [1, 2, 3]
@test df.b == [42, 84, 42]
DBInterface.close!(con)
end
@testset "Test standard table scan" begin
df = (a = [1, 2, 3], b = [42, 84, 42])
for df in [df, Tables.rowtable(df)]
con = DBInterface.connect(DuckDB.DB)
DuckDB.register_table(con, df, "my_df")
GC.gc()
results = DBInterface.execute(con, "SELECT * FROM my_df")
GC.gc()
df = columntable(results)
@test Tables.columnnames(df) == (:a, :b)
@test Tables.rowcount(df) == 3
@test df.a == [1, 2, 3]
@test df.b == [42, 84, 42]
DBInterface.close!(con)
end
end
@testset "Test DataFrame scan with NULL values" begin
con = DBInterface.connect(DuckDB.DB)
df = DataFrame(a = [1, missing, 3], b = [missing, 84, missing])
DuckDB.register_table(con, df, "my_df")
results = DBInterface.execute(con, "SELECT * FROM my_df")
df = DataFrame(results)
@test names(df) == ["a", "b"]
@test size(df, 1) == 3
@test isequal(df.a, [1, missing, 3])
@test isequal(df.b, [missing, 84, missing])
DBInterface.close!(con)
end
@testset "Test table scan with NULL values" begin
df = (a = [1, missing, 3], b = [missing, 84, missing])
for df in [df, Tables.rowtable(df)]
con = DBInterface.connect(DuckDB.DB)
DuckDB.register_table(con, df, "my_df")
results = DBInterface.execute(con, "SELECT * FROM my_df")
df = columntable(results)
@test Tables.columnnames(df) == (:a, :b)
@test Tables.rowcount(df) == 3
@test isequal(df.a, [1, missing, 3])
@test isequal(df.b, [missing, 84, missing])
DBInterface.close!(con)
end
end
@testset "Test DataFrame scan with numerics" begin
con = DBInterface.connect(DuckDB.DB)
numeric_types = [Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64, Float32, Float64]
for type in numeric_types
my_df = DataFrame(a = [1, missing, 3], b = [missing, 84, missing])
my_df[!, :a] = convert.(Union{type, Missing}, my_df[!, :a])
my_df[!, :b] = convert.(Union{type, Missing}, my_df[!, :b])
DuckDB.register_table(con, my_df, "my_df")
results = DBInterface.execute(con, "SELECT * FROM my_df")
df = DataFrame(results)
@test isequal(df, my_df)
end
DBInterface.close!(con)
end
@testset "Test table scan with numerics" begin
for tblf in [Tables.columntable, Tables.rowtable]
con = DBInterface.connect(DuckDB.DB)
numeric_types = [Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64, Float32, Float64]
for type in numeric_types
my_df = (a = [1, missing, 3], b = [missing, 84, missing])
my_df = map(my_df) do col
return convert.(Union{type, Missing}, col)
end
DuckDB.register_table(con, tblf(my_df), "my_df")
results = DBInterface.execute(con, "SELECT * FROM my_df")
df = columntable(results)
@test isequal(df, my_df)
end
DBInterface.close!(con)
end
end
@testset "Test DataFrame scan with various types" begin
con = DBInterface.connect(DuckDB.DB)
# boolean
my_df = DataFrame(a = [true, false, missing])
DuckDB.register_table(con, my_df, "my_df")
results = DBInterface.execute(con, "SELECT * FROM my_df")
df = DataFrame(results)
@test isequal(df, my_df)
# date/time/timestamp
my_df = DataFrame(
date = [Date(1992, 9, 20), missing, Date(1950, 2, 3)],
time = [Time(23, 3, 1), Time(11, 49, 33), missing],
timestamp = [DateTime(1992, 9, 20, 23, 3, 1), DateTime(1950, 2, 3, 11, 49, 3), missing]
)
DuckDB.register_table(con, my_df, "my_df")
results = DBInterface.execute(con, "SELECT * FROM my_df")
df = DataFrame(results)
@test isequal(df, my_df)
DBInterface.close!(con)
end
@testset "Test table scan with various types" begin
for tblf in [Tables.columntable, Tables.rowtable]
con = DBInterface.connect(DuckDB.DB)
# boolean
my_df = (a = [true, false, missing],)
DuckDB.register_table(con, tblf(my_df), "my_df")
results = DBInterface.execute(con, "SELECT * FROM my_df")
df = columntable(results)
@test isequal(df, my_df)
# date/time/timestamp
my_df = (
date = [Date(1992, 9, 20), missing, Date(1950, 2, 3)],
time = [Time(23, 3, 1), Time(11, 49, 33), missing],
timestamp = [DateTime(1992, 9, 20, 23, 3, 1), DateTime(1950, 2, 3, 11, 49, 3), missing]
)
DuckDB.register_table(con, tblf(my_df), "my_df")
results = DBInterface.execute(con, "SELECT * FROM my_df")
df = columntable(results)
@test isequal(df, my_df)
DBInterface.close!(con)
end
end
@testset "Test DataFrame scan with strings" begin
con = DBInterface.connect(DuckDB.DB)
# date/time/timestamp
my_df = DataFrame(str = ["hello", "this is a very long string", missing, "obligatory mühleisen"])
DuckDB.register_table(con, my_df, "my_df")
results = DBInterface.execute(con, "SELECT * FROM my_df")
df = DataFrame(results)
@test isequal(df, my_df)
DBInterface.close!(con)
end
@testset "Test table scan with strings" begin
for tblf in [Tables.columntable, Tables.rowtable]
con = DBInterface.connect(DuckDB.DB)
# date/time/timestamp
my_df = (str = ["hello", "this is a very long string", missing, "obligatory mühleisen"],)
DuckDB.register_table(con, tblf(my_df), "my_df")
results = DBInterface.execute(con, "SELECT * FROM my_df")
df = columntable(results)
@test isequal(df, my_df)
DBInterface.close!(con)
end
end
@testset "Test DataFrame scan projection pushdown" begin
con = DBInterface.connect(DuckDB.DB)
df = DataFrame(a = [1, 2, 3], b = [42, 84, 42], c = [3, 7, 18])
DuckDB.register_table(con, df, "my_df")
GC.gc()
results = DBInterface.execute(con, "SELECT b FROM my_df")
GC.gc()
df = DataFrame(results)
@test names(df) == ["b"]
@test size(df, 1) == 3
@test df.b == [42, 84, 42]
results = DBInterface.execute(con, "SELECT c, b FROM my_df")
GC.gc()
df = DataFrame(results)
@test names(df) == ["c", "b"]
@test size(df, 1) == 3
@test df.b == [42, 84, 42]
@test df.c == [3, 7, 18]
results = DBInterface.execute(con, "SELECT c, a, a FROM my_df")
GC.gc()
df = DataFrame(results)
@test names(df) == ["c", "a", "a_1"]
@test size(df, 1) == 3
@test df.c == [3, 7, 18]
@test df.a == [1, 2, 3]
@test df.a_1 == [1, 2, 3]
results = DBInterface.execute(con, "SELECT COUNT(*) cnt FROM my_df")
GC.gc()
df = DataFrame(results)
@test names(df) == ["cnt"]
@test size(df, 1) == 1
@test df.cnt == [3]
GC.gc()
DBInterface.close!(con)
end
@testset "Test table scan projection pushdown" begin
for tblf in [Tables.columntable, Tables.rowtable]
con = DBInterface.connect(DuckDB.DB)
df = (a = [1, 2, 3], b = [42, 84, 42], c = [3, 7, 18])
DuckDB.register_table(con, tblf(df), "my_df")
GC.gc()
results = DBInterface.execute(con, "SELECT b FROM my_df")
GC.gc()
df = columntable(results)
@test Tables.columnnames(df) == (:b,)
@test Tables.rowcount(df) == 3
@test df.b == [42, 84, 42]
results = DBInterface.execute(con, "SELECT c, b FROM my_df")
GC.gc()
df = columntable(results)
@test Tables.columnnames(df) == (:c, :b)
@test Tables.rowcount(df) == 3
@test df.b == [42, 84, 42]
@test df.c == [3, 7, 18]
results = DBInterface.execute(con, "SELECT c, a, a FROM my_df")
GC.gc()
df = columntable(results)
@test Tables.columnnames(df) == (:c, :a, :a_1)
@test Tables.rowcount(df) == 3
@test df.c == [3, 7, 18]
@test df.a == [1, 2, 3]
@test df.a_1 == [1, 2, 3]
results = DBInterface.execute(con, "SELECT COUNT(*) cnt FROM my_df")
GC.gc()
df = columntable(results)
@test Tables.columnnames(df) == (:cnt,)
@test Tables.rowcount(df) == 1
@test df.cnt == [3]
GC.gc()
DBInterface.close!(con)
end
end
@testset "Test large DataFrame scan" begin
con = DBInterface.connect(DuckDB.DB)
my_df = DataFrame(DBInterface.execute(con, "SELECT i%5 AS i FROM range(10000000) tbl(i)"))
DuckDB.register_table(con, my_df, "my_df")
GC.gc()
results = DBInterface.execute(con, "SELECT SUM(i) AS sum FROM my_df")
GC.gc()
df = DataFrame(results)
@test names(df) == ["sum"]
@test size(df, 1) == 1
@test df.sum == [20000000]
DBInterface.close!(con)
end
@testset "Test large table scan" begin
for tblf in [Tables.columntable, Tables.rowtable]
con = DBInterface.connect(DuckDB.DB)
my_df = tblf(DBInterface.execute(con, "SELECT i%5 AS i FROM range(10000000) tbl(i)"))
DuckDB.register_table(con, my_df, "my_df")
GC.gc()
results = DBInterface.execute(con, "SELECT SUM(i) AS sum FROM my_df")
GC.gc()
df = columntable(results)
@test Tables.columnnames(df) == (:sum,)
@test Tables.rowcount(df) == 1
@test df.sum == [20000000]
DBInterface.close!(con)
end
end

View File

@@ -0,0 +1,12 @@
# test_threading.jl
@testset "Test threading" begin
con = DBInterface.connect(DuckDB.DB)
DBInterface.execute(con, "CREATE TABLE integers AS SELECT * FROM range(100000000) t(i)")
results = DBInterface.execute(con, "SELECT SUM(i) sum FROM integers")
df = DataFrame(results)
@test df.sum == [4999999950000000]
DBInterface.close!(con)
end

View File

@@ -0,0 +1,59 @@
# test_tpch.jl
# DuckDB needs to have been built with TPCH (BUILD_TPCH=1) to run this test!
@testset "Test TPC-H" begin
sf = "0.1"
# load TPC-H into DuckDB
native_con = DBInterface.connect(DuckDB.DB)
try
DBInterface.execute(native_con, "CALL dbgen(sf=$sf)")
catch
@info "TPC-H extension not available; skipping"
return
end
# convert all tables to Julia DataFrames
customer = DataFrame(DBInterface.execute(native_con, "SELECT * FROM customer"))
lineitem = DataFrame(DBInterface.execute(native_con, "SELECT * FROM lineitem"))
nation = DataFrame(DBInterface.execute(native_con, "SELECT * FROM nation"))
orders = DataFrame(DBInterface.execute(native_con, "SELECT * FROM orders"))
part = DataFrame(DBInterface.execute(native_con, "SELECT * FROM part"))
partsupp = DataFrame(DBInterface.execute(native_con, "SELECT * FROM partsupp"))
region = DataFrame(DBInterface.execute(native_con, "SELECT * FROM region"))
supplier = DataFrame(DBInterface.execute(native_con, "SELECT * FROM supplier"))
# now open a new in-memory database, and register the dataframes there
df_con = DBInterface.connect(DuckDB.DB)
DuckDB.register_table(df_con, customer, "customer")
DuckDB.register_table(df_con, lineitem, "lineitem")
DuckDB.register_table(df_con, nation, "nation")
DuckDB.register_table(df_con, orders, "orders")
DuckDB.register_table(df_con, part, "part")
DuckDB.register_table(df_con, partsupp, "partsupp")
DuckDB.register_table(df_con, region, "region")
DuckDB.register_table(df_con, supplier, "supplier")
GC.gc()
# run all the queries
for i in 1:22
# print("Q$i\n")
# for each query, compare the results of the query ran on the original tables
# versus the result when run on the Julia DataFrames
res = DataFrame(DBInterface.execute(df_con, "PRAGMA tpch($i)"))
res2 = DataFrame(DBInterface.execute(native_con, "PRAGMA tpch($i)"))
@test isequal(res, res2)
# print("Native DuckDB\n")
# @time begin
# results = DBInterface.execute(native_con, "PRAGMA tpch($i)")
# end
# print("DataFrame\n")
# @time begin
# results = DBInterface.execute(df_con, "PRAGMA tpch($i)")
# end
end
DBInterface.close!(df_con)
DBInterface.close!(native_con)
end

View File

@@ -0,0 +1,54 @@
# test_tpch_multithread.jl
# DuckDB needs to have been built with TPCH (BUILD_TPCH=1) to run this test!
function test_tpch_multithread()
sf = "0.10"
# load TPC-H into DuckDB
native_con = DBInterface.connect(DuckDB.DB)
try
DBInterface.execute(native_con, "CALL dbgen(sf=$sf)")
catch
@info "TPC-H extension not available; skipping"
return
end
# convert all tables to Julia DataFrames
customer = DataFrame(DBInterface.execute(native_con, "SELECT * FROM customer"))
lineitem = DataFrame(DBInterface.execute(native_con, "SELECT * FROM lineitem"))
nation = DataFrame(DBInterface.execute(native_con, "SELECT * FROM nation"))
orders = DataFrame(DBInterface.execute(native_con, "SELECT * FROM orders"))
part = DataFrame(DBInterface.execute(native_con, "SELECT * FROM part"))
partsupp = DataFrame(DBInterface.execute(native_con, "SELECT * FROM partsupp"))
region = DataFrame(DBInterface.execute(native_con, "SELECT * FROM region"))
supplier = DataFrame(DBInterface.execute(native_con, "SELECT * FROM supplier"))
id = Threads.threadid()
# now open a new in-memory database, and register the dataframes there
df_con = DBInterface.connect(DuckDB.DB)
DuckDB.register_table(df_con, customer, "customer")
DuckDB.register_table(df_con, lineitem, "lineitem")
DuckDB.register_table(df_con, nation, "nation")
DuckDB.register_table(df_con, orders, "orders")
DuckDB.register_table(df_con, part, "part")
DuckDB.register_table(df_con, partsupp, "partsupp")
DuckDB.register_table(df_con, region, "region")
DuckDB.register_table(df_con, supplier, "supplier")
GC.gc()
# Execute all the queries
for _ in 1:10
for i in 1:22
print("T:$id | Q:$i\n")
res = DataFrame(DBInterface.execute(df_con, "PRAGMA tpch($i)"))
end
end
DBInterface.close!(df_con)
return DBInterface.close!(native_con)
end
@testset "Test TPC-H Stresstest" begin
test_tpch_multithread()
end

View File

@@ -0,0 +1,23 @@
# test_transaction.jl
@testset "Test DBInterface.transaction" begin
con = DBInterface.connect(DuckDB.DB, ":memory:")
# throw an exception in DBInterface.transaction
# this should cause a rollback to happen
@test_throws DuckDB.QueryException DBInterface.transaction(con) do
DBInterface.execute(con, "CREATE TABLE integers(i INTEGER)")
return DBInterface.execute(con, "SELEC")
end
# verify that the table does not exist
@test_throws DuckDB.QueryException DBInterface.execute(con, "SELECT * FROM integers")
# no exception, this should work and be committed
DBInterface.transaction(con) do
return DBInterface.execute(con, "CREATE TABLE integers(i INTEGER)")
end
DBInterface.execute(con, "SELECT * FROM integers")
DBInterface.close!(con)
end

View File

@@ -0,0 +1,43 @@
# test_union_type.jl
@testset "Test Union Type" begin
db = DBInterface.connect(DuckDB.DB)
con = DBInterface.connect(db)
DBInterface.execute(
con,
"""
create table tbl (
u UNION (a BOOL, b VARCHAR)
);
"""
)
DBInterface.execute(
con,
"""
insert into tbl VALUES('str'), (true);
"""
)
df = DataFrame(DBInterface.execute(
con,
"""
select u from tbl;
"""
))
@test isequal(df.u, ["str", true])
DBInterface.execute(
con,
"""
insert into tbl VALUES(NULL);
"""
)
df = DataFrame(DBInterface.execute(
con,
"""
select u from tbl;
"""
))
@test isequal(df.u, ["str", true, missing])
end

22
external/duckdb/tools/juliapkg/update_api.sh vendored Executable file
View File

@@ -0,0 +1,22 @@
set -euo pipefail
echo "Updating api.jl..."
OLD_API_FILE=tools/juliapkg/src/api_old.jl
ORIG_DIR=$(pwd)
GIR_ROOT_DIR=$(git rev-parse --show-toplevel)
cd "$GIR_ROOT_DIR"
# Generate the Julia API
python tools/juliapkg/scripts/generate_c_api_julia.py \
--auto-1-index \
--capi-dir src/include/duckdb/main/capi/header_generation \
tools/juliapkg/src/api.jl
echo "Formatting..."
cd "$ORIG_DIR"
./format.sh