should be it

This commit is contained in:
2025-10-24 19:21:19 -05:00
parent a4b23fc57c
commit f09560c7b1
14047 changed files with 3161551 additions and 1 deletions

View File

@@ -0,0 +1,21 @@
cmake_minimum_required(VERSION 3.5...3.29)
project(CoreFunctionsExtension)
include_directories(include)
add_subdirectory(aggregate)
add_subdirectory(scalar)
set(CORE_FUNCTION_FILES ${CORE_FUNCTION_FILES} core_functions_extension.cpp
function_list.cpp lambda_functions.cpp)
build_static_extension(core_functions ${CORE_FUNCTION_FILES})
set(PARAMETERS "-warnings")
build_loadable_extension(core_functions ${PARAMETERS} ${CORE_FUNCTION_FILES})
target_link_libraries(core_functions_loadable_extension duckdb_skiplistlib)
install(
TARGETS core_functions_extension
EXPORT "${DUCKDB_EXPORT_SET}"
LIBRARY DESTINATION "${INSTALL_LIB_DIR}"
ARCHIVE DESTINATION "${INSTALL_LIB_DIR}")

View File

@@ -0,0 +1,51 @@
`core_functions` contains the set of functions that is included in the core system.
These functions are bundled with every installation of DuckDB.
In order to add new functions, add their definition to the `functions.json` file in the respective directory.
The function headers can then be generated from the set of functions using the following command:
```python
python3 scripts/generate_functions.py
```
#### Function Format
Functions are defined according to the following format:
```json
{
"name": "date_diff",
"parameters": "part,startdate,enddate",
"description": "The number of partition boundaries between the timestamps",
"example": "date_diff('hour', TIMESTAMPTZ '1992-09-30 23:59:59', TIMESTAMPTZ '1992-10-01 01:58:00')",
"type": "scalar_function_set",
"struct": "DateDiffFun",
"aliases": ["datediff"]
}
```
* *name* signifies the function name at the SQL level.
* *parameters* is a comma separated list of parameter names (for documentation purposes).
* *description* is a description of the function (for documentation purposes).
* *example* is an example of how to use the function (for documentation purposes).
* *type* is the type of function, e.g. `scalar_function`, `scalar_function_set`, `aggregate_function`, etc.
* *struct* is the **optional** name of the struct that holds the definition of the function in the generated header. By default the function name will be title cased with `Fun` added to the end, e.g. `date_diff` -> `DateDiffFun`.
* *aliases* is an **optional** list of aliases for the function at the SQL level.
##### Scalar Function
Scalar functions require the following function to be defined:
```cpp
ScalarFunction DateDiffFun::GetFunction() {
return ...
}
```
##### Scalar Function Set
Scalar function sets require the following function to be defined:
```cpp
ScalarFunctionSet DateDiffFun::GetFunctions() {
return ...
}
```

View File

@@ -0,0 +1,9 @@
add_subdirectory(algebraic)
add_subdirectory(distributive)
add_subdirectory(holistic)
add_subdirectory(nested)
add_subdirectory(regression)
set(CORE_FUNCTION_FILES
${CORE_FUNCTION_FILES}
PARENT_SCOPE)

View File

@@ -0,0 +1,238 @@
# Aggregate Functions
Aggregate functions combine a set of values into a single value.
In DuckDB, they appear in several contexts:
* As part of the `SELECT` list of a query with a `GROUP BY` clause (ordinary aggregation)
* As the only elements of the `SELECT` list of a query _without_ a `GROUP BY` clause (simple aggregation)
* Modified by an `OVER` clause (windowed aggregation)
* As an argument to the `list_aggregate` function (list aggregation)
## Aggregation Operations
In order to define an aggregate function, you need to define some operations.
These operations accumulate data into a `State` object that is specific to the aggregate.
Each `State` represents the accumulated values for a single result,
so if (say) there are multiple groups in a `GROUP BY`,
each result value would need its own `State` object.
Unlike simple scalar functions, there are several of these:
| Operation | Description | Required |
| :-------- | :---------- | :------- |
| `size` | Returns the fixed size of the `State` | X |
| `initialize` | Constructs the `State` in raw memory | X |
| `destructor` | Destructs the `State` back to raw memory | |
| `update` | Accumulate the arguments into the corresponding `State` | X |
| `simple_update` | Accumulate the arguments into a single `State`. | |
| `combine` | Merge one `State` into another | |
| `finalize` | Convert a `State` into a final value. | X |
| `window` | Compute a windowed aggregate value from the inputs and frame bounds | |
| `bind` | Modify the binding of the aggregate | |
| `statistics` | Derive statistics of the result from the statistics of the arguments | |
| `serialize` | Write a `State` to a relocatable binary blob | |
| `deserialize` | Read a `State` from a binary blob | |
In addition to these high level functions,
there is also a template `AggregateExecutor` that can be used to generate these functions
from row-oriented static methods in a class.
There are also a number of helper objects that contain various bits of context for the aggregate,
such as binding data and extracted validity masks.
By combining them into these helper objects, we reduce the number of arguments to various functions.
The helpers can vary by the number of arguments, and we will refer to them simply as `info` below.
Consult the code for details on what is available.
### Size
```cpp
size()
```
`State`s are allocated in memory blocks by the various operators
so each aggregate has to tell the operator how much memory it will require.
Note that this is just the memory that the aggregate needs to get started -
it is perfectly legal to allocate variable amounts of memory
and storing pointers to it in the `State`.
### Initialize
```cpp
initialize(State *)
```
Construct a _single_ empty `State` from uninitialized memory.
### Destructor
```cpp
destructor(Vector &state, AggregateInputData &info, idx_t count)
```
Destruct a `Vector` of state pointers.
If you are using a template, the method has the signature
```cpp
Destroy(State &state, AggregateInputData &info)
```
### Update and Simple Update
```cpp
update(Vector inputs[], AggregateInputData &info, idx_t ninputs, Vector &states, idx_t count)
```
Accumulate the input values for each row into the `State` object for that row.
The `states` argument contains pointers to the states,
which allows different rows to be accumulated into the same row if they are in the same group.
This type of operations is called "scattering", which is why
the template generator methods for `update` operations are called `ScatterUpdate`s.
```cpp
simple_update(Vector inputs[], AggregateInputData &info, idx_t ninputs, State *state, idx_t count)
```
Accumulate the input arguments for each row into a single `State`.
Simple updates are used when there is only one `State` being updated,
usually for `SELECT` queries with no `GROUP BY` clause.
They are defined when an update can be performed more efficiently in a single tight loop.
There are some other places where this operations will be used if available
when the caller has only one state to update.
The template generator methods for simple updates are just called `Update`s.
The template generators use two methods for single rows:
```cpp
ConstantOperation(State& state, const Arg1Type &arg1, ..., AggregateInputInfo &info, idx_t count)
```
Called when there is a single value that can be accumulated `count` times.
```cpp
Operation(State& state, const Arg1Type &arg1, ..., AggregateInputInfo &info)
```
Called for each tuple of argument values with the `State` to update.
### Combine
```cpp
combine(Vector &sources, Vector &targets, AggregateInputData &info, idx_t count)
```
Merges the source states into the corresponding target states.
If you are using template generators,
the generator is `StateCombine` and the method it wraps is:
```cpp
Combine(const State& source, State &target, AggregateInputData &info)
```
Note that the `source` should _not_ be modified for efficiency because the caller may be using them
for multiple operations (e.g., window segment trees).
If you wish to combine destructively, you _must_ check that the `combine_type` member
of the `AggregateInputData` argument is set to `ALLOW_DESTRUCTIVE`.
This is useful when the aggregate can move data more efficiently than copying it.
`LIST` is an example, where the internal linked list data structures can be spliced instead of copied.
The `combine` operation is optional, but it is needed for multi-threaded aggregation.
If it is not provided, then _all_ aggregate functions in the grouping must be computed on a single thread.
### Finalize
```cpp
finalize(Vector &state, AggregateInputData &info, Vector &result, idx_t count, idx_t offset)
```
Converts states into result values.
If you are using template generators, the generator is `StateFinalize`
and the method you define is:
```cpp
Finalize(const State &state, ResultType &result, AggregateFinalizeData &info)
```
### Window
```cpp
window(Vector inputs[], const ValidityMask &filter,
AggregateInputData &info, idx_t ninputs, State *state,
const FrameBounds &frame, const FrameBounds &prev, Vector &result, idx_t rid,
idx_t bias)
```
The Window operator usually works with the basic aggregation operations `update`, `combine` and `finalize`
to compute moving aggregates via segment trees or simply computing the aggregate over a range of inputs.
In some situations, this is either overkill (`COUNT(*)`) or too slow (`MODE`)
and an optional window function can be defined.
This function will be passed the values in the window frame,
along with the current frame, the previous frame
the result `Vector` and the result row number being computed.
The previous frame is provided so the function can use
the delta from the previous frame to update the `State`.
This could be kept in the `State` itself.
The `bias` argument was used for handling large input partitions,
and contains the partition offset where the `inputs` rows start.
Currently, it is always zero, but this could change in the future
to handle constrained memory situations.
The template generator method for windowing is:
```cpp
Window(const ArgType *arg, ValidityMask &filter, ValidityMask &valid,
AggregateInputData &info, State *state,
const FrameBounds &frame, const FrameBounds &prev,
ResultType &result, idx_t rid, idx_tbias)
```
### Bind
```cpp
bind(ClientContext &context, AggregateFunction &function,vector<unique_ptr<Expression>> &arguments)
```
Like scalar functions, aggregates can sometimes have complex binding rules
or need to cache data (such as constant arguments to quantiles).
The `bind` function is how the aggregate hooks into the binding system.
### Statistics
```cpp
statistics(ClientContext &context, BoundAggregateExpression &expr, AggregateStatisticsInput &input)
```
Also like scalar functions, aggregates can sometime be able to produce result statistics
based on their arguments.
The `statistics` function is how the aggregate hooks into the planner.
### Serialization
```cpp
serialize(Serializer &serializer, const optional_ptr<FunctionData> bind_data, const AggregateFunction &function);
deserialize(Deserializer &deserializer, AggregateFunction &function);
```
Again like scalar functions, bound aggregates can be serialised as part of a query plan.
These functions save and restore the binding data from binary blobs.
### Ignore Nulls
The templating system needs to know whether the aggregate ignores nulls,
so the template generators require the `IgnoreNull` static method to be defined.
## Ordered Aggregates
Some aggregates (e.g., `STRING_AGG`) are order-sensitive.
Unless marked otherwise by setting the `order_dependent` flag to `NOT_ORDER_DEPENDENT`,
the aggregate will be assumed to be order-sensitive.
If the aggregate is order-sensitive and the user specifies an `ORDER BY` clause in the arguments,
then it will be wrapped to make sure that the arguments are cached and sorted
before being passed to the aggregate operations:
```sql
-- Concatenate the strings in alphabetical order
STRING_AGG(code, ',' ORDER BY code)
```

View File

@@ -0,0 +1,5 @@
add_library_unity(duckdb_core_functions_algebraic OBJECT corr.cpp stddev.cpp
avg.cpp covar.cpp)
set(CORE_FUNCTION_FILES
${CORE_FUNCTION_FILES} $<TARGET_OBJECTS:duckdb_core_functions_algebraic>
PARENT_SCOPE)

View File

@@ -0,0 +1,314 @@
#include "core_functions/aggregate/algebraic_functions.hpp"
#include "core_functions/aggregate/sum_helpers.hpp"
#include "duckdb/common/types/hugeint.hpp"
#include "duckdb/common/types/time.hpp"
#include "duckdb/common/exception.hpp"
#include "duckdb/function/function_set.hpp"
#include "duckdb/planner/expression.hpp"
namespace duckdb {
namespace {
template <class T>
struct AvgState {
uint64_t count;
T value;
void Initialize() {
this->count = 0;
}
void Combine(const AvgState<T> &other) {
this->count += other.count;
this->value += other.value;
}
};
struct IntervalAvgState {
int64_t count;
interval_t value;
void Initialize() {
this->count = 0;
this->value = interval_t();
}
void Combine(const IntervalAvgState &other) {
this->count += other.count;
this->value = AddOperator::Operation<interval_t, interval_t, interval_t>(this->value, other.value);
}
};
struct KahanAvgState {
uint64_t count;
double value;
double err;
void Initialize() {
this->count = 0;
this->err = 0.0;
}
void Combine(const KahanAvgState &other) {
this->count += other.count;
KahanAddInternal(other.value, this->value, this->err);
KahanAddInternal(other.err, this->value, this->err);
}
};
struct AverageDecimalBindData : public FunctionData {
explicit AverageDecimalBindData(double scale) : scale(scale) {
}
double scale;
public:
unique_ptr<FunctionData> Copy() const override {
return make_uniq<AverageDecimalBindData>(scale);
};
bool Equals(const FunctionData &other_p) const override {
auto &other = other_p.Cast<AverageDecimalBindData>();
return scale == other.scale;
}
};
struct AverageSetOperation {
template <class STATE>
static void Initialize(STATE &state) {
state.Initialize();
}
template <class STATE>
static void Combine(const STATE &source, STATE &target, AggregateInputData &) {
target.Combine(source);
}
template <class STATE>
static void AddValues(STATE &state, idx_t count) {
state.count += count;
}
};
template <class T>
static T GetAverageDivident(uint64_t count, optional_ptr<FunctionData> bind_data) {
T divident = T(count);
if (bind_data) {
auto &avg_bind_data = bind_data->Cast<AverageDecimalBindData>();
divident *= avg_bind_data.scale;
}
return divident;
}
struct IntegerAverageOperation : public BaseSumOperation<AverageSetOperation, RegularAdd> {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.count == 0) {
finalize_data.ReturnNull();
} else {
double divident = GetAverageDivident<double>(state.count, finalize_data.input.bind_data);
target = double(state.value) / divident;
}
}
};
struct IntegerAverageOperationHugeint : public BaseSumOperation<AverageSetOperation, AddToHugeint> {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.count == 0) {
finalize_data.ReturnNull();
} else {
long double divident = GetAverageDivident<long double>(state.count, finalize_data.input.bind_data);
target = Hugeint::Cast<long double>(state.value) / divident;
}
}
};
struct DiscreteAverageOperation : public BaseSumOperation<AverageSetOperation, AddToHugeint> {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.count == 0) {
finalize_data.ReturnNull();
} else {
hugeint_t remainder;
target = Hugeint::Cast<T>(Hugeint::DivMod(state.value, state.count, remainder));
// Round the result
target += (remainder > (state.count / 2));
}
}
};
struct HugeintAverageOperation : public BaseSumOperation<AverageSetOperation, HugeintAdd> {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.count == 0) {
finalize_data.ReturnNull();
} else {
long double divident = GetAverageDivident<long double>(state.count, finalize_data.input.bind_data);
target = Hugeint::Cast<long double>(state.value) / divident;
}
}
};
struct NumericAverageOperation : public BaseSumOperation<AverageSetOperation, RegularAdd> {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.count == 0) {
finalize_data.ReturnNull();
} else {
target = state.value / state.count;
}
}
};
struct KahanAverageOperation : public BaseSumOperation<AverageSetOperation, KahanAdd> {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.count == 0) {
finalize_data.ReturnNull();
} else {
target = (state.value / state.count) + (state.err / state.count);
}
}
};
struct IntervalAverageOperation : public BaseSumOperation<AverageSetOperation, IntervalAdd> {
// Override BaseSumOperation::Initialize because
// IntervalAvgState does not have an assignment constructor from 0
static void Initialize(IntervalAvgState &state) {
AverageSetOperation::Initialize<IntervalAvgState>(state);
}
template <class RESULT_TYPE, class STATE>
static void Finalize(STATE &state, RESULT_TYPE &target, AggregateFinalizeData &finalize_data) {
if (state.count == 0) {
finalize_data.ReturnNull();
} else {
// DivideOperator does not borrow fractions right,
// TODO: Maybe it should?
// Copy PG implementation.
const auto &value = state.value;
const auto count = UnsafeNumericCast<int64_t>(state.count);
target.months = value.months / count;
auto months_remainder = value.months % count;
target.days = value.days / count;
auto days_remainder = value.days % count;
target.micros = value.micros / count;
auto micros_remainder = value.micros % count;
// Shift the remainders right
months_remainder *= Interval::DAYS_PER_MONTH;
target.days += months_remainder / count;
days_remainder += months_remainder % count;
days_remainder *= Interval::MICROS_PER_DAY;
micros_remainder += days_remainder / count;
target.micros += micros_remainder;
}
}
};
struct TimeTZAverageOperation : public BaseSumOperation<AverageSetOperation, AddToHugeint> {
template <class INPUT_TYPE, class STATE, class OP>
static void Operation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &aggr_unary) {
const auto micros = Time::NormalizeTimeTZ(input).micros;
AverageSetOperation::template AddValues<STATE>(state, 1);
AddToHugeint::template AddNumber<STATE, int64_t>(state, micros);
}
template <class INPUT_TYPE, class STATE, class OP>
static void ConstantOperation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &aggr_unary, idx_t count) {
const auto micros = Time::NormalizeTimeTZ(input).micros;
AverageSetOperation::template AddValues<STATE>(state, count);
AddToHugeint::template AddConstant<STATE, int64_t>(state, micros, count);
}
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.count == 0) {
finalize_data.ReturnNull();
} else {
uint64_t remainder;
auto micros = Hugeint::Cast<int64_t>(Hugeint::DivModPositive(state.value, state.count, remainder));
// Round the result
micros += (remainder > (state.count / 2));
target = dtime_tz_t(dtime_t(micros), 0);
}
}
};
AggregateFunction GetAverageAggregate(PhysicalType type) {
switch (type) {
case PhysicalType::INT16: {
return AggregateFunction::UnaryAggregate<AvgState<int64_t>, int16_t, double, IntegerAverageOperation>(
LogicalType::SMALLINT, LogicalType::DOUBLE);
}
case PhysicalType::INT32: {
return AggregateFunction::UnaryAggregate<AvgState<hugeint_t>, int32_t, double, IntegerAverageOperationHugeint>(
LogicalType::INTEGER, LogicalType::DOUBLE);
}
case PhysicalType::INT64: {
return AggregateFunction::UnaryAggregate<AvgState<hugeint_t>, int64_t, double, IntegerAverageOperationHugeint>(
LogicalType::BIGINT, LogicalType::DOUBLE);
}
case PhysicalType::INT128: {
return AggregateFunction::UnaryAggregate<AvgState<hugeint_t>, hugeint_t, double, HugeintAverageOperation>(
LogicalType::HUGEINT, LogicalType::DOUBLE);
}
case PhysicalType::INTERVAL: {
return AggregateFunction::UnaryAggregate<IntervalAvgState, interval_t, interval_t, IntervalAverageOperation>(
LogicalType::INTERVAL, LogicalType::INTERVAL);
}
default:
throw InternalException("Unimplemented average aggregate");
}
}
unique_ptr<FunctionData> BindDecimalAvg(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
auto decimal_type = arguments[0]->return_type;
function = GetAverageAggregate(decimal_type.InternalType());
function.name = "avg";
function.arguments[0] = decimal_type;
function.return_type = LogicalType::DOUBLE;
return make_uniq<AverageDecimalBindData>(
Hugeint::Cast<double>(Hugeint::POWERS_OF_TEN[DecimalType::GetScale(decimal_type)]));
}
} // namespace
AggregateFunctionSet AvgFun::GetFunctions() {
AggregateFunctionSet avg;
avg.AddFunction(AggregateFunction({LogicalTypeId::DECIMAL}, LogicalTypeId::DECIMAL, nullptr, nullptr, nullptr,
nullptr, nullptr, FunctionNullHandling::DEFAULT_NULL_HANDLING, nullptr,
BindDecimalAvg));
avg.AddFunction(GetAverageAggregate(PhysicalType::INT16));
avg.AddFunction(GetAverageAggregate(PhysicalType::INT32));
avg.AddFunction(GetAverageAggregate(PhysicalType::INT64));
avg.AddFunction(GetAverageAggregate(PhysicalType::INT128));
avg.AddFunction(GetAverageAggregate(PhysicalType::INTERVAL));
avg.AddFunction(AggregateFunction::UnaryAggregate<AvgState<double>, double, double, NumericAverageOperation>(
LogicalType::DOUBLE, LogicalType::DOUBLE));
avg.AddFunction(AggregateFunction::UnaryAggregate<AvgState<hugeint_t>, int64_t, int64_t, DiscreteAverageOperation>(
LogicalType::TIMESTAMP, LogicalType::TIMESTAMP));
avg.AddFunction(AggregateFunction::UnaryAggregate<AvgState<hugeint_t>, int64_t, int64_t, DiscreteAverageOperation>(
LogicalType::TIMESTAMP_TZ, LogicalType::TIMESTAMP_TZ));
avg.AddFunction(AggregateFunction::UnaryAggregate<AvgState<hugeint_t>, int64_t, int64_t, DiscreteAverageOperation>(
LogicalType::TIME, LogicalType::TIME));
avg.AddFunction(
AggregateFunction::UnaryAggregate<AvgState<hugeint_t>, dtime_tz_t, dtime_tz_t, TimeTZAverageOperation>(
LogicalType::TIME_TZ, LogicalType::TIME_TZ));
return avg;
}
AggregateFunction FAvgFun::GetFunction() {
return AggregateFunction::UnaryAggregate<KahanAvgState, double, double, KahanAverageOperation>(LogicalType::DOUBLE,
LogicalType::DOUBLE);
}
} // namespace duckdb

View File

@@ -0,0 +1,13 @@
#include "core_functions/aggregate/algebraic_functions.hpp"
#include "core_functions/aggregate/algebraic/covar.hpp"
#include "core_functions/aggregate/algebraic/stddev.hpp"
#include "core_functions/aggregate/algebraic/corr.hpp"
#include "duckdb/function/function_set.hpp"
namespace duckdb {
AggregateFunction CorrFun::GetFunction() {
return AggregateFunction::BinaryAggregate<CorrState, double, double, double, CorrOperation>(
LogicalType::DOUBLE, LogicalType::DOUBLE, LogicalType::DOUBLE);
}
} // namespace duckdb

View File

@@ -0,0 +1,17 @@
#include "core_functions/aggregate/algebraic_functions.hpp"
#include "duckdb/common/types/null_value.hpp"
#include "core_functions/aggregate/algebraic/covar.hpp"
namespace duckdb {
AggregateFunction CovarPopFun::GetFunction() {
return AggregateFunction::BinaryAggregate<CovarState, double, double, double, CovarPopOperation>(
LogicalType::DOUBLE, LogicalType::DOUBLE, LogicalType::DOUBLE);
}
AggregateFunction CovarSampFun::GetFunction() {
return AggregateFunction::BinaryAggregate<CovarState, double, double, double, CovarSampOperation>(
LogicalType::DOUBLE, LogicalType::DOUBLE, LogicalType::DOUBLE);
}
} // namespace duckdb

View File

@@ -0,0 +1,79 @@
[
{
"name": "avg",
"parameters": "x",
"description": "Calculates the average value for all tuples in x.",
"example": "SUM(x) / COUNT(*)",
"type": "aggregate_function_set",
"aliases": ["mean"]
},
{
"name": "corr",
"parameters": "y,x",
"description": "Returns the correlation coefficient for non-NULL pairs in a group.",
"example": "COVAR_POP(y, x) / (STDDEV_POP(x) * STDDEV_POP(y))",
"type": "aggregate_function"
},
{
"name": "covar_pop",
"parameters": "y,x",
"description": "Returns the population covariance of input values.",
"example": "(SUM(x*y) - SUM(x) * SUM(y) / COUNT(*)) / COUNT(*)",
"type": "aggregate_function"
},
{
"name": "covar_samp",
"parameters": "y,x",
"description": "Returns the sample covariance for non-NULL pairs in a group.",
"example": "(SUM(x*y) - SUM(x) * SUM(y) / COUNT(*)) / (COUNT(*) - 1)",
"type": "aggregate_function"
},
{
"name": "favg",
"parameters": "x",
"description": "Calculates the average using a more accurate floating point summation (Kahan Sum)",
"example": "favg(A)",
"type": "aggregate_function",
"struct": "FAvgFun"
},
{
"name": "sem",
"parameters": "x",
"description": "Returns the standard error of the mean",
"example": "",
"type": "aggregate_function",
"struct": "StandardErrorOfTheMeanFun"
},
{
"name": "stddev_pop",
"parameters": "x",
"description": "Returns the population standard deviation.",
"example": "sqrt(var_pop(x))",
"type": "aggregate_function",
"struct": "StdDevPopFun"
},
{
"name": "stddev_samp",
"parameters": "x",
"description": "Returns the sample standard deviation",
"example": "sqrt(var_samp(x))",
"type": "aggregate_function",
"aliases": ["stddev"],
"struct": "StdDevSampFun"
},
{
"name": "var_pop",
"parameters": "x",
"description": "Returns the population variance.",
"example": "",
"type": "aggregate_function"
},
{
"name": "var_samp",
"parameters": "x",
"description": "Returns the sample variance of all input values.",
"example": "(SUM(x^2) - SUM(x)^2 / COUNT(x)) / (COUNT(x) - 1)",
"type": "aggregate_function",
"aliases": ["variance"]
}
]

View File

@@ -0,0 +1,34 @@
#include "core_functions/aggregate/algebraic_functions.hpp"
#include "duckdb/common/vector_operations/vector_operations.hpp"
#include "duckdb/function/function_set.hpp"
#include "core_functions/aggregate/algebraic/stddev.hpp"
#include <cmath>
namespace duckdb {
AggregateFunction StdDevSampFun::GetFunction() {
return AggregateFunction::UnaryAggregate<StddevState, double, double, STDDevSampOperation>(LogicalType::DOUBLE,
LogicalType::DOUBLE);
}
AggregateFunction StdDevPopFun::GetFunction() {
return AggregateFunction::UnaryAggregate<StddevState, double, double, STDDevPopOperation>(LogicalType::DOUBLE,
LogicalType::DOUBLE);
}
AggregateFunction VarPopFun::GetFunction() {
return AggregateFunction::UnaryAggregate<StddevState, double, double, VarPopOperation>(LogicalType::DOUBLE,
LogicalType::DOUBLE);
}
AggregateFunction VarSampFun::GetFunction() {
return AggregateFunction::UnaryAggregate<StddevState, double, double, VarSampOperation>(LogicalType::DOUBLE,
LogicalType::DOUBLE);
}
AggregateFunction StandardErrorOfTheMeanFun::GetFunction() {
return AggregateFunction::UnaryAggregate<StddevState, double, double, StandardErrorOfTheMeanOperation>(
LogicalType::DOUBLE, LogicalType::DOUBLE);
}
} // namespace duckdb

View File

@@ -0,0 +1,16 @@
add_library_unity(
duckdb_core_functions_distributive
OBJECT
kurtosis.cpp
string_agg.cpp
sum.cpp
arg_min_max.cpp
approx_count.cpp
skew.cpp
bitagg.cpp
bitstring_agg.cpp
product.cpp
bool.cpp)
set(CORE_FUNCTION_FILES
${CORE_FUNCTION_FILES} $<TARGET_OBJECTS:duckdb_core_functions_distributive>
PARENT_SCOPE)

View File

@@ -0,0 +1,103 @@
#include "duckdb/common/exception.hpp"
#include "duckdb/common/types/hash.hpp"
#include "duckdb/common/types/hyperloglog.hpp"
#include "core_functions/aggregate/distributive_functions.hpp"
#include "duckdb/function/function_set.hpp"
#include "duckdb/planner/expression/bound_aggregate_expression.hpp"
#include "hyperloglog.hpp"
namespace duckdb {
// Algorithms from
// "New cardinality estimation algorithms for HyperLogLog sketches"
// Otmar Ertl, arXiv:1702.01284
namespace {
struct ApproxDistinctCountState {
HyperLogLog hll;
};
struct ApproxCountDistinctFunction {
template <class STATE>
static void Initialize(STATE &state) {
new (&state) STATE();
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &) {
target.hll.Merge(source.hll);
}
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
target = UnsafeNumericCast<T>(state.hll.Count());
}
static bool IgnoreNull() {
return true;
}
};
void ApproxCountDistinctSimpleUpdateFunction(Vector inputs[], AggregateInputData &, idx_t input_count, data_ptr_t state,
idx_t count) {
D_ASSERT(input_count == 1);
auto &input = inputs[0];
if (count > STANDARD_VECTOR_SIZE) {
throw InternalException("ApproxCountDistinct - count must be at most vector size");
}
Vector hash_vec(LogicalType::HASH, count);
VectorOperations::Hash(input, hash_vec, count);
auto agg_state = reinterpret_cast<ApproxDistinctCountState *>(state);
agg_state->hll.Update(input, hash_vec, count);
}
void ApproxCountDistinctUpdateFunction(Vector inputs[], AggregateInputData &, idx_t input_count, Vector &state_vector,
idx_t count) {
D_ASSERT(input_count == 1);
auto &input = inputs[0];
UnifiedVectorFormat idata;
input.ToUnifiedFormat(count, idata);
if (count > STANDARD_VECTOR_SIZE) {
throw InternalException("ApproxCountDistinct - count must be at most vector size");
}
Vector hash_vec(LogicalType::HASH, count);
VectorOperations::Hash(input, hash_vec, count);
UnifiedVectorFormat sdata;
state_vector.ToUnifiedFormat(count, sdata);
const auto states = UnifiedVectorFormat::GetDataNoConst<ApproxDistinctCountState *>(sdata);
UnifiedVectorFormat hdata;
hash_vec.ToUnifiedFormat(count, hdata);
const auto *hashes = UnifiedVectorFormat::GetData<hash_t>(hdata);
for (idx_t i = 0; i < count; i++) {
if (idata.validity.RowIsValid(idata.sel->get_index(i))) {
auto agg_state = states[sdata.sel->get_index(i)];
const auto hash = hashes[hdata.sel->get_index(i)];
agg_state->hll.InsertElement(hash);
}
}
}
AggregateFunction GetApproxCountDistinctFunction(const LogicalType &input_type) {
auto fun = AggregateFunction(
{input_type}, LogicalTypeId::BIGINT, AggregateFunction::StateSize<ApproxDistinctCountState>,
AggregateFunction::StateInitialize<ApproxDistinctCountState, ApproxCountDistinctFunction>,
ApproxCountDistinctUpdateFunction,
AggregateFunction::StateCombine<ApproxDistinctCountState, ApproxCountDistinctFunction>,
AggregateFunction::StateFinalize<ApproxDistinctCountState, int64_t, ApproxCountDistinctFunction>,
ApproxCountDistinctSimpleUpdateFunction);
fun.null_handling = FunctionNullHandling::SPECIAL_HANDLING;
return fun;
}
} // namespace
AggregateFunction ApproxCountDistinctFun::GetFunction() {
return GetApproxCountDistinctFunction(LogicalType::ANY);
}
} // namespace duckdb

View File

@@ -0,0 +1,929 @@
#include "duckdb/common/exception.hpp"
#include "duckdb/common/operator/comparison_operators.hpp"
#include "duckdb/common/vector_operations/vector_operations.hpp"
#include "core_functions/aggregate/distributive_functions.hpp"
#include "duckdb/function/cast/cast_function_set.hpp"
#include "duckdb/function/function_set.hpp"
#include "duckdb/planner/expression/bound_aggregate_expression.hpp"
#include "duckdb/planner/expression/bound_comparison_expression.hpp"
#include "duckdb/planner/expression_binder.hpp"
#include "duckdb/function/create_sort_key.hpp"
#include "duckdb/function/aggregate/minmax_n_helpers.hpp"
namespace duckdb {
namespace {
struct ArgMinMaxStateBase {
ArgMinMaxStateBase() : is_initialized(false), arg_null(false), val_null(false) {
}
template <class T>
static inline void CreateValue(T &value) {
}
template <class T>
static inline void AssignValue(T &target, T new_value, AggregateInputData &aggregate_input_data) {
target = new_value;
}
template <typename T>
static inline void ReadValue(Vector &result, T &arg, T &target) {
target = arg;
}
bool is_initialized;
bool arg_null;
bool val_null;
};
// Out-of-line specialisations
template <>
void ArgMinMaxStateBase::CreateValue(string_t &value) {
value = string_t(uint32_t(0));
}
template <>
void ArgMinMaxStateBase::AssignValue(string_t &target, string_t new_value, AggregateInputData &aggregate_input_data) {
if (new_value.IsInlined()) {
target = new_value;
} else {
// non-inlined string, need to allocate space for it
auto len = new_value.GetSize();
char *ptr;
if (!target.IsInlined() && target.GetSize() >= len) {
// Target has enough space, reuse ptr
ptr = target.GetPointer();
} else {
// Target might be too small, allocate
ptr = reinterpret_cast<char *>(aggregate_input_data.allocator.Allocate(len));
}
memcpy(ptr, new_value.GetData(), len);
target = string_t(ptr, UnsafeNumericCast<uint32_t>(len));
}
}
template <>
void ArgMinMaxStateBase::ReadValue(Vector &result, string_t &arg, string_t &target) {
target = StringVector::AddStringOrBlob(result, arg);
}
template <class A, class B>
struct ArgMinMaxState : public ArgMinMaxStateBase {
using ARG_TYPE = A;
using BY_TYPE = B;
ARG_TYPE arg;
BY_TYPE value;
ArgMinMaxState() {
CreateValue(arg);
CreateValue(value);
}
};
template <class COMPARATOR>
struct ArgMinMaxBase {
template <class STATE>
static void Initialize(STATE &state) {
new (&state) STATE;
}
template <class STATE>
static void Destroy(STATE &state, AggregateInputData &aggr_input_data) {
state.~STATE();
}
template <class A_TYPE, class B_TYPE, class STATE>
static void Assign(STATE &state, const A_TYPE &x, const B_TYPE &y, const bool x_null, const bool y_null,
AggregateInputData &aggregate_input_data) {
D_ASSERT(aggregate_input_data.bind_data);
const auto &bind_data = aggregate_input_data.bind_data->Cast<ArgMinMaxFunctionData>();
if (bind_data.null_handling == ArgMinMaxNullHandling::IGNORE_ANY_NULL) {
STATE::template AssignValue<A_TYPE>(state.arg, x, aggregate_input_data);
STATE::template AssignValue<B_TYPE>(state.value, y, aggregate_input_data);
} else {
state.arg_null = x_null;
state.val_null = y_null;
if (!state.arg_null) {
STATE::template AssignValue<A_TYPE>(state.arg, x, aggregate_input_data);
}
if (!state.val_null) {
STATE::template AssignValue<B_TYPE>(state.value, y, aggregate_input_data);
}
}
}
template <class A_TYPE, class B_TYPE, class STATE, class OP>
static void Operation(STATE &state, const A_TYPE &x, const B_TYPE &y, AggregateBinaryInput &binary) {
D_ASSERT(binary.input.bind_data);
const auto &bind_data = binary.input.bind_data->Cast<ArgMinMaxFunctionData>();
if (!state.is_initialized) {
if (bind_data.null_handling == ArgMinMaxNullHandling::IGNORE_ANY_NULL &&
binary.left_mask.RowIsValid(binary.lidx) && binary.right_mask.RowIsValid(binary.ridx)) {
Assign(state, x, y, !binary.left_mask.RowIsValid(binary.lidx),
!binary.right_mask.RowIsValid(binary.ridx), binary.input);
state.is_initialized = true;
return;
}
if (bind_data.null_handling == ArgMinMaxNullHandling::HANDLE_ARG_NULL &&
binary.right_mask.RowIsValid(binary.ridx)) {
Assign(state, x, y, !binary.left_mask.RowIsValid(binary.lidx),
!binary.right_mask.RowIsValid(binary.ridx), binary.input);
state.is_initialized = true;
return;
}
if (bind_data.null_handling == ArgMinMaxNullHandling::HANDLE_ANY_NULL) {
Assign(state, x, y, !binary.left_mask.RowIsValid(binary.lidx),
!binary.right_mask.RowIsValid(binary.ridx), binary.input);
state.is_initialized = true;
}
} else {
OP::template Execute<A_TYPE, B_TYPE, STATE>(state, x, y, binary);
}
}
template <class A_TYPE, class B_TYPE, class STATE>
static void Execute(STATE &state, A_TYPE x_data, B_TYPE y_data, AggregateBinaryInput &binary) {
D_ASSERT(binary.input.bind_data);
const auto &bind_data = binary.input.bind_data->Cast<ArgMinMaxFunctionData>();
if (binary.right_mask.RowIsValid(binary.ridx) && COMPARATOR::Operation(y_data, state.value)) {
if (bind_data.null_handling != ArgMinMaxNullHandling::IGNORE_ANY_NULL ||
binary.left_mask.RowIsValid(binary.lidx)) {
Assign(state, x_data, y_data, !binary.left_mask.RowIsValid(binary.lidx), false, binary.input);
}
}
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &aggregate_input_data) {
if (!source.is_initialized) {
return;
}
if (!target.is_initialized || target.val_null ||
(!source.val_null && COMPARATOR::Operation(source.value, target.value))) {
Assign(target, source.arg, source.value, source.arg_null, false, aggregate_input_data);
target.is_initialized = true;
}
}
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (!state.is_initialized || state.arg_null) {
finalize_data.ReturnNull();
} else {
STATE::template ReadValue<T>(finalize_data.result, state.arg, target);
}
}
static bool IgnoreNull() {
return false;
}
template <ArgMinMaxNullHandling NULL_HANDLING>
static unique_ptr<FunctionData> Bind(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
if (arguments[1]->return_type.InternalType() == PhysicalType::VARCHAR) {
ExpressionBinder::PushCollation(context, arguments[1], arguments[1]->return_type);
}
function.arguments[0] = arguments[0]->return_type;
function.return_type = arguments[0]->return_type;
auto function_data = make_uniq<ArgMinMaxFunctionData>(NULL_HANDLING);
return unique_ptr<FunctionData>(std::move(function_data));
}
};
struct SpecializedGenericArgMinMaxState {
static bool CreateExtraState(idx_t count) {
// nop extra state
return false;
}
static void PrepareData(Vector &by, idx_t count, bool &, UnifiedVectorFormat &result) {
by.ToUnifiedFormat(count, result);
}
};
template <OrderType ORDER_TYPE>
struct GenericArgMinMaxState {
static Vector CreateExtraState(idx_t count) {
return Vector(LogicalType::BLOB, count);
}
static void PrepareData(Vector &by, idx_t count, Vector &extra_state, UnifiedVectorFormat &result) {
OrderModifiers modifiers(ORDER_TYPE, OrderByNullType::NULLS_LAST);
CreateSortKeyHelpers::CreateSortKeyWithValidity(by, extra_state, modifiers, count);
extra_state.ToUnifiedFormat(count, result);
}
};
template <typename COMPARATOR, OrderType ORDER_TYPE, class UPDATE_TYPE = SpecializedGenericArgMinMaxState>
struct VectorArgMinMaxBase : ArgMinMaxBase<COMPARATOR> {
template <class STATE>
static void Update(Vector inputs[], AggregateInputData &aggregate_input_data, idx_t input_count,
Vector &state_vector, idx_t count) {
D_ASSERT(aggregate_input_data.bind_data);
const auto &bind_data = aggregate_input_data.bind_data->Cast<ArgMinMaxFunctionData>();
auto &arg = inputs[0];
UnifiedVectorFormat adata;
arg.ToUnifiedFormat(count, adata);
using ARG_TYPE = typename STATE::ARG_TYPE;
using BY_TYPE = typename STATE::BY_TYPE;
auto &by = inputs[1];
UnifiedVectorFormat bdata;
auto extra_state = UPDATE_TYPE::CreateExtraState(count);
UPDATE_TYPE::PrepareData(by, count, extra_state, bdata);
const auto bys = UnifiedVectorFormat::GetData<BY_TYPE>(bdata);
UnifiedVectorFormat sdata;
state_vector.ToUnifiedFormat(count, sdata);
STATE *last_state = nullptr;
sel_t assign_sel[STANDARD_VECTOR_SIZE];
idx_t assign_count = 0;
auto states = UnifiedVectorFormat::GetData<STATE *>(sdata);
for (idx_t i = 0; i < count; i++) {
const auto sidx = sdata.sel->get_index(i);
auto &state = *states[sidx];
const auto aidx = adata.sel->get_index(i);
const auto arg_null = !adata.validity.RowIsValid(aidx);
if (bind_data.null_handling == ArgMinMaxNullHandling::IGNORE_ANY_NULL && arg_null) {
continue;
}
const auto bidx = bdata.sel->get_index(i);
if (!bdata.validity.RowIsValid(bidx)) {
if (bind_data.null_handling == ArgMinMaxNullHandling::HANDLE_ANY_NULL && !state.is_initialized) {
state.is_initialized = true;
state.val_null = true;
if (!arg_null) {
if (&state == last_state) {
assign_count--;
}
assign_sel[assign_count++] = UnsafeNumericCast<sel_t>(i);
last_state = &state;
}
}
continue;
}
const auto bval = bys[bidx];
if (!state.is_initialized || state.val_null || COMPARATOR::template Operation<BY_TYPE>(bval, state.value)) {
STATE::template AssignValue<BY_TYPE>(state.value, bval, aggregate_input_data);
state.arg_null = arg_null;
// micro-adaptivity: it is common we overwrite the same state repeatedly
// e.g. when running arg_max(val, ts) and ts is sorted in ascending order
// this check essentially says:
// "if we are overriding the same state as the last row, the last write was pointless"
// hence we skip the last write altogether
if (!arg_null) {
if (&state == last_state) {
assign_count--;
}
assign_sel[assign_count++] = UnsafeNumericCast<sel_t>(i);
last_state = &state;
}
state.is_initialized = true;
}
}
if (assign_count == 0) {
// no need to assign anything: nothing left to do
return;
}
Vector sort_key(LogicalType::BLOB);
auto modifiers = OrderModifiers(ORDER_TYPE, OrderByNullType::NULLS_LAST);
// slice with a selection vector and generate sort keys
SelectionVector sel(assign_sel);
Vector sliced_input(arg, sel, assign_count);
CreateSortKeyHelpers::CreateSortKey(sliced_input, assign_count, modifiers, sort_key);
auto sort_key_data = FlatVector::GetData<string_t>(sort_key);
// now assign sort keys
for (idx_t i = 0; i < assign_count; i++) {
const auto sidx = sdata.sel->get_index(sel.get_index(i));
auto &state = *states[sidx];
STATE::template AssignValue<ARG_TYPE>(state.arg, sort_key_data[i], aggregate_input_data);
}
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &aggregate_input_data) {
if (!source.is_initialized) {
return;
}
if (!target.is_initialized || target.val_null ||
(!source.val_null && COMPARATOR::Operation(source.value, target.value))) {
target.val_null = source.val_null;
if (!target.val_null) {
STATE::template AssignValue<typename STATE::BY_TYPE>(target.value, source.value, aggregate_input_data);
}
target.arg_null = source.arg_null;
if (!target.arg_null) {
STATE::template AssignValue<typename STATE::ARG_TYPE>(target.arg, source.arg, aggregate_input_data);
}
target.is_initialized = true;
}
}
template <class STATE>
static void Finalize(STATE &state, AggregateFinalizeData &finalize_data) {
if (!state.is_initialized || state.arg_null) {
finalize_data.ReturnNull();
} else {
CreateSortKeyHelpers::DecodeSortKey(state.arg, finalize_data.result, finalize_data.result_idx,
OrderModifiers(ORDER_TYPE, OrderByNullType::NULLS_LAST));
}
}
template <ArgMinMaxNullHandling NULL_HANDLING>
static unique_ptr<FunctionData> Bind(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
if (arguments[1]->return_type.InternalType() == PhysicalType::VARCHAR) {
ExpressionBinder::PushCollation(context, arguments[1], arguments[1]->return_type);
}
function.arguments[0] = arguments[0]->return_type;
function.return_type = arguments[0]->return_type;
auto function_data = make_uniq<ArgMinMaxFunctionData>(NULL_HANDLING);
return unique_ptr<FunctionData>(std::move(function_data));
}
};
template <class OP>
bind_aggregate_function_t GetBindFunction(const ArgMinMaxNullHandling null_handling) {
switch (null_handling) {
case ArgMinMaxNullHandling::HANDLE_ARG_NULL:
return OP::template Bind<ArgMinMaxNullHandling::HANDLE_ARG_NULL>;
case ArgMinMaxNullHandling::HANDLE_ANY_NULL:
return OP::template Bind<ArgMinMaxNullHandling::HANDLE_ANY_NULL>;
default:
return OP::template Bind<ArgMinMaxNullHandling::IGNORE_ANY_NULL>;
}
}
template <class OP>
AggregateFunction GetGenericArgMinMaxFunction(const ArgMinMaxNullHandling null_handling) {
using STATE = ArgMinMaxState<string_t, string_t>;
auto bind = GetBindFunction<OP>(null_handling);
return AggregateFunction(
{LogicalType::ANY, LogicalType::ANY}, LogicalType::ANY, AggregateFunction::StateSize<STATE>,
AggregateFunction::StateInitialize<STATE, OP, AggregateDestructorType::LEGACY>, OP::template Update<STATE>,
AggregateFunction::StateCombine<STATE, OP>, AggregateFunction::StateVoidFinalize<STATE, OP>, nullptr, bind,
AggregateFunction::StateDestroy<STATE, OP>);
}
template <class OP, class ARG_TYPE, class BY_TYPE>
AggregateFunction GetVectorArgMinMaxFunctionInternal(const LogicalType &by_type, const LogicalType &type,
const ArgMinMaxNullHandling null_handling) {
#ifndef DUCKDB_SMALLER_BINARY
using STATE = ArgMinMaxState<ARG_TYPE, BY_TYPE>;
auto bind = GetBindFunction<OP>(null_handling);
return AggregateFunction({type, by_type}, type, AggregateFunction::StateSize<STATE>,
AggregateFunction::StateInitialize<STATE, OP, AggregateDestructorType::LEGACY>,
OP::template Update<STATE>, AggregateFunction::StateCombine<STATE, OP>,
AggregateFunction::StateVoidFinalize<STATE, OP>, nullptr, bind,
AggregateFunction::StateDestroy<STATE, OP>);
#else
auto function = GetGenericArgMinMaxFunction<OP>(null_handling);
function.arguments = {type, by_type};
function.return_type = type;
return function;
#endif
}
#ifndef DUCKDB_SMALLER_BINARY
template <class OP, class ARG_TYPE>
AggregateFunction GetVectorArgMinMaxFunctionBy(const LogicalType &by_type, const LogicalType &type,
const ArgMinMaxNullHandling null_handling) {
switch (by_type.InternalType()) {
case PhysicalType::INT32:
return GetVectorArgMinMaxFunctionInternal<OP, ARG_TYPE, int32_t>(by_type, type, null_handling);
case PhysicalType::INT64:
return GetVectorArgMinMaxFunctionInternal<OP, ARG_TYPE, int64_t>(by_type, type, null_handling);
case PhysicalType::INT128:
return GetVectorArgMinMaxFunctionInternal<OP, ARG_TYPE, hugeint_t>(by_type, type, null_handling);
case PhysicalType::DOUBLE:
return GetVectorArgMinMaxFunctionInternal<OP, ARG_TYPE, double>(by_type, type, null_handling);
case PhysicalType::VARCHAR:
return GetVectorArgMinMaxFunctionInternal<OP, ARG_TYPE, string_t>(by_type, type, null_handling);
default:
throw InternalException("Unimplemented arg_min/arg_max aggregate");
}
}
#endif
const vector<LogicalType> ArgMaxByTypes() {
vector<LogicalType> types = {LogicalType::INTEGER, LogicalType::BIGINT, LogicalType::HUGEINT,
LogicalType::DOUBLE, LogicalType::VARCHAR, LogicalType::DATE,
LogicalType::TIMESTAMP, LogicalType::TIMESTAMP_TZ, LogicalType::BLOB};
return types;
}
template <class OP, class ARG_TYPE>
void AddVectorArgMinMaxFunctionBy(AggregateFunctionSet &fun, const LogicalType &type,
const ArgMinMaxNullHandling null_handling) {
auto by_types = ArgMaxByTypes();
for (const auto &by_type : by_types) {
#ifndef DUCKDB_SMALLER_BINARY
fun.AddFunction(GetVectorArgMinMaxFunctionBy<OP, ARG_TYPE>(by_type, type, null_handling));
#else
fun.AddFunction(GetVectorArgMinMaxFunctionInternal<OP, string_t, string_t>(by_type, type, null_handling));
#endif
}
}
template <class OP, class ARG_TYPE, class BY_TYPE>
AggregateFunction GetArgMinMaxFunctionInternal(const LogicalType &by_type, const LogicalType &type,
const ArgMinMaxNullHandling null_handling) {
#ifndef DUCKDB_SMALLER_BINARY
using STATE = ArgMinMaxState<ARG_TYPE, BY_TYPE>;
auto function =
AggregateFunction::BinaryAggregate<STATE, ARG_TYPE, BY_TYPE, ARG_TYPE, OP, AggregateDestructorType::LEGACY>(
type, by_type, type);
if (type.InternalType() == PhysicalType::VARCHAR || by_type.InternalType() == PhysicalType::VARCHAR) {
function.destructor = AggregateFunction::StateDestroy<STATE, OP>;
}
function.bind = GetBindFunction<OP>(null_handling);
#else
auto function = GetGenericArgMinMaxFunction<OP>(null_handling);
function.arguments = {type, by_type};
function.return_type = type;
#endif
return function;
}
#ifndef DUCKDB_SMALLER_BINARY
template <class OP, class ARG_TYPE>
AggregateFunction GetArgMinMaxFunctionBy(const LogicalType &by_type, const LogicalType &type,
const ArgMinMaxNullHandling null_handling) {
switch (by_type.InternalType()) {
case PhysicalType::INT32:
return GetArgMinMaxFunctionInternal<OP, ARG_TYPE, int32_t>(by_type, type, null_handling);
case PhysicalType::INT64:
return GetArgMinMaxFunctionInternal<OP, ARG_TYPE, int64_t>(by_type, type, null_handling);
case PhysicalType::INT128:
return GetArgMinMaxFunctionInternal<OP, ARG_TYPE, hugeint_t>(by_type, type, null_handling);
case PhysicalType::DOUBLE:
return GetArgMinMaxFunctionInternal<OP, ARG_TYPE, double>(by_type, type, null_handling);
case PhysicalType::VARCHAR:
return GetArgMinMaxFunctionInternal<OP, ARG_TYPE, string_t>(by_type, type, null_handling);
default:
throw InternalException("Unimplemented arg_min/arg_max by aggregate");
}
}
#endif
template <class OP, class ARG_TYPE>
void AddArgMinMaxFunctionBy(AggregateFunctionSet &fun, const LogicalType &type, ArgMinMaxNullHandling null_handling) {
auto by_types = ArgMaxByTypes();
for (const auto &by_type : by_types) {
#ifndef DUCKDB_SMALLER_BINARY
fun.AddFunction(GetArgMinMaxFunctionBy<OP, ARG_TYPE>(by_type, type, null_handling));
#else
fun.AddFunction(GetArgMinMaxFunctionInternal<OP, string_t, string_t>(by_type, type, null_handling));
#endif
}
}
template <class OP>
AggregateFunction GetDecimalArgMinMaxFunction(const LogicalType &by_type, const LogicalType &type,
ArgMinMaxNullHandling null_handling) {
D_ASSERT(type.id() == LogicalTypeId::DECIMAL);
#ifndef DUCKDB_SMALLER_BINARY
switch (type.InternalType()) {
case PhysicalType::INT16:
return GetArgMinMaxFunctionBy<OP, int16_t>(by_type, type, null_handling);
case PhysicalType::INT32:
return GetArgMinMaxFunctionBy<OP, int32_t>(by_type, type, null_handling);
case PhysicalType::INT64:
return GetArgMinMaxFunctionBy<OP, int64_t>(by_type, type, null_handling);
default:
return GetArgMinMaxFunctionBy<OP, hugeint_t>(by_type, type, null_handling);
}
#else
return GetArgMinMaxFunctionInternal<OP, string_t, string_t>(by_type, type, null_handling);
#endif
}
template <class OP, ArgMinMaxNullHandling NULL_HANDLING>
unique_ptr<FunctionData> BindDecimalArgMinMax(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
auto decimal_type = arguments[0]->return_type;
auto by_type = arguments[1]->return_type;
// To avoid a combinatorial explosion, cast the ordering argument to one from the list
auto by_types = ArgMaxByTypes();
idx_t best_target = DConstants::INVALID_INDEX;
int64_t lowest_cost = NumericLimits<int64_t>::Maximum();
for (idx_t i = 0; i < by_types.size(); ++i) {
// Before falling back to casting, check for a physical type match for the by_type
if (by_types[i].InternalType() == by_type.InternalType()) {
lowest_cost = 0;
best_target = DConstants::INVALID_INDEX;
break;
}
auto cast_cost = CastFunctionSet::ImplicitCastCost(context, by_type, by_types[i]);
if (cast_cost < 0) {
continue;
}
if (cast_cost < lowest_cost) {
best_target = i;
}
}
if (best_target != DConstants::INVALID_INDEX) {
by_type = by_types[best_target];
}
auto name = std::move(function.name);
function = GetDecimalArgMinMaxFunction<OP>(by_type, decimal_type, NULL_HANDLING);
function.name = std::move(name);
function.return_type = decimal_type;
auto function_data = make_uniq<ArgMinMaxFunctionData>(NULL_HANDLING);
return unique_ptr<FunctionData>(std::move(function_data));
}
template <class OP>
void AddDecimalArgMinMaxFunctionBy(AggregateFunctionSet &fun, const LogicalType &by_type,
const ArgMinMaxNullHandling null_handling) {
switch (null_handling) {
case ArgMinMaxNullHandling::IGNORE_ANY_NULL:
fun.AddFunction(AggregateFunction({LogicalTypeId::DECIMAL, by_type}, LogicalTypeId::DECIMAL, nullptr, nullptr,
nullptr, nullptr, nullptr, nullptr,
BindDecimalArgMinMax<OP, ArgMinMaxNullHandling::IGNORE_ANY_NULL>));
break;
case ArgMinMaxNullHandling::HANDLE_ARG_NULL:
fun.AddFunction(AggregateFunction({LogicalTypeId::DECIMAL, by_type}, LogicalTypeId::DECIMAL, nullptr, nullptr,
nullptr, nullptr, nullptr, nullptr,
BindDecimalArgMinMax<OP, ArgMinMaxNullHandling::HANDLE_ARG_NULL>));
break;
case ArgMinMaxNullHandling::HANDLE_ANY_NULL:
fun.AddFunction(AggregateFunction({LogicalTypeId::DECIMAL, by_type}, LogicalTypeId::DECIMAL, nullptr, nullptr,
nullptr, nullptr, nullptr, nullptr,
BindDecimalArgMinMax<OP, ArgMinMaxNullHandling::HANDLE_ANY_NULL>));
break;
}
}
template <class OP>
void AddGenericArgMinMaxFunction(AggregateFunctionSet &fun, const ArgMinMaxNullHandling null_handling) {
fun.AddFunction(GetGenericArgMinMaxFunction<OP>(null_handling));
}
template <class COMPARATOR, OrderType ORDER_TYPE>
void AddArgMinMaxFunctions(AggregateFunctionSet &fun, const ArgMinMaxNullHandling null_handling) {
using GENERIC_VECTOR_OP = VectorArgMinMaxBase<LessThan, ORDER_TYPE, GenericArgMinMaxState<ORDER_TYPE>>;
#ifndef DUCKDB_SMALLER_BINARY
using OP = ArgMinMaxBase<COMPARATOR>;
using VECTOR_OP = VectorArgMinMaxBase<COMPARATOR, ORDER_TYPE>;
#else
using OP = GENERIC_VECTOR_OP;
using VECTOR_OP = GENERIC_VECTOR_OP;
#endif
AddArgMinMaxFunctionBy<OP, int32_t>(fun, LogicalType::INTEGER, null_handling);
AddArgMinMaxFunctionBy<OP, int64_t>(fun, LogicalType::BIGINT, null_handling);
AddArgMinMaxFunctionBy<OP, double>(fun, LogicalType::DOUBLE, null_handling);
AddArgMinMaxFunctionBy<OP, string_t>(fun, LogicalType::VARCHAR, null_handling);
AddArgMinMaxFunctionBy<OP, date_t>(fun, LogicalType::DATE, null_handling);
AddArgMinMaxFunctionBy<OP, timestamp_t>(fun, LogicalType::TIMESTAMP, null_handling);
AddArgMinMaxFunctionBy<OP, timestamp_t>(fun, LogicalType::TIMESTAMP_TZ, null_handling);
AddArgMinMaxFunctionBy<OP, string_t>(fun, LogicalType::BLOB, null_handling);
auto by_types = ArgMaxByTypes();
for (const auto &by_type : by_types) {
AddDecimalArgMinMaxFunctionBy<OP>(fun, by_type, null_handling);
}
AddVectorArgMinMaxFunctionBy<VECTOR_OP, string_t>(fun, LogicalType::ANY, null_handling);
// we always use LessThan when using sort keys because the ORDER_TYPE takes care of selecting the lowest or highest
AddGenericArgMinMaxFunction<GENERIC_VECTOR_OP>(fun, null_handling);
}
//------------------------------------------------------------------------------
// ArgMinMax(N) Function
//------------------------------------------------------------------------------
//------------------------------------------------------------------------------
// State
//------------------------------------------------------------------------------
template <class A, class B, class COMPARATOR>
class ArgMinMaxNState {
public:
using VAL_TYPE = A;
using ARG_TYPE = B;
using V = typename VAL_TYPE::TYPE;
using K = typename ARG_TYPE::TYPE;
BinaryAggregateHeap<K, V, COMPARATOR> heap;
bool is_initialized = false;
void Initialize(ArenaAllocator &allocator, idx_t nval) {
heap.Initialize(allocator, nval);
is_initialized = true;
}
};
//------------------------------------------------------------------------------
// Operation
//------------------------------------------------------------------------------
template <class STATE>
void ArgMinMaxNUpdate(Vector inputs[], AggregateInputData &aggr_input, idx_t input_count, Vector &state_vector,
idx_t count) {
D_ASSERT(aggr_input.bind_data);
const auto &bind_data = aggr_input.bind_data->Cast<ArgMinMaxFunctionData>();
auto &val_vector = inputs[0];
auto &arg_vector = inputs[1];
auto &n_vector = inputs[2];
UnifiedVectorFormat val_format;
UnifiedVectorFormat arg_format;
UnifiedVectorFormat n_format;
UnifiedVectorFormat state_format;
auto val_extra_state = STATE::VAL_TYPE::CreateExtraState(val_vector, count);
auto arg_extra_state = STATE::ARG_TYPE::CreateExtraState(arg_vector, count);
STATE::VAL_TYPE::PrepareData(val_vector, count, val_extra_state, val_format, bind_data.nulls_last);
STATE::ARG_TYPE::PrepareData(arg_vector, count, arg_extra_state, arg_format, bind_data.nulls_last);
n_vector.ToUnifiedFormat(count, n_format);
state_vector.ToUnifiedFormat(count, state_format);
auto states = UnifiedVectorFormat::GetData<STATE *>(state_format);
for (idx_t i = 0; i < count; i++) {
const auto arg_idx = arg_format.sel->get_index(i);
const auto val_idx = val_format.sel->get_index(i);
if (bind_data.null_handling == ArgMinMaxNullHandling::IGNORE_ANY_NULL &&
(!arg_format.validity.RowIsValid(arg_idx) || !val_format.validity.RowIsValid(val_idx))) {
continue;
}
if (bind_data.null_handling == ArgMinMaxNullHandling::HANDLE_ARG_NULL &&
!val_format.validity.RowIsValid(val_idx)) {
continue;
}
const auto state_idx = state_format.sel->get_index(i);
auto &state = *states[state_idx];
// Initialize the heap if necessary and add the input to the heap
if (!state.is_initialized) {
static constexpr int64_t MAX_N = 1000000;
const auto nidx = n_format.sel->get_index(i);
if (!n_format.validity.RowIsValid(nidx)) {
throw InvalidInputException("Invalid input for arg_min/arg_max: n value cannot be NULL");
}
const auto nval = UnifiedVectorFormat::GetData<int64_t>(n_format)[nidx];
if (nval <= 0) {
throw InvalidInputException("Invalid input for arg_min/arg_max: n value must be > 0");
}
if (nval >= MAX_N) {
throw InvalidInputException("Invalid input for arg_min/arg_max: n value must be < %d", MAX_N);
}
state.Initialize(aggr_input.allocator, UnsafeNumericCast<idx_t>(nval));
}
// Now add the input to the heap
auto arg_val = STATE::ARG_TYPE::Create(arg_format, arg_idx);
auto val_val = STATE::VAL_TYPE::Create(val_format, val_idx);
state.heap.Insert(aggr_input.allocator, arg_val, val_val);
}
}
//------------------------------------------------------------------------------
// Bind
//------------------------------------------------------------------------------
template <class VAL_TYPE, class ARG_TYPE, class COMPARATOR>
void SpecializeArgMinMaxNFunction(AggregateFunction &function) {
using STATE = ArgMinMaxNState<VAL_TYPE, ARG_TYPE, COMPARATOR>;
using OP = MinMaxNOperation;
function.state_size = AggregateFunction::StateSize<STATE>;
function.initialize = AggregateFunction::StateInitialize<STATE, OP, AggregateDestructorType::LEGACY>;
function.combine = AggregateFunction::StateCombine<STATE, OP>;
function.destructor = AggregateFunction::StateDestroy<STATE, OP>;
function.finalize = MinMaxNOperation::Finalize<STATE>;
function.update = ArgMinMaxNUpdate<STATE>;
}
template <class VAL_TYPE, class COMPARATOR>
void SpecializeArgMinMaxNFunction(PhysicalType arg_type, AggregateFunction &function) {
switch (arg_type) {
#ifndef DUCKDB_SMALLER_BINARY
case PhysicalType::VARCHAR:
SpecializeArgMinMaxNFunction<VAL_TYPE, MinMaxStringValue, COMPARATOR>(function);
break;
case PhysicalType::INT32:
SpecializeArgMinMaxNFunction<VAL_TYPE, MinMaxFixedValue<int32_t>, COMPARATOR>(function);
break;
case PhysicalType::INT64:
SpecializeArgMinMaxNFunction<VAL_TYPE, MinMaxFixedValue<int64_t>, COMPARATOR>(function);
break;
case PhysicalType::FLOAT:
SpecializeArgMinMaxNFunction<VAL_TYPE, MinMaxFixedValue<float>, COMPARATOR>(function);
break;
case PhysicalType::DOUBLE:
SpecializeArgMinMaxNFunction<VAL_TYPE, MinMaxFixedValue<double>, COMPARATOR>(function);
break;
#endif
default:
SpecializeArgMinMaxNFunction<VAL_TYPE, MinMaxFallbackValue, COMPARATOR>(function);
break;
}
}
template <class COMPARATOR>
void SpecializeArgMinMaxNFunction(PhysicalType val_type, PhysicalType arg_type, AggregateFunction &function) {
switch (val_type) {
#ifndef DUCKDB_SMALLER_BINARY
case PhysicalType::VARCHAR:
SpecializeArgMinMaxNFunction<MinMaxStringValue, COMPARATOR>(arg_type, function);
break;
case PhysicalType::INT32:
SpecializeArgMinMaxNFunction<MinMaxFixedValue<int32_t>, COMPARATOR>(arg_type, function);
break;
case PhysicalType::INT64:
SpecializeArgMinMaxNFunction<MinMaxFixedValue<int64_t>, COMPARATOR>(arg_type, function);
break;
case PhysicalType::FLOAT:
SpecializeArgMinMaxNFunction<MinMaxFixedValue<float>, COMPARATOR>(arg_type, function);
break;
case PhysicalType::DOUBLE:
SpecializeArgMinMaxNFunction<MinMaxFixedValue<double>, COMPARATOR>(arg_type, function);
break;
#endif
default:
SpecializeArgMinMaxNFunction<MinMaxFallbackValue, COMPARATOR>(arg_type, function);
break;
}
}
template <class VAL_TYPE, class ARG_TYPE, class COMPARATOR>
void SpecializeArgMinMaxNullNFunction(AggregateFunction &function) {
using STATE = ArgMinMaxNState<VAL_TYPE, ARG_TYPE, COMPARATOR>;
using OP = MinMaxNOperation;
function.state_size = AggregateFunction::StateSize<STATE>;
function.initialize = AggregateFunction::StateInitialize<STATE, OP, AggregateDestructorType::LEGACY>;
function.combine = AggregateFunction::StateCombine<STATE, OP>;
function.destructor = AggregateFunction::StateDestroy<STATE, OP>;
function.finalize = MinMaxNOperation::Finalize<STATE>;
function.update = ArgMinMaxNUpdate<STATE>;
}
template <class VAL_TYPE, bool NULLS_LAST, class COMPARATOR>
void SpecializeArgMinMaxNullNFunction(PhysicalType arg_type, AggregateFunction &function) {
switch (arg_type) {
#ifndef DUCKDB_SMALLER_BINARY
case PhysicalType::VARCHAR:
SpecializeArgMinMaxNullNFunction<VAL_TYPE, MinMaxFallbackValue, COMPARATOR>(function);
break;
case PhysicalType::INT32:
SpecializeArgMinMaxNullNFunction<VAL_TYPE, MinMaxFixedValueOrNull<int32_t, NULLS_LAST>, COMPARATOR>(function);
break;
case PhysicalType::INT64:
SpecializeArgMinMaxNullNFunction<VAL_TYPE, MinMaxFixedValueOrNull<int64_t, NULLS_LAST>, COMPARATOR>(function);
break;
case PhysicalType::FLOAT:
SpecializeArgMinMaxNullNFunction<VAL_TYPE, MinMaxFixedValueOrNull<float, NULLS_LAST>, COMPARATOR>(function);
break;
case PhysicalType::DOUBLE:
SpecializeArgMinMaxNullNFunction<VAL_TYPE, MinMaxFixedValueOrNull<double, NULLS_LAST>, COMPARATOR>(function);
break;
#endif
default:
SpecializeArgMinMaxNullNFunction<VAL_TYPE, MinMaxFallbackValue, COMPARATOR>(function);
break;
}
}
template <bool NULLS_LAST, class COMPARATOR>
void SpecializeArgMinMaxNullNFunction(PhysicalType val_type, PhysicalType arg_type, AggregateFunction &function) {
switch (val_type) {
#ifndef DUCKDB_SMALLER_BINARY
case PhysicalType::VARCHAR:
SpecializeArgMinMaxNullNFunction<MinMaxFallbackValue, NULLS_LAST, COMPARATOR>(arg_type, function);
break;
case PhysicalType::INT32:
SpecializeArgMinMaxNullNFunction<MinMaxFixedValueOrNull<int32_t, NULLS_LAST>, NULLS_LAST, COMPARATOR>(arg_type,
function);
break;
case PhysicalType::INT64:
SpecializeArgMinMaxNullNFunction<MinMaxFixedValueOrNull<int64_t, NULLS_LAST>, NULLS_LAST, COMPARATOR>(arg_type,
function);
break;
case PhysicalType::FLOAT:
SpecializeArgMinMaxNullNFunction<MinMaxFixedValueOrNull<float, NULLS_LAST>, NULLS_LAST, COMPARATOR>(arg_type,
function);
break;
case PhysicalType::DOUBLE:
SpecializeArgMinMaxNullNFunction<MinMaxFixedValueOrNull<double, NULLS_LAST>, NULLS_LAST, COMPARATOR>(arg_type,
function);
break;
#endif
default:
SpecializeArgMinMaxNullNFunction<MinMaxFallbackValue, NULLS_LAST, COMPARATOR>(arg_type, function);
break;
}
}
template <ArgMinMaxNullHandling NULL_HANDLING, bool NULLS_LAST, class COMPARATOR>
unique_ptr<FunctionData> ArgMinMaxNBind(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
for (auto &arg : arguments) {
if (arg->return_type.id() == LogicalTypeId::UNKNOWN) {
throw ParameterNotResolvedException();
}
}
const auto val_type = arguments[0]->return_type.InternalType();
const auto arg_type = arguments[1]->return_type.InternalType();
function.return_type = LogicalType::LIST(arguments[0]->return_type);
// Specialize the function based on the input types
auto function_data = make_uniq<ArgMinMaxFunctionData>(NULL_HANDLING, NULLS_LAST);
if (NULL_HANDLING != ArgMinMaxNullHandling::IGNORE_ANY_NULL) {
SpecializeArgMinMaxNullNFunction<NULLS_LAST, COMPARATOR>(val_type, arg_type, function);
} else {
SpecializeArgMinMaxNFunction<COMPARATOR>(val_type, arg_type, function);
}
return unique_ptr<FunctionData>(std::move(function_data));
}
template <ArgMinMaxNullHandling NULL_HANDLING, bool NULLS_LAST, class COMPARATOR>
void AddArgMinMaxNFunction(AggregateFunctionSet &set) {
AggregateFunction function({LogicalTypeId::ANY, LogicalTypeId::ANY, LogicalType::BIGINT},
LogicalType::LIST(LogicalType::ANY), nullptr, nullptr, nullptr, nullptr, nullptr,
nullptr, ArgMinMaxNBind<NULL_HANDLING, NULLS_LAST, COMPARATOR>);
return set.AddFunction(function);
}
} // namespace
//------------------------------------------------------------------------------
// Function Registration
//------------------------------------------------------------------------------
AggregateFunctionSet ArgMinFun::GetFunctions() {
AggregateFunctionSet fun;
AddArgMinMaxFunctions<LessThan, OrderType::ASCENDING>(fun, ArgMinMaxNullHandling::IGNORE_ANY_NULL);
AddArgMinMaxNFunction<ArgMinMaxNullHandling::IGNORE_ANY_NULL, true, LessThan>(fun);
return fun;
}
AggregateFunctionSet ArgMaxFun::GetFunctions() {
AggregateFunctionSet fun;
AddArgMinMaxFunctions<GreaterThan, OrderType::DESCENDING>(fun, ArgMinMaxNullHandling::IGNORE_ANY_NULL);
AddArgMinMaxNFunction<ArgMinMaxNullHandling::IGNORE_ANY_NULL, false, GreaterThan>(fun);
return fun;
}
AggregateFunctionSet ArgMinNullFun::GetFunctions() {
AggregateFunctionSet fun;
AddArgMinMaxFunctions<LessThan, OrderType::ASCENDING>(fun, ArgMinMaxNullHandling::HANDLE_ARG_NULL);
return fun;
}
AggregateFunctionSet ArgMaxNullFun::GetFunctions() {
AggregateFunctionSet fun;
AddArgMinMaxFunctions<GreaterThan, OrderType::DESCENDING>(fun, ArgMinMaxNullHandling::HANDLE_ARG_NULL);
return fun;
}
AggregateFunctionSet ArgMinNullsLastFun::GetFunctions() {
AggregateFunctionSet fun;
AddArgMinMaxFunctions<LessThan, OrderType::ASCENDING>(fun, ArgMinMaxNullHandling::HANDLE_ANY_NULL);
AddArgMinMaxNFunction<ArgMinMaxNullHandling::HANDLE_ANY_NULL, true, LessThan>(fun);
return fun;
}
AggregateFunctionSet ArgMaxNullsLastFun::GetFunctions() {
AggregateFunctionSet fun;
AddArgMinMaxFunctions<GreaterThan, OrderType::DESCENDING>(fun, ArgMinMaxNullHandling::HANDLE_ANY_NULL);
AddArgMinMaxNFunction<ArgMinMaxNullHandling::HANDLE_ANY_NULL, false, GreaterThan>(fun);
return fun;
}
} // namespace duckdb

View File

@@ -0,0 +1,235 @@
#include "core_functions/aggregate/distributive_functions.hpp"
#include "duckdb/common/exception.hpp"
#include "duckdb/common/types/null_value.hpp"
#include "duckdb/common/vector_operations/vector_operations.hpp"
#include "duckdb/common/vector_operations/aggregate_executor.hpp"
#include "duckdb/common/types/bit.hpp"
#include "duckdb/common/types/cast_helpers.hpp"
namespace duckdb {
namespace {
template <class T>
struct BitState {
using TYPE = T;
bool is_set;
T value;
};
template <class OP>
AggregateFunction GetBitfieldUnaryAggregate(LogicalType type) {
switch (type.id()) {
case LogicalTypeId::TINYINT:
return AggregateFunction::UnaryAggregate<BitState<uint8_t>, int8_t, int8_t, OP>(type, type);
case LogicalTypeId::SMALLINT:
return AggregateFunction::UnaryAggregate<BitState<uint16_t>, int16_t, int16_t, OP>(type, type);
case LogicalTypeId::INTEGER:
return AggregateFunction::UnaryAggregate<BitState<uint32_t>, int32_t, int32_t, OP>(type, type);
case LogicalTypeId::BIGINT:
return AggregateFunction::UnaryAggregate<BitState<uint64_t>, int64_t, int64_t, OP>(type, type);
case LogicalTypeId::HUGEINT:
return AggregateFunction::UnaryAggregate<BitState<hugeint_t>, hugeint_t, hugeint_t, OP>(type, type);
case LogicalTypeId::UTINYINT:
return AggregateFunction::UnaryAggregate<BitState<uint8_t>, uint8_t, uint8_t, OP>(type, type);
case LogicalTypeId::USMALLINT:
return AggregateFunction::UnaryAggregate<BitState<uint16_t>, uint16_t, uint16_t, OP>(type, type);
case LogicalTypeId::UINTEGER:
return AggregateFunction::UnaryAggregate<BitState<uint32_t>, uint32_t, uint32_t, OP>(type, type);
case LogicalTypeId::UBIGINT:
return AggregateFunction::UnaryAggregate<BitState<uint64_t>, uint64_t, uint64_t, OP>(type, type);
case LogicalTypeId::UHUGEINT:
return AggregateFunction::UnaryAggregate<BitState<uhugeint_t>, uhugeint_t, uhugeint_t, OP>(type, type);
default:
throw InternalException("Unimplemented bitfield type for unary aggregate");
}
}
struct BitwiseOperation {
template <class STATE>
static void Initialize(STATE &state) {
// If there are no matching rows, returns a null value.
state.is_set = false;
}
template <class INPUT_TYPE, class STATE, class OP>
static void Operation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &) {
if (!state.is_set) {
OP::template Assign<INPUT_TYPE>(state, input);
state.is_set = true;
} else {
OP::template Execute<INPUT_TYPE>(state, input);
}
}
template <class INPUT_TYPE, class STATE, class OP>
static void ConstantOperation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input,
idx_t count) {
OP::template Operation<INPUT_TYPE, STATE, OP>(state, input, unary_input);
}
template <class INPUT_TYPE, class STATE>
static void Assign(STATE &state, INPUT_TYPE input) {
state.value = typename STATE::TYPE(input);
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &) {
if (!source.is_set) {
// source is NULL, nothing to do.
return;
}
if (!target.is_set) {
// target is NULL, use source value directly.
OP::template Assign<typename STATE::TYPE>(target, source.value);
target.is_set = true;
} else {
OP::template Execute<typename STATE::TYPE>(target, source.value);
}
}
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (!state.is_set) {
finalize_data.ReturnNull();
} else {
target = T(state.value);
}
}
static bool IgnoreNull() {
return true;
}
};
struct BitAndOperation : public BitwiseOperation {
template <class INPUT_TYPE, class STATE>
static void Execute(STATE &state, INPUT_TYPE input) {
state.value &= typename STATE::TYPE(input);
;
}
};
struct BitOrOperation : public BitwiseOperation {
template <class INPUT_TYPE, class STATE>
static void Execute(STATE &state, INPUT_TYPE input) {
state.value |= typename STATE::TYPE(input);
;
}
};
struct BitXorOperation : public BitwiseOperation {
template <class INPUT_TYPE, class STATE>
static void Execute(STATE &state, INPUT_TYPE input) {
state.value ^= typename STATE::TYPE(input);
}
template <class INPUT_TYPE, class STATE, class OP>
static void ConstantOperation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input,
idx_t count) {
for (idx_t i = 0; i < count; i++) {
Operation<INPUT_TYPE, STATE, OP>(state, input, unary_input);
}
}
};
struct BitStringBitwiseOperation : public BitwiseOperation {
template <class STATE>
static void Destroy(STATE &state, AggregateInputData &aggr_input_data) {
if (state.is_set && !state.value.IsInlined()) {
delete[] state.value.GetData();
}
}
template <class INPUT_TYPE, class STATE>
static void Assign(STATE &state, INPUT_TYPE input) {
D_ASSERT(state.is_set == false);
if (input.IsInlined()) {
state.value = input;
} else { // non-inlined string, need to allocate space for it
auto len = input.GetSize();
auto ptr = new char[len];
memcpy(ptr, input.GetData(), len);
state.value = string_t(ptr, UnsafeNumericCast<uint32_t>(len));
}
}
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (!state.is_set) {
finalize_data.ReturnNull();
} else {
target = finalize_data.ReturnString(state.value);
}
}
};
struct BitStringAndOperation : public BitStringBitwiseOperation {
template <class INPUT_TYPE, class STATE>
static void Execute(STATE &state, INPUT_TYPE input) {
Bit::BitwiseAnd(input, state.value, state.value);
}
};
struct BitStringOrOperation : public BitStringBitwiseOperation {
template <class INPUT_TYPE, class STATE>
static void Execute(STATE &state, INPUT_TYPE input) {
Bit::BitwiseOr(input, state.value, state.value);
}
};
struct BitStringXorOperation : public BitStringBitwiseOperation {
template <class INPUT_TYPE, class STATE>
static void Execute(STATE &state, INPUT_TYPE input) {
Bit::BitwiseXor(input, state.value, state.value);
}
template <class INPUT_TYPE, class STATE, class OP>
static void ConstantOperation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input,
idx_t count) {
for (idx_t i = 0; i < count; i++) {
Operation<INPUT_TYPE, STATE, OP>(state, input, unary_input);
}
}
};
} // namespace
AggregateFunctionSet BitAndFun::GetFunctions() {
AggregateFunctionSet bit_and;
for (auto &type : LogicalType::Integral()) {
bit_and.AddFunction(GetBitfieldUnaryAggregate<BitAndOperation>(type));
}
bit_and.AddFunction(
AggregateFunction::UnaryAggregateDestructor<BitState<string_t>, string_t, string_t, BitStringAndOperation>(
LogicalType::BIT, LogicalType::BIT));
return bit_and;
}
AggregateFunctionSet BitOrFun::GetFunctions() {
AggregateFunctionSet bit_or;
for (auto &type : LogicalType::Integral()) {
bit_or.AddFunction(GetBitfieldUnaryAggregate<BitOrOperation>(type));
}
bit_or.AddFunction(
AggregateFunction::UnaryAggregateDestructor<BitState<string_t>, string_t, string_t, BitStringOrOperation>(
LogicalType::BIT, LogicalType::BIT));
return bit_or;
}
AggregateFunctionSet BitXorFun::GetFunctions() {
AggregateFunctionSet bit_xor;
for (auto &type : LogicalType::Integral()) {
bit_xor.AddFunction(GetBitfieldUnaryAggregate<BitXorOperation>(type));
}
bit_xor.AddFunction(
AggregateFunction::UnaryAggregateDestructor<BitState<string_t>, string_t, string_t, BitStringXorOperation>(
LogicalType::BIT, LogicalType::BIT));
return bit_xor;
}
} // namespace duckdb

View File

@@ -0,0 +1,324 @@
#include "core_functions/aggregate/distributive_functions.hpp"
#include "duckdb/common/exception.hpp"
#include "duckdb/common/types/null_value.hpp"
#include "duckdb/common/vector_operations/aggregate_executor.hpp"
#include "duckdb/common/types/bit.hpp"
#include "duckdb/common/types/uhugeint.hpp"
#include "duckdb/storage/statistics/base_statistics.hpp"
#include "duckdb/execution/expression_executor.hpp"
#include "duckdb/common/types/cast_helpers.hpp"
#include "duckdb/common/operator/subtract.hpp"
#include "duckdb/common/serializer/deserializer.hpp"
#include "duckdb/common/serializer/serializer.hpp"
namespace duckdb {
namespace {
template <class INPUT_TYPE>
struct BitAggState {
bool is_set;
string_t value;
INPUT_TYPE min;
INPUT_TYPE max;
};
struct BitstringAggBindData : public FunctionData {
Value min;
Value max;
BitstringAggBindData() {
}
BitstringAggBindData(Value min, Value max) : min(std::move(min)), max(std::move(max)) {
}
unique_ptr<FunctionData> Copy() const override {
return make_uniq<BitstringAggBindData>(*this);
}
bool Equals(const FunctionData &other_p) const override {
auto &other = other_p.Cast<BitstringAggBindData>();
if (min.IsNull() && other.min.IsNull() && max.IsNull() && other.max.IsNull()) {
return true;
}
if (Value::NotDistinctFrom(min, other.min) && Value::NotDistinctFrom(max, other.max)) {
return true;
}
return false;
}
static void Serialize(Serializer &serializer, const optional_ptr<FunctionData> bind_data_p,
const AggregateFunction &) {
auto &bind_data = bind_data_p->Cast<BitstringAggBindData>();
serializer.WriteProperty(100, "min", bind_data.min);
serializer.WriteProperty(101, "max", bind_data.max);
}
static unique_ptr<FunctionData> Deserialize(Deserializer &deserializer, AggregateFunction &) {
Value min;
Value max;
deserializer.ReadProperty(100, "min", min);
deserializer.ReadProperty(101, "max", max);
return make_uniq<BitstringAggBindData>(min, max);
}
};
struct BitStringAggOperation {
static constexpr const idx_t MAX_BIT_RANGE = 1000000000; // for now capped at 1 billion bits
template <class STATE>
static void Initialize(STATE &state) {
state.is_set = false;
}
template <class INPUT_TYPE, class STATE, class OP>
static void Operation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input) {
auto &bind_agg_data = unary_input.input.bind_data->template Cast<BitstringAggBindData>();
if (!state.is_set) {
if (bind_agg_data.min.IsNull() || bind_agg_data.max.IsNull()) {
throw BinderException(
"Could not retrieve required statistics. Alternatively, try by providing the statistics "
"explicitly: BITSTRING_AGG(col, min, max) ");
}
state.min = bind_agg_data.min.GetValue<INPUT_TYPE>();
state.max = bind_agg_data.max.GetValue<INPUT_TYPE>();
if (state.min > state.max) {
throw InvalidInputException("Invalid explicit bitstring range: Minimum (%s) > maximum (%s)",
NumericHelper::ToString(state.min), NumericHelper::ToString(state.max));
}
idx_t bit_range =
GetRange(bind_agg_data.min.GetValue<INPUT_TYPE>(), bind_agg_data.max.GetValue<INPUT_TYPE>());
if (bit_range > MAX_BIT_RANGE) {
throw OutOfRangeException(
"The range between min and max value (%s <-> %s) is too large for bitstring aggregation",
NumericHelper::ToString(state.min), NumericHelper::ToString(state.max));
}
idx_t len = Bit::ComputeBitstringLen(bit_range);
auto target = len > string_t::INLINE_LENGTH ? string_t(new char[len], UnsafeNumericCast<uint32_t>(len))
: string_t(UnsafeNumericCast<uint32_t>(len));
Bit::SetEmptyBitString(target, bit_range);
state.value = target;
state.is_set = true;
}
if (input >= state.min && input <= state.max) {
Execute(state, input, bind_agg_data.min.GetValue<INPUT_TYPE>());
} else {
throw OutOfRangeException("Value %s is outside of provided min and max range (%s <-> %s)",
NumericHelper::ToString(input), NumericHelper::ToString(state.min),
NumericHelper::ToString(state.max));
}
}
template <class INPUT_TYPE, class STATE, class OP>
static void ConstantOperation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input,
idx_t count) {
OP::template Operation<INPUT_TYPE, STATE, OP>(state, input, unary_input);
}
template <class INPUT_TYPE>
static idx_t GetRange(INPUT_TYPE min, INPUT_TYPE max) {
if (min > max) {
throw InvalidInputException("Invalid explicit bitstring range: Minimum (%d) > maximum (%d)", min, max);
}
INPUT_TYPE result;
if (!TrySubtractOperator::Operation(max, min, result)) {
return NumericLimits<idx_t>::Maximum();
}
auto val = NumericCast<idx_t>(result);
if (val == NumericLimits<idx_t>::Maximum()) {
return val;
}
return val + 1;
}
template <class INPUT_TYPE, class STATE>
static void Execute(STATE &state, INPUT_TYPE input, INPUT_TYPE min) {
Bit::SetBit(state.value, UnsafeNumericCast<idx_t>(input - min), 1);
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &) {
if (!source.is_set) {
return;
}
if (!target.is_set) {
Assign(target, source.value);
target.is_set = true;
target.min = source.min;
target.max = source.max;
} else {
Bit::BitwiseOr(source.value, target.value, target.value);
}
}
template <class INPUT_TYPE, class STATE>
static void Assign(STATE &state, INPUT_TYPE input) {
D_ASSERT(state.is_set == false);
if (input.IsInlined()) {
state.value = input;
} else { // non-inlined string, need to allocate space for it
auto len = input.GetSize();
auto ptr = new char[len];
memcpy(ptr, input.GetData(), len);
state.value = string_t(ptr, UnsafeNumericCast<uint32_t>(len));
}
}
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (!state.is_set) {
finalize_data.ReturnNull();
} else {
target = StringVector::AddStringOrBlob(finalize_data.result, state.value);
}
}
template <class STATE>
static void Destroy(STATE &state, AggregateInputData &aggr_input_data) {
if (state.is_set && !state.value.IsInlined()) {
delete[] state.value.GetData();
}
}
static bool IgnoreNull() {
return true;
}
};
template <>
void BitStringAggOperation::Execute(BitAggState<hugeint_t> &state, hugeint_t input, hugeint_t min) {
idx_t val;
if (Hugeint::TryCast(input - min, val)) {
Bit::SetBit(state.value, val, 1);
} else {
throw OutOfRangeException("Range too large for bitstring aggregation");
}
}
template <>
idx_t BitStringAggOperation::GetRange(hugeint_t min, hugeint_t max) {
hugeint_t result;
if (!TrySubtractOperator::Operation(max, min, result)) {
return NumericLimits<idx_t>::Maximum();
}
idx_t range;
if (!Hugeint::TryCast(result + 1, range) || result == NumericLimits<hugeint_t>::Maximum()) {
return NumericLimits<idx_t>::Maximum();
}
return range;
}
template <>
void BitStringAggOperation::Execute(BitAggState<uhugeint_t> &state, uhugeint_t input, uhugeint_t min) {
idx_t val;
if (Uhugeint::TryCast(input - min, val)) {
Bit::SetBit(state.value, val, 1);
} else {
throw OutOfRangeException("Range too large for bitstring aggregation");
}
}
template <>
idx_t BitStringAggOperation::GetRange(uhugeint_t min, uhugeint_t max) {
uhugeint_t result;
if (!TrySubtractOperator::Operation(max, min, result)) {
return NumericLimits<idx_t>::Maximum();
}
idx_t range;
if (!Uhugeint::TryCast(result + 1, range) || result == NumericLimits<uhugeint_t>::Maximum()) {
return NumericLimits<idx_t>::Maximum();
}
return range;
}
unique_ptr<BaseStatistics> BitstringPropagateStats(ClientContext &context, BoundAggregateExpression &expr,
AggregateStatisticsInput &input) {
if (NumericStats::HasMinMax(input.child_stats[0])) {
auto &bind_agg_data = input.bind_data->Cast<BitstringAggBindData>();
bind_agg_data.min = NumericStats::Min(input.child_stats[0]);
bind_agg_data.max = NumericStats::Max(input.child_stats[0]);
}
return nullptr;
}
unique_ptr<FunctionData> BindBitstringAgg(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
if (arguments.size() == 3) {
if (!arguments[1]->IsFoldable() || !arguments[2]->IsFoldable()) {
throw BinderException("bitstring_agg requires a constant min and max argument");
}
auto min = ExpressionExecutor::EvaluateScalar(context, *arguments[1]);
auto max = ExpressionExecutor::EvaluateScalar(context, *arguments[2]);
Function::EraseArgument(function, arguments, 2);
Function::EraseArgument(function, arguments, 1);
return make_uniq<BitstringAggBindData>(min, max);
}
return make_uniq<BitstringAggBindData>();
}
template <class TYPE>
void BindBitString(AggregateFunctionSet &bitstring_agg, const LogicalTypeId &type) {
auto function =
AggregateFunction::UnaryAggregateDestructor<BitAggState<TYPE>, TYPE, string_t, BitStringAggOperation>(
type, LogicalType::BIT);
function.bind = BindBitstringAgg; // create new a 'BitstringAggBindData'
function.serialize = BitstringAggBindData::Serialize;
function.deserialize = BitstringAggBindData::Deserialize;
function.statistics = BitstringPropagateStats; // stores min and max from column stats in BitstringAggBindData
bitstring_agg.AddFunction(function); // uses the BitstringAggBindData to access statistics for creating bitstring
function.arguments = {type, type, type};
function.statistics = nullptr; // min and max are provided as arguments
bitstring_agg.AddFunction(function);
}
void GetBitStringAggregate(const LogicalType &type, AggregateFunctionSet &bitstring_agg) {
switch (type.id()) {
case LogicalType::TINYINT: {
return BindBitString<int8_t>(bitstring_agg, type.id());
}
case LogicalType::SMALLINT: {
return BindBitString<int16_t>(bitstring_agg, type.id());
}
case LogicalType::INTEGER: {
return BindBitString<int32_t>(bitstring_agg, type.id());
}
case LogicalType::BIGINT: {
return BindBitString<int64_t>(bitstring_agg, type.id());
}
case LogicalType::HUGEINT: {
return BindBitString<hugeint_t>(bitstring_agg, type.id());
}
case LogicalType::UTINYINT: {
return BindBitString<uint8_t>(bitstring_agg, type.id());
}
case LogicalType::USMALLINT: {
return BindBitString<uint16_t>(bitstring_agg, type.id());
}
case LogicalType::UINTEGER: {
return BindBitString<uint32_t>(bitstring_agg, type.id());
}
case LogicalType::UBIGINT: {
return BindBitString<uint64_t>(bitstring_agg, type.id());
}
case LogicalType::UHUGEINT: {
return BindBitString<uhugeint_t>(bitstring_agg, type.id());
}
default:
throw InternalException("Unimplemented bitstring aggregate");
}
}
} // namespace
AggregateFunctionSet BitstringAggFun::GetFunctions() {
AggregateFunctionSet bitstring_agg("bitstring_agg");
for (auto &type : LogicalType::Integral()) {
GetBitStringAggregate(type, bitstring_agg);
}
return bitstring_agg;
}
} // namespace duckdb

View File

@@ -0,0 +1,114 @@
#include "core_functions/aggregate/distributive_functions.hpp"
#include "duckdb/common/exception.hpp"
#include "duckdb/common/vector_operations/vector_operations.hpp"
#include "duckdb/planner/expression/bound_aggregate_expression.hpp"
#include "duckdb/function/function_set.hpp"
namespace duckdb {
namespace {
struct BoolState {
bool empty;
bool val;
};
struct BoolAndFunFunction {
template <class STATE>
static void Initialize(STATE &state) {
state.val = true;
state.empty = true;
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &) {
target.val = target.val && source.val;
target.empty = target.empty && source.empty;
}
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.empty) {
finalize_data.ReturnNull();
return;
}
target = state.val;
}
template <class INPUT_TYPE, class STATE, class OP>
static void Operation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input) {
state.empty = false;
state.val = input && state.val;
}
template <class INPUT_TYPE, class STATE, class OP>
static void ConstantOperation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input,
idx_t count) {
for (idx_t i = 0; i < count; i++) {
Operation<INPUT_TYPE, STATE, OP>(state, input, unary_input);
}
}
static bool IgnoreNull() {
return true;
}
};
struct BoolOrFunFunction {
template <class STATE>
static void Initialize(STATE &state) {
state.val = false;
state.empty = true;
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &) {
target.val = target.val || source.val;
target.empty = target.empty && source.empty;
}
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.empty) {
finalize_data.ReturnNull();
return;
}
target = state.val;
}
template <class INPUT_TYPE, class STATE, class OP>
static void Operation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input) {
state.empty = false;
state.val = input || state.val;
}
template <class INPUT_TYPE, class STATE, class OP>
static void ConstantOperation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input,
idx_t count) {
for (idx_t i = 0; i < count; i++) {
Operation<INPUT_TYPE, STATE, OP>(state, input, unary_input);
}
}
static bool IgnoreNull() {
return true;
}
};
} // namespace
AggregateFunction BoolOrFun::GetFunction() {
auto fun = AggregateFunction::UnaryAggregate<BoolState, bool, bool, BoolOrFunFunction>(
LogicalType(LogicalTypeId::BOOLEAN), LogicalType::BOOLEAN);
fun.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
fun.distinct_dependent = AggregateDistinctDependent::NOT_DISTINCT_DEPENDENT;
return fun;
}
AggregateFunction BoolAndFun::GetFunction() {
auto fun = AggregateFunction::UnaryAggregate<BoolState, bool, bool, BoolAndFunFunction>(
LogicalType(LogicalTypeId::BOOLEAN), LogicalType::BOOLEAN);
fun.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
fun.distinct_dependent = AggregateDistinctDependent::NOT_DISTINCT_DEPENDENT;
return fun;
}
} // namespace duckdb

View File

@@ -0,0 +1,168 @@
[
{
"name": "approx_count_distinct",
"parameters": "any",
"description": "Computes the approximate count of distinct elements using HyperLogLog.",
"example": "approx_count_distinct(A)",
"type": "aggregate_function"
},
{
"name": "arg_min",
"parameters": "arg,val",
"description": "Finds the row with the minimum val. Calculates the non-NULL arg expression at that row.",
"example": "arg_min(A, B)",
"type": "aggregate_function_set",
"aliases": ["argmin", "min_by"]
},
{
"name": "arg_min_null",
"parameters": "arg,val",
"description": "Finds the row with the minimum val. Calculates the arg expression at that row.",
"example": "arg_min_null(A, B)",
"type": "aggregate_function_set"
},
{
"name": "arg_min_nulls_last",
"parameters": "arg,val,N",
"description": "Finds the rows with N minimum vals, including nulls. Calculates the arg expression at that row.",
"example": "arg_min_null_val(A, B, N)",
"type": "aggregate_function_set"
},
{
"name": "arg_max",
"parameters": "arg,val",
"description": "Finds the row with the maximum val. Calculates the non-NULL arg expression at that row.",
"example": "arg_max(A, B)",
"type": "aggregate_function_set",
"aliases": ["argmax", "max_by"]
},
{
"name": "arg_max_null",
"parameters": "arg,val",
"description": "Finds the row with the maximum val. Calculates the arg expression at that row.",
"example": "arg_max_null(A, B)",
"type": "aggregate_function_set"
},
{
"name": "arg_max_nulls_last",
"parameters": "arg,val,N",
"description": "Finds the rows with N maximum vals, including nulls. Calculates the arg expression at that row.",
"example": "arg_min_null_val(A, B, N)",
"type": "aggregate_function_set"
},
{
"name": "bit_and",
"parameters": "arg",
"description": "Returns the bitwise AND of all bits in a given expression.",
"example": "bit_and(A)",
"type": "aggregate_function_set"
},
{
"name": "bit_or",
"parameters": "arg",
"description": "Returns the bitwise OR of all bits in a given expression.",
"example": "bit_or(A)",
"type": "aggregate_function_set"
},
{
"name": "bit_xor",
"parameters": "arg",
"description": "Returns the bitwise XOR of all bits in a given expression.",
"example": "bit_xor(A)",
"type": "aggregate_function_set"
},
{
"name": "bitstring_agg",
"parameters": "arg",
"description": "Returns a bitstring with bits set for each distinct value.",
"example": "bitstring_agg(A)",
"type": "aggregate_function_set"
},
{
"name": "bool_and",
"parameters": "arg",
"description": "Returns TRUE if every input value is TRUE, otherwise FALSE.",
"example": "bool_and(A)",
"type": "aggregate_function"
},
{
"name": "bool_or",
"parameters": "arg",
"description": "Returns TRUE if any input value is TRUE, otherwise FALSE.",
"example": "bool_or(A)",
"type": "aggregate_function"
},
{
"name": "count_if",
"parameters": "arg",
"description": "Counts the total number of TRUE values for a boolean column",
"example": "count_if(A)",
"type": "aggregate_function",
"aliases": ["countif"]
},
{
"name": "entropy",
"parameters": "x",
"description": "Returns the log-2 entropy of count input-values.",
"example": "",
"type": "aggregate_function_set"
},
{
"name": "kahan_sum",
"parameters": "arg",
"description": "Calculates the sum using a more accurate floating point summation (Kahan Sum).",
"example": "kahan_sum(A)",
"type": "aggregate_function",
"aliases": ["fsum", "sumkahan"]
},
{
"name": "kurtosis",
"parameters": "x",
"description": "Returns the excess kurtosis (Fishers definition) of all input values, with a bias correction according to the sample size",
"example": "",
"type": "aggregate_function"
},
{
"name": "kurtosis_pop",
"parameters": "x",
"description": "Returns the excess kurtosis (Fishers definition) of all input values, without bias correction",
"example": "",
"type": "aggregate_function"
},
{
"name": "product",
"parameters": "arg",
"description": "Calculates the product of all tuples in arg.",
"example": "product(A)",
"type": "aggregate_function"
},
{
"name": "skewness",
"parameters": "x",
"description": "Returns the skewness of all input values.",
"example": "skewness(A)",
"type": "aggregate_function"
},
{
"name": "string_agg",
"parameters": "str,arg",
"description": "Concatenates the column string values with an optional separator.",
"example": "string_agg(A, '-')",
"type": "aggregate_function_set",
"aliases": ["group_concat","listagg"]
},
{
"name": "sum",
"parameters": "arg",
"description": "Calculates the sum value for all tuples in arg.",
"example": "sum(A)",
"type": "aggregate_function_set"
},
{
"name": "sum_no_overflow",
"parameters": "arg",
"description": "Internal only. Calculates the sum value for all tuples in arg without overflow checks.",
"example": "sum_no_overflow(A)",
"type": "aggregate_function_set"
}
]

View File

@@ -0,0 +1,121 @@
#include "core_functions/aggregate/distributive_functions.hpp"
#include "duckdb/common/exception.hpp"
#include "duckdb/common/vector_operations/vector_operations.hpp"
#include "duckdb/planner/expression/bound_aggregate_expression.hpp"
#include "duckdb/common/algorithm.hpp"
namespace duckdb {
namespace {
struct KurtosisState {
idx_t n;
double sum;
double sum_sqr;
double sum_cub;
double sum_four;
};
struct KurtosisFlagBiasCorrection {};
struct KurtosisFlagNoBiasCorrection {};
template <class KURTOSIS_FLAG>
struct KurtosisOperation {
template <class STATE>
static void Initialize(STATE &state) {
state.n = 0;
state.sum = state.sum_sqr = state.sum_cub = state.sum_four = 0.0;
}
template <class INPUT_TYPE, class STATE, class OP>
static void ConstantOperation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input,
idx_t count) {
for (idx_t i = 0; i < count; i++) {
Operation<INPUT_TYPE, STATE, OP>(state, input, unary_input);
}
}
template <class INPUT_TYPE, class STATE, class OP>
static void Operation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input) {
state.n++;
state.sum += input;
state.sum_sqr += pow(input, 2);
state.sum_cub += pow(input, 3);
state.sum_four += pow(input, 4);
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &) {
if (source.n == 0) {
return;
}
target.n += source.n;
target.sum += source.sum;
target.sum_sqr += source.sum_sqr;
target.sum_cub += source.sum_cub;
target.sum_four += source.sum_four;
}
template <class TARGET_TYPE, class STATE>
static void Finalize(STATE &state, TARGET_TYPE &target, AggregateFinalizeData &finalize_data) {
auto n = (double)state.n;
if (n <= 1) {
finalize_data.ReturnNull();
return;
}
if (std::is_same<KURTOSIS_FLAG, KurtosisFlagBiasCorrection>::value && n <= 3) {
finalize_data.ReturnNull();
return;
}
double temp = 1 / n;
//! This is necessary due to linux 32 bits
long double temp_aux = 1 / n;
if (state.sum_sqr - state.sum * state.sum * temp == 0 ||
state.sum_sqr - state.sum * state.sum * temp_aux == 0) {
finalize_data.ReturnNull();
return;
}
double m4 =
temp * (state.sum_four - 4 * state.sum_cub * state.sum * temp +
6 * state.sum_sqr * state.sum * state.sum * temp * temp - 3 * pow(state.sum, 4) * pow(temp, 3));
double m2 = temp * (state.sum_sqr - state.sum * state.sum * temp);
if (m2 <= 0) { // m2 shouldn't be below 0 but floating points are weird
finalize_data.ReturnNull();
return;
}
if (std::is_same<KURTOSIS_FLAG, KurtosisFlagNoBiasCorrection>::value) {
target = m4 / (m2 * m2) - 3;
} else {
target = (n - 1) * ((n + 1) * m4 / (m2 * m2) - 3 * (n - 1)) / ((n - 2) * (n - 3));
}
if (!Value::DoubleIsFinite(target)) {
throw OutOfRangeException("Kurtosis is out of range!");
}
}
static bool IgnoreNull() {
return true;
}
};
} // namespace
AggregateFunction KurtosisFun::GetFunction() {
auto result =
AggregateFunction::UnaryAggregate<KurtosisState, double, double, KurtosisOperation<KurtosisFlagBiasCorrection>>(
LogicalType::DOUBLE, LogicalType::DOUBLE);
result.errors = FunctionErrors::CAN_THROW_RUNTIME_ERROR;
return result;
}
AggregateFunction KurtosisPopFun::GetFunction() {
auto result = AggregateFunction::UnaryAggregate<KurtosisState, double, double,
KurtosisOperation<KurtosisFlagNoBiasCorrection>>(
LogicalType::DOUBLE, LogicalType::DOUBLE);
result.errors = FunctionErrors::CAN_THROW_RUNTIME_ERROR;
return result;
}
} // namespace duckdb

View File

@@ -0,0 +1,65 @@
#include "core_functions/aggregate/distributive_functions.hpp"
#include "duckdb/common/exception.hpp"
#include "duckdb/common/vector_operations/vector_operations.hpp"
#include "duckdb/planner/expression/bound_aggregate_expression.hpp"
#include "duckdb/function/function_set.hpp"
namespace duckdb {
namespace {
struct ProductState {
bool empty;
double val;
};
struct ProductFunction {
template <class STATE>
static void Initialize(STATE &state) {
state.val = 1;
state.empty = true;
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &) {
target.val *= source.val;
target.empty = target.empty && source.empty;
}
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.empty) {
finalize_data.ReturnNull();
return;
}
target = state.val;
}
template <class INPUT_TYPE, class STATE, class OP>
static void Operation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input) {
if (state.empty) {
state.empty = false;
}
state.val *= input;
}
template <class INPUT_TYPE, class STATE, class OP>
static void ConstantOperation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input,
idx_t count) {
for (idx_t i = 0; i < count; i++) {
Operation<INPUT_TYPE, STATE, OP>(state, input, unary_input);
}
}
static bool IgnoreNull() {
return true;
}
};
} // namespace
AggregateFunction ProductFun::GetFunction() {
return AggregateFunction::UnaryAggregate<ProductState, double, double, ProductFunction>(
LogicalType(LogicalTypeId::DOUBLE), LogicalType::DOUBLE);
}
} // namespace duckdb

View File

@@ -0,0 +1,90 @@
#include "core_functions/aggregate/distributive_functions.hpp"
#include "duckdb/common/exception.hpp"
#include "duckdb/common/vector_operations/vector_operations.hpp"
#include "duckdb/planner/expression/bound_aggregate_expression.hpp"
#include "duckdb/common/algorithm.hpp"
namespace duckdb {
namespace {
struct SkewState {
size_t n;
double sum;
double sum_sqr;
double sum_cub;
};
struct SkewnessOperation {
template <class STATE>
static void Initialize(STATE &state) {
state.n = 0;
state.sum = state.sum_sqr = state.sum_cub = 0;
}
template <class INPUT_TYPE, class STATE, class OP>
static void ConstantOperation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input,
idx_t count) {
for (idx_t i = 0; i < count; i++) {
Operation<INPUT_TYPE, STATE, OP>(state, input, unary_input);
}
}
template <class INPUT_TYPE, class STATE, class OP>
static void Operation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input) {
state.n++;
state.sum += input;
state.sum_sqr += pow(input, 2);
state.sum_cub += pow(input, 3);
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &) {
if (source.n == 0) {
return;
}
target.n += source.n;
target.sum += source.sum;
target.sum_sqr += source.sum_sqr;
target.sum_cub += source.sum_cub;
}
template <class TARGET_TYPE, class STATE>
static void Finalize(STATE &state, TARGET_TYPE &target, AggregateFinalizeData &finalize_data) {
if (state.n <= 2) {
finalize_data.ReturnNull();
return;
}
double n = state.n;
double temp = 1 / n;
auto p = std::pow(temp * (state.sum_sqr - state.sum * state.sum * temp), 3);
if (p < 0) {
p = 0; // Shouldn't be below 0 but floating points are weird
}
double div = std::sqrt(p);
if (div == 0) {
target = NAN;
return;
}
double temp1 = std::sqrt(n * (n - 1)) / (n - 2);
target = temp1 * temp *
(state.sum_cub - 3 * state.sum_sqr * state.sum * temp + 2 * pow(state.sum, 3) * temp * temp) / div;
if (!Value::DoubleIsFinite(target)) {
throw OutOfRangeException("SKEW is out of range!");
}
}
static bool IgnoreNull() {
return true;
}
};
} // namespace
AggregateFunction SkewnessFun::GetFunction() {
return AggregateFunction::UnaryAggregate<SkewState, double, double, SkewnessOperation>(LogicalType::DOUBLE,
LogicalType::DOUBLE);
}
} // namespace duckdb

View File

@@ -0,0 +1,171 @@
#include "core_functions/aggregate/distributive_functions.hpp"
#include "duckdb/common/exception.hpp"
#include "duckdb/common/types/null_value.hpp"
#include "duckdb/common/vector_operations/vector_operations.hpp"
#include "duckdb/common/algorithm.hpp"
#include "duckdb/execution/expression_executor.hpp"
#include "duckdb/planner/expression/bound_constant_expression.hpp"
#include "duckdb/common/serializer/serializer.hpp"
#include "duckdb/common/serializer/deserializer.hpp"
namespace duckdb {
namespace {
struct StringAggState {
idx_t size;
idx_t alloc_size;
char *dataptr;
};
struct StringAggBindData : public FunctionData {
explicit StringAggBindData(string sep_p) : sep(std::move(sep_p)) {
}
string sep;
unique_ptr<FunctionData> Copy() const override {
return make_uniq<StringAggBindData>(sep);
}
bool Equals(const FunctionData &other_p) const override {
auto &other = other_p.Cast<StringAggBindData>();
return sep == other.sep;
}
};
struct StringAggFunction {
template <class STATE>
static void Initialize(STATE &state) {
state.dataptr = nullptr;
state.alloc_size = 0;
state.size = 0;
}
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (!state.dataptr) {
finalize_data.ReturnNull();
} else {
target = string_t(state.dataptr, state.size);
}
}
static bool IgnoreNull() {
return true;
}
static inline void PerformOperation(StringAggState &state, ArenaAllocator &allocator, const char *str,
const char *sep, idx_t str_size, idx_t sep_size) {
if (!state.dataptr) {
// first iteration: allocate space for the string and copy it into the state
state.alloc_size = MaxValue<idx_t>(8, NextPowerOfTwo(str_size));
state.dataptr = char_ptr_cast(allocator.Allocate(state.alloc_size));
state.size = str_size;
memcpy(state.dataptr, str, str_size);
} else {
// subsequent iteration: first check if we have space to place the string and separator
idx_t required_size = state.size + str_size + sep_size;
if (required_size > state.alloc_size) {
// no space! allocate extra space
const auto old_size = state.alloc_size;
while (state.alloc_size < required_size) {
state.alloc_size *= 2;
}
state.dataptr =
char_ptr_cast(allocator.Reallocate(data_ptr_cast(state.dataptr), old_size, state.alloc_size));
}
// copy the separator
memcpy(state.dataptr + state.size, sep, sep_size);
state.size += sep_size;
// copy the string
memcpy(state.dataptr + state.size, str, str_size);
state.size += str_size;
}
}
static inline void PerformOperation(StringAggState &state, ArenaAllocator &allocator, string_t str,
optional_ptr<FunctionData> data_p) {
auto &data = data_p->Cast<StringAggBindData>();
PerformOperation(state, allocator, str.GetData(), data.sep.c_str(), str.GetSize(), data.sep.size());
}
template <class INPUT_TYPE, class STATE, class OP>
static void Operation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input) {
PerformOperation(state, unary_input.input.allocator, input, unary_input.input.bind_data);
}
template <class INPUT_TYPE, class STATE, class OP>
static void ConstantOperation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input,
idx_t count) {
for (idx_t i = 0; i < count; i++) {
Operation<INPUT_TYPE, STATE, OP>(state, input, unary_input);
}
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &aggr_input_data) {
if (!source.dataptr) {
// source is not set: skip combining
return;
}
PerformOperation(target, aggr_input_data.allocator,
string_t(source.dataptr, UnsafeNumericCast<uint32_t>(source.size)), aggr_input_data.bind_data);
}
};
unique_ptr<FunctionData> StringAggBind(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
if (arguments.size() == 1) {
// single argument: default to comma
return make_uniq<StringAggBindData>(",");
}
D_ASSERT(arguments.size() == 2);
if (arguments[1]->HasParameter()) {
throw ParameterNotResolvedException();
}
if (!arguments[1]->IsFoldable()) {
throw BinderException("Separator argument to StringAgg must be a constant");
}
auto separator_val = ExpressionExecutor::EvaluateScalar(context, *arguments[1]);
string separator_string = ",";
if (separator_val.IsNull()) {
arguments[0] = make_uniq<BoundConstantExpression>(Value(LogicalType::VARCHAR));
} else {
separator_string = separator_val.ToString();
}
Function::EraseArgument(function, arguments, arguments.size() - 1);
return make_uniq<StringAggBindData>(std::move(separator_string));
}
void StringAggSerialize(Serializer &serializer, const optional_ptr<FunctionData> bind_data_p,
const AggregateFunction &function) {
auto bind_data = bind_data_p->Cast<StringAggBindData>();
serializer.WriteProperty(100, "separator", bind_data.sep);
}
unique_ptr<FunctionData> StringAggDeserialize(Deserializer &deserializer, AggregateFunction &bound_function) {
auto sep = deserializer.ReadProperty<string>(100, "separator");
return make_uniq<StringAggBindData>(std::move(sep));
}
} // namespace
AggregateFunctionSet StringAggFun::GetFunctions() {
AggregateFunctionSet string_agg;
AggregateFunction string_agg_param(
{LogicalType::ANY_PARAMS(LogicalType::VARCHAR)}, LogicalType::VARCHAR,
AggregateFunction::StateSize<StringAggState>,
AggregateFunction::StateInitialize<StringAggState, StringAggFunction>,
AggregateFunction::UnaryScatterUpdate<StringAggState, string_t, StringAggFunction>,
AggregateFunction::StateCombine<StringAggState, StringAggFunction>,
AggregateFunction::StateFinalize<StringAggState, string_t, StringAggFunction>,
AggregateFunction::UnaryUpdate<StringAggState, string_t, StringAggFunction>, StringAggBind);
string_agg_param.serialize = StringAggSerialize;
string_agg_param.deserialize = StringAggDeserialize;
string_agg.AddFunction(string_agg_param);
string_agg_param.arguments.emplace_back(LogicalType::VARCHAR);
string_agg.AddFunction(string_agg_param);
return string_agg;
}
} // namespace duckdb

View File

@@ -0,0 +1,309 @@
#include "core_functions/aggregate/distributive_functions.hpp"
#include "core_functions/aggregate/sum_helpers.hpp"
#include "duckdb/common/exception.hpp"
#include "duckdb/common/bignum.hpp"
#include "duckdb/common/types/decimal.hpp"
#include "duckdb/planner/expression/bound_aggregate_expression.hpp"
#include "duckdb/common/serializer/deserializer.hpp"
namespace duckdb {
namespace {
struct SumSetOperation {
template <class STATE>
static void Initialize(STATE &state) {
state.Initialize();
}
template <class STATE>
static void Combine(const STATE &source, STATE &target, AggregateInputData &) {
target.Combine(source);
}
template <class STATE>
static void AddValues(STATE &state, idx_t count) {
state.isset = true;
}
};
struct IntegerSumOperation : public BaseSumOperation<SumSetOperation, RegularAdd> {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (!state.isset) {
finalize_data.ReturnNull();
} else {
target = Hugeint::Convert(state.value);
}
}
};
struct SumToHugeintOperation : public BaseSumOperation<SumSetOperation, AddToHugeint> {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (!state.isset) {
finalize_data.ReturnNull();
} else {
target = state.value;
}
}
};
template <class ADD_OPERATOR>
struct DoubleSumOperation : public BaseSumOperation<SumSetOperation, ADD_OPERATOR> {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (!state.isset) {
finalize_data.ReturnNull();
} else {
target = state.value;
}
}
};
using NumericSumOperation = DoubleSumOperation<RegularAdd>;
using KahanSumOperation = DoubleSumOperation<KahanAdd>;
struct HugeintSumOperation : public BaseSumOperation<SumSetOperation, HugeintAdd> {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (!state.isset) {
finalize_data.ReturnNull();
} else {
target = state.value;
}
}
};
unique_ptr<FunctionData> SumNoOverflowBind(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
throw BinderException("sum_no_overflow is for internal use only!");
}
void SumNoOverflowSerialize(Serializer &serializer, const optional_ptr<FunctionData> bind_data,
const AggregateFunction &function) {
return;
}
unique_ptr<FunctionData> SumNoOverflowDeserialize(Deserializer &deserializer, AggregateFunction &function) {
function.return_type = deserializer.Get<const LogicalType &>();
return nullptr;
}
AggregateFunction GetSumAggregateNoOverflow(PhysicalType type) {
switch (type) {
case PhysicalType::INT32: {
auto function = AggregateFunction::UnaryAggregate<SumState<int64_t>, int32_t, hugeint_t, IntegerSumOperation>(
LogicalType::INTEGER, LogicalType::HUGEINT);
function.name = "sum_no_overflow";
function.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
function.bind = SumNoOverflowBind;
function.serialize = SumNoOverflowSerialize;
function.deserialize = SumNoOverflowDeserialize;
return function;
}
case PhysicalType::INT64: {
auto function = AggregateFunction::UnaryAggregate<SumState<int64_t>, int64_t, hugeint_t, IntegerSumOperation>(
LogicalType::BIGINT, LogicalType::HUGEINT);
function.name = "sum_no_overflow";
function.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
function.bind = SumNoOverflowBind;
function.serialize = SumNoOverflowSerialize;
function.deserialize = SumNoOverflowDeserialize;
return function;
}
default:
throw BinderException("Unsupported internal type for sum_no_overflow");
}
}
AggregateFunction GetSumAggregateNoOverflowDecimal() {
AggregateFunction aggr({LogicalTypeId::DECIMAL}, LogicalTypeId::DECIMAL, nullptr, nullptr, nullptr, nullptr,
nullptr, FunctionNullHandling::DEFAULT_NULL_HANDLING, nullptr, SumNoOverflowBind);
aggr.serialize = SumNoOverflowSerialize;
aggr.deserialize = SumNoOverflowDeserialize;
return aggr;
}
unique_ptr<BaseStatistics> SumPropagateStats(ClientContext &context, BoundAggregateExpression &expr,
AggregateStatisticsInput &input) {
if (input.node_stats && input.node_stats->has_max_cardinality) {
auto &numeric_stats = input.child_stats[0];
if (!NumericStats::HasMinMax(numeric_stats)) {
return nullptr;
}
auto internal_type = numeric_stats.GetType().InternalType();
hugeint_t max_negative;
hugeint_t max_positive;
switch (internal_type) {
case PhysicalType::INT32:
max_negative = NumericStats::Min(numeric_stats).GetValueUnsafe<int32_t>();
max_positive = NumericStats::Max(numeric_stats).GetValueUnsafe<int32_t>();
break;
case PhysicalType::INT64:
max_negative = NumericStats::Min(numeric_stats).GetValueUnsafe<int64_t>();
max_positive = NumericStats::Max(numeric_stats).GetValueUnsafe<int64_t>();
break;
default:
throw InternalException("Unsupported type for propagate sum stats");
}
auto max_sum_negative = max_negative * Hugeint::Convert(input.node_stats->max_cardinality);
auto max_sum_positive = max_positive * Hugeint::Convert(input.node_stats->max_cardinality);
if (max_sum_positive >= NumericLimits<int64_t>::Maximum() ||
max_sum_negative <= NumericLimits<int64_t>::Minimum()) {
// sum can potentially exceed int64_t bounds: use hugeint sum
return nullptr;
}
// total sum is guaranteed to fit in a single int64: use int64 sum instead of hugeint sum
expr.function = GetSumAggregateNoOverflow(internal_type);
}
return nullptr;
}
AggregateFunction GetSumAggregate(PhysicalType type) {
switch (type) {
case PhysicalType::BOOL: {
auto function = AggregateFunction::UnaryAggregate<SumState<int64_t>, bool, hugeint_t, IntegerSumOperation>(
LogicalType::BOOLEAN, LogicalType::HUGEINT);
function.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
return function;
}
case PhysicalType::INT16: {
auto function = AggregateFunction::UnaryAggregate<SumState<int64_t>, int16_t, hugeint_t, IntegerSumOperation>(
LogicalType::SMALLINT, LogicalType::HUGEINT);
function.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
return function;
}
case PhysicalType::INT32: {
auto function =
AggregateFunction::UnaryAggregate<SumState<hugeint_t>, int32_t, hugeint_t, SumToHugeintOperation>(
LogicalType::INTEGER, LogicalType::HUGEINT);
function.statistics = SumPropagateStats;
function.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
return function;
}
case PhysicalType::INT64: {
auto function =
AggregateFunction::UnaryAggregate<SumState<hugeint_t>, int64_t, hugeint_t, SumToHugeintOperation>(
LogicalType::BIGINT, LogicalType::HUGEINT);
function.statistics = SumPropagateStats;
function.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
return function;
}
case PhysicalType::INT128: {
auto function =
AggregateFunction::UnaryAggregate<SumState<hugeint_t>, hugeint_t, hugeint_t, HugeintSumOperation>(
LogicalType::HUGEINT, LogicalType::HUGEINT);
function.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
return function;
}
default:
throw InternalException("Unimplemented sum aggregate");
}
}
unique_ptr<FunctionData> BindDecimalSum(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
auto decimal_type = arguments[0]->return_type;
function = GetSumAggregate(decimal_type.InternalType());
function.name = "sum";
function.arguments[0] = decimal_type;
function.return_type = LogicalType::DECIMAL(Decimal::MAX_WIDTH_DECIMAL, DecimalType::GetScale(decimal_type));
function.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
return nullptr;
}
struct BignumState {
bool is_set;
BignumIntermediate value;
};
struct BignumOperation {
template <class STATE>
static void Initialize(STATE &state) {
state.is_set = false;
}
template <class INPUT_TYPE, class STATE, class OP>
static void ConstantOperation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input,
idx_t count) {
for (idx_t i = 0; i < count; i++) {
Operation<INPUT_TYPE, STATE, OP>(state, input, unary_input);
}
}
template <class INPUT_TYPE, class STATE, class OP>
static void Operation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input) {
if (!state.is_set) {
state.is_set = true;
state.value.Initialize(unary_input.input.allocator);
}
BignumIntermediate rhs(input);
state.value.AddInPlace(unary_input.input.allocator, rhs);
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &input) {
if (!source.is_set) {
return;
}
if (!target.is_set) {
target.value = source.value;
target.is_set = true;
return;
}
target.value.AddInPlace(input.allocator, source.value);
target.is_set = true;
}
template <class TARGET_TYPE, class STATE>
static void Finalize(STATE &state, TARGET_TYPE &target, AggregateFinalizeData &finalize_data) {
if (!state.is_set) {
finalize_data.ReturnNull();
} else {
target = state.value.ToBignum(finalize_data.input.allocator);
}
}
static bool IgnoreNull() {
return true;
}
};
} // namespace
AggregateFunctionSet SumFun::GetFunctions() {
AggregateFunctionSet sum;
// decimal
sum.AddFunction(AggregateFunction({LogicalTypeId::DECIMAL}, LogicalTypeId::DECIMAL, nullptr, nullptr, nullptr,
nullptr, nullptr, FunctionNullHandling::DEFAULT_NULL_HANDLING, nullptr,
BindDecimalSum));
sum.AddFunction(GetSumAggregate(PhysicalType::BOOL));
sum.AddFunction(GetSumAggregate(PhysicalType::INT16));
sum.AddFunction(GetSumAggregate(PhysicalType::INT32));
sum.AddFunction(GetSumAggregate(PhysicalType::INT64));
sum.AddFunction(GetSumAggregate(PhysicalType::INT128));
sum.AddFunction(AggregateFunction::UnaryAggregate<SumState<double>, double, double, NumericSumOperation>(
LogicalType::DOUBLE, LogicalType::DOUBLE));
sum.AddFunction(AggregateFunction::UnaryAggregate<BignumState, bignum_t, bignum_t, BignumOperation>(
LogicalType::BIGNUM, LogicalType::BIGNUM));
return sum;
}
AggregateFunction CountIfFun::GetFunction() {
return GetSumAggregate(PhysicalType::BOOL);
}
AggregateFunctionSet SumNoOverflowFun::GetFunctions() {
AggregateFunctionSet sum_no_overflow;
sum_no_overflow.AddFunction(GetSumAggregateNoOverflow(PhysicalType::INT32));
sum_no_overflow.AddFunction(GetSumAggregateNoOverflow(PhysicalType::INT64));
sum_no_overflow.AddFunction(GetSumAggregateNoOverflowDecimal());
return sum_no_overflow;
}
AggregateFunction KahanSumFun::GetFunction() {
return AggregateFunction::UnaryAggregate<KahanSumState, double, double, KahanSumOperation>(LogicalType::DOUBLE,
LogicalType::DOUBLE);
}
} // namespace duckdb

View File

@@ -0,0 +1,12 @@
add_library_unity(
duckdb_core_functions_holistic
OBJECT
approx_top_k.cpp
quantile.cpp
reservoir_quantile.cpp
mad.cpp
approximate_quantile.cpp
mode.cpp)
set(CORE_FUNCTION_FILES
${CORE_FUNCTION_FILES} $<TARGET_OBJECTS:duckdb_core_functions_holistic>
PARENT_SCOPE)

View File

@@ -0,0 +1,417 @@
#include "core_functions/aggregate/histogram_helpers.hpp"
#include "core_functions/aggregate/holistic_functions.hpp"
#include "duckdb/function/aggregate/sort_key_helpers.hpp"
#include "duckdb/execution/expression_executor.hpp"
#include "duckdb/common/string_map_set.hpp"
#include "duckdb/common/printer.hpp"
namespace duckdb {
namespace {
struct ApproxTopKString {
ApproxTopKString() : str(UINT32_C(0)), hash(0) {
}
ApproxTopKString(string_t str_p, hash_t hash_p) : str(str_p), hash(hash_p) {
}
string_t str;
hash_t hash;
};
struct ApproxTopKHash {
std::size_t operator()(const ApproxTopKString &k) const {
return k.hash;
}
};
struct ApproxTopKEquality {
bool operator()(const ApproxTopKString &a, const ApproxTopKString &b) const {
return Equals::Operation(a.str, b.str);
}
};
template <typename T>
using approx_topk_map_t = unordered_map<ApproxTopKString, T, ApproxTopKHash, ApproxTopKEquality>;
// approx top k algorithm based on "A parallel space saving algorithm for frequent items and the Hurwitz zeta
// distribution" arxiv link - https://arxiv.org/pdf/1401.0702
// together with the filter extension (Filtered Space-Saving) from "Estimating Top-k Destinations in Data Streams"
struct ApproxTopKValue {
//! The counter
idx_t count = 0;
//! Index in the values array
idx_t index = 0;
//! The string value
ApproxTopKString str_val;
//! Allocated data
char *dataptr = nullptr;
uint32_t size = 0;
uint32_t capacity = 0;
};
struct InternalApproxTopKState {
// the top-k data structure has two components
// a list of k values sorted on "count" (i.e. values[0] has the lowest count)
// a lookup map: string_t -> idx in "values" array
unsafe_unique_array<ApproxTopKValue> stored_values;
unsafe_vector<reference<ApproxTopKValue>> values;
approx_topk_map_t<reference<ApproxTopKValue>> lookup_map;
unsafe_vector<idx_t> filter;
idx_t k = 0;
idx_t capacity = 0;
idx_t filter_mask;
void Initialize(idx_t kval) {
static constexpr idx_t MONITORED_VALUES_RATIO = 3;
static constexpr idx_t FILTER_RATIO = 8;
D_ASSERT(values.empty());
D_ASSERT(lookup_map.empty());
k = kval;
capacity = kval * MONITORED_VALUES_RATIO;
stored_values = make_unsafe_uniq_array_uninitialized<ApproxTopKValue>(capacity);
values.reserve(capacity);
// we scale the filter based on the amount of values we are monitoring
idx_t filter_size = NextPowerOfTwo(capacity * FILTER_RATIO);
filter_mask = filter_size - 1;
filter.resize(filter_size);
}
static void CopyValue(ApproxTopKValue &value, const ApproxTopKString &input, AggregateInputData &input_data) {
value.str_val.hash = input.hash;
if (input.str.IsInlined()) {
// no need to copy
value.str_val = input;
return;
}
value.size = UnsafeNumericCast<uint32_t>(input.str.GetSize());
if (value.size > value.capacity) {
// need to re-allocate for this value
value.capacity = UnsafeNumericCast<uint32_t>(NextPowerOfTwo(value.size));
value.dataptr = char_ptr_cast(input_data.allocator.Allocate(value.capacity));
}
// copy over the data
memcpy(value.dataptr, input.str.GetData(), value.size);
value.str_val.str = string_t(value.dataptr, value.size);
}
void InsertOrReplaceEntry(const ApproxTopKString &input, AggregateInputData &aggr_input, idx_t increment = 1) {
if (values.size() < capacity) {
D_ASSERT(increment > 0);
// we can always add this entry
auto &val = stored_values[values.size()];
val.index = values.size();
values.push_back(val);
}
auto &value = values.back().get();
if (value.count > 0) {
// the capacity is reached - we need to replace an entry
// we use the filter as an early out
// based on the hash - we find a slot in the filter
// instead of monitoring the value immediately, we add to the slot in the filter
// ONLY when the value in the filter exceeds the current min value, we start monitoring the value
// this speeds up the algorithm as switching monitor values means we need to erase/insert in the hash table
auto &filter_value = filter[input.hash & filter_mask];
if (filter_value + increment < value.count) {
// if the filter has a lower count than the current min count
// we can skip adding this entry (for now)
filter_value += increment;
return;
}
// the filter exceeds the min value - start monitoring this value
// erase the existing entry from the map
// and set the filter for the minimum value back to the current minimum value
filter[value.str_val.hash & filter_mask] = value.count;
lookup_map.erase(value.str_val);
}
CopyValue(value, input, aggr_input);
lookup_map.insert(make_pair(value.str_val, reference<ApproxTopKValue>(value)));
IncrementCount(value, increment);
}
void IncrementCount(ApproxTopKValue &value, idx_t increment = 1) {
value.count += increment;
// maintain sortedness of "values"
// swap while we have a higher count than the next entry
while (value.index > 0 && values[value.index].get().count > values[value.index - 1].get().count) {
// swap the elements around
auto &left = values[value.index];
auto &right = values[value.index - 1];
std::swap(left.get().index, right.get().index);
std::swap(left, right);
}
}
void Verify() const {
#ifdef DEBUG
if (values.empty()) {
D_ASSERT(lookup_map.empty());
return;
}
D_ASSERT(values.size() <= capacity);
for (idx_t k = 0; k < values.size(); k++) {
auto &val = values[k].get();
D_ASSERT(val.count > 0);
// verify map exists
auto entry = lookup_map.find(val.str_val);
D_ASSERT(entry != lookup_map.end());
// verify the index is correct
D_ASSERT(val.index == k);
if (k > 0) {
// sortedness
D_ASSERT(val.count <= values[k - 1].get().count);
}
}
// verify lookup map does not contain extra entries
D_ASSERT(lookup_map.size() == values.size());
#endif
}
};
struct ApproxTopKState {
InternalApproxTopKState *state;
InternalApproxTopKState &GetState() {
if (!state) {
state = new InternalApproxTopKState();
}
return *state;
}
const InternalApproxTopKState &GetState() const {
if (!state) {
throw InternalException("No state available");
}
return *state;
}
};
struct ApproxTopKOperation {
template <class STATE>
static void Initialize(STATE &state) {
state.state = nullptr;
}
template <class TYPE, class STATE>
static void Operation(STATE &aggr_state, const TYPE &input, AggregateInputData &aggr_input, Vector &top_k_vector,
idx_t offset, idx_t count) {
auto &state = aggr_state.GetState();
if (state.values.empty()) {
static constexpr int64_t MAX_APPROX_K = 1000000;
// not initialized yet - initialize the K value and set all counters to 0
UnifiedVectorFormat kdata;
top_k_vector.ToUnifiedFormat(count, kdata);
auto kidx = kdata.sel->get_index(offset);
if (!kdata.validity.RowIsValid(kidx)) {
throw InvalidInputException("Invalid input for approx_top_k: k value cannot be NULL");
}
auto kval = UnifiedVectorFormat::GetData<int64_t>(kdata)[kidx];
if (kval <= 0) {
throw InvalidInputException("Invalid input for approx_top_k: k value must be > 0");
}
if (kval >= MAX_APPROX_K) {
throw InvalidInputException("Invalid input for approx_top_k: k value must be < %d", MAX_APPROX_K);
}
state.Initialize(UnsafeNumericCast<idx_t>(kval));
}
ApproxTopKString topk_string(input, Hash(input));
auto entry = state.lookup_map.find(topk_string);
if (entry != state.lookup_map.end()) {
// the input is monitored - increment the count
state.IncrementCount(entry->second.get());
} else {
// the input is not monitored - replace the first entry with the current entry and increment
state.InsertOrReplaceEntry(topk_string, aggr_input);
}
}
template <class STATE, class OP>
static void Combine(const STATE &aggr_source, STATE &aggr_target, AggregateInputData &aggr_input) {
if (!aggr_source.state) {
// source state is empty
return;
}
auto &source = aggr_source.GetState();
auto &target = aggr_target.GetState();
if (source.values.empty()) {
// source is empty
return;
}
source.Verify();
auto min_source = source.values.back().get().count;
idx_t min_target;
if (target.values.empty()) {
min_target = 0;
target.Initialize(source.k);
} else {
if (source.k != target.k) {
throw NotImplementedException("Approx Top K - cannot combine approx_top_K with different k values. "
"K values must be the same for all entries within the same group");
}
min_target = target.values.back().get().count;
}
// for all entries in target
// check if they are tracked in source
// if they do - add the tracked count
// if they do not - add the minimum count
for (idx_t target_idx = 0; target_idx < target.values.size(); target_idx++) {
auto &val = target.values[target_idx].get();
auto source_entry = source.lookup_map.find(val.str_val);
idx_t increment = min_source;
if (source_entry != source.lookup_map.end()) {
increment = source_entry->second.get().count;
}
if (increment == 0) {
continue;
}
target.IncrementCount(val, increment);
}
// now for each entry in source, if it is not tracked by the target, at the target minimum
for (auto &source_entry : source.values) {
auto &source_val = source_entry.get();
auto target_entry = target.lookup_map.find(source_val.str_val);
if (target_entry != target.lookup_map.end()) {
// already tracked - no need to add anything
continue;
}
auto new_count = source_val.count + min_target;
idx_t increment;
if (target.values.size() >= target.capacity) {
idx_t current_min = target.values.empty() ? 0 : target.values.back().get().count;
D_ASSERT(target.values.size() == target.capacity);
// target already has capacity values
// check if we should insert this entry
if (new_count <= current_min) {
// if we do not we can skip this entry
continue;
}
increment = new_count - current_min;
} else {
// target does not have capacity entries yet
// just add this entry with the full count
increment = new_count;
}
target.InsertOrReplaceEntry(source_val.str_val, aggr_input, increment);
}
// copy over the filter
D_ASSERT(source.filter.size() == target.filter.size());
for (idx_t filter_idx = 0; filter_idx < source.filter.size(); filter_idx++) {
target.filter[filter_idx] += source.filter[filter_idx];
}
target.Verify();
}
template <class STATE>
static void Destroy(STATE &state, AggregateInputData &aggr_input_data) {
delete state.state;
}
static bool IgnoreNull() {
return true;
}
};
template <class T = string_t, class OP = HistogramGenericFunctor>
void ApproxTopKUpdate(Vector inputs[], AggregateInputData &aggr_input, idx_t input_count, Vector &state_vector,
idx_t count) {
using STATE = ApproxTopKState;
auto &input = inputs[0];
UnifiedVectorFormat sdata;
state_vector.ToUnifiedFormat(count, sdata);
auto &top_k_vector = inputs[1];
auto extra_state = OP::CreateExtraState(count);
UnifiedVectorFormat input_data;
OP::PrepareData(input, count, extra_state, input_data);
auto states = UnifiedVectorFormat::GetData<STATE *>(sdata);
auto data = UnifiedVectorFormat::GetData<T>(input_data);
for (idx_t i = 0; i < count; i++) {
auto idx = input_data.sel->get_index(i);
if (!input_data.validity.RowIsValid(idx)) {
continue;
}
auto &state = *states[sdata.sel->get_index(i)];
ApproxTopKOperation::Operation<T, STATE>(state, data[idx], aggr_input, top_k_vector, i, count);
}
}
template <class OP = HistogramGenericFunctor>
void ApproxTopKFinalize(Vector &state_vector, AggregateInputData &, Vector &result, idx_t count, idx_t offset) {
UnifiedVectorFormat sdata;
state_vector.ToUnifiedFormat(count, sdata);
auto states = UnifiedVectorFormat::GetData<ApproxTopKState *>(sdata);
auto &mask = FlatVector::Validity(result);
auto old_len = ListVector::GetListSize(result);
idx_t new_entries = 0;
// figure out how much space we need
for (idx_t i = 0; i < count; i++) {
auto &state = states[sdata.sel->get_index(i)]->GetState();
if (state.values.empty()) {
continue;
}
// get up to k values for each state
// this can be less of fewer unique values were found
new_entries += MinValue<idx_t>(state.values.size(), state.k);
}
// reserve space in the list vector
ListVector::Reserve(result, old_len + new_entries);
auto list_entries = FlatVector::GetData<list_entry_t>(result);
auto &child_data = ListVector::GetEntry(result);
idx_t current_offset = old_len;
for (idx_t i = 0; i < count; i++) {
const auto rid = i + offset;
auto &state = states[sdata.sel->get_index(i)]->GetState();
if (state.values.empty()) {
mask.SetInvalid(rid);
continue;
}
auto &list_entry = list_entries[rid];
list_entry.offset = current_offset;
for (idx_t val_idx = 0; val_idx < MinValue<idx_t>(state.values.size(), state.k); val_idx++) {
auto &val = state.values[val_idx].get();
D_ASSERT(val.count > 0);
OP::template HistogramFinalize<string_t>(val.str_val.str, child_data, current_offset);
current_offset++;
}
list_entry.length = current_offset - list_entry.offset;
}
D_ASSERT(current_offset == old_len + new_entries);
ListVector::SetListSize(result, current_offset);
result.Verify(count);
}
unique_ptr<FunctionData> ApproxTopKBind(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
for (auto &arg : arguments) {
if (arg->return_type.id() == LogicalTypeId::UNKNOWN) {
throw ParameterNotResolvedException();
}
}
if (arguments[0]->return_type.id() == LogicalTypeId::VARCHAR) {
function.update = ApproxTopKUpdate<string_t, HistogramStringFunctor>;
function.finalize = ApproxTopKFinalize<HistogramStringFunctor>;
}
function.return_type = LogicalType::LIST(arguments[0]->return_type);
return nullptr;
}
} // namespace
AggregateFunction ApproxTopKFun::GetFunction() {
using STATE = ApproxTopKState;
using OP = ApproxTopKOperation;
return AggregateFunction("approx_top_k", {LogicalTypeId::ANY, LogicalType::BIGINT},
LogicalType::LIST(LogicalType::ANY), AggregateFunction::StateSize<STATE>,
AggregateFunction::StateInitialize<STATE, OP>, ApproxTopKUpdate,
AggregateFunction::StateCombine<STATE, OP>, ApproxTopKFinalize, nullptr, ApproxTopKBind,
AggregateFunction::StateDestroy<STATE, OP>);
}
} // namespace duckdb

View File

@@ -0,0 +1,484 @@
#include "duckdb/execution/expression_executor.hpp"
#include "core_functions/aggregate/holistic_functions.hpp"
#include "t_digest.hpp"
#include "duckdb/planner/expression.hpp"
#include "duckdb/common/operator/cast_operators.hpp"
#include "duckdb/common/serializer/serializer.hpp"
#include "duckdb/common/serializer/deserializer.hpp"
#include <stdlib.h>
namespace duckdb {
namespace {
struct ApproxQuantileState {
duckdb_tdigest::TDigest *h;
idx_t pos;
};
struct ApproxQuantileCoding {
template <typename INPUT_TYPE, typename SAVE_TYPE>
static SAVE_TYPE Encode(const INPUT_TYPE &input) {
return Cast::template Operation<INPUT_TYPE, SAVE_TYPE>(input);
}
template <typename SAVE_TYPE, typename TARGET_TYPE>
static bool Decode(const SAVE_TYPE &source, TARGET_TYPE &target) {
// The result is approximate, so clamp instead of overflowing.
if (TryCast::Operation(source, target, false)) {
return true;
} else if (source < 0) {
target = NumericLimits<TARGET_TYPE>::Minimum();
} else {
target = NumericLimits<TARGET_TYPE>::Maximum();
}
return false;
}
};
template <>
double ApproxQuantileCoding::Encode(const dtime_tz_t &input) {
return Encode<uint64_t, double>(input.sort_key());
}
template <>
bool ApproxQuantileCoding::Decode(const double &source, dtime_tz_t &target) {
uint64_t sort_key;
const auto decoded = Decode<double, uint64_t>(source, sort_key);
if (decoded) {
// We can invert the sort key because its offset was not touched.
auto offset = dtime_tz_t::decode_offset(sort_key);
auto micros = dtime_tz_t::decode_micros(sort_key);
micros -= int64_t(dtime_tz_t::encode_offset(offset) * dtime_tz_t::OFFSET_MICROS);
target = dtime_tz_t(dtime_t(micros), offset);
} else if (source < 0) {
target = Value::MinimumValue(LogicalTypeId::TIME_TZ).GetValue<dtime_tz_t>();
} else {
target = Value::MaximumValue(LogicalTypeId::TIME_TZ).GetValue<dtime_tz_t>();
}
return decoded;
}
struct ApproximateQuantileBindData : public FunctionData {
ApproximateQuantileBindData() {
}
explicit ApproximateQuantileBindData(float quantile_p) : quantiles(1, quantile_p) {
}
explicit ApproximateQuantileBindData(vector<float> quantiles_p) : quantiles(std::move(quantiles_p)) {
}
unique_ptr<FunctionData> Copy() const override {
return make_uniq<ApproximateQuantileBindData>(quantiles);
}
bool Equals(const FunctionData &other_p) const override {
auto &other = other_p.Cast<ApproximateQuantileBindData>();
// return quantiles == other.quantiles;
if (quantiles != other.quantiles) {
return false;
}
return true;
}
static void Serialize(Serializer &serializer, const optional_ptr<FunctionData> bind_data_p,
const AggregateFunction &function) {
auto &bind_data = bind_data_p->Cast<ApproximateQuantileBindData>();
serializer.WriteProperty(100, "quantiles", bind_data.quantiles);
}
static unique_ptr<FunctionData> Deserialize(Deserializer &deserializer, AggregateFunction &function) {
auto result = make_uniq<ApproximateQuantileBindData>();
deserializer.ReadProperty(100, "quantiles", result->quantiles);
return std::move(result);
}
vector<float> quantiles;
};
struct ApproxQuantileOperation {
using SAVE_TYPE = duckdb_tdigest::Value;
template <class STATE>
static void Initialize(STATE &state) {
state.pos = 0;
state.h = nullptr;
}
template <class INPUT_TYPE, class STATE, class OP>
static void ConstantOperation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input,
idx_t count) {
for (idx_t i = 0; i < count; i++) {
Operation<INPUT_TYPE, STATE, OP>(state, input, unary_input);
}
}
template <class INPUT_TYPE, class STATE, class OP>
static void Operation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input) {
auto val = ApproxQuantileCoding::template Encode<INPUT_TYPE, SAVE_TYPE>(input);
if (!Value::DoubleIsFinite(val)) {
return;
}
if (!state.h) {
state.h = new duckdb_tdigest::TDigest(100);
}
state.h->add(val);
state.pos++;
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &) {
if (source.pos == 0) {
return;
}
D_ASSERT(source.h);
if (!target.h) {
target.h = new duckdb_tdigest::TDigest(100);
}
target.h->merge(source.h);
target.pos += source.pos;
}
template <class STATE>
static void Destroy(STATE &state, AggregateInputData &aggr_input_data) {
if (state.h) {
delete state.h;
}
}
static bool IgnoreNull() {
return true;
}
};
struct ApproxQuantileScalarOperation : public ApproxQuantileOperation {
template <class TARGET_TYPE, class STATE>
static void Finalize(STATE &state, TARGET_TYPE &target, AggregateFinalizeData &finalize_data) {
if (state.pos == 0) {
finalize_data.ReturnNull();
return;
}
D_ASSERT(state.h);
D_ASSERT(finalize_data.input.bind_data);
state.h->compress();
auto &bind_data = finalize_data.input.bind_data->template Cast<ApproximateQuantileBindData>();
D_ASSERT(bind_data.quantiles.size() == 1);
const auto source = state.h->quantile(bind_data.quantiles[0]);
ApproxQuantileCoding::Decode(source, target);
}
};
AggregateFunction GetApproximateQuantileAggregateFunction(const LogicalType &type) {
// Not binary comparable
if (type == LogicalType::TIME_TZ) {
return AggregateFunction::UnaryAggregateDestructor<ApproxQuantileState, dtime_tz_t, dtime_tz_t,
ApproxQuantileScalarOperation>(type, type);
}
switch (type.InternalType()) {
case PhysicalType::INT8:
return AggregateFunction::UnaryAggregateDestructor<ApproxQuantileState, int8_t, int8_t,
ApproxQuantileScalarOperation>(type, type);
case PhysicalType::INT16:
return AggregateFunction::UnaryAggregateDestructor<ApproxQuantileState, int16_t, int16_t,
ApproxQuantileScalarOperation>(type, type);
case PhysicalType::INT32:
return AggregateFunction::UnaryAggregateDestructor<ApproxQuantileState, int32_t, int32_t,
ApproxQuantileScalarOperation>(type, type);
case PhysicalType::INT64:
return AggregateFunction::UnaryAggregateDestructor<ApproxQuantileState, int64_t, int64_t,
ApproxQuantileScalarOperation>(type, type);
case PhysicalType::INT128:
return AggregateFunction::UnaryAggregateDestructor<ApproxQuantileState, hugeint_t, hugeint_t,
ApproxQuantileScalarOperation>(type, type);
case PhysicalType::FLOAT:
return AggregateFunction::UnaryAggregateDestructor<ApproxQuantileState, float, float,
ApproxQuantileScalarOperation>(type, type);
case PhysicalType::DOUBLE:
return AggregateFunction::UnaryAggregateDestructor<ApproxQuantileState, double, double,
ApproxQuantileScalarOperation>(type, type);
default:
throw InternalException("Unimplemented quantile aggregate");
}
}
AggregateFunction GetApproximateQuantileDecimalAggregateFunction(const LogicalType &type) {
switch (type.InternalType()) {
case PhysicalType::INT8:
return GetApproximateQuantileAggregateFunction(LogicalType::TINYINT);
case PhysicalType::INT16:
return GetApproximateQuantileAggregateFunction(LogicalType::SMALLINT);
case PhysicalType::INT32:
return GetApproximateQuantileAggregateFunction(LogicalType::INTEGER);
case PhysicalType::INT64:
return GetApproximateQuantileAggregateFunction(LogicalType::BIGINT);
case PhysicalType::INT128:
return GetApproximateQuantileAggregateFunction(LogicalType::HUGEINT);
default:
throw InternalException("Unimplemented quantile decimal aggregate");
}
}
float CheckApproxQuantile(const Value &quantile_val) {
if (quantile_val.IsNull()) {
throw BinderException("APPROXIMATE QUANTILE parameter cannot be NULL");
}
auto quantile = quantile_val.GetValue<float>();
if (quantile < 0 || quantile > 1) {
throw BinderException("APPROXIMATE QUANTILE can only take parameters in range [0, 1]");
}
return quantile;
}
unique_ptr<FunctionData> BindApproxQuantile(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
if (arguments[1]->HasParameter()) {
throw ParameterNotResolvedException();
}
if (!arguments[1]->IsFoldable()) {
throw BinderException("APPROXIMATE QUANTILE can only take constant quantile parameters");
}
Value quantile_val = ExpressionExecutor::EvaluateScalar(context, *arguments[1]);
if (quantile_val.IsNull()) {
throw BinderException("APPROXIMATE QUANTILE parameter list cannot be NULL");
}
vector<float> quantiles;
switch (quantile_val.type().id()) {
case LogicalTypeId::LIST:
for (const auto &element_val : ListValue::GetChildren(quantile_val)) {
quantiles.push_back(CheckApproxQuantile(element_val));
}
break;
case LogicalTypeId::ARRAY:
for (const auto &element_val : ArrayValue::GetChildren(quantile_val)) {
quantiles.push_back(CheckApproxQuantile(element_val));
}
break;
default:
quantiles.push_back(CheckApproxQuantile(quantile_val));
break;
}
// remove the quantile argument so we can use the unary aggregate
Function::EraseArgument(function, arguments, arguments.size() - 1);
return make_uniq<ApproximateQuantileBindData>(quantiles);
}
AggregateFunction ApproxQuantileDecimalFunction(const LogicalType &type) {
auto function = GetApproximateQuantileDecimalAggregateFunction(type);
function.name = "approx_quantile";
function.serialize = ApproximateQuantileBindData::Serialize;
function.deserialize = ApproximateQuantileBindData::Deserialize;
return function;
}
unique_ptr<FunctionData> BindApproxQuantileDecimal(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
auto bind_data = BindApproxQuantile(context, function, arguments);
function = ApproxQuantileDecimalFunction(arguments[0]->return_type);
return bind_data;
}
AggregateFunction GetApproximateQuantileAggregate(const LogicalType &type) {
auto fun = GetApproximateQuantileAggregateFunction(type);
fun.bind = BindApproxQuantile;
fun.serialize = ApproximateQuantileBindData::Serialize;
fun.deserialize = ApproximateQuantileBindData::Deserialize;
// temporarily push an argument so we can bind the actual quantile
fun.arguments.emplace_back(LogicalType::FLOAT);
return fun;
}
template <class CHILD_TYPE>
struct ApproxQuantileListOperation : public ApproxQuantileOperation {
template <class RESULT_TYPE, class STATE>
static void Finalize(STATE &state, RESULT_TYPE &target, AggregateFinalizeData &finalize_data) {
if (state.pos == 0) {
finalize_data.ReturnNull();
return;
}
D_ASSERT(finalize_data.input.bind_data);
auto &bind_data = finalize_data.input.bind_data->template Cast<ApproximateQuantileBindData>();
auto &result = ListVector::GetEntry(finalize_data.result);
auto ridx = ListVector::GetListSize(finalize_data.result);
ListVector::Reserve(finalize_data.result, ridx + bind_data.quantiles.size());
auto rdata = FlatVector::GetData<CHILD_TYPE>(result);
D_ASSERT(state.h);
state.h->compress();
auto &entry = target;
entry.offset = ridx;
entry.length = bind_data.quantiles.size();
for (size_t q = 0; q < entry.length; ++q) {
const auto &quantile = bind_data.quantiles[q];
const auto &source = state.h->quantile(quantile);
auto &target = rdata[ridx + q];
ApproxQuantileCoding::Decode(source, target);
}
ListVector::SetListSize(finalize_data.result, entry.offset + entry.length);
}
};
template <class STATE, class INPUT_TYPE, class RESULT_TYPE, class OP>
AggregateFunction ApproxQuantileListAggregate(const LogicalType &input_type, const LogicalType &child_type) {
LogicalType result_type = LogicalType::LIST(child_type);
return AggregateFunction(
{input_type}, result_type, AggregateFunction::StateSize<STATE>, AggregateFunction::StateInitialize<STATE, OP>,
AggregateFunction::UnaryScatterUpdate<STATE, INPUT_TYPE, OP>, AggregateFunction::StateCombine<STATE, OP>,
AggregateFunction::StateFinalize<STATE, RESULT_TYPE, OP>, AggregateFunction::UnaryUpdate<STATE, INPUT_TYPE, OP>,
nullptr, AggregateFunction::StateDestroy<STATE, OP>);
}
template <typename INPUT_TYPE, typename SAVE_TYPE>
AggregateFunction GetTypedApproxQuantileListAggregateFunction(const LogicalType &type) {
using STATE = ApproxQuantileState;
using OP = ApproxQuantileListOperation<INPUT_TYPE>;
auto fun = ApproxQuantileListAggregate<STATE, INPUT_TYPE, list_entry_t, OP>(type, type);
fun.serialize = ApproximateQuantileBindData::Serialize;
fun.deserialize = ApproximateQuantileBindData::Deserialize;
return fun;
}
AggregateFunction GetApproxQuantileListAggregateFunction(const LogicalType &type) {
switch (type.id()) {
case LogicalTypeId::TINYINT:
return GetTypedApproxQuantileListAggregateFunction<int8_t, int8_t>(type);
case LogicalTypeId::SMALLINT:
return GetTypedApproxQuantileListAggregateFunction<int16_t, int16_t>(type);
case LogicalTypeId::INTEGER:
case LogicalTypeId::DATE:
case LogicalTypeId::TIME:
return GetTypedApproxQuantileListAggregateFunction<int32_t, int32_t>(type);
case LogicalTypeId::BIGINT:
case LogicalTypeId::TIMESTAMP:
case LogicalTypeId::TIMESTAMP_TZ:
return GetTypedApproxQuantileListAggregateFunction<int64_t, int64_t>(type);
case LogicalTypeId::TIME_TZ:
// Not binary comparable
return GetTypedApproxQuantileListAggregateFunction<dtime_tz_t, dtime_tz_t>(type);
case LogicalTypeId::HUGEINT:
return GetTypedApproxQuantileListAggregateFunction<hugeint_t, hugeint_t>(type);
case LogicalTypeId::FLOAT:
return GetTypedApproxQuantileListAggregateFunction<float, float>(type);
case LogicalTypeId::DOUBLE:
return GetTypedApproxQuantileListAggregateFunction<double, double>(type);
case LogicalTypeId::DECIMAL:
switch (type.InternalType()) {
case PhysicalType::INT16:
return GetTypedApproxQuantileListAggregateFunction<int16_t, int16_t>(type);
case PhysicalType::INT32:
return GetTypedApproxQuantileListAggregateFunction<int32_t, int32_t>(type);
case PhysicalType::INT64:
return GetTypedApproxQuantileListAggregateFunction<int64_t, int64_t>(type);
case PhysicalType::INT128:
return GetTypedApproxQuantileListAggregateFunction<hugeint_t, hugeint_t>(type);
default:
throw NotImplementedException("Unimplemented approximate quantile list decimal aggregate");
}
default:
throw NotImplementedException("Unimplemented approximate quantile list aggregate");
}
}
AggregateFunction ApproxQuantileDecimalListFunction(const LogicalType &type) {
auto function = GetApproxQuantileListAggregateFunction(type);
function.name = "approx_quantile";
function.serialize = ApproximateQuantileBindData::Serialize;
function.deserialize = ApproximateQuantileBindData::Deserialize;
return function;
}
unique_ptr<FunctionData> BindApproxQuantileDecimalList(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
auto bind_data = BindApproxQuantile(context, function, arguments);
function = ApproxQuantileDecimalListFunction(arguments[0]->return_type);
return bind_data;
}
AggregateFunction GetApproxQuantileListAggregate(const LogicalType &type) {
auto fun = GetApproxQuantileListAggregateFunction(type);
fun.bind = BindApproxQuantile;
fun.serialize = ApproximateQuantileBindData::Serialize;
fun.deserialize = ApproximateQuantileBindData::Deserialize;
// temporarily push an argument so we can bind the actual quantile
auto list_of_float = LogicalType::LIST(LogicalType::FLOAT);
fun.arguments.push_back(list_of_float);
return fun;
}
unique_ptr<FunctionData> ApproxQuantileDecimalDeserialize(Deserializer &deserializer, AggregateFunction &function) {
auto bind_data = ApproximateQuantileBindData::Deserialize(deserializer, function);
auto &return_type = deserializer.Get<const LogicalType &>();
if (return_type.id() == LogicalTypeId::LIST) {
function = ApproxQuantileDecimalListFunction(function.arguments[0]);
} else {
function = ApproxQuantileDecimalFunction(function.arguments[0]);
}
return bind_data;
}
AggregateFunction GetApproxQuantileDecimal() {
// stub function - the actual function is set during bind or deserialize
AggregateFunction fun({LogicalTypeId::DECIMAL, LogicalType::FLOAT}, LogicalTypeId::DECIMAL, nullptr, nullptr,
nullptr, nullptr, nullptr, nullptr, BindApproxQuantileDecimal);
fun.serialize = ApproximateQuantileBindData::Serialize;
fun.deserialize = ApproxQuantileDecimalDeserialize;
return fun;
}
AggregateFunction GetApproxQuantileDecimalList() {
// stub function - the actual function is set during bind or deserialize
AggregateFunction fun({LogicalTypeId::DECIMAL, LogicalType::LIST(LogicalType::FLOAT)},
LogicalType::LIST(LogicalTypeId::DECIMAL), nullptr, nullptr, nullptr, nullptr, nullptr,
nullptr, BindApproxQuantileDecimalList);
fun.serialize = ApproximateQuantileBindData::Serialize;
fun.deserialize = ApproxQuantileDecimalDeserialize;
return fun;
}
} // namespace
AggregateFunctionSet ApproxQuantileFun::GetFunctions() {
AggregateFunctionSet approx_quantile;
approx_quantile.AddFunction(GetApproxQuantileDecimal());
approx_quantile.AddFunction(GetApproximateQuantileAggregate(LogicalType::SMALLINT));
approx_quantile.AddFunction(GetApproximateQuantileAggregate(LogicalType::INTEGER));
approx_quantile.AddFunction(GetApproximateQuantileAggregate(LogicalType::BIGINT));
approx_quantile.AddFunction(GetApproximateQuantileAggregate(LogicalType::HUGEINT));
approx_quantile.AddFunction(GetApproximateQuantileAggregate(LogicalType::DOUBLE));
approx_quantile.AddFunction(GetApproximateQuantileAggregate(LogicalType::DATE));
approx_quantile.AddFunction(GetApproximateQuantileAggregate(LogicalType::TIME));
approx_quantile.AddFunction(GetApproximateQuantileAggregate(LogicalType::TIME_TZ));
approx_quantile.AddFunction(GetApproximateQuantileAggregate(LogicalType::TIMESTAMP));
approx_quantile.AddFunction(GetApproximateQuantileAggregate(LogicalType::TIMESTAMP_TZ));
// List variants
approx_quantile.AddFunction(GetApproxQuantileDecimalList());
approx_quantile.AddFunction(GetApproxQuantileListAggregate(LogicalTypeId::TINYINT));
approx_quantile.AddFunction(GetApproxQuantileListAggregate(LogicalTypeId::SMALLINT));
approx_quantile.AddFunction(GetApproxQuantileListAggregate(LogicalTypeId::INTEGER));
approx_quantile.AddFunction(GetApproxQuantileListAggregate(LogicalTypeId::BIGINT));
approx_quantile.AddFunction(GetApproxQuantileListAggregate(LogicalTypeId::HUGEINT));
approx_quantile.AddFunction(GetApproxQuantileListAggregate(LogicalTypeId::FLOAT));
approx_quantile.AddFunction(GetApproxQuantileListAggregate(LogicalTypeId::DOUBLE));
approx_quantile.AddFunction(GetApproxQuantileListAggregate(LogicalType::DATE));
approx_quantile.AddFunction(GetApproxQuantileListAggregate(LogicalType::TIME));
approx_quantile.AddFunction(GetApproxQuantileListAggregate(LogicalType::TIME_TZ));
approx_quantile.AddFunction(GetApproxQuantileListAggregate(LogicalType::TIMESTAMP));
approx_quantile.AddFunction(GetApproxQuantileListAggregate(LogicalType::TIMESTAMP_TZ));
return approx_quantile;
}
} // namespace duckdb

View File

@@ -0,0 +1,59 @@
[
{
"name": "approx_quantile",
"parameters": "x,pos",
"description": "Computes the approximate quantile using T-Digest.",
"example": "approx_quantile(x, 0.5)",
"type": "aggregate_function_set"
},
{
"name": "mad",
"parameters": "x",
"description": "Returns the median absolute deviation for the values within x. NULL values are ignored. Temporal types return a positive INTERVAL.\t",
"example": "mad(x)",
"type": "aggregate_function_set"
},
{
"name": "median",
"parameters": "x",
"description": "Returns the middle value of the set. NULL values are ignored. For even value counts, interpolate-able types (numeric, date/time) return the average of the two middle values. Non-interpolate-able types (everything else) return the lower of the two middle values.",
"example": "median(x)",
"type": "aggregate_function_set"
},
{
"name": "mode",
"parameters": "x",
"description": "Returns the most frequent value for the values within x. NULL values are ignored.",
"example": "",
"type": "aggregate_function_set"
},
{
"name": "quantile_disc",
"parameters": "x,pos",
"description": "Returns the exact quantile number between 0 and 1 . If pos is a LIST of FLOATs, then the result is a LIST of the corresponding exact quantiles.",
"example": "quantile_disc(x, 0.5)",
"type": "aggregate_function_set",
"aliases": ["quantile"]
},
{
"name": "quantile_cont",
"parameters": "x,pos",
"description": "Returns the interpolated quantile number between 0 and 1 . If pos is a LIST of FLOATs, then the result is a LIST of the corresponding interpolated quantiles.\t",
"example": "quantile_cont(x, 0.5)",
"type": "aggregate_function_set"
},
{
"name": "reservoir_quantile",
"parameters": "x,quantile,sample_size",
"description": "Gives the approximate quantile using reservoir sampling, the sample size is optional and uses 8192 as a default size.",
"example": "reservoir_quantile(A, 0.5, 1024)",
"type": "aggregate_function_set"
},
{
"name": "approx_top_k",
"parameters": "val,k",
"description": "Finds the k approximately most occurring values in the data set",
"example": "approx_top_k(x, 5)",
"type": "aggregate_function"
}
]

View File

@@ -0,0 +1,348 @@
#include "core_functions/aggregate/holistic_functions.hpp"
#include "duckdb/planner/expression.hpp"
#include "duckdb/common/operator/cast_operators.hpp"
#include "duckdb/common/operator/abs.hpp"
#include "core_functions/aggregate/quantile_state.hpp"
namespace duckdb {
namespace {
struct FrameSet {
inline explicit FrameSet(const SubFrames &frames_p) : frames(frames_p) {
}
inline idx_t Size() const {
idx_t result = 0;
for (const auto &frame : frames) {
result += frame.end - frame.start;
}
return result;
}
inline bool Contains(idx_t i) const {
for (idx_t f = 0; f < frames.size(); ++f) {
const auto &frame = frames[f];
if (frame.start <= i && i < frame.end) {
return true;
}
}
return false;
}
const SubFrames &frames;
};
struct QuantileReuseUpdater {
idx_t *index;
idx_t j;
inline QuantileReuseUpdater(idx_t *index, idx_t j) : index(index), j(j) {
}
inline void Neither(idx_t begin, idx_t end) {
}
inline void Left(idx_t begin, idx_t end) {
}
inline void Right(idx_t begin, idx_t end) {
for (; begin < end; ++begin) {
index[j++] = begin;
}
}
inline void Both(idx_t begin, idx_t end) {
}
};
void ReuseIndexes(idx_t *index, const SubFrames &currs, const SubFrames &prevs) {
// Copy overlapping indices by scanning the previous set and copying down into holes.
// We copy instead of leaving gaps in case there are fewer values in the current frame.
FrameSet prev_set(prevs);
FrameSet curr_set(currs);
const auto prev_count = prev_set.Size();
idx_t j = 0;
for (idx_t p = 0; p < prev_count; ++p) {
auto idx = index[p];
// Shift down into any hole
if (j != p) {
index[j] = idx;
}
// Skip overlapping values
if (curr_set.Contains(idx)) {
++j;
}
}
// Insert new indices
if (j > 0) {
QuantileReuseUpdater updater(index, j);
AggregateExecutor::IntersectFrames(prevs, currs, updater);
} else {
// No overlap: overwrite with new values
for (const auto &curr : currs) {
for (auto idx = curr.start; idx < curr.end; ++idx) {
index[j++] = idx;
}
}
}
}
//===--------------------------------------------------------------------===//
// Median Absolute Deviation
//===--------------------------------------------------------------------===//
template <typename T, typename R, typename MEDIAN_TYPE>
struct MadAccessor {
using INPUT_TYPE = T;
using RESULT_TYPE = R;
const MEDIAN_TYPE &median;
explicit MadAccessor(const MEDIAN_TYPE &median_p) : median(median_p) {
}
inline RESULT_TYPE operator()(const INPUT_TYPE &input) const {
const RESULT_TYPE delta = input - UnsafeNumericCast<RESULT_TYPE>(median);
return TryAbsOperator::Operation<RESULT_TYPE, RESULT_TYPE>(delta);
}
};
// hugeint_t - double => undefined
template <>
struct MadAccessor<hugeint_t, double, double> {
using INPUT_TYPE = hugeint_t;
using RESULT_TYPE = double;
using MEDIAN_TYPE = double;
const MEDIAN_TYPE &median;
explicit MadAccessor(const MEDIAN_TYPE &median_p) : median(median_p) {
}
inline RESULT_TYPE operator()(const INPUT_TYPE &input) const {
const auto delta = Hugeint::Cast<double>(input) - median;
return TryAbsOperator::Operation<double, double>(delta);
}
};
// date_t - timestamp_t => interval_t
template <>
struct MadAccessor<date_t, interval_t, timestamp_t> {
using INPUT_TYPE = date_t;
using RESULT_TYPE = interval_t;
using MEDIAN_TYPE = timestamp_t;
const MEDIAN_TYPE &median;
explicit MadAccessor(const MEDIAN_TYPE &median_p) : median(median_p) {
}
inline RESULT_TYPE operator()(const INPUT_TYPE &input) const {
const auto dt = Cast::Operation<date_t, timestamp_t>(input);
const auto delta = dt - median;
return Interval::FromMicro(TryAbsOperator::Operation<int64_t, int64_t>(delta));
}
};
// timestamp_t - timestamp_t => int64_t
template <>
struct MadAccessor<timestamp_t, interval_t, timestamp_t> {
using INPUT_TYPE = timestamp_t;
using RESULT_TYPE = interval_t;
using MEDIAN_TYPE = timestamp_t;
const MEDIAN_TYPE &median;
explicit MadAccessor(const MEDIAN_TYPE &median_p) : median(median_p) {
}
inline RESULT_TYPE operator()(const INPUT_TYPE &input) const {
const auto delta = input - median;
return Interval::FromMicro(TryAbsOperator::Operation<int64_t, int64_t>(delta));
}
};
// dtime_t - dtime_t => int64_t
template <>
struct MadAccessor<dtime_t, interval_t, dtime_t> {
using INPUT_TYPE = dtime_t;
using RESULT_TYPE = interval_t;
using MEDIAN_TYPE = dtime_t;
const MEDIAN_TYPE &median;
explicit MadAccessor(const MEDIAN_TYPE &median_p) : median(median_p) {
}
inline RESULT_TYPE operator()(const INPUT_TYPE &input) const {
const auto delta = input - median;
return Interval::FromMicro(TryAbsOperator::Operation<int64_t, int64_t>(delta));
}
};
template <typename MEDIAN_TYPE>
struct MedianAbsoluteDeviationOperation : QuantileOperation {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.v.empty()) {
finalize_data.ReturnNull();
return;
}
using INPUT_TYPE = typename STATE::InputType;
D_ASSERT(finalize_data.input.bind_data);
auto &bind_data = finalize_data.input.bind_data->Cast<QuantileBindData>();
D_ASSERT(bind_data.quantiles.size() == 1);
const auto &q = bind_data.quantiles[0];
QuantileInterpolator<false> interp(q, state.v.size(), false);
const auto med = interp.template Operation<INPUT_TYPE, MEDIAN_TYPE>(state.v.data(), finalize_data.result);
MadAccessor<INPUT_TYPE, T, MEDIAN_TYPE> accessor(med);
target = interp.template Operation<INPUT_TYPE, T>(state.v.data(), finalize_data.result, accessor);
}
template <class STATE, class INPUT_TYPE, class RESULT_TYPE>
static void Window(AggregateInputData &aggr_input_data, const WindowPartitionInput &partition,
const_data_ptr_t g_state, data_ptr_t l_state, const SubFrames &frames, Vector &result,
idx_t ridx) {
auto &state = *reinterpret_cast<STATE *>(l_state);
auto gstate = reinterpret_cast<const STATE *>(g_state);
auto &data = state.GetOrCreateWindowCursor(partition);
const auto &fmask = partition.filter_mask;
auto rdata = FlatVector::GetData<RESULT_TYPE>(result);
QuantileIncluded<INPUT_TYPE> included(fmask, data);
const auto n = FrameSize(included, frames);
if (!n) {
auto &rmask = FlatVector::Validity(result);
rmask.Set(ridx, false);
return;
}
// Compute the median
D_ASSERT(aggr_input_data.bind_data);
auto &bind_data = aggr_input_data.bind_data->Cast<QuantileBindData>();
D_ASSERT(bind_data.quantiles.size() == 1);
const auto &quantile = bind_data.quantiles[0];
auto &window_state = state.GetOrCreateWindowState();
MEDIAN_TYPE med;
if (gstate && gstate->HasTree()) {
med = gstate->GetWindowState().template WindowScalar<MEDIAN_TYPE, false>(data, frames, n, result, quantile);
} else {
window_state.UpdateSkip(data, frames, included);
med = window_state.template WindowScalar<MEDIAN_TYPE, false>(data, frames, n, result, quantile);
}
// Lazily initialise frame state
window_state.SetCount(frames.back().end - frames.front().start);
auto index2 = window_state.m.data();
D_ASSERT(index2);
// The replacement trick does not work on the second index because if
// the median has changed, the previous order is not correct.
// It is probably close, however, and so reuse is helpful.
auto &prevs = window_state.prevs;
ReuseIndexes(index2, frames, prevs);
std::partition(index2, index2 + window_state.count, included);
QuantileInterpolator<false> interp(quantile, n, false);
// Compute mad from the second index
using ID = QuantileIndirect<INPUT_TYPE>;
ID indirect(data);
using MAD = MadAccessor<INPUT_TYPE, RESULT_TYPE, MEDIAN_TYPE>;
MAD mad(med);
using MadIndirect = QuantileComposed<MAD, ID>;
MadIndirect mad_indirect(mad, indirect);
rdata[ridx] = interp.template Operation<idx_t, RESULT_TYPE, MadIndirect>(index2, result, mad_indirect);
// Prev is used by both skip lists and increments
prevs = frames;
}
};
unique_ptr<FunctionData> BindMAD(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
return make_uniq<QuantileBindData>(Value::DECIMAL(int16_t(5), 2, 1));
}
template <typename INPUT_TYPE, typename MEDIAN_TYPE, typename TARGET_TYPE>
AggregateFunction GetTypedMedianAbsoluteDeviationAggregateFunction(const LogicalType &input_type,
const LogicalType &target_type) {
using STATE = QuantileState<INPUT_TYPE, QuantileStandardType>;
using OP = MedianAbsoluteDeviationOperation<MEDIAN_TYPE>;
auto fun = AggregateFunction::UnaryAggregateDestructor<STATE, INPUT_TYPE, TARGET_TYPE, OP,
AggregateDestructorType::LEGACY>(input_type, target_type);
fun.bind = BindMAD;
fun.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
#ifndef DUCKDB_SMALLER_BINARY
fun.window = OP::template Window<STATE, INPUT_TYPE, TARGET_TYPE>;
fun.window_init = OP::template WindowInit<STATE, INPUT_TYPE>;
#endif
return fun;
}
AggregateFunction GetMedianAbsoluteDeviationAggregateFunctionInternal(const LogicalType &type) {
switch (type.id()) {
case LogicalTypeId::FLOAT:
return GetTypedMedianAbsoluteDeviationAggregateFunction<float, float, float>(type, type);
case LogicalTypeId::DOUBLE:
return GetTypedMedianAbsoluteDeviationAggregateFunction<double, double, double>(type, type);
case LogicalTypeId::DECIMAL:
switch (type.InternalType()) {
case PhysicalType::INT16:
return GetTypedMedianAbsoluteDeviationAggregateFunction<int16_t, int16_t, int16_t>(type, type);
case PhysicalType::INT32:
return GetTypedMedianAbsoluteDeviationAggregateFunction<int32_t, int32_t, int32_t>(type, type);
case PhysicalType::INT64:
return GetTypedMedianAbsoluteDeviationAggregateFunction<int64_t, int64_t, int64_t>(type, type);
case PhysicalType::INT128:
return GetTypedMedianAbsoluteDeviationAggregateFunction<hugeint_t, hugeint_t, hugeint_t>(type, type);
default:
throw NotImplementedException("Unimplemented Median Absolute Deviation DECIMAL aggregate");
}
break;
case LogicalTypeId::DATE:
return GetTypedMedianAbsoluteDeviationAggregateFunction<date_t, timestamp_t, interval_t>(type,
LogicalType::INTERVAL);
case LogicalTypeId::TIMESTAMP:
case LogicalTypeId::TIMESTAMP_TZ:
return GetTypedMedianAbsoluteDeviationAggregateFunction<timestamp_t, timestamp_t, interval_t>(
type, LogicalType::INTERVAL);
case LogicalTypeId::TIME:
case LogicalTypeId::TIME_TZ:
return GetTypedMedianAbsoluteDeviationAggregateFunction<dtime_t, dtime_t, interval_t>(type,
LogicalType::INTERVAL);
default:
throw NotImplementedException("Unimplemented Median Absolute Deviation aggregate");
}
}
AggregateFunction GetMedianAbsoluteDeviationAggregateFunction(const LogicalType &type) {
auto result = GetMedianAbsoluteDeviationAggregateFunctionInternal(type);
result.errors = FunctionErrors::CAN_THROW_RUNTIME_ERROR;
return result;
}
unique_ptr<FunctionData> BindMedianAbsoluteDeviationDecimal(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
function = GetMedianAbsoluteDeviationAggregateFunction(arguments[0]->return_type);
function.name = "mad";
function.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
return BindMAD(context, function, arguments);
}
} // namespace
AggregateFunctionSet MadFun::GetFunctions() {
AggregateFunctionSet mad("mad");
mad.AddFunction(AggregateFunction({LogicalTypeId::DECIMAL}, LogicalTypeId::DECIMAL, nullptr, nullptr, nullptr,
nullptr, nullptr, nullptr, BindMedianAbsoluteDeviationDecimal));
const vector<LogicalType> MAD_TYPES = {LogicalType::FLOAT, LogicalType::DOUBLE, LogicalType::DATE,
LogicalType::TIMESTAMP, LogicalType::TIME, LogicalType::TIMESTAMP_TZ,
LogicalType::TIME_TZ};
for (const auto &type : MAD_TYPES) {
mad.AddFunction(GetMedianAbsoluteDeviationAggregateFunction(type));
}
return mad;
}
} // namespace duckdb

View File

@@ -0,0 +1,580 @@
#include "duckdb/common/exception.hpp"
#include "duckdb/common/uhugeint.hpp"
#include "duckdb/common/vector_operations/vector_operations.hpp"
#include "duckdb/common/operator/comparison_operators.hpp"
#include "duckdb/common/types/column/column_data_collection.hpp"
#include "core_functions/aggregate/distributive_functions.hpp"
#include "core_functions/aggregate/holistic_functions.hpp"
#include "duckdb/planner/expression/bound_aggregate_expression.hpp"
#include "duckdb/common/unordered_map.hpp"
#include "duckdb/common/owning_string_map.hpp"
#include "duckdb/function/create_sort_key.hpp"
#include "duckdb/function/aggregate/sort_key_helpers.hpp"
#include "duckdb/common/algorithm.hpp"
#include <functional>
// MODE( <expr1> )
// Returns the most frequent value for the values within expr1.
// NULL values are ignored. If all the values are NULL, or there are 0 rows, then the function returns NULL.
namespace std {} // namespace std
namespace duckdb {
namespace {
struct ModeAttr {
ModeAttr() : count(0), first_row(std::numeric_limits<idx_t>::max()) {
}
size_t count;
idx_t first_row;
};
template <class T>
struct ModeStandard {
using MAP_TYPE = unordered_map<T, ModeAttr>;
static MAP_TYPE *CreateEmpty(ArenaAllocator &) {
return new MAP_TYPE();
}
static MAP_TYPE *CreateEmpty(Allocator &) {
return new MAP_TYPE();
}
template <class INPUT_TYPE, class RESULT_TYPE>
static RESULT_TYPE Assign(Vector &result, INPUT_TYPE input) {
return RESULT_TYPE(input);
}
};
struct ModeString {
using MAP_TYPE = OwningStringMap<ModeAttr>;
static MAP_TYPE *CreateEmpty(ArenaAllocator &allocator) {
return new MAP_TYPE(allocator);
}
static MAP_TYPE *CreateEmpty(Allocator &allocator) {
return new MAP_TYPE(allocator);
}
template <class INPUT_TYPE, class RESULT_TYPE>
static RESULT_TYPE Assign(Vector &result, INPUT_TYPE input) {
return StringVector::AddStringOrBlob(result, input);
}
};
template <class KEY_TYPE, class TYPE_OP>
struct ModeState {
using Counts = typename TYPE_OP::MAP_TYPE;
ModeState() {
}
SubFrames prevs;
Counts *frequency_map = nullptr;
KEY_TYPE *mode = nullptr;
size_t nonzero = 0;
bool valid = false;
size_t count = 0;
//! The collection being read
const ColumnDataCollection *inputs;
//! The state used for reading the collection on this thread
ColumnDataScanState *scan = nullptr;
//! The data chunk paged into into
DataChunk page;
//! The data pointer
const KEY_TYPE *data = nullptr;
//! The validity mask
const ValidityMask *validity = nullptr;
~ModeState() {
if (frequency_map) {
delete frequency_map;
}
if (mode) {
delete mode;
}
if (scan) {
delete scan;
}
}
void InitializePage(const WindowPartitionInput &partition) {
if (!scan) {
scan = new ColumnDataScanState();
}
if (page.ColumnCount() == 0) {
D_ASSERT(partition.inputs);
inputs = partition.inputs;
D_ASSERT(partition.column_ids.size() == 1);
inputs->InitializeScan(*scan, partition.column_ids);
inputs->InitializeScanChunk(*scan, page);
}
}
inline sel_t RowOffset(idx_t row_idx) const {
D_ASSERT(RowIsVisible(row_idx));
return UnsafeNumericCast<sel_t>(row_idx - scan->current_row_index);
}
inline bool RowIsVisible(idx_t row_idx) const {
return (row_idx < scan->next_row_index && scan->current_row_index <= row_idx);
}
inline idx_t Seek(idx_t row_idx) {
if (!RowIsVisible(row_idx)) {
D_ASSERT(inputs);
inputs->Seek(row_idx, *scan, page);
data = FlatVector::GetData<KEY_TYPE>(page.data[0]);
validity = &FlatVector::Validity(page.data[0]);
}
return RowOffset(row_idx);
}
inline const KEY_TYPE &GetCell(idx_t row_idx) {
const auto offset = Seek(row_idx);
return data[offset];
}
inline bool RowIsValid(idx_t row_idx) {
const auto offset = Seek(row_idx);
return validity->RowIsValid(offset);
}
void Reset() {
if (frequency_map) {
frequency_map->clear();
}
nonzero = 0;
count = 0;
valid = false;
}
void ModeAdd(idx_t row) {
const auto &key = GetCell(row);
auto &attr = (*frequency_map)[key];
auto new_count = (attr.count += 1);
if (new_count == 1) {
++nonzero;
attr.first_row = row;
} else {
attr.first_row = MinValue(row, attr.first_row);
}
if (new_count > count) {
valid = true;
count = new_count;
if (mode) {
*mode = key;
} else {
mode = new KEY_TYPE(key);
}
}
}
void ModeRm(idx_t frame) {
const auto &key = GetCell(frame);
auto &attr = (*frequency_map)[key];
auto old_count = attr.count;
nonzero -= size_t(old_count == 1);
attr.count -= 1;
if (count == old_count && key == *mode) {
valid = false;
}
}
typename Counts::const_iterator Scan() const {
//! Initialize control variables to first variable of the frequency map
auto highest_frequency = frequency_map->begin();
for (auto i = highest_frequency; i != frequency_map->end(); ++i) {
// Tie break with the lowest insert position
if (i->second.count > highest_frequency->second.count ||
(i->second.count == highest_frequency->second.count &&
i->second.first_row < highest_frequency->second.first_row)) {
highest_frequency = i;
}
}
return highest_frequency;
}
};
template <typename STATE>
struct ModeIncluded {
inline explicit ModeIncluded(const ValidityMask &fmask_p, STATE &state) : fmask(fmask_p), state(state) {
}
inline bool operator()(const idx_t &idx) const {
return fmask.RowIsValid(idx) && state.RowIsValid(idx);
}
const ValidityMask &fmask;
STATE &state;
};
template <typename TYPE_OP>
struct BaseModeFunction {
template <class STATE>
static void Initialize(STATE &state) {
new (&state) STATE();
}
template <class INPUT_TYPE, class STATE, class OP>
static void Execute(STATE &state, const INPUT_TYPE &key, AggregateInputData &input_data) {
if (!state.frequency_map) {
state.frequency_map = TYPE_OP::CreateEmpty(input_data.allocator);
}
auto &i = (*state.frequency_map)[key];
++i.count;
i.first_row = MinValue<idx_t>(i.first_row, state.count);
++state.count;
}
template <class INPUT_TYPE, class STATE, class OP>
static void Operation(STATE &state, const INPUT_TYPE &key, AggregateUnaryInput &aggr_input) {
Execute<INPUT_TYPE, STATE, OP>(state, key, aggr_input.input);
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &) {
if (!source.frequency_map) {
return;
}
if (!target.frequency_map) {
// Copy - don't destroy! Otherwise windowing will break.
target.frequency_map = new typename STATE::Counts(*source.frequency_map);
target.count = source.count;
return;
}
for (auto &val : *source.frequency_map) {
auto &i = (*target.frequency_map)[val.first];
i.count += val.second.count;
i.first_row = MinValue(i.first_row, val.second.first_row);
}
target.count += source.count;
}
static bool IgnoreNull() {
return true;
}
template <class STATE>
static void Destroy(STATE &state, AggregateInputData &aggr_input_data) {
state.~STATE();
}
};
template <typename TYPE_OP>
struct TypedModeFunction : BaseModeFunction<TYPE_OP> {
template <class INPUT_TYPE, class STATE, class OP>
static void ConstantOperation(STATE &state, const INPUT_TYPE &key, AggregateUnaryInput &aggr_input, idx_t count) {
if (!state.frequency_map) {
state.frequency_map = TYPE_OP::CreateEmpty(aggr_input.input.allocator);
}
auto &i = (*state.frequency_map)[key];
i.count += count;
i.first_row = MinValue<idx_t>(i.first_row, state.count);
state.count += count;
}
};
template <typename TYPE_OP>
struct ModeFunction : TypedModeFunction<TYPE_OP> {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (!state.frequency_map) {
finalize_data.ReturnNull();
return;
}
auto highest_frequency = state.Scan();
if (highest_frequency != state.frequency_map->end()) {
target = TYPE_OP::template Assign<T, T>(finalize_data.result, highest_frequency->first);
} else {
finalize_data.ReturnNull();
}
}
template <typename STATE, typename INPUT_TYPE>
struct UpdateWindowState {
STATE &state;
ModeIncluded<STATE> &included;
inline UpdateWindowState(STATE &state, ModeIncluded<STATE> &included) : state(state), included(included) {
}
inline void Neither(idx_t begin, idx_t end) {
}
inline void Left(idx_t begin, idx_t end) {
for (; begin < end; ++begin) {
if (included(begin)) {
state.ModeRm(begin);
}
}
}
inline void Right(idx_t begin, idx_t end) {
for (; begin < end; ++begin) {
if (included(begin)) {
state.ModeAdd(begin);
}
}
}
inline void Both(idx_t begin, idx_t end) {
}
};
template <class STATE, class INPUT_TYPE, class RESULT_TYPE>
static void Window(AggregateInputData &aggr_input_data, const WindowPartitionInput &partition,
const_data_ptr_t g_state, data_ptr_t l_state, const SubFrames &frames, Vector &result,
idx_t rid) {
auto &state = *reinterpret_cast<STATE *>(l_state);
state.InitializePage(partition);
const auto &fmask = partition.filter_mask;
auto rdata = FlatVector::GetData<RESULT_TYPE>(result);
auto &rmask = FlatVector::Validity(result);
auto &prevs = state.prevs;
if (prevs.empty()) {
prevs.resize(1);
}
ModeIncluded<STATE> included(fmask, state);
if (!state.frequency_map) {
state.frequency_map = TYPE_OP::CreateEmpty(Allocator::DefaultAllocator());
}
const size_t tau_inverse = 4; // tau==0.25
if (state.nonzero <= (state.frequency_map->size() / tau_inverse) || prevs.back().end <= frames.front().start ||
frames.back().end <= prevs.front().start) {
state.Reset();
// for f ∈ F do
for (const auto &frame : frames) {
for (auto i = frame.start; i < frame.end; ++i) {
if (included(i)) {
state.ModeAdd(i);
}
}
}
} else {
using Updater = UpdateWindowState<STATE, INPUT_TYPE>;
Updater updater(state, included);
AggregateExecutor::IntersectFrames(prevs, frames, updater);
}
if (!state.valid) {
// Rescan
auto highest_frequency = state.Scan();
if (highest_frequency != state.frequency_map->end()) {
*(state.mode) = highest_frequency->first;
state.count = highest_frequency->second.count;
state.valid = (state.count > 0);
}
}
if (state.valid) {
rdata[rid] = TYPE_OP::template Assign<INPUT_TYPE, RESULT_TYPE>(result, *state.mode);
} else {
rmask.Set(rid, false);
}
prevs = frames;
}
};
template <typename TYPE_OP>
struct ModeFallbackFunction : BaseModeFunction<TYPE_OP> {
template <class STATE>
static void Finalize(STATE &state, AggregateFinalizeData &finalize_data) {
if (!state.frequency_map) {
finalize_data.ReturnNull();
return;
}
auto highest_frequency = state.Scan();
if (highest_frequency != state.frequency_map->end()) {
CreateSortKeyHelpers::DecodeSortKey(highest_frequency->first, finalize_data.result,
finalize_data.result_idx,
OrderModifiers(OrderType::ASCENDING, OrderByNullType::NULLS_LAST));
} else {
finalize_data.ReturnNull();
}
}
};
AggregateFunction GetFallbackModeFunction(const LogicalType &type) {
using STATE = ModeState<string_t, ModeString>;
using OP = ModeFallbackFunction<ModeString>;
AggregateFunction aggr({type}, type, AggregateFunction::StateSize<STATE>,
AggregateFunction::StateInitialize<STATE, OP, AggregateDestructorType::LEGACY>,
AggregateSortKeyHelpers::UnaryUpdate<STATE, OP>, AggregateFunction::StateCombine<STATE, OP>,
AggregateFunction::StateVoidFinalize<STATE, OP>, nullptr);
aggr.destructor = AggregateFunction::StateDestroy<STATE, OP>;
return aggr;
}
template <typename INPUT_TYPE, typename TYPE_OP = ModeStandard<INPUT_TYPE>>
AggregateFunction GetTypedModeFunction(const LogicalType &type) {
using STATE = ModeState<INPUT_TYPE, TYPE_OP>;
using OP = ModeFunction<TYPE_OP>;
auto func =
AggregateFunction::UnaryAggregateDestructor<STATE, INPUT_TYPE, INPUT_TYPE, OP, AggregateDestructorType::LEGACY>(
type, type);
func.window = OP::template Window<STATE, INPUT_TYPE, INPUT_TYPE>;
return func;
}
AggregateFunction GetModeAggregate(const LogicalType &type) {
switch (type.InternalType()) {
#ifndef DUCKDB_SMALLER_BINARY
case PhysicalType::INT8:
return GetTypedModeFunction<int8_t>(type);
case PhysicalType::UINT8:
return GetTypedModeFunction<uint8_t>(type);
case PhysicalType::INT16:
return GetTypedModeFunction<int16_t>(type);
case PhysicalType::UINT16:
return GetTypedModeFunction<uint16_t>(type);
case PhysicalType::INT32:
return GetTypedModeFunction<int32_t>(type);
case PhysicalType::UINT32:
return GetTypedModeFunction<uint32_t>(type);
case PhysicalType::INT64:
return GetTypedModeFunction<int64_t>(type);
case PhysicalType::UINT64:
return GetTypedModeFunction<uint64_t>(type);
case PhysicalType::INT128:
return GetTypedModeFunction<hugeint_t>(type);
case PhysicalType::UINT128:
return GetTypedModeFunction<uhugeint_t>(type);
case PhysicalType::FLOAT:
return GetTypedModeFunction<float>(type);
case PhysicalType::DOUBLE:
return GetTypedModeFunction<double>(type);
case PhysicalType::INTERVAL:
return GetTypedModeFunction<interval_t>(type);
case PhysicalType::VARCHAR:
return GetTypedModeFunction<string_t, ModeString>(type);
#endif
default:
return GetFallbackModeFunction(type);
}
}
unique_ptr<FunctionData> BindModeAggregate(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
function = GetModeAggregate(arguments[0]->return_type);
function.name = "mode";
return nullptr;
}
} // namespace
AggregateFunctionSet ModeFun::GetFunctions() {
AggregateFunctionSet mode("mode");
mode.AddFunction(AggregateFunction({LogicalTypeId::ANY}, LogicalTypeId::ANY, nullptr, nullptr, nullptr, nullptr,
nullptr, nullptr, BindModeAggregate));
return mode;
}
//===--------------------------------------------------------------------===//
// Entropy
//===--------------------------------------------------------------------===//
namespace {
template <class STATE>
double FinalizeEntropy(STATE &state) {
if (!state.frequency_map) {
return 0;
}
double count = static_cast<double>(state.count);
double entropy = 0;
for (auto &val : *state.frequency_map) {
double val_sec = static_cast<double>(val.second.count);
entropy += (val_sec / count) * log2(count / val_sec);
}
return entropy;
}
template <typename TYPE_OP>
struct EntropyFunction : TypedModeFunction<TYPE_OP> {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
target = FinalizeEntropy(state);
}
};
template <typename TYPE_OP>
struct EntropyFallbackFunction : BaseModeFunction<TYPE_OP> {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
target = FinalizeEntropy(state);
}
};
template <typename INPUT_TYPE, typename TYPE_OP = ModeStandard<INPUT_TYPE>>
AggregateFunction GetTypedEntropyFunction(const LogicalType &type) {
using STATE = ModeState<INPUT_TYPE, TYPE_OP>;
using OP = EntropyFunction<TYPE_OP>;
auto func =
AggregateFunction::UnaryAggregateDestructor<STATE, INPUT_TYPE, double, OP, AggregateDestructorType::LEGACY>(
type, LogicalType::DOUBLE);
func.null_handling = FunctionNullHandling::SPECIAL_HANDLING;
return func;
}
AggregateFunction GetFallbackEntropyFunction(const LogicalType &type) {
using STATE = ModeState<string_t, ModeString>;
using OP = EntropyFallbackFunction<ModeString>;
AggregateFunction func({type}, LogicalType::DOUBLE, AggregateFunction::StateSize<STATE>,
AggregateFunction::StateInitialize<STATE, OP, AggregateDestructorType::LEGACY>,
AggregateSortKeyHelpers::UnaryUpdate<STATE, OP>, AggregateFunction::StateCombine<STATE, OP>,
AggregateFunction::StateFinalize<STATE, double, OP>, nullptr);
func.destructor = AggregateFunction::StateDestroy<STATE, OP>;
func.null_handling = FunctionNullHandling::SPECIAL_HANDLING;
return func;
}
AggregateFunction GetEntropyFunction(const LogicalType &type) {
switch (type.InternalType()) {
#ifndef DUCKDB_SMALLER_BINARY
case PhysicalType::UINT16:
return GetTypedEntropyFunction<uint16_t>(type);
case PhysicalType::UINT32:
return GetTypedEntropyFunction<uint32_t>(type);
case PhysicalType::UINT64:
return GetTypedEntropyFunction<uint64_t>(type);
case PhysicalType::INT16:
return GetTypedEntropyFunction<int16_t>(type);
case PhysicalType::INT32:
return GetTypedEntropyFunction<int32_t>(type);
case PhysicalType::INT64:
return GetTypedEntropyFunction<int64_t>(type);
case PhysicalType::FLOAT:
return GetTypedEntropyFunction<float>(type);
case PhysicalType::DOUBLE:
return GetTypedEntropyFunction<double>(type);
case PhysicalType::VARCHAR:
return GetTypedEntropyFunction<string_t, ModeString>(type);
#endif
default:
return GetFallbackEntropyFunction(type);
}
}
unique_ptr<FunctionData> BindEntropyAggregate(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
function = GetEntropyFunction(arguments[0]->return_type);
function.name = "entropy";
return nullptr;
}
} // namespace
AggregateFunctionSet EntropyFun::GetFunctions() {
AggregateFunctionSet entropy("entropy");
entropy.AddFunction(AggregateFunction({LogicalTypeId::ANY}, LogicalType::DOUBLE, nullptr, nullptr, nullptr, nullptr,
nullptr, nullptr, BindEntropyAggregate));
return entropy;
}
} // namespace duckdb

View File

@@ -0,0 +1,833 @@
#include "duckdb/execution/expression_executor.hpp"
#include "core_functions/aggregate/holistic_functions.hpp"
#include "duckdb/common/enums/quantile_enum.hpp"
#include "duckdb/planner/expression.hpp"
#include "duckdb/common/operator/cast_operators.hpp"
#include "duckdb/common/operator/abs.hpp"
#include "core_functions/aggregate/quantile_state.hpp"
#include "duckdb/common/types/timestamp.hpp"
#include "duckdb/common/serializer/serializer.hpp"
#include "duckdb/common/serializer/deserializer.hpp"
#include "duckdb/function/aggregate/sort_key_helpers.hpp"
namespace duckdb {
template <class INPUT_TYPE>
struct IndirectLess {
inline explicit IndirectLess(const INPUT_TYPE *inputs_p) : inputs(inputs_p) {
}
inline bool operator()(const idx_t &lhi, const idx_t &rhi) const {
return inputs[lhi] < inputs[rhi];
}
const INPUT_TYPE *inputs;
};
template <typename T>
static inline T QuantileAbs(const T &t) {
return AbsOperator::Operation<T, T>(t);
}
template <>
inline Value QuantileAbs(const Value &v) {
const auto &type = v.type();
switch (type.id()) {
case LogicalTypeId::DECIMAL: {
const auto integral = IntegralValue::Get(v);
const auto width = DecimalType::GetWidth(type);
const auto scale = DecimalType::GetScale(type);
switch (type.InternalType()) {
case PhysicalType::INT16:
return Value::DECIMAL(QuantileAbs<int16_t>(Cast::Operation<hugeint_t, int16_t>(integral)), width, scale);
case PhysicalType::INT32:
return Value::DECIMAL(QuantileAbs<int32_t>(Cast::Operation<hugeint_t, int32_t>(integral)), width, scale);
case PhysicalType::INT64:
return Value::DECIMAL(QuantileAbs<int64_t>(Cast::Operation<hugeint_t, int64_t>(integral)), width, scale);
case PhysicalType::INT128:
return Value::DECIMAL(QuantileAbs<hugeint_t>(integral), width, scale);
default:
throw InternalException("Unknown DECIMAL type");
}
}
default:
return Value::DOUBLE(QuantileAbs<double>(v.GetValue<double>()));
}
}
//===--------------------------------------------------------------------===//
// Quantile Bind Data
//===--------------------------------------------------------------------===//
QuantileBindData::QuantileBindData() {
}
QuantileBindData::QuantileBindData(const Value &quantile_p)
: quantiles(1, QuantileValue(QuantileAbs(quantile_p))), order(1, 0), desc(quantile_p < 0) {
}
QuantileBindData::QuantileBindData(const vector<Value> &quantiles_p) {
vector<Value> normalised;
size_t pos = 0;
size_t neg = 0;
for (idx_t i = 0; i < quantiles_p.size(); ++i) {
const auto &q = quantiles_p[i];
pos += (q > 0);
neg += (q < 0);
normalised.emplace_back(QuantileAbs(q));
order.push_back(i);
}
if (pos && neg) {
throw BinderException("QUANTILE parameters must have consistent signs");
}
desc = (neg > 0);
IndirectLess<Value> lt(normalised.data());
std::sort(order.begin(), order.end(), lt);
for (const auto &q : normalised) {
quantiles.emplace_back(QuantileValue(q));
}
}
QuantileBindData::QuantileBindData(const QuantileBindData &other) : order(other.order), desc(other.desc) {
for (const auto &q : other.quantiles) {
quantiles.emplace_back(q);
}
}
unique_ptr<FunctionData> QuantileBindData::Copy() const {
return make_uniq<QuantileBindData>(*this);
}
bool QuantileBindData::Equals(const FunctionData &other_p) const {
auto &other = other_p.Cast<QuantileBindData>();
return desc == other.desc && quantiles == other.quantiles && order == other.order;
}
void QuantileBindData::Serialize(Serializer &serializer, const optional_ptr<FunctionData> bind_data_p,
const AggregateFunction &function) {
auto &bind_data = bind_data_p->Cast<QuantileBindData>();
vector<Value> raw;
for (const auto &q : bind_data.quantiles) {
raw.emplace_back(q.val);
}
serializer.WriteProperty(100, "quantiles", raw);
serializer.WriteProperty(101, "order", bind_data.order);
serializer.WriteProperty(102, "desc", bind_data.desc);
}
unique_ptr<FunctionData> QuantileBindData::Deserialize(Deserializer &deserializer, AggregateFunction &function) {
auto result = make_uniq<QuantileBindData>();
vector<Value> raw;
deserializer.ReadProperty(100, "quantiles", raw);
deserializer.ReadProperty(101, "order", result->order);
deserializer.ReadProperty(102, "desc", result->desc);
QuantileSerializationType deserialization_type;
deserializer.ReadPropertyWithExplicitDefault(103, "quantile_type", deserialization_type,
QuantileSerializationType::NON_DECIMAL);
if (deserialization_type != QuantileSerializationType::NON_DECIMAL) {
deserializer.ReadDeletedProperty<LogicalType>(104, "logical_type");
}
for (const auto &r : raw) {
result->quantiles.emplace_back(QuantileValue(r));
}
return std::move(result);
}
//===--------------------------------------------------------------------===//
// Quantile Casts
//===--------------------------------------------------------------------===//
template <>
interval_t QuantileCast::Operation(const dtime_t &src, Vector &result) {
return {0, 0, src.micros};
}
template <>
string_t QuantileCast::Operation(const string_t &src, Vector &result) {
return StringVector::AddStringOrBlob(result, src);
}
//===--------------------------------------------------------------------===//
// Scalar Quantile
//===--------------------------------------------------------------------===//
template <bool DISCRETE, class TYPE_OP = QuantileStandardType>
struct QuantileScalarOperation : public QuantileOperation {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.v.empty()) {
finalize_data.ReturnNull();
return;
}
D_ASSERT(finalize_data.input.bind_data);
auto &bind_data = finalize_data.input.bind_data->Cast<QuantileBindData>();
D_ASSERT(bind_data.quantiles.size() == 1);
QuantileInterpolator<DISCRETE> interp(bind_data.quantiles[0], state.v.size(), bind_data.desc);
target = interp.template Operation<typename STATE::InputType, T>(state.v.data(), finalize_data.result);
}
template <class STATE, class INPUT_TYPE, class RESULT_TYPE>
static void Window(AggregateInputData &aggr_input_data, const WindowPartitionInput &partition,
const_data_ptr_t g_state, data_ptr_t l_state, const SubFrames &frames, Vector &result,
idx_t ridx) {
auto &state = *reinterpret_cast<STATE *>(l_state);
auto gstate = reinterpret_cast<const STATE *>(g_state);
auto &data = state.GetOrCreateWindowCursor(partition);
const auto &fmask = partition.filter_mask;
QuantileIncluded<INPUT_TYPE> included(fmask, data);
const auto n = FrameSize(included, frames);
D_ASSERT(aggr_input_data.bind_data);
auto &bind_data = aggr_input_data.bind_data->Cast<QuantileBindData>();
auto rdata = FlatVector::GetData<RESULT_TYPE>(result);
auto &rmask = FlatVector::Validity(result);
if (!n) {
rmask.Set(ridx, false);
return;
}
const auto &quantile = bind_data.quantiles[0];
if (gstate && gstate->HasTree()) {
rdata[ridx] = gstate->GetWindowState().template WindowScalar<RESULT_TYPE, DISCRETE>(data, frames, n, result,
quantile);
} else {
auto &window_state = state.GetOrCreateWindowState();
// Update the skip list
window_state.UpdateSkip(data, frames, included);
// Find the position(s) needed
rdata[ridx] = window_state.template WindowScalar<RESULT_TYPE, DISCRETE>(data, frames, n, result, quantile);
// Save the previous state for next time
window_state.prevs = frames;
}
}
};
struct QuantileScalarFallback : QuantileOperation {
template <class INPUT_TYPE, class STATE, class OP>
static void Execute(STATE &state, const INPUT_TYPE &key, AggregateInputData &input_data) {
state.AddElement(key, input_data);
}
template <class STATE>
static void Finalize(STATE &state, AggregateFinalizeData &finalize_data) {
if (state.v.empty()) {
finalize_data.ReturnNull();
return;
}
D_ASSERT(finalize_data.input.bind_data);
auto &bind_data = finalize_data.input.bind_data->Cast<QuantileBindData>();
D_ASSERT(bind_data.quantiles.size() == 1);
QuantileInterpolator<true> interp(bind_data.quantiles[0], state.v.size(), bind_data.desc);
auto interpolation_result = interp.InterpolateInternal<string_t>(state.v.data());
CreateSortKeyHelpers::DecodeSortKey(interpolation_result, finalize_data.result, finalize_data.result_idx,
OrderModifiers(OrderType::ASCENDING, OrderByNullType::NULLS_LAST));
}
};
//===--------------------------------------------------------------------===//
// Quantile List
//===--------------------------------------------------------------------===//
template <class CHILD_TYPE, bool DISCRETE>
struct QuantileListOperation : QuantileOperation {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.v.empty()) {
finalize_data.ReturnNull();
return;
}
D_ASSERT(finalize_data.input.bind_data);
auto &bind_data = finalize_data.input.bind_data->Cast<QuantileBindData>();
auto &result = ListVector::GetEntry(finalize_data.result);
auto ridx = ListVector::GetListSize(finalize_data.result);
ListVector::Reserve(finalize_data.result, ridx + bind_data.quantiles.size());
auto rdata = FlatVector::GetData<CHILD_TYPE>(result);
auto v_t = state.v.data();
D_ASSERT(v_t);
auto &entry = target;
entry.offset = ridx;
idx_t lower = 0;
for (const auto &q : bind_data.order) {
const auto &quantile = bind_data.quantiles[q];
QuantileInterpolator<DISCRETE> interp(quantile, state.v.size(), bind_data.desc);
interp.begin = lower;
rdata[ridx + q] = interp.template Operation<typename STATE::InputType, CHILD_TYPE>(v_t, result);
lower = interp.FRN;
}
entry.length = bind_data.quantiles.size();
ListVector::SetListSize(finalize_data.result, entry.offset + entry.length);
}
template <class STATE, class INPUT_TYPE, class RESULT_TYPE>
static void Window(AggregateInputData &aggr_input_data, const WindowPartitionInput &partition,
const_data_ptr_t g_state, data_ptr_t l_state, const SubFrames &frames, Vector &list,
idx_t lidx) {
auto &state = *reinterpret_cast<STATE *>(l_state);
auto gstate = reinterpret_cast<const STATE *>(g_state);
auto &data = state.GetOrCreateWindowCursor(partition);
const auto &fmask = partition.filter_mask;
D_ASSERT(aggr_input_data.bind_data);
auto &bind_data = aggr_input_data.bind_data->Cast<QuantileBindData>();
QuantileIncluded<INPUT_TYPE> included(fmask, data);
const auto n = FrameSize(included, frames);
// Result is a constant LIST<RESULT_TYPE> with a fixed length
if (!n) {
auto &lmask = FlatVector::Validity(list);
lmask.Set(lidx, false);
return;
}
if (gstate && gstate->HasTree()) {
gstate->GetWindowState().template WindowList<CHILD_TYPE, DISCRETE>(data, frames, n, list, lidx, bind_data);
} else {
auto &window_state = state.GetOrCreateWindowState();
window_state.UpdateSkip(data, frames, included);
window_state.template WindowList<CHILD_TYPE, DISCRETE>(data, frames, n, list, lidx, bind_data);
window_state.prevs = frames;
}
}
};
struct QuantileListFallback : QuantileOperation {
template <class INPUT_TYPE, class STATE, class OP>
static void Execute(STATE &state, const INPUT_TYPE &key, AggregateInputData &input_data) {
state.AddElement(key, input_data);
}
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.v.empty()) {
finalize_data.ReturnNull();
return;
}
D_ASSERT(finalize_data.input.bind_data);
auto &bind_data = finalize_data.input.bind_data->Cast<QuantileBindData>();
auto &result = ListVector::GetEntry(finalize_data.result);
auto ridx = ListVector::GetListSize(finalize_data.result);
ListVector::Reserve(finalize_data.result, ridx + bind_data.quantiles.size());
D_ASSERT(state.v.data());
auto &entry = target;
entry.offset = ridx;
idx_t lower = 0;
for (const auto &q : bind_data.order) {
const auto &quantile = bind_data.quantiles[q];
QuantileInterpolator<true> interp(quantile, state.v.size(), bind_data.desc);
interp.begin = lower;
auto interpolation_result = interp.InterpolateInternal<string_t>(state.v.data());
CreateSortKeyHelpers::DecodeSortKey(interpolation_result, result, ridx + q,
OrderModifiers(OrderType::ASCENDING, OrderByNullType::NULLS_LAST));
lower = interp.FRN;
}
entry.length = bind_data.quantiles.size();
ListVector::SetListSize(finalize_data.result, entry.offset + entry.length);
}
};
//===--------------------------------------------------------------------===//
// Discrete Quantiles
//===--------------------------------------------------------------------===//
template <class OP>
AggregateFunction GetDiscreteQuantileTemplated(const LogicalType &type) {
switch (type.InternalType()) {
#ifndef DUCKDB_SMALLER_BINARY
case PhysicalType::INT8:
return OP::template GetFunction<int8_t>(type);
case PhysicalType::INT16:
return OP::template GetFunction<int16_t>(type);
case PhysicalType::INT32:
return OP::template GetFunction<int32_t>(type);
case PhysicalType::INT64:
return OP::template GetFunction<int64_t>(type);
case PhysicalType::INT128:
return OP::template GetFunction<hugeint_t>(type);
case PhysicalType::FLOAT:
return OP::template GetFunction<float>(type);
case PhysicalType::DOUBLE:
return OP::template GetFunction<double>(type);
case PhysicalType::INTERVAL:
return OP::template GetFunction<interval_t>(type);
case PhysicalType::VARCHAR:
return OP::template GetFunction<string_t, QuantileStringType>(type);
#endif
default:
return OP::GetFallback(type);
}
}
struct ScalarDiscreteQuantile {
template <typename INPUT_TYPE, class TYPE_OP = QuantileStandardType>
static AggregateFunction GetFunction(const LogicalType &type) {
using STATE = QuantileState<INPUT_TYPE, TYPE_OP>;
using OP = QuantileScalarOperation<true>;
auto fun = AggregateFunction::UnaryAggregateDestructor<STATE, INPUT_TYPE, INPUT_TYPE, OP,
AggregateDestructorType::LEGACY>(type, type);
#ifndef DUCKDB_SMALLER_BINARY
fun.window = OP::Window<STATE, INPUT_TYPE, INPUT_TYPE>;
fun.window_init = OP::WindowInit<STATE, INPUT_TYPE>;
#endif
return fun;
}
static AggregateFunction GetFallback(const LogicalType &type) {
using STATE = QuantileState<string_t, QuantileStringType>;
using OP = QuantileScalarFallback;
AggregateFunction fun({type}, type, AggregateFunction::StateSize<STATE>,
AggregateFunction::StateInitialize<STATE, OP, AggregateDestructorType::LEGACY>,
AggregateSortKeyHelpers::UnaryUpdate<STATE, OP>,
AggregateFunction::StateCombine<STATE, OP>,
AggregateFunction::StateVoidFinalize<STATE, OP>, nullptr, nullptr,
AggregateFunction::StateDestroy<STATE, OP>);
return fun;
}
};
template <class STATE, class INPUT_TYPE, class RESULT_TYPE, class OP>
static AggregateFunction QuantileListAggregate(const LogicalType &input_type, const LogicalType &child_type) { // NOLINT
LogicalType result_type = LogicalType::LIST(child_type);
return AggregateFunction(
{input_type}, result_type, AggregateFunction::StateSize<STATE>,
AggregateFunction::StateInitialize<STATE, OP, AggregateDestructorType::LEGACY>,
AggregateFunction::UnaryScatterUpdate<STATE, INPUT_TYPE, OP>, AggregateFunction::StateCombine<STATE, OP>,
AggregateFunction::StateFinalize<STATE, RESULT_TYPE, OP>, AggregateFunction::UnaryUpdate<STATE, INPUT_TYPE, OP>,
nullptr, AggregateFunction::StateDestroy<STATE, OP>);
}
struct ListDiscreteQuantile {
template <typename INPUT_TYPE, class TYPE_OP = QuantileStandardType>
static AggregateFunction GetFunction(const LogicalType &type) {
using STATE = QuantileState<INPUT_TYPE, TYPE_OP>;
using OP = QuantileListOperation<INPUT_TYPE, true>;
auto fun = QuantileListAggregate<STATE, INPUT_TYPE, list_entry_t, OP>(type, type);
fun.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
#ifndef DUCKDB_SMALLER_BINARY
fun.window = OP::template Window<STATE, INPUT_TYPE, list_entry_t>;
fun.window_init = OP::template WindowInit<STATE, INPUT_TYPE>;
#endif
return fun;
}
static AggregateFunction GetFallback(const LogicalType &type) {
using STATE = QuantileState<string_t, QuantileStringType>;
using OP = QuantileListFallback;
AggregateFunction fun({type}, LogicalType::LIST(type), AggregateFunction::StateSize<STATE>,
AggregateFunction::StateInitialize<STATE, OP, AggregateDestructorType::LEGACY>,
AggregateSortKeyHelpers::UnaryUpdate<STATE, OP>,
AggregateFunction::StateCombine<STATE, OP>,
AggregateFunction::StateFinalize<STATE, list_entry_t, OP>, nullptr, nullptr,
AggregateFunction::StateDestroy<STATE, OP>);
return fun;
}
};
AggregateFunction GetDiscreteQuantile(const LogicalType &type) {
return GetDiscreteQuantileTemplated<ScalarDiscreteQuantile>(type);
}
AggregateFunction GetDiscreteQuantileList(const LogicalType &type) {
return GetDiscreteQuantileTemplated<ListDiscreteQuantile>(type);
}
//===--------------------------------------------------------------------===//
// Continuous Quantiles
//===--------------------------------------------------------------------===//
template <class OP>
AggregateFunction GetContinuousQuantileTemplated(const LogicalType &type) {
switch (type.id()) {
case LogicalTypeId::TINYINT:
return OP::template GetFunction<int8_t, double>(type, LogicalType::DOUBLE);
case LogicalTypeId::SMALLINT:
return OP::template GetFunction<int16_t, double>(type, LogicalType::DOUBLE);
case LogicalTypeId::SQLNULL:
case LogicalTypeId::INTEGER:
return OP::template GetFunction<int32_t, double>(type, LogicalType::DOUBLE);
case LogicalTypeId::BIGINT:
return OP::template GetFunction<int64_t, double>(type, LogicalType::DOUBLE);
case LogicalTypeId::HUGEINT:
return OP::template GetFunction<hugeint_t, double>(type, LogicalType::DOUBLE);
case LogicalTypeId::FLOAT:
return OP::template GetFunction<float, float>(type, type);
case LogicalTypeId::UTINYINT:
case LogicalTypeId::USMALLINT:
case LogicalTypeId::UINTEGER:
case LogicalTypeId::UBIGINT:
case LogicalTypeId::UHUGEINT:
case LogicalTypeId::DOUBLE:
return OP::template GetFunction<double, double>(LogicalType::DOUBLE, LogicalType::DOUBLE);
case LogicalTypeId::DECIMAL:
switch (type.InternalType()) {
case PhysicalType::INT16:
return OP::template GetFunction<int16_t, int16_t>(type, type);
case PhysicalType::INT32:
return OP::template GetFunction<int32_t, int32_t>(type, type);
case PhysicalType::INT64:
return OP::template GetFunction<int64_t, int64_t>(type, type);
case PhysicalType::INT128:
return OP::template GetFunction<hugeint_t, hugeint_t>(type, type);
default:
throw NotImplementedException("Unimplemented continuous quantile DECIMAL aggregate");
}
case LogicalTypeId::DATE:
return OP::template GetFunction<date_t, timestamp_t>(type, LogicalType::TIMESTAMP);
case LogicalTypeId::TIMESTAMP:
case LogicalTypeId::TIMESTAMP_TZ:
case LogicalTypeId::TIMESTAMP_SEC:
case LogicalTypeId::TIMESTAMP_MS:
case LogicalTypeId::TIMESTAMP_NS:
return OP::template GetFunction<timestamp_t, timestamp_t>(type, type);
case LogicalTypeId::TIME:
case LogicalTypeId::TIME_TZ:
return OP::template GetFunction<dtime_t, dtime_t>(type, type);
default:
throw NotImplementedException("Unimplemented continuous quantile aggregate");
}
}
struct ScalarContinuousQuantile {
template <typename INPUT_TYPE, typename TARGET_TYPE>
static AggregateFunction GetFunction(const LogicalType &input_type, const LogicalType &target_type) {
using STATE = QuantileState<INPUT_TYPE, QuantileStandardType>;
using OP = QuantileScalarOperation<false>;
auto fun =
AggregateFunction::UnaryAggregateDestructor<STATE, INPUT_TYPE, TARGET_TYPE, OP,
AggregateDestructorType::LEGACY>(input_type, target_type);
fun.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
#ifndef DUCKDB_SMALLER_BINARY
fun.window = OP::template Window<STATE, INPUT_TYPE, TARGET_TYPE>;
fun.window_init = OP::template WindowInit<STATE, INPUT_TYPE>;
#endif
return fun;
}
};
struct ListContinuousQuantile {
template <typename INPUT_TYPE, typename TARGET_TYPE>
static AggregateFunction GetFunction(const LogicalType &input_type, const LogicalType &target_type) {
using STATE = QuantileState<INPUT_TYPE, QuantileStandardType>;
using OP = QuantileListOperation<TARGET_TYPE, false>;
auto fun = QuantileListAggregate<STATE, INPUT_TYPE, list_entry_t, OP>(input_type, target_type);
fun.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
#ifndef DUCKDB_SMALLER_BINARY
fun.window = OP::template Window<STATE, INPUT_TYPE, list_entry_t>;
fun.window_init = OP::template WindowInit<STATE, INPUT_TYPE>;
#endif
return fun;
}
};
AggregateFunction GetContinuousQuantile(const LogicalType &type) {
return GetContinuousQuantileTemplated<ScalarContinuousQuantile>(type);
}
AggregateFunction GetContinuousQuantileList(const LogicalType &type) {
return GetContinuousQuantileTemplated<ListContinuousQuantile>(type);
}
//===--------------------------------------------------------------------===//
// Quantile binding
//===--------------------------------------------------------------------===//
static Value CheckQuantile(const Value &quantile_val) {
if (quantile_val.IsNull()) {
throw BinderException("QUANTILE parameter cannot be NULL");
}
auto quantile = quantile_val.GetValue<double>();
if (quantile < -1 || quantile > 1) {
throw BinderException("QUANTILE can only take parameters in the range [-1, 1]");
}
if (Value::IsNan(quantile)) {
throw BinderException("QUANTILE parameter cannot be NaN");
}
return quantile_val;
}
unique_ptr<FunctionData> BindQuantile(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
if (arguments.size() < 2) {
throw BinderException("QUANTILE requires a range argument between [0, 1]");
}
if (arguments[1]->HasParameter()) {
throw ParameterNotResolvedException();
}
if (!arguments[1]->IsFoldable()) {
throw BinderException("QUANTILE can only take constant parameters");
}
Value quantile_val = ExpressionExecutor::EvaluateScalar(context, *arguments[1]);
if (quantile_val.IsNull()) {
throw BinderException("QUANTILE argument must not be NULL");
}
vector<Value> quantiles;
switch (quantile_val.type().id()) {
case LogicalTypeId::LIST:
for (const auto &element_val : ListValue::GetChildren(quantile_val)) {
quantiles.push_back(CheckQuantile(element_val));
}
break;
case LogicalTypeId::ARRAY:
for (const auto &element_val : ArrayValue::GetChildren(quantile_val)) {
quantiles.push_back(CheckQuantile(element_val));
}
break;
default:
quantiles.push_back(CheckQuantile(quantile_val));
break;
}
Function::EraseArgument(function, arguments, arguments.size() - 1);
return make_uniq<QuantileBindData>(quantiles);
}
//===--------------------------------------------------------------------===//
// Function definitions
//===--------------------------------------------------------------------===//
static bool CanInterpolate(const LogicalType &type) {
if (type.HasAlias()) {
return false;
}
switch (type.id()) {
case LogicalTypeId::DECIMAL:
case LogicalTypeId::SQLNULL:
case LogicalTypeId::TINYINT:
case LogicalTypeId::SMALLINT:
case LogicalTypeId::INTEGER:
case LogicalTypeId::UTINYINT:
case LogicalTypeId::USMALLINT:
case LogicalTypeId::UINTEGER:
case LogicalTypeId::UBIGINT:
case LogicalTypeId::BIGINT:
case LogicalTypeId::UHUGEINT:
case LogicalTypeId::HUGEINT:
case LogicalTypeId::FLOAT:
case LogicalTypeId::DOUBLE:
case LogicalTypeId::DATE:
case LogicalTypeId::TIMESTAMP:
case LogicalTypeId::TIMESTAMP_TZ:
case LogicalTypeId::TIMESTAMP_SEC:
case LogicalTypeId::TIMESTAMP_MS:
case LogicalTypeId::TIMESTAMP_NS:
case LogicalTypeId::TIME:
case LogicalTypeId::TIME_TZ:
return true;
default:
return false;
}
}
struct MedianFunction {
static AggregateFunction GetAggregate(const LogicalType &type) {
auto fun = CanInterpolate(type) ? GetContinuousQuantile(type) : GetDiscreteQuantile(type);
fun.name = "median";
fun.serialize = QuantileBindData::Serialize;
fun.deserialize = Deserialize;
return fun;
}
static unique_ptr<FunctionData> Deserialize(Deserializer &deserializer, AggregateFunction &function) {
auto bind_data = QuantileBindData::Deserialize(deserializer, function);
auto &input_type = function.arguments[0];
function = GetAggregate(input_type);
return bind_data;
}
static unique_ptr<FunctionData> Bind(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
function = GetAggregate(arguments[0]->return_type);
return make_uniq<QuantileBindData>(Value::DECIMAL(int16_t(5), 2, 1));
}
};
struct DiscreteQuantileListFunction {
static AggregateFunction GetAggregate(const LogicalType &type) {
auto fun = GetDiscreteQuantileList(type);
fun.name = "quantile_disc";
fun.bind = Bind;
fun.serialize = QuantileBindData::Serialize;
fun.deserialize = Deserialize;
// temporarily push an argument so we can bind the actual quantile
fun.arguments.emplace_back(LogicalType::LIST(LogicalType::DOUBLE));
fun.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
return fun;
}
static unique_ptr<FunctionData> Deserialize(Deserializer &deserializer, AggregateFunction &function) {
auto bind_data = QuantileBindData::Deserialize(deserializer, function);
auto &input_type = function.arguments[0];
function = GetAggregate(input_type);
return bind_data;
}
static unique_ptr<FunctionData> Bind(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
function = GetAggregate(arguments[0]->return_type);
return BindQuantile(context, function, arguments);
}
};
struct DiscreteQuantileFunction {
static AggregateFunction GetAggregate(const LogicalType &type) {
auto fun = GetDiscreteQuantile(type);
fun.name = "quantile_disc";
fun.bind = Bind;
fun.serialize = QuantileBindData::Serialize;
fun.deserialize = Deserialize;
// temporarily push an argument so we can bind the actual quantile
fun.arguments.emplace_back(LogicalType::DOUBLE);
fun.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
return fun;
}
static unique_ptr<FunctionData> Deserialize(Deserializer &deserializer, AggregateFunction &function) {
auto bind_data = QuantileBindData::Deserialize(deserializer, function);
auto &quantile_data = bind_data->Cast<QuantileBindData>();
auto &input_type = function.arguments[0];
if (quantile_data.quantiles.size() == 1) {
function = GetAggregate(input_type);
} else {
function = DiscreteQuantileListFunction::GetAggregate(input_type);
}
return bind_data;
}
static unique_ptr<FunctionData> Bind(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
function = GetAggregate(arguments[0]->return_type);
return BindQuantile(context, function, arguments);
}
};
struct ContinuousQuantileFunction {
static AggregateFunction GetAggregate(const LogicalType &type) {
auto fun = GetContinuousQuantile(type);
fun.name = "quantile_cont";
fun.bind = Bind;
fun.serialize = QuantileBindData::Serialize;
fun.deserialize = Deserialize;
// temporarily push an argument so we can bind the actual quantile
fun.arguments.emplace_back(LogicalType::DOUBLE);
fun.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
return fun;
}
static unique_ptr<FunctionData> Deserialize(Deserializer &deserializer, AggregateFunction &function) {
auto bind_data = QuantileBindData::Deserialize(deserializer, function);
auto &input_type = function.arguments[0];
function = GetAggregate(input_type);
return bind_data;
}
static unique_ptr<FunctionData> Bind(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
function = GetAggregate(function.arguments[0].id() == LogicalTypeId::DECIMAL ? arguments[0]->return_type
: function.arguments[0]);
return BindQuantile(context, function, arguments);
}
};
struct ContinuousQuantileListFunction {
static AggregateFunction GetAggregate(const LogicalType &type) {
auto fun = GetContinuousQuantileList(type);
fun.name = "quantile_cont";
fun.bind = Bind;
fun.serialize = QuantileBindData::Serialize;
fun.deserialize = Deserialize;
// temporarily push an argument so we can bind the actual quantile
auto list_of_double = LogicalType::LIST(LogicalType::DOUBLE);
fun.arguments.push_back(list_of_double);
fun.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
return fun;
}
static unique_ptr<FunctionData> Deserialize(Deserializer &deserializer, AggregateFunction &function) {
auto bind_data = QuantileBindData::Deserialize(deserializer, function);
auto &input_type = function.arguments[0];
function = GetAggregate(input_type);
return bind_data;
}
static unique_ptr<FunctionData> Bind(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
function = GetAggregate(function.arguments[0].id() == LogicalTypeId::DECIMAL ? arguments[0]->return_type
: function.arguments[0]);
return BindQuantile(context, function, arguments);
}
};
template <class OP>
AggregateFunction EmptyQuantileFunction(LogicalType input, const LogicalType &result, const LogicalType &extra_arg) {
AggregateFunction fun({std::move(input)}, std::move(result), nullptr, nullptr, nullptr, nullptr, nullptr, nullptr,
OP::Bind);
if (extra_arg.id() != LogicalTypeId::INVALID) {
fun.arguments.push_back(extra_arg);
}
fun.serialize = QuantileBindData::Serialize;
fun.deserialize = OP::Deserialize;
fun.order_dependent = AggregateOrderDependent::NOT_ORDER_DEPENDENT;
return fun;
}
AggregateFunctionSet MedianFun::GetFunctions() {
AggregateFunctionSet set("median");
set.AddFunction(EmptyQuantileFunction<MedianFunction>(LogicalType::ANY, LogicalType::ANY, LogicalTypeId::INVALID));
return set;
}
AggregateFunctionSet QuantileDiscFun::GetFunctions() {
AggregateFunctionSet set("quantile_disc");
set.AddFunction(
EmptyQuantileFunction<DiscreteQuantileFunction>(LogicalType::ANY, LogicalType::ANY, LogicalType::DOUBLE));
set.AddFunction(EmptyQuantileFunction<DiscreteQuantileListFunction>(LogicalType::ANY, LogicalType::ANY,
LogicalType::LIST(LogicalType::DOUBLE)));
// this function is here for deserialization - it cannot be called by users
set.AddFunction(
EmptyQuantileFunction<DiscreteQuantileFunction>(LogicalType::ANY, LogicalType::ANY, LogicalType::INVALID));
return set;
}
vector<LogicalType> GetContinuousQuantileTypes() {
return {LogicalType::TINYINT, LogicalType::SMALLINT, LogicalType::INTEGER, LogicalType::BIGINT,
LogicalType::HUGEINT, LogicalType::FLOAT, LogicalType::DOUBLE, LogicalType::DATE,
LogicalType::TIMESTAMP, LogicalType::TIME, LogicalType::TIMESTAMP_TZ, LogicalType::TIME_TZ};
}
AggregateFunctionSet QuantileContFun::GetFunctions() {
AggregateFunctionSet quantile_cont("quantile_cont");
quantile_cont.AddFunction(EmptyQuantileFunction<ContinuousQuantileFunction>(
LogicalTypeId::DECIMAL, LogicalTypeId::DECIMAL, LogicalType::DOUBLE));
quantile_cont.AddFunction(EmptyQuantileFunction<ContinuousQuantileListFunction>(
LogicalTypeId::DECIMAL, LogicalTypeId::DECIMAL, LogicalType::LIST(LogicalType::DOUBLE)));
for (const auto &type : GetContinuousQuantileTypes()) {
quantile_cont.AddFunction(EmptyQuantileFunction<ContinuousQuantileFunction>(type, type, LogicalType::DOUBLE));
quantile_cont.AddFunction(
EmptyQuantileFunction<ContinuousQuantileListFunction>(type, type, LogicalType::LIST(LogicalType::DOUBLE)));
}
return quantile_cont;
}
} // namespace duckdb

View File

@@ -0,0 +1,443 @@
#include "duckdb/execution/expression_executor.hpp"
#include "duckdb/execution/reservoir_sample.hpp"
#include "core_functions/aggregate/holistic_functions.hpp"
#include "duckdb/planner/expression.hpp"
#include "duckdb/common/queue.hpp"
#include "duckdb/common/serializer/serializer.hpp"
#include "duckdb/common/serializer/deserializer.hpp"
#include <algorithm>
#include <stdlib.h>
namespace duckdb {
namespace {
template <typename T>
struct ReservoirQuantileState {
T *v;
idx_t len;
idx_t pos;
BaseReservoirSampling *r_samp;
void Resize(idx_t new_len) {
if (new_len <= len) {
return;
}
T *old_v = v;
v = (T *)realloc(v, new_len * sizeof(T));
if (!v) {
free(old_v);
throw InternalException("Memory allocation failure");
}
len = new_len;
}
void ReplaceElement(T &input) {
v[r_samp->min_weighted_entry_index] = input;
r_samp->ReplaceElement();
}
void FillReservoir(idx_t sample_size, T element) {
if (pos < sample_size) {
v[pos++] = element;
r_samp->InitializeReservoirWeights(pos, len);
} else {
D_ASSERT(r_samp->next_index_to_sample >= r_samp->num_entries_to_skip_b4_next_sample);
if (r_samp->next_index_to_sample == r_samp->num_entries_to_skip_b4_next_sample) {
ReplaceElement(element);
}
}
}
};
struct ReservoirQuantileBindData : public FunctionData {
ReservoirQuantileBindData() {
}
ReservoirQuantileBindData(double quantile_p, idx_t sample_size_p)
: quantiles(1, quantile_p), sample_size(sample_size_p) {
}
ReservoirQuantileBindData(vector<double> quantiles_p, idx_t sample_size_p)
: quantiles(std::move(quantiles_p)), sample_size(sample_size_p) {
}
unique_ptr<FunctionData> Copy() const override {
return make_uniq<ReservoirQuantileBindData>(quantiles, sample_size);
}
bool Equals(const FunctionData &other_p) const override {
auto &other = other_p.Cast<ReservoirQuantileBindData>();
return quantiles == other.quantiles && sample_size == other.sample_size;
}
static void Serialize(Serializer &serializer, const optional_ptr<FunctionData> bind_data_p,
const AggregateFunction &function) {
auto &bind_data = bind_data_p->Cast<ReservoirQuantileBindData>();
serializer.WriteProperty(100, "quantiles", bind_data.quantiles);
serializer.WriteProperty(101, "sample_size", bind_data.sample_size);
}
static unique_ptr<FunctionData> Deserialize(Deserializer &deserializer, AggregateFunction &function) {
auto result = make_uniq<ReservoirQuantileBindData>();
deserializer.ReadProperty(100, "quantiles", result->quantiles);
deserializer.ReadProperty(101, "sample_size", result->sample_size);
return std::move(result);
}
vector<double> quantiles;
idx_t sample_size;
};
struct ReservoirQuantileOperation {
template <class STATE>
static void Initialize(STATE &state) {
state.v = nullptr;
state.len = 0;
state.pos = 0;
state.r_samp = nullptr;
}
template <class INPUT_TYPE, class STATE, class OP>
static void ConstantOperation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input,
idx_t count) {
for (idx_t i = 0; i < count; i++) {
Operation<INPUT_TYPE, STATE, OP>(state, input, unary_input);
}
}
template <class INPUT_TYPE, class STATE, class OP>
static void Operation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input) {
auto &bind_data = unary_input.input.bind_data->template Cast<ReservoirQuantileBindData>();
if (state.pos == 0) {
state.Resize(bind_data.sample_size);
}
if (!state.r_samp) {
state.r_samp = new BaseReservoirSampling();
}
D_ASSERT(state.v);
state.FillReservoir(bind_data.sample_size, input);
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &) {
if (source.pos == 0) {
return;
}
if (target.pos == 0) {
target.Resize(source.len);
}
if (!target.r_samp) {
target.r_samp = new BaseReservoirSampling();
}
for (idx_t src_idx = 0; src_idx < source.pos; src_idx++) {
target.FillReservoir(target.len, source.v[src_idx]);
}
}
template <class STATE>
static void Destroy(STATE &state, AggregateInputData &aggr_input_data) {
if (state.v) {
free(state.v);
state.v = nullptr;
}
if (state.r_samp) {
delete state.r_samp;
state.r_samp = nullptr;
}
}
static bool IgnoreNull() {
return true;
}
};
struct ReservoirQuantileScalarOperation : public ReservoirQuantileOperation {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.pos == 0) {
finalize_data.ReturnNull();
return;
}
D_ASSERT(state.v);
D_ASSERT(finalize_data.input.bind_data);
auto &bind_data = finalize_data.input.bind_data->template Cast<ReservoirQuantileBindData>();
auto v_t = state.v;
D_ASSERT(bind_data.quantiles.size() == 1);
auto offset = (idx_t)((double)(state.pos - 1) * bind_data.quantiles[0]);
std::nth_element(v_t, v_t + offset, v_t + state.pos);
target = v_t[offset];
}
};
AggregateFunction GetReservoirQuantileAggregateFunction(PhysicalType type) {
switch (type) {
case PhysicalType::INT8:
return AggregateFunction::UnaryAggregateDestructor<ReservoirQuantileState<int8_t>, int8_t, int8_t,
ReservoirQuantileScalarOperation>(LogicalType::TINYINT,
LogicalType::TINYINT);
case PhysicalType::INT16:
return AggregateFunction::UnaryAggregateDestructor<ReservoirQuantileState<int16_t>, int16_t, int16_t,
ReservoirQuantileScalarOperation>(LogicalType::SMALLINT,
LogicalType::SMALLINT);
case PhysicalType::INT32:
return AggregateFunction::UnaryAggregateDestructor<ReservoirQuantileState<int32_t>, int32_t, int32_t,
ReservoirQuantileScalarOperation>(LogicalType::INTEGER,
LogicalType::INTEGER);
case PhysicalType::INT64:
return AggregateFunction::UnaryAggregateDestructor<ReservoirQuantileState<int64_t>, int64_t, int64_t,
ReservoirQuantileScalarOperation>(LogicalType::BIGINT,
LogicalType::BIGINT);
case PhysicalType::INT128:
return AggregateFunction::UnaryAggregateDestructor<ReservoirQuantileState<hugeint_t>, hugeint_t, hugeint_t,
ReservoirQuantileScalarOperation>(LogicalType::HUGEINT,
LogicalType::HUGEINT);
case PhysicalType::FLOAT:
return AggregateFunction::UnaryAggregateDestructor<ReservoirQuantileState<float>, float, float,
ReservoirQuantileScalarOperation>(LogicalType::FLOAT,
LogicalType::FLOAT);
case PhysicalType::DOUBLE:
return AggregateFunction::UnaryAggregateDestructor<ReservoirQuantileState<double>, double, double,
ReservoirQuantileScalarOperation>(LogicalType::DOUBLE,
LogicalType::DOUBLE);
default:
throw InternalException("Unimplemented reservoir quantile aggregate");
}
}
template <class CHILD_TYPE>
struct ReservoirQuantileListOperation : public ReservoirQuantileOperation {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.pos == 0) {
finalize_data.ReturnNull();
return;
}
D_ASSERT(finalize_data.input.bind_data);
auto &bind_data = finalize_data.input.bind_data->template Cast<ReservoirQuantileBindData>();
auto &result = ListVector::GetEntry(finalize_data.result);
auto ridx = ListVector::GetListSize(finalize_data.result);
ListVector::Reserve(finalize_data.result, ridx + bind_data.quantiles.size());
auto rdata = FlatVector::GetData<CHILD_TYPE>(result);
auto v_t = state.v;
D_ASSERT(v_t);
auto &entry = target;
entry.offset = ridx;
entry.length = bind_data.quantiles.size();
for (size_t q = 0; q < entry.length; ++q) {
const auto &quantile = bind_data.quantiles[q];
auto offset = (idx_t)((double)(state.pos - 1) * quantile);
std::nth_element(v_t, v_t + offset, v_t + state.pos);
rdata[ridx + q] = v_t[offset];
}
ListVector::SetListSize(finalize_data.result, entry.offset + entry.length);
}
};
template <class STATE, class INPUT_TYPE, class RESULT_TYPE, class OP>
AggregateFunction ReservoirQuantileListAggregate(const LogicalType &input_type, const LogicalType &child_type) {
LogicalType result_type = LogicalType::LIST(child_type);
return AggregateFunction(
{input_type}, result_type, AggregateFunction::StateSize<STATE>, AggregateFunction::StateInitialize<STATE, OP>,
AggregateFunction::UnaryScatterUpdate<STATE, INPUT_TYPE, OP>, AggregateFunction::StateCombine<STATE, OP>,
AggregateFunction::StateFinalize<STATE, RESULT_TYPE, OP>, AggregateFunction::UnaryUpdate<STATE, INPUT_TYPE, OP>,
nullptr, AggregateFunction::StateDestroy<STATE, OP>);
}
template <typename INPUT_TYPE, typename SAVE_TYPE>
AggregateFunction GetTypedReservoirQuantileListAggregateFunction(const LogicalType &type) {
using STATE = ReservoirQuantileState<SAVE_TYPE>;
using OP = ReservoirQuantileListOperation<INPUT_TYPE>;
auto fun = ReservoirQuantileListAggregate<STATE, INPUT_TYPE, list_entry_t, OP>(type, type);
return fun;
}
AggregateFunction GetReservoirQuantileListAggregateFunction(const LogicalType &type) {
switch (type.id()) {
case LogicalTypeId::TINYINT:
return GetTypedReservoirQuantileListAggregateFunction<int8_t, int8_t>(type);
case LogicalTypeId::SMALLINT:
return GetTypedReservoirQuantileListAggregateFunction<int16_t, int16_t>(type);
case LogicalTypeId::INTEGER:
return GetTypedReservoirQuantileListAggregateFunction<int32_t, int32_t>(type);
case LogicalTypeId::BIGINT:
return GetTypedReservoirQuantileListAggregateFunction<int64_t, int64_t>(type);
case LogicalTypeId::HUGEINT:
return GetTypedReservoirQuantileListAggregateFunction<hugeint_t, hugeint_t>(type);
case LogicalTypeId::FLOAT:
return GetTypedReservoirQuantileListAggregateFunction<float, float>(type);
case LogicalTypeId::DOUBLE:
return GetTypedReservoirQuantileListAggregateFunction<double, double>(type);
case LogicalTypeId::DECIMAL:
switch (type.InternalType()) {
case PhysicalType::INT16:
return GetTypedReservoirQuantileListAggregateFunction<int16_t, int16_t>(type);
case PhysicalType::INT32:
return GetTypedReservoirQuantileListAggregateFunction<int32_t, int32_t>(type);
case PhysicalType::INT64:
return GetTypedReservoirQuantileListAggregateFunction<int64_t, int64_t>(type);
case PhysicalType::INT128:
return GetTypedReservoirQuantileListAggregateFunction<hugeint_t, hugeint_t>(type);
default:
throw NotImplementedException("Unimplemented reservoir quantile list aggregate");
}
default:
// TODO: Add quantitative temporal types
throw NotImplementedException("Unimplemented reservoir quantile list aggregate");
}
}
double CheckReservoirQuantile(const Value &quantile_val) {
if (quantile_val.IsNull()) {
throw BinderException("RESERVOIR_QUANTILE QUANTILE parameter cannot be NULL");
}
auto quantile = quantile_val.GetValue<double>();
if (quantile < 0 || quantile > 1) {
throw BinderException("RESERVOIR_QUANTILE can only take parameters in the range [0, 1]");
}
return quantile;
}
unique_ptr<FunctionData> BindReservoirQuantile(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
D_ASSERT(arguments.size() >= 2);
if (arguments[1]->HasParameter()) {
throw ParameterNotResolvedException();
}
if (!arguments[1]->IsFoldable()) {
throw BinderException("RESERVOIR_QUANTILE can only take constant quantile parameters");
}
Value quantile_val = ExpressionExecutor::EvaluateScalar(context, *arguments[1]);
vector<double> quantiles;
if (quantile_val.type().id() != LogicalTypeId::LIST) {
quantiles.push_back(CheckReservoirQuantile(quantile_val));
} else {
for (const auto &element_val : ListValue::GetChildren(quantile_val)) {
quantiles.push_back(CheckReservoirQuantile(element_val));
}
}
if (arguments.size() == 2) {
// remove the quantile argument so we can use the unary aggregate
if (function.arguments.size() == 2) {
Function::EraseArgument(function, arguments, arguments.size() - 1);
} else {
arguments.pop_back();
}
return make_uniq<ReservoirQuantileBindData>(quantiles, 8192ULL);
}
if (!arguments[2]->IsFoldable()) {
throw BinderException("RESERVOIR_QUANTILE can only take constant sample size parameters");
}
Value sample_size_val = ExpressionExecutor::EvaluateScalar(context, *arguments[2]);
if (sample_size_val.IsNull()) {
throw BinderException("Size of the RESERVOIR_QUANTILE sample cannot be NULL");
}
auto sample_size = sample_size_val.GetValue<int32_t>();
if (sample_size_val.IsNull() || sample_size <= 0) {
throw BinderException("Size of the RESERVOIR_QUANTILE sample must be bigger than 0");
}
// remove the quantile arguments so we can use the unary aggregate
if (function.arguments.size() == arguments.size()) {
Function::EraseArgument(function, arguments, arguments.size() - 1);
Function::EraseArgument(function, arguments, arguments.size() - 1);
} else {
arguments.pop_back();
arguments.pop_back();
}
return make_uniq<ReservoirQuantileBindData>(quantiles, NumericCast<idx_t>(sample_size));
}
unique_ptr<FunctionData> BindReservoirQuantileDecimal(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
function = GetReservoirQuantileAggregateFunction(arguments[0]->return_type.InternalType());
auto bind_data = BindReservoirQuantile(context, function, arguments);
function.name = "reservoir_quantile";
function.serialize = ReservoirQuantileBindData::Serialize;
function.deserialize = ReservoirQuantileBindData::Deserialize;
return bind_data;
}
AggregateFunction GetReservoirQuantileAggregate(PhysicalType type) {
auto fun = GetReservoirQuantileAggregateFunction(type);
fun.bind = BindReservoirQuantile;
fun.serialize = ReservoirQuantileBindData::Serialize;
fun.deserialize = ReservoirQuantileBindData::Deserialize;
// temporarily push an argument so we can bind the actual quantile
fun.arguments.emplace_back(LogicalType::DOUBLE);
return fun;
}
AggregateFunction GetReservoirQuantileListAggregate(const LogicalType &type) {
auto fun = GetReservoirQuantileListAggregateFunction(type);
fun.bind = BindReservoirQuantile;
fun.serialize = ReservoirQuantileBindData::Serialize;
fun.deserialize = ReservoirQuantileBindData::Deserialize;
// temporarily push an argument so we can bind the actual quantile
auto list_of_double = LogicalType::LIST(LogicalType::DOUBLE);
fun.arguments.push_back(list_of_double);
return fun;
}
void DefineReservoirQuantile(AggregateFunctionSet &set, const LogicalType &type) {
// Four versions: type, scalar/list[, count]
auto fun = GetReservoirQuantileAggregate(type.InternalType());
set.AddFunction(fun);
fun.arguments.emplace_back(LogicalType::INTEGER);
set.AddFunction(fun);
// List variants
fun = GetReservoirQuantileListAggregate(type);
set.AddFunction(fun);
fun.arguments.emplace_back(LogicalType::INTEGER);
set.AddFunction(fun);
}
void GetReservoirQuantileDecimalFunction(AggregateFunctionSet &set, const vector<LogicalType> &arguments,
const LogicalType &return_value) {
AggregateFunction fun(arguments, return_value, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr,
BindReservoirQuantileDecimal);
fun.serialize = ReservoirQuantileBindData::Serialize;
fun.deserialize = ReservoirQuantileBindData::Deserialize;
set.AddFunction(fun);
fun.arguments.emplace_back(LogicalType::INTEGER);
set.AddFunction(fun);
}
} // namespace
AggregateFunctionSet ReservoirQuantileFun::GetFunctions() {
AggregateFunctionSet reservoir_quantile;
// DECIMAL
GetReservoirQuantileDecimalFunction(reservoir_quantile, {LogicalTypeId::DECIMAL, LogicalType::DOUBLE},
LogicalTypeId::DECIMAL);
GetReservoirQuantileDecimalFunction(reservoir_quantile,
{LogicalTypeId::DECIMAL, LogicalType::LIST(LogicalType::DOUBLE)},
LogicalType::LIST(LogicalTypeId::DECIMAL));
DefineReservoirQuantile(reservoir_quantile, LogicalTypeId::TINYINT);
DefineReservoirQuantile(reservoir_quantile, LogicalTypeId::SMALLINT);
DefineReservoirQuantile(reservoir_quantile, LogicalTypeId::INTEGER);
DefineReservoirQuantile(reservoir_quantile, LogicalTypeId::BIGINT);
DefineReservoirQuantile(reservoir_quantile, LogicalTypeId::HUGEINT);
DefineReservoirQuantile(reservoir_quantile, LogicalTypeId::FLOAT);
DefineReservoirQuantile(reservoir_quantile, LogicalTypeId::DOUBLE);
return reservoir_quantile;
}
} // namespace duckdb

View File

@@ -0,0 +1,5 @@
add_library_unity(duckdb_core_functions_nested OBJECT binned_histogram.cpp
list.cpp histogram.cpp)
set(CORE_FUNCTION_FILES
${CORE_FUNCTION_FILES} $<TARGET_OBJECTS:duckdb_core_functions_nested>
PARENT_SCOPE)

View File

@@ -0,0 +1,414 @@
#include "duckdb/function/scalar/nested_functions.hpp"
#include "core_functions/aggregate/nested_functions.hpp"
#include "duckdb/planner/expression/bound_aggregate_expression.hpp"
#include "duckdb/common/types/vector.hpp"
#include "core_functions/aggregate/histogram_helpers.hpp"
#include "core_functions/scalar/generic_functions.hpp"
#include "duckdb/common/vector_operations/vector_operations.hpp"
#include "duckdb/common/algorithm.hpp"
namespace duckdb {
namespace {
template <class T>
struct HistogramBinState {
using TYPE = T;
unsafe_vector<T> *bin_boundaries;
unsafe_vector<idx_t> *counts;
void Initialize() {
bin_boundaries = nullptr;
counts = nullptr;
}
void Destroy() {
if (bin_boundaries) {
delete bin_boundaries;
bin_boundaries = nullptr;
}
if (counts) {
delete counts;
counts = nullptr;
}
}
bool IsSet() {
return bin_boundaries;
}
template <class OP>
void InitializeBins(Vector &bin_vector, idx_t count, idx_t pos, AggregateInputData &aggr_input) {
bin_boundaries = new unsafe_vector<T>();
counts = new unsafe_vector<idx_t>();
UnifiedVectorFormat bin_data;
bin_vector.ToUnifiedFormat(count, bin_data);
auto bin_counts = UnifiedVectorFormat::GetData<list_entry_t>(bin_data);
auto bin_index = bin_data.sel->get_index(pos);
auto bin_list = bin_counts[bin_index];
if (!bin_data.validity.RowIsValid(bin_index)) {
throw BinderException("Histogram bin list cannot be NULL");
}
auto &bin_child = ListVector::GetEntry(bin_vector);
auto bin_count = ListVector::GetListSize(bin_vector);
UnifiedVectorFormat bin_child_data;
auto extra_state = OP::CreateExtraState(bin_count);
OP::PrepareData(bin_child, bin_count, extra_state, bin_child_data);
bin_boundaries->reserve(bin_list.length);
for (idx_t i = 0; i < bin_list.length; i++) {
auto bin_child_idx = bin_child_data.sel->get_index(bin_list.offset + i);
if (!bin_child_data.validity.RowIsValid(bin_child_idx)) {
throw BinderException("Histogram bin entry cannot be NULL");
}
bin_boundaries->push_back(OP::template ExtractValue<T>(bin_child_data, bin_list.offset + i, aggr_input));
}
// sort the bin boundaries
std::sort(bin_boundaries->begin(), bin_boundaries->end());
// ensure there are no duplicate bin boundaries
for (idx_t i = 1; i < bin_boundaries->size(); i++) {
if (Equals::Operation((*bin_boundaries)[i - 1], (*bin_boundaries)[i])) {
bin_boundaries->erase_at(i);
i--;
}
}
counts->resize(bin_list.length + 1);
}
};
struct HistogramBinFunction {
template <class STATE>
static void Initialize(STATE &state) {
state.Initialize();
}
template <class STATE>
static void Destroy(STATE &state, AggregateInputData &aggr_input_data) {
state.Destroy();
}
static bool IgnoreNull() {
return true;
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &input_data) {
if (!source.bin_boundaries) {
// nothing to combine
return;
}
if (!target.bin_boundaries) {
// target does not have bin boundaries - copy everything over
target.bin_boundaries = new unsafe_vector<typename STATE::TYPE>();
target.counts = new unsafe_vector<idx_t>();
*target.bin_boundaries = *source.bin_boundaries;
*target.counts = *source.counts;
} else {
// both source and target have bin boundaries
if (*target.bin_boundaries != *source.bin_boundaries) {
throw NotImplementedException(
"Histogram - cannot combine histograms with different bin boundaries. "
"Bin boundaries must be the same for all histograms within the same group");
}
if (target.counts->size() != source.counts->size()) {
throw InternalException("Histogram combine - bin boundaries are the same but counts are different");
}
D_ASSERT(target.counts->size() == source.counts->size());
for (idx_t bin_idx = 0; bin_idx < target.counts->size(); bin_idx++) {
(*target.counts)[bin_idx] += (*source.counts)[bin_idx];
}
}
}
};
struct HistogramRange {
static constexpr bool EXACT = false;
template <class T>
static idx_t GetBin(T value, const unsafe_vector<T> &bin_boundaries) {
auto entry = std::lower_bound(bin_boundaries.begin(), bin_boundaries.end(), value);
return UnsafeNumericCast<idx_t>(entry - bin_boundaries.begin());
}
};
struct HistogramExact {
static constexpr bool EXACT = true;
template <class T>
static idx_t GetBin(T value, const unsafe_vector<T> &bin_boundaries) {
auto entry = std::lower_bound(bin_boundaries.begin(), bin_boundaries.end(), value);
if (entry == bin_boundaries.end() || !(*entry == value)) {
// entry not found - return last bucket
return bin_boundaries.size();
}
return UnsafeNumericCast<idx_t>(entry - bin_boundaries.begin());
}
};
template <class OP, class T, class HIST>
void HistogramBinUpdateFunction(Vector inputs[], AggregateInputData &aggr_input, idx_t input_count,
Vector &state_vector, idx_t count) {
auto &input = inputs[0];
UnifiedVectorFormat sdata;
state_vector.ToUnifiedFormat(count, sdata);
auto &bin_vector = inputs[1];
auto extra_state = OP::CreateExtraState(count);
UnifiedVectorFormat input_data;
OP::PrepareData(input, count, extra_state, input_data);
auto states = UnifiedVectorFormat::GetData<HistogramBinState<T> *>(sdata);
auto data = UnifiedVectorFormat::GetData<T>(input_data);
for (idx_t i = 0; i < count; i++) {
auto idx = input_data.sel->get_index(i);
if (!input_data.validity.RowIsValid(idx)) {
continue;
}
auto &state = *states[sdata.sel->get_index(i)];
if (!state.IsSet()) {
state.template InitializeBins<OP>(bin_vector, count, i, aggr_input);
}
auto bin_entry = HIST::template GetBin<T>(data[idx], *state.bin_boundaries);
++(*state.counts)[bin_entry];
}
}
bool SupportsOtherBucket(const LogicalType &type) {
if (type.HasAlias()) {
return false;
}
switch (type.id()) {
case LogicalTypeId::TINYINT:
case LogicalTypeId::SMALLINT:
case LogicalTypeId::INTEGER:
case LogicalTypeId::BIGINT:
case LogicalTypeId::HUGEINT:
case LogicalTypeId::FLOAT:
case LogicalTypeId::DOUBLE:
case LogicalTypeId::DECIMAL:
case LogicalTypeId::UTINYINT:
case LogicalTypeId::USMALLINT:
case LogicalTypeId::UINTEGER:
case LogicalTypeId::UBIGINT:
case LogicalTypeId::UHUGEINT:
case LogicalTypeId::TIME:
case LogicalTypeId::TIME_TZ:
case LogicalTypeId::DATE:
case LogicalTypeId::TIMESTAMP:
case LogicalTypeId::TIMESTAMP_TZ:
case LogicalTypeId::TIMESTAMP_SEC:
case LogicalTypeId::TIMESTAMP_MS:
case LogicalTypeId::TIMESTAMP_NS:
case LogicalTypeId::VARCHAR:
case LogicalTypeId::BLOB:
case LogicalTypeId::STRUCT:
case LogicalTypeId::LIST:
return true;
default:
return false;
}
}
Value OtherBucketValue(const LogicalType &type) {
switch (type.id()) {
case LogicalTypeId::TINYINT:
case LogicalTypeId::SMALLINT:
case LogicalTypeId::INTEGER:
case LogicalTypeId::BIGINT:
case LogicalTypeId::HUGEINT:
case LogicalTypeId::DECIMAL:
case LogicalTypeId::UTINYINT:
case LogicalTypeId::USMALLINT:
case LogicalTypeId::UINTEGER:
case LogicalTypeId::UBIGINT:
case LogicalTypeId::UHUGEINT:
case LogicalTypeId::TIME:
case LogicalTypeId::TIME_TZ:
return Value::MaximumValue(type);
case LogicalTypeId::DATE:
case LogicalTypeId::TIMESTAMP:
case LogicalTypeId::TIMESTAMP_TZ:
case LogicalTypeId::TIMESTAMP_SEC:
case LogicalTypeId::TIMESTAMP_MS:
case LogicalTypeId::TIMESTAMP_NS:
case LogicalTypeId::FLOAT:
case LogicalTypeId::DOUBLE:
return Value::Infinity(type);
case LogicalTypeId::VARCHAR:
return Value("");
case LogicalTypeId::BLOB:
return Value::BLOB("");
case LogicalTypeId::STRUCT: {
// for structs we can set all child members to NULL
auto &child_types = StructType::GetChildTypes(type);
child_list_t<Value> child_list;
for (auto &child_type : child_types) {
child_list.push_back(make_pair(child_type.first, Value(child_type.second)));
}
return Value::STRUCT(std::move(child_list));
}
case LogicalTypeId::LIST:
return Value::LIST(ListType::GetChildType(type), vector<Value>());
default:
throw InternalException("Unsupported type for other bucket");
}
}
void IsHistogramOtherBinFunction(DataChunk &args, ExpressionState &state, Vector &result) {
auto &input_type = args.data[0].GetType();
if (!SupportsOtherBucket(input_type)) {
result.Reference(Value::BOOLEAN(false));
return;
}
auto v = OtherBucketValue(input_type);
Vector ref(v);
VectorOperations::NotDistinctFrom(args.data[0], ref, result, args.size());
}
template <class OP, class T>
void HistogramBinFinalizeFunction(Vector &state_vector, AggregateInputData &, Vector &result, idx_t count,
idx_t offset) {
UnifiedVectorFormat sdata;
state_vector.ToUnifiedFormat(count, sdata);
auto states = UnifiedVectorFormat::GetData<HistogramBinState<T> *>(sdata);
auto &mask = FlatVector::Validity(result);
auto old_len = ListVector::GetListSize(result);
idx_t new_entries = 0;
bool supports_other_bucket = SupportsOtherBucket(MapType::KeyType(result.GetType()));
// figure out how much space we need
for (idx_t i = 0; i < count; i++) {
auto &state = *states[sdata.sel->get_index(i)];
if (!state.bin_boundaries) {
continue;
}
new_entries += state.bin_boundaries->size();
if (state.counts->back() > 0 && supports_other_bucket) {
// overflow bucket has entries
new_entries++;
}
}
// reserve space in the list vector
ListVector::Reserve(result, old_len + new_entries);
auto &keys = MapVector::GetKeys(result);
auto &values = MapVector::GetValues(result);
auto list_entries = FlatVector::GetData<list_entry_t>(result);
auto count_entries = FlatVector::GetData<uint64_t>(values);
idx_t current_offset = old_len;
for (idx_t i = 0; i < count; i++) {
const auto rid = i + offset;
auto &state = *states[sdata.sel->get_index(i)];
if (!state.bin_boundaries) {
mask.SetInvalid(rid);
continue;
}
auto &list_entry = list_entries[rid];
list_entry.offset = current_offset;
for (idx_t bin_idx = 0; bin_idx < state.bin_boundaries->size(); bin_idx++) {
OP::template HistogramFinalize<T>((*state.bin_boundaries)[bin_idx], keys, current_offset);
count_entries[current_offset] = (*state.counts)[bin_idx];
current_offset++;
}
if (state.counts->back() > 0 && supports_other_bucket) {
// add overflow bucket ("others")
// set bin boundary to NULL for overflow bucket
keys.SetValue(current_offset, OtherBucketValue(keys.GetType()));
count_entries[current_offset] = state.counts->back();
current_offset++;
}
list_entry.length = current_offset - list_entry.offset;
}
D_ASSERT(current_offset == old_len + new_entries);
ListVector::SetListSize(result, current_offset);
result.Verify(count);
}
template <class OP, class T, class HIST>
AggregateFunction GetHistogramBinFunction(const LogicalType &type) {
using STATE_TYPE = HistogramBinState<T>;
const char *function_name = HIST::EXACT ? "histogram_exact" : "histogram";
auto struct_type = LogicalType::MAP(type, LogicalType::UBIGINT);
return AggregateFunction(
function_name, {type, LogicalType::LIST(type)}, struct_type, AggregateFunction::StateSize<STATE_TYPE>,
AggregateFunction::StateInitialize<STATE_TYPE, HistogramBinFunction>, HistogramBinUpdateFunction<OP, T, HIST>,
AggregateFunction::StateCombine<STATE_TYPE, HistogramBinFunction>, HistogramBinFinalizeFunction<OP, T>, nullptr,
nullptr, AggregateFunction::StateDestroy<STATE_TYPE, HistogramBinFunction>);
}
template <class HIST>
AggregateFunction GetHistogramBinFunction(const LogicalType &type) {
if (type.id() == LogicalTypeId::DECIMAL) {
return GetHistogramBinFunction<HIST>(LogicalType::DOUBLE);
}
switch (type.InternalType()) {
#ifndef DUCKDB_SMALLER_BINARY
case PhysicalType::BOOL:
return GetHistogramBinFunction<HistogramFunctor, bool, HIST>(type);
case PhysicalType::UINT8:
return GetHistogramBinFunction<HistogramFunctor, uint8_t, HIST>(type);
case PhysicalType::UINT16:
return GetHistogramBinFunction<HistogramFunctor, uint16_t, HIST>(type);
case PhysicalType::UINT32:
return GetHistogramBinFunction<HistogramFunctor, uint32_t, HIST>(type);
case PhysicalType::UINT64:
return GetHistogramBinFunction<HistogramFunctor, uint64_t, HIST>(type);
case PhysicalType::INT8:
return GetHistogramBinFunction<HistogramFunctor, int8_t, HIST>(type);
case PhysicalType::INT16:
return GetHistogramBinFunction<HistogramFunctor, int16_t, HIST>(type);
case PhysicalType::INT32:
return GetHistogramBinFunction<HistogramFunctor, int32_t, HIST>(type);
case PhysicalType::INT64:
return GetHistogramBinFunction<HistogramFunctor, int64_t, HIST>(type);
case PhysicalType::FLOAT:
return GetHistogramBinFunction<HistogramFunctor, float, HIST>(type);
case PhysicalType::DOUBLE:
return GetHistogramBinFunction<HistogramFunctor, double, HIST>(type);
case PhysicalType::VARCHAR:
return GetHistogramBinFunction<HistogramStringFunctor, string_t, HIST>(type);
#endif
default:
return GetHistogramBinFunction<HistogramGenericFunctor, string_t, HIST>(type);
}
}
template <class HIST>
unique_ptr<FunctionData> HistogramBinBindFunction(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
for (auto &arg : arguments) {
if (arg->return_type.id() == LogicalTypeId::UNKNOWN) {
throw ParameterNotResolvedException();
}
}
function = GetHistogramBinFunction<HIST>(arguments[0]->return_type);
return nullptr;
}
} // namespace
AggregateFunction HistogramFun::BinnedHistogramFunction() {
return AggregateFunction("histogram", {LogicalType::ANY, LogicalType::LIST(LogicalType::ANY)}, LogicalTypeId::MAP,
nullptr, nullptr, nullptr, nullptr, nullptr, nullptr,
HistogramBinBindFunction<HistogramRange>, nullptr);
}
AggregateFunction HistogramExactFun::GetFunction() {
return AggregateFunction("histogram_exact", {LogicalType::ANY, LogicalType::LIST(LogicalType::ANY)},
LogicalTypeId::MAP, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr,
HistogramBinBindFunction<HistogramExact>, nullptr);
}
ScalarFunction IsHistogramOtherBinFun::GetFunction() {
return ScalarFunction("is_histogram_other_bin", {LogicalType::ANY}, LogicalType::BOOLEAN,
IsHistogramOtherBinFunction);
}
} // namespace duckdb

View File

@@ -0,0 +1,25 @@
[
{
"name": "histogram",
"parameters": "arg",
"description": "Returns a LIST of STRUCTs with the fields bucket and count.",
"example": "histogram(A)",
"type": "aggregate_function_set",
"extra_functions": ["static AggregateFunction GetHistogramUnorderedMap(LogicalType &type);", "static AggregateFunction BinnedHistogramFunction();"]
},
{
"name": "histogram_exact",
"parameters": "arg,bins",
"description": "Returns a LIST of STRUCTs with the fields bucket and count matching the buckets exactly.",
"example": "histogram_exact(A, [0, 1, 2])",
"type": "aggregate_function"
},
{
"name": "list",
"parameters": "arg",
"description": "Returns a LIST containing all the values of a column.",
"example": "list(A)",
"type": "aggregate_function",
"aliases": ["array_agg"]
}
]

View File

@@ -0,0 +1,238 @@
#include "duckdb/function/scalar/nested_functions.hpp"
#include "core_functions/aggregate/nested_functions.hpp"
#include "duckdb/common/types/vector.hpp"
#include "duckdb/common/string_map_set.hpp"
#include "core_functions/aggregate/histogram_helpers.hpp"
#include "duckdb/common/owning_string_map.hpp"
namespace duckdb {
namespace {
template <class MAP_TYPE>
struct HistogramFunction {
template <class STATE>
static void Initialize(STATE &state) {
state.hist = nullptr;
}
template <class STATE>
static void Destroy(STATE &state, AggregateInputData &) {
if (state.hist) {
delete state.hist;
}
}
static bool IgnoreNull() {
return true;
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &input_data) {
if (!source.hist) {
return;
}
if (!target.hist) {
target.hist = MAP_TYPE::CreateEmpty(input_data.allocator);
}
for (auto &entry : *source.hist) {
(*target.hist)[entry.first] += entry.second;
}
}
};
template <class TYPE>
struct DefaultMapType {
using MAP_TYPE = TYPE;
static TYPE *CreateEmpty(ArenaAllocator &) {
return new TYPE();
}
};
template <class TYPE>
struct StringMapType {
using MAP_TYPE = TYPE;
static TYPE *CreateEmpty(ArenaAllocator &allocator) {
return new TYPE(allocator);
}
};
template <class OP, class T, class MAP_TYPE>
void HistogramUpdateFunction(Vector inputs[], AggregateInputData &aggr_input, idx_t input_count, Vector &state_vector,
idx_t count) {
D_ASSERT(input_count == 1);
auto &input = inputs[0];
UnifiedVectorFormat sdata;
state_vector.ToUnifiedFormat(count, sdata);
auto extra_state = OP::CreateExtraState(count);
UnifiedVectorFormat input_data;
OP::PrepareData(input, count, extra_state, input_data);
auto states = UnifiedVectorFormat::GetData<HistogramAggState<T, typename MAP_TYPE::MAP_TYPE> *>(sdata);
auto input_values = UnifiedVectorFormat::GetData<T>(input_data);
for (idx_t i = 0; i < count; i++) {
auto idx = input_data.sel->get_index(i);
if (!input_data.validity.RowIsValid(idx)) {
continue;
}
auto &state = *states[sdata.sel->get_index(i)];
if (!state.hist) {
state.hist = MAP_TYPE::CreateEmpty(aggr_input.allocator);
}
auto &input_value = input_values[idx];
++(*state.hist)[input_value];
}
}
template <class OP, class T, class MAP_TYPE>
void HistogramFinalizeFunction(Vector &state_vector, AggregateInputData &, Vector &result, idx_t count, idx_t offset) {
using HIST_STATE = HistogramAggState<T, typename MAP_TYPE::MAP_TYPE>;
UnifiedVectorFormat sdata;
state_vector.ToUnifiedFormat(count, sdata);
auto states = UnifiedVectorFormat::GetData<HIST_STATE *>(sdata);
auto &mask = FlatVector::Validity(result);
auto old_len = ListVector::GetListSize(result);
idx_t new_entries = 0;
// figure out how much space we need
for (idx_t i = 0; i < count; i++) {
auto &state = *states[sdata.sel->get_index(i)];
if (!state.hist) {
continue;
}
new_entries += state.hist->size();
}
// reserve space in the list vector
ListVector::Reserve(result, old_len + new_entries);
auto &keys = MapVector::GetKeys(result);
auto &values = MapVector::GetValues(result);
auto list_entries = FlatVector::GetData<list_entry_t>(result);
auto count_entries = FlatVector::GetData<uint64_t>(values);
idx_t current_offset = old_len;
for (idx_t i = 0; i < count; i++) {
const auto rid = i + offset;
auto &state = *states[sdata.sel->get_index(i)];
if (!state.hist) {
mask.SetInvalid(rid);
continue;
}
auto &list_entry = list_entries[rid];
list_entry.offset = current_offset;
for (auto &entry : *state.hist) {
OP::template HistogramFinalize<T>(entry.first, keys, current_offset);
count_entries[current_offset] = entry.second;
current_offset++;
}
list_entry.length = current_offset - list_entry.offset;
}
D_ASSERT(current_offset == old_len + new_entries);
ListVector::SetListSize(result, current_offset);
result.Verify(count);
}
template <class OP, class T, class MAP_TYPE>
AggregateFunction GetHistogramFunction(const LogicalType &type) {
using STATE_TYPE = HistogramAggState<T, typename MAP_TYPE::MAP_TYPE>;
using HIST_FUNC = HistogramFunction<MAP_TYPE>;
auto struct_type = LogicalType::MAP(type, LogicalType::UBIGINT);
return AggregateFunction(
"histogram", {type}, struct_type, AggregateFunction::StateSize<STATE_TYPE>,
AggregateFunction::StateInitialize<STATE_TYPE, HIST_FUNC>, HistogramUpdateFunction<OP, T, MAP_TYPE>,
AggregateFunction::StateCombine<STATE_TYPE, HIST_FUNC>, HistogramFinalizeFunction<OP, T, MAP_TYPE>, nullptr,
nullptr, AggregateFunction::StateDestroy<STATE_TYPE, HIST_FUNC>);
}
template <class OP, class T, class MAP_TYPE>
AggregateFunction GetMapTypeInternal(const LogicalType &type) {
return GetHistogramFunction<OP, T, MAP_TYPE>(type);
}
template <class OP, class T, bool IS_ORDERED>
AggregateFunction GetMapType(const LogicalType &type) {
if (IS_ORDERED) {
return GetMapTypeInternal<OP, T, DefaultMapType<map<T, idx_t>>>(type);
}
return GetMapTypeInternal<OP, T, DefaultMapType<unordered_map<T, idx_t>>>(type);
}
template <class OP, bool IS_ORDERED>
AggregateFunction GetStringMapType(const LogicalType &type) {
if (IS_ORDERED) {
return GetMapTypeInternal<OP, string_t, StringMapType<OrderedOwningStringMap<idx_t>>>(type);
} else {
return GetMapTypeInternal<OP, string_t, StringMapType<OwningStringMap<idx_t>>>(type);
}
}
template <bool IS_ORDERED = true>
AggregateFunction GetHistogramFunction(const LogicalType &type) {
switch (type.InternalType()) {
#ifndef DUCKDB_SMALLER_BINARY
case PhysicalType::BOOL:
return GetMapType<HistogramFunctor, bool, IS_ORDERED>(type);
case PhysicalType::UINT8:
return GetMapType<HistogramFunctor, uint8_t, IS_ORDERED>(type);
case PhysicalType::UINT16:
return GetMapType<HistogramFunctor, uint16_t, IS_ORDERED>(type);
case PhysicalType::UINT32:
return GetMapType<HistogramFunctor, uint32_t, IS_ORDERED>(type);
case PhysicalType::UINT64:
return GetMapType<HistogramFunctor, uint64_t, IS_ORDERED>(type);
case PhysicalType::INT8:
return GetMapType<HistogramFunctor, int8_t, IS_ORDERED>(type);
case PhysicalType::INT16:
return GetMapType<HistogramFunctor, int16_t, IS_ORDERED>(type);
case PhysicalType::INT32:
return GetMapType<HistogramFunctor, int32_t, IS_ORDERED>(type);
case PhysicalType::INT64:
return GetMapType<HistogramFunctor, int64_t, IS_ORDERED>(type);
case PhysicalType::FLOAT:
return GetMapType<HistogramFunctor, float, IS_ORDERED>(type);
case PhysicalType::DOUBLE:
return GetMapType<HistogramFunctor, double, IS_ORDERED>(type);
case PhysicalType::VARCHAR:
return GetStringMapType<HistogramStringFunctor, IS_ORDERED>(type);
#endif
default:
return GetStringMapType<HistogramGenericFunctor, IS_ORDERED>(type);
}
}
template <bool IS_ORDERED = true>
unique_ptr<FunctionData> HistogramBindFunction(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
D_ASSERT(arguments.size() == 1);
if (arguments[0]->return_type.id() == LogicalTypeId::UNKNOWN) {
throw ParameterNotResolvedException();
}
function = GetHistogramFunction<IS_ORDERED>(arguments[0]->return_type);
return make_uniq<VariableReturnBindData>(function.return_type);
}
} // namespace
AggregateFunctionSet HistogramFun::GetFunctions() {
AggregateFunctionSet fun;
AggregateFunction histogram_function("histogram", {LogicalType::ANY}, LogicalTypeId::MAP, nullptr, nullptr, nullptr,
nullptr, nullptr, nullptr, HistogramBindFunction, nullptr);
fun.AddFunction(HistogramFun::BinnedHistogramFunction());
fun.AddFunction(histogram_function);
return fun;
}
AggregateFunction HistogramFun::GetHistogramUnorderedMap(LogicalType &type) {
return AggregateFunction("histogram", {LogicalType::ANY}, LogicalTypeId::MAP, nullptr, nullptr, nullptr, nullptr,
nullptr, nullptr, HistogramBindFunction<false>, nullptr);
}
} // namespace duckdb

View File

@@ -0,0 +1,201 @@
#include "duckdb/common/pair.hpp"
#include "duckdb/common/types/list_segment.hpp"
#include "core_functions/aggregate/nested_functions.hpp"
#include "duckdb/planner/expression/bound_aggregate_expression.hpp"
namespace duckdb {
namespace {
struct ListBindData : public FunctionData {
explicit ListBindData(const LogicalType &stype_p);
LogicalType stype;
ListSegmentFunctions functions;
unique_ptr<FunctionData> Copy() const override {
return make_uniq<ListBindData>(stype);
}
bool Equals(const FunctionData &other_p) const override {
auto &other = other_p.Cast<ListBindData>();
return stype == other.stype;
}
};
ListBindData::ListBindData(const LogicalType &stype_p) : stype(stype_p) {
// always unnest once because the result vector is of type LIST
auto type = ListType::GetChildType(stype_p);
GetSegmentDataFunctions(functions, type);
}
struct ListAggState {
LinkedList linked_list;
};
struct ListFunction {
template <class STATE>
static void Initialize(STATE &state) {
state.linked_list.total_capacity = 0;
state.linked_list.first_segment = nullptr;
state.linked_list.last_segment = nullptr;
}
static bool IgnoreNull() {
return false;
}
};
void ListUpdateFunction(Vector inputs[], AggregateInputData &aggr_input_data, idx_t input_count, Vector &state_vector,
idx_t count) {
D_ASSERT(input_count == 1);
auto &input = inputs[0];
RecursiveUnifiedVectorFormat input_data;
Vector::RecursiveToUnifiedFormat(input, count, input_data);
UnifiedVectorFormat states_data;
state_vector.ToUnifiedFormat(count, states_data);
auto states = UnifiedVectorFormat::GetData<ListAggState *>(states_data);
auto &list_bind_data = aggr_input_data.bind_data->Cast<ListBindData>();
for (idx_t i = 0; i < count; i++) {
auto &state = *states[states_data.sel->get_index(i)];
aggr_input_data.allocator.AlignNext();
list_bind_data.functions.AppendRow(aggr_input_data.allocator, state.linked_list, input_data, i);
}
}
void ListAbsorbFunction(Vector &states_vector, Vector &combined, AggregateInputData &aggr_input_data, idx_t count) {
D_ASSERT(aggr_input_data.combine_type == AggregateCombineType::ALLOW_DESTRUCTIVE);
UnifiedVectorFormat states_data;
states_vector.ToUnifiedFormat(count, states_data);
auto states_ptr = UnifiedVectorFormat::GetData<ListAggState *>(states_data);
auto combined_ptr = FlatVector::GetData<ListAggState *>(combined);
for (idx_t i = 0; i < count; i++) {
auto &state = *states_ptr[states_data.sel->get_index(i)];
if (state.linked_list.total_capacity == 0) {
// NULL, no need to append
// this can happen when adding a FILTER to the grouping, e.g.,
// LIST(i) FILTER (WHERE i <> 3)
continue;
}
if (combined_ptr[i]->linked_list.total_capacity == 0) {
combined_ptr[i]->linked_list = state.linked_list;
continue;
}
// append the linked list
combined_ptr[i]->linked_list.last_segment->next = state.linked_list.first_segment;
combined_ptr[i]->linked_list.last_segment = state.linked_list.last_segment;
combined_ptr[i]->linked_list.total_capacity += state.linked_list.total_capacity;
}
}
void ListFinalize(Vector &states_vector, AggregateInputData &aggr_input_data, Vector &result, idx_t count,
idx_t offset) {
UnifiedVectorFormat states_data;
states_vector.ToUnifiedFormat(count, states_data);
auto states = UnifiedVectorFormat::GetData<ListAggState *>(states_data);
D_ASSERT(result.GetType().id() == LogicalTypeId::LIST);
auto &mask = FlatVector::Validity(result);
auto result_data = FlatVector::GetData<list_entry_t>(result);
size_t total_len = ListVector::GetListSize(result);
auto &list_bind_data = aggr_input_data.bind_data->Cast<ListBindData>();
// first iterate over all entries and set up the list entries, and get the newly required total length
for (idx_t i = 0; i < count; i++) {
auto &state = *states[states_data.sel->get_index(i)];
const auto rid = i + offset;
result_data[rid].offset = total_len;
if (state.linked_list.total_capacity == 0) {
mask.SetInvalid(rid);
result_data[rid].length = 0;
continue;
}
// set the length and offset of this list in the result vector
auto total_capacity = state.linked_list.total_capacity;
result_data[rid].length = total_capacity;
total_len += total_capacity;
}
// reserve capacity, then iterate over all entries again and copy over the data to the child vector
ListVector::Reserve(result, total_len);
auto &result_child = ListVector::GetEntry(result);
for (idx_t i = 0; i < count; i++) {
auto &state = *states[states_data.sel->get_index(i)];
const auto rid = i + offset;
if (state.linked_list.total_capacity == 0) {
continue;
}
idx_t current_offset = result_data[rid].offset;
list_bind_data.functions.BuildListVector(state.linked_list, result_child, current_offset);
}
ListVector::SetListSize(result, total_len);
}
void ListCombineFunction(Vector &states_vector, Vector &combined, AggregateInputData &aggr_input_data, idx_t count) {
// Can we use destructive combining?
if (aggr_input_data.combine_type == AggregateCombineType::ALLOW_DESTRUCTIVE) {
ListAbsorbFunction(states_vector, combined, aggr_input_data, count);
return;
}
UnifiedVectorFormat states_data;
states_vector.ToUnifiedFormat(count, states_data);
auto states_ptr = UnifiedVectorFormat::GetData<const ListAggState *>(states_data);
auto combined_ptr = FlatVector::GetData<ListAggState *>(combined);
auto &list_bind_data = aggr_input_data.bind_data->Cast<ListBindData>();
auto result_type = ListType::GetChildType(list_bind_data.stype);
for (idx_t i = 0; i < count; i++) {
auto &source = *states_ptr[states_data.sel->get_index(i)];
auto &target = *combined_ptr[i];
const auto entry_count = source.linked_list.total_capacity;
Vector input(result_type, source.linked_list.total_capacity);
list_bind_data.functions.BuildListVector(source.linked_list, input, 0);
RecursiveUnifiedVectorFormat input_data;
Vector::RecursiveToUnifiedFormat(input, entry_count, input_data);
for (idx_t entry_idx = 0; entry_idx < entry_count; ++entry_idx) {
aggr_input_data.allocator.AlignNext();
list_bind_data.functions.AppendRow(aggr_input_data.allocator, target.linked_list, input_data, entry_idx);
}
}
}
unique_ptr<FunctionData> ListBindFunction(ClientContext &context, AggregateFunction &function,
vector<unique_ptr<Expression>> &arguments) {
function.return_type = LogicalType::LIST(arguments[0]->return_type);
return make_uniq<ListBindData>(function.return_type);
}
} // namespace
AggregateFunction ListFun::GetFunction() {
auto func = AggregateFunction(
{LogicalType::TEMPLATE("T")}, LogicalType::LIST(LogicalType::TEMPLATE("T")),
AggregateFunction::StateSize<ListAggState>, AggregateFunction::StateInitialize<ListAggState, ListFunction>,
ListUpdateFunction, ListCombineFunction, ListFinalize, nullptr, ListBindFunction, nullptr, nullptr, nullptr);
return func;
}
} // namespace duckdb

View File

@@ -0,0 +1,13 @@
add_library_unity(
duckdb_core_functions_regression
OBJECT
regr_sxy.cpp
regr_intercept.cpp
regr_count.cpp
regr_r2.cpp
regr_avg.cpp
regr_slope.cpp
regr_sxx_syy.cpp)
set(CORE_FUNCTION_FILES
${CORE_FUNCTION_FILES} $<TARGET_OBJECTS:duckdb_core_functions_regression>
PARENT_SCOPE)

View File

@@ -0,0 +1,68 @@
[
{
"name": "regr_avgx",
"parameters": "y,x",
"description": "Returns the average of the independent variable for non-NULL pairs in a group, where x is the independent variable and y is the dependent variable.",
"example": "",
"type": "aggregate_function"
},
{
"name": "regr_avgy",
"parameters": "y,x",
"description": "Returns the average of the dependent variable for non-NULL pairs in a group, where x is the independent variable and y is the dependent variable.",
"example": "",
"type": "aggregate_function"
},
{
"name": "regr_count",
"parameters": "y,x",
"description": "Returns the number of non-NULL number pairs in a group.",
"example": "(SUM(x*y) - SUM(x) * SUM(y) / COUNT(*)) / COUNT(*)",
"type": "aggregate_function"
},
{
"name": "regr_intercept",
"parameters": "y,x",
"description": "Returns the intercept of the univariate linear regression line for non-NULL pairs in a group.",
"example": "AVG(y)-REGR_SLOPE(y, x)*AVG(x)",
"type": "aggregate_function"
},
{
"name": "regr_r2",
"parameters": "y,x",
"description": "Returns the coefficient of determination for non-NULL pairs in a group.",
"example": "",
"type": "aggregate_function"
},
{
"name": "regr_slope",
"parameters": "y,x",
"description": "Returns the slope of the linear regression line for non-NULL pairs in a group.",
"example": "COVAR_POP(x, y) / VAR_POP(x)",
"type": "aggregate_function"
},
{
"name": "regr_sxx",
"parameters": "y,x",
"description": "",
"example": "REGR_COUNT(y, x) * VAR_POP(x)",
"type": "aggregate_function",
"struct": "RegrSXXFun"
},
{
"name": "regr_sxy",
"parameters": "y,x",
"description": "Returns the population covariance of input values",
"example": "REGR_COUNT(y, x) * COVAR_POP(y, x)",
"type": "aggregate_function",
"struct": "RegrSXYFun"
},
{
"name": "regr_syy",
"parameters": "y,x",
"description": "",
"example": "REGR_COUNT(y, x) * VAR_POP(y)",
"type": "aggregate_function",
"struct": "RegrSYYFun"
}
]

View File

@@ -0,0 +1,69 @@
#include "duckdb/common/exception.hpp"
#include "duckdb/common/vector_operations/vector_operations.hpp"
#include "core_functions/aggregate/regression_functions.hpp"
#include "duckdb/planner/expression/bound_aggregate_expression.hpp"
#include "duckdb/function/function_set.hpp"
namespace duckdb {
namespace {
struct RegrState {
double sum;
size_t count;
};
struct RegrAvgFunction {
template <class STATE>
static void Initialize(STATE &state) {
state.sum = 0;
state.count = 0;
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &) {
target.sum += source.sum;
target.count += source.count;
}
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.count == 0) {
finalize_data.ReturnNull();
} else {
target = state.sum / (double)state.count;
}
}
static bool IgnoreNull() {
return true;
}
};
struct RegrAvgXFunction : RegrAvgFunction {
template <class A_TYPE, class B_TYPE, class STATE, class OP>
static void Operation(STATE &state, const A_TYPE &y, const B_TYPE &x, AggregateBinaryInput &idata) {
state.sum += x;
state.count++;
}
};
struct RegrAvgYFunction : RegrAvgFunction {
template <class A_TYPE, class B_TYPE, class STATE, class OP>
static void Operation(STATE &state, const A_TYPE &y, const B_TYPE &x, AggregateBinaryInput &idata) {
state.sum += y;
state.count++;
}
};
} // namespace
AggregateFunction RegrAvgxFun::GetFunction() {
return AggregateFunction::BinaryAggregate<RegrState, double, double, double, RegrAvgXFunction>(
LogicalType::DOUBLE, LogicalType::DOUBLE, LogicalType::DOUBLE);
}
AggregateFunction RegrAvgyFun::GetFunction() {
return AggregateFunction::BinaryAggregate<RegrState, double, double, double, RegrAvgYFunction>(
LogicalType::DOUBLE, LogicalType::DOUBLE, LogicalType::DOUBLE);
}
} // namespace duckdb

View File

@@ -0,0 +1,18 @@
#include "duckdb/common/exception.hpp"
#include "duckdb/common/vector_operations/vector_operations.hpp"
#include "core_functions/aggregate/regression_functions.hpp"
#include "duckdb/planner/expression/bound_aggregate_expression.hpp"
#include "core_functions/aggregate/regression/regr_count.hpp"
#include "duckdb/function/function_set.hpp"
namespace duckdb {
AggregateFunction RegrCountFun::GetFunction() {
auto regr_count = AggregateFunction::BinaryAggregate<size_t, double, double, uint32_t, RegrCountFunction>(
LogicalType::DOUBLE, LogicalType::DOUBLE, LogicalType::UINTEGER);
regr_count.name = "regr_count";
regr_count.null_handling = FunctionNullHandling::SPECIAL_HANDLING;
return regr_count;
}
} // namespace duckdb

View File

@@ -0,0 +1,70 @@
//! AVG(y)-REGR_SLOPE(y,x)*AVG(x)
#include "core_functions/aggregate/regression_functions.hpp"
#include "core_functions/aggregate/regression/regr_slope.hpp"
#include "duckdb/function/function_set.hpp"
namespace duckdb {
namespace {
struct RegrInterceptState {
size_t count;
double sum_x;
double sum_y;
RegrSlopeState slope;
};
struct RegrInterceptOperation {
template <class STATE>
static void Initialize(STATE &state) {
state.count = 0;
state.sum_x = 0;
state.sum_y = 0;
RegrSlopeOperation::Initialize<RegrSlopeState>(state.slope);
}
template <class A_TYPE, class B_TYPE, class STATE, class OP>
static void Operation(STATE &state, const A_TYPE &y, const B_TYPE &x, AggregateBinaryInput &idata) {
state.count++;
state.sum_x += x;
state.sum_y += y;
RegrSlopeOperation::Operation<A_TYPE, B_TYPE, RegrSlopeState, OP>(state.slope, y, x, idata);
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &aggr_input_data) {
target.count += source.count;
target.sum_x += source.sum_x;
target.sum_y += source.sum_y;
RegrSlopeOperation::Combine<RegrSlopeState, OP>(source.slope, target.slope, aggr_input_data);
}
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.count == 0) {
finalize_data.ReturnNull();
return;
}
RegrSlopeOperation::Finalize<T, RegrSlopeState>(state.slope, target, finalize_data);
if (Value::IsNan(target)) {
finalize_data.ReturnNull();
return;
}
auto x_avg = state.sum_x / state.count;
auto y_avg = state.sum_y / state.count;
target = y_avg - target * x_avg;
}
static bool IgnoreNull() {
return true;
}
};
} // namespace
AggregateFunction RegrInterceptFun::GetFunction() {
return AggregateFunction::BinaryAggregate<RegrInterceptState, double, double, double, RegrInterceptOperation>(
LogicalType::DOUBLE, LogicalType::DOUBLE, LogicalType::DOUBLE);
}
} // namespace duckdb

View File

@@ -0,0 +1,77 @@
// REGR_R2(y, x)
// Returns the coefficient of determination for non-null pairs in a group.
// It is computed for non-null pairs using the following formula:
// null if var_pop(x) = 0, else
// 1 if var_pop(y) = 0 and var_pop(x) <> 0, else
// power(corr(y,x), 2)
#include "core_functions/aggregate/algebraic/corr.hpp"
#include "duckdb/function/function_set.hpp"
#include "core_functions/aggregate/regression_functions.hpp"
namespace duckdb {
namespace {
struct RegrR2State {
CorrState corr;
StddevState var_pop_x;
StddevState var_pop_y;
};
struct RegrR2Operation {
template <class STATE>
static void Initialize(STATE &state) {
CorrOperation::Initialize<CorrState>(state.corr);
STDDevBaseOperation::Initialize<StddevState>(state.var_pop_x);
STDDevBaseOperation::Initialize<StddevState>(state.var_pop_y);
}
template <class A_TYPE, class B_TYPE, class STATE, class OP>
static void Operation(STATE &state, const A_TYPE &y, const B_TYPE &x, AggregateBinaryInput &idata) {
CorrOperation::Operation<A_TYPE, B_TYPE, CorrState, OP>(state.corr, y, x, idata);
STDDevBaseOperation::Execute<A_TYPE, StddevState>(state.var_pop_x, x);
STDDevBaseOperation::Execute<A_TYPE, StddevState>(state.var_pop_y, y);
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &aggr_input_data) {
CorrOperation::Combine<CorrState, OP>(source.corr, target.corr, aggr_input_data);
STDDevBaseOperation::Combine<StddevState, OP>(source.var_pop_x, target.var_pop_x, aggr_input_data);
STDDevBaseOperation::Combine<StddevState, OP>(source.var_pop_y, target.var_pop_y, aggr_input_data);
}
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
auto var_pop_x = state.var_pop_x.count > 1 ? (state.var_pop_x.dsquared / state.var_pop_x.count) : 0;
if (!Value::DoubleIsFinite(var_pop_x)) {
throw OutOfRangeException("VARPOP(X) is out of range!");
}
if (var_pop_x == 0) {
finalize_data.ReturnNull();
return;
}
auto var_pop_y = state.var_pop_y.count > 1 ? (state.var_pop_y.dsquared / state.var_pop_y.count) : 0;
if (!Value::DoubleIsFinite(var_pop_y)) {
throw OutOfRangeException("VARPOP(Y) is out of range!");
}
if (var_pop_y == 0) {
target = 1;
return;
}
CorrOperation::Finalize<T, CorrState>(state.corr, target, finalize_data);
target = pow(target, 2);
}
static bool IgnoreNull() {
return true;
}
};
} // namespace
AggregateFunction RegrR2Fun::GetFunction() {
return AggregateFunction::BinaryAggregate<RegrR2State, double, double, double, RegrR2Operation>(
LogicalType::DOUBLE, LogicalType::DOUBLE, LogicalType::DOUBLE);
}
} // namespace duckdb

View File

@@ -0,0 +1,20 @@
// REGR_SLOPE(y, x)
// Returns the slope of the linear regression line for non-null pairs in a group.
// It is computed for non-null pairs using the following formula:
// COVAR_POP(x,y) / VAR_POP(x)
//! Input : Any numeric type
//! Output : Double
#include "core_functions/aggregate/regression/regr_slope.hpp"
#include "duckdb/function/function_set.hpp"
#include "core_functions/aggregate/regression_functions.hpp"
namespace duckdb {
AggregateFunction RegrSlopeFun::GetFunction() {
return AggregateFunction::BinaryAggregate<RegrSlopeState, double, double, double, RegrSlopeOperation>(
LogicalType::DOUBLE, LogicalType::DOUBLE, LogicalType::DOUBLE);
}
} // namespace duckdb

View File

@@ -0,0 +1,78 @@
// REGR_SXX(y, x)
// Returns REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs.
// REGR_SYY(y, x)
// Returns REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs.
#include "core_functions/aggregate/regression/regr_count.hpp"
#include "duckdb/function/function_set.hpp"
#include "core_functions/aggregate/regression_functions.hpp"
namespace duckdb {
namespace {
struct RegrSState {
size_t count;
StddevState var_pop;
};
struct RegrBaseOperation {
template <class STATE>
static void Initialize(STATE &state) {
RegrCountFunction::Initialize<size_t>(state.count);
STDDevBaseOperation::Initialize<StddevState>(state.var_pop);
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &aggr_input_data) {
RegrCountFunction::Combine<size_t, OP>(source.count, target.count, aggr_input_data);
STDDevBaseOperation::Combine<StddevState, OP>(source.var_pop, target.var_pop, aggr_input_data);
}
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.var_pop.count == 0) {
finalize_data.ReturnNull();
return;
}
auto var_pop = state.var_pop.count > 1 ? (state.var_pop.dsquared / state.var_pop.count) : 0;
if (!Value::DoubleIsFinite(var_pop)) {
throw OutOfRangeException("VARPOP is out of range!");
}
RegrCountFunction::Finalize<T, size_t>(state.count, target, finalize_data);
target *= var_pop;
}
static bool IgnoreNull() {
return true;
}
};
struct RegrSXXOperation : RegrBaseOperation {
template <class A_TYPE, class B_TYPE, class STATE, class OP>
static void Operation(STATE &state, const A_TYPE &y, const B_TYPE &x, AggregateBinaryInput &idata) {
RegrCountFunction::Operation<A_TYPE, B_TYPE, size_t, OP>(state.count, y, x, idata);
STDDevBaseOperation::Execute<A_TYPE, StddevState>(state.var_pop, x);
}
};
struct RegrSYYOperation : RegrBaseOperation {
template <class A_TYPE, class B_TYPE, class STATE, class OP>
static void Operation(STATE &state, const A_TYPE &y, const B_TYPE &x, AggregateBinaryInput &idata) {
RegrCountFunction::Operation<A_TYPE, B_TYPE, size_t, OP>(state.count, y, x, idata);
STDDevBaseOperation::Execute<A_TYPE, StddevState>(state.var_pop, y);
}
};
} // namespace
AggregateFunction RegrSXXFun::GetFunction() {
return AggregateFunction::BinaryAggregate<RegrSState, double, double, double, RegrSXXOperation>(
LogicalType::DOUBLE, LogicalType::DOUBLE, LogicalType::DOUBLE);
}
AggregateFunction RegrSYYFun::GetFunction() {
return AggregateFunction::BinaryAggregate<RegrSState, double, double, double, RegrSYYOperation>(
LogicalType::DOUBLE, LogicalType::DOUBLE, LogicalType::DOUBLE);
}
} // namespace duckdb

View File

@@ -0,0 +1,57 @@
// REGR_SXY(y, x)
// Returns REGR_COUNT(expr1, expr2) * COVAR_POP(expr1, expr2) for non-null pairs.
#include "core_functions/aggregate/regression/regr_count.hpp"
#include "core_functions/aggregate/algebraic/covar.hpp"
#include "core_functions/aggregate/regression_functions.hpp"
#include "duckdb/function/function_set.hpp"
namespace duckdb {
namespace {
struct RegrSXyState {
size_t count;
CovarState cov_pop;
};
struct RegrSXYOperation {
template <class STATE>
static void Initialize(STATE &state) {
RegrCountFunction::Initialize<size_t>(state.count);
CovarOperation::Initialize<CovarState>(state.cov_pop);
}
template <class A_TYPE, class B_TYPE, class STATE, class OP>
static void Operation(STATE &state, const A_TYPE &y, const B_TYPE &x, AggregateBinaryInput &idata) {
RegrCountFunction::Operation<A_TYPE, B_TYPE, size_t, OP>(state.count, y, x, idata);
CovarOperation::Operation<A_TYPE, B_TYPE, CovarState, OP>(state.cov_pop, y, x, idata);
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &aggr_input_data) {
CovarOperation::Combine<CovarState, OP>(source.cov_pop, target.cov_pop, aggr_input_data);
RegrCountFunction::Combine<size_t, OP>(source.count, target.count, aggr_input_data);
}
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
CovarPopOperation::Finalize<T, CovarState>(state.cov_pop, target, finalize_data);
auto cov_pop = target;
RegrCountFunction::Finalize<T, size_t>(state.count, target, finalize_data);
target *= cov_pop;
}
static bool IgnoreNull() {
return true;
}
};
} // namespace
AggregateFunction RegrSXYFun::GetFunction() {
return AggregateFunction::BinaryAggregate<RegrSXyState, double, double, double, RegrSXYOperation>(
LogicalType::DOUBLE, LogicalType::DOUBLE, LogicalType::DOUBLE);
}
} // namespace duckdb

View File

@@ -0,0 +1,14 @@
import os
prefix = os.path.join('extension', 'core_functions')
def list_files_recursive(rootdir, suffix):
file_list = []
for root, _, files in os.walk(rootdir):
file_list += [os.path.join(root, f) for f in files if f.endswith(suffix)]
return file_list
include_directories = [os.path.join(prefix, x) for x in ['include']]
source_files = list_files_recursive(prefix, '.cpp')

View File

@@ -0,0 +1,33 @@
#include "core_functions_extension.hpp"
#include "core_functions/function_list.hpp"
namespace duckdb {
static void LoadInternal(ExtensionLoader &loader) {
FunctionList::RegisterExtensionFunctions(loader, CoreFunctionList::GetFunctionList());
}
void CoreFunctionsExtension::Load(ExtensionLoader &loader) {
LoadInternal(loader);
}
std::string CoreFunctionsExtension::Name() {
return "core_functions";
}
std::string CoreFunctionsExtension::Version() const {
#ifdef EXT_VERSION_CORE_FUNCTIONS
return EXT_VERSION_CORE_FUNCTIONS;
#else
return "";
#endif
}
} // namespace duckdb
extern "C" {
DUCKDB_CPP_EXTENSION_ENTRY(core_functions, loader) {
duckdb::LoadInternal(loader);
}
}

View File

@@ -0,0 +1,418 @@
#include "core_functions/function_list.hpp"
#include "core_functions/aggregate/algebraic_functions.hpp"
#include "core_functions/aggregate/distributive_functions.hpp"
#include "core_functions/aggregate/holistic_functions.hpp"
#include "core_functions/aggregate/nested_functions.hpp"
#include "core_functions/aggregate/regression_functions.hpp"
#include "core_functions/scalar/bit_functions.hpp"
#include "core_functions/scalar/blob_functions.hpp"
#include "core_functions/scalar/date_functions.hpp"
#include "core_functions/scalar/enum_functions.hpp"
#include "core_functions/scalar/generic_functions.hpp"
#include "core_functions/scalar/list_functions.hpp"
#include "core_functions/scalar/map_functions.hpp"
#include "core_functions/scalar/math_functions.hpp"
#include "core_functions/scalar/operators_functions.hpp"
#include "core_functions/scalar/random_functions.hpp"
#include "core_functions/scalar/secret_functions.hpp"
#include "core_functions/scalar/string_functions.hpp"
#include "core_functions/scalar/struct_functions.hpp"
#include "core_functions/scalar/union_functions.hpp"
#include "core_functions/scalar/array_functions.hpp"
#include "core_functions/scalar/debug_functions.hpp"
namespace duckdb {
// Scalar Function
#define DUCKDB_SCALAR_FUNCTION_BASE(_PARAM, _NAME, _ALIAS_OF) \
{ _NAME, _ALIAS_OF, _PARAM::Parameters, _PARAM::Description, _PARAM::Example, _PARAM::Categories, _PARAM::GetFunction, nullptr, nullptr, nullptr }
#define DUCKDB_SCALAR_FUNCTION(_PARAM) DUCKDB_SCALAR_FUNCTION_BASE(_PARAM, _PARAM::Name, _PARAM::Name)
#define DUCKDB_SCALAR_FUNCTION_ALIAS(_PARAM) DUCKDB_SCALAR_FUNCTION_BASE(_PARAM::ALIAS, _PARAM::Name, _PARAM::ALIAS::Name)
// Scalar Function Set
#define DUCKDB_SCALAR_FUNCTION_SET_BASE(_PARAM, _NAME, _ALIAS_OF) \
{ _NAME, _ALIAS_OF, _PARAM::Parameters, _PARAM::Description, _PARAM::Example, _PARAM::Categories, nullptr, _PARAM::GetFunctions, nullptr, nullptr }
#define DUCKDB_SCALAR_FUNCTION_SET(_PARAM) DUCKDB_SCALAR_FUNCTION_SET_BASE(_PARAM, _PARAM::Name, _PARAM::Name)
#define DUCKDB_SCALAR_FUNCTION_SET_ALIAS(_PARAM) DUCKDB_SCALAR_FUNCTION_SET_BASE(_PARAM::ALIAS, _PARAM::Name, _PARAM::ALIAS::Name)
// Aggregate Function
#define DUCKDB_AGGREGATE_FUNCTION_BASE(_PARAM, _NAME, _ALIAS_OF) \
{ _NAME, _ALIAS_OF, _PARAM::Parameters, _PARAM::Description, _PARAM::Example, _PARAM::Categories, nullptr, nullptr, _PARAM::GetFunction, nullptr }
#define DUCKDB_AGGREGATE_FUNCTION(_PARAM) DUCKDB_AGGREGATE_FUNCTION_BASE(_PARAM, _PARAM::Name, _PARAM::Name)
#define DUCKDB_AGGREGATE_FUNCTION_ALIAS(_PARAM) DUCKDB_AGGREGATE_FUNCTION_BASE(_PARAM::ALIAS, _PARAM::Name, _PARAM::ALIAS::Name)
// Aggregate Function Set
#define DUCKDB_AGGREGATE_FUNCTION_SET_BASE(_PARAM, _NAME, _ALIAS_OF) \
{ _NAME, _ALIAS_OF, _PARAM::Parameters, _PARAM::Description, _PARAM::Example, _PARAM::Categories, nullptr, nullptr, nullptr, _PARAM::GetFunctions }
#define DUCKDB_AGGREGATE_FUNCTION_SET(_PARAM) DUCKDB_AGGREGATE_FUNCTION_SET_BASE(_PARAM, _PARAM::Name, _PARAM::Name)
#define DUCKDB_AGGREGATE_FUNCTION_SET_ALIAS(_PARAM) DUCKDB_AGGREGATE_FUNCTION_SET_BASE(_PARAM::ALIAS, _PARAM::Name, _PARAM::ALIAS::Name)
#define FINAL_FUNCTION \
{ nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr }
// this list is generated by scripts/generate_functions.py
static const StaticFunctionDefinition core_functions[] = {
DUCKDB_SCALAR_FUNCTION(FactorialOperatorFun),
DUCKDB_SCALAR_FUNCTION_SET(BitwiseAndFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(ListHasAnyFunAlias),
DUCKDB_SCALAR_FUNCTION(PowOperatorFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(ListDistanceFunAlias),
DUCKDB_SCALAR_FUNCTION_SET(LeftShiftFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(ListCosineDistanceFunAlias),
DUCKDB_SCALAR_FUNCTION_ALIAS(ListHasAllFunAlias2),
DUCKDB_SCALAR_FUNCTION_SET(RightShiftFun),
DUCKDB_SCALAR_FUNCTION_SET(AbsOperatorFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(ListHasAllFunAlias),
DUCKDB_SCALAR_FUNCTION_ALIAS(PowOperatorFunAlias),
DUCKDB_SCALAR_FUNCTION(StartsWithOperatorFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(AbsFun),
DUCKDB_SCALAR_FUNCTION(AcosFun),
DUCKDB_SCALAR_FUNCTION(AcoshFun),
DUCKDB_SCALAR_FUNCTION_SET(AgeFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(AggregateFun),
DUCKDB_SCALAR_FUNCTION(AliasFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(ApplyFun),
DUCKDB_AGGREGATE_FUNCTION(ApproxCountDistinctFun),
DUCKDB_AGGREGATE_FUNCTION_SET(ApproxQuantileFun),
DUCKDB_AGGREGATE_FUNCTION(ApproxTopKFun),
DUCKDB_AGGREGATE_FUNCTION_SET(ArgMaxFun),
DUCKDB_AGGREGATE_FUNCTION_SET(ArgMaxNullFun),
DUCKDB_AGGREGATE_FUNCTION_SET(ArgMaxNullsLastFun),
DUCKDB_AGGREGATE_FUNCTION_SET(ArgMinFun),
DUCKDB_AGGREGATE_FUNCTION_SET(ArgMinNullFun),
DUCKDB_AGGREGATE_FUNCTION_SET(ArgMinNullsLastFun),
DUCKDB_AGGREGATE_FUNCTION_SET_ALIAS(ArgmaxFun),
DUCKDB_AGGREGATE_FUNCTION_SET_ALIAS(ArgminFun),
DUCKDB_AGGREGATE_FUNCTION_ALIAS(ArrayAggFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(ArrayAggrFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(ArrayAggregateFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(ArrayApplyFun),
DUCKDB_SCALAR_FUNCTION_SET(ArrayCosineDistanceFun),
DUCKDB_SCALAR_FUNCTION_SET(ArrayCosineSimilarityFun),
DUCKDB_SCALAR_FUNCTION_SET(ArrayCrossProductFun),
DUCKDB_SCALAR_FUNCTION_SET(ArrayDistanceFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(ArrayDistinctFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(ArrayDotProductFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(ArrayFilterFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(ArrayGradeUpFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(ArrayHasAllFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(ArrayHasAnyFun),
DUCKDB_SCALAR_FUNCTION_SET(ArrayInnerProductFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(ArrayNegativeDotProductFun),
DUCKDB_SCALAR_FUNCTION_SET(ArrayNegativeInnerProductFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(ArrayReduceFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(ArrayReverseSortFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(ArraySliceFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(ArraySortFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(ArrayTransformFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(ArrayUniqueFun),
DUCKDB_SCALAR_FUNCTION(ArrayValueFun),
DUCKDB_SCALAR_FUNCTION(ASCIIFun),
DUCKDB_SCALAR_FUNCTION(AsinFun),
DUCKDB_SCALAR_FUNCTION(AsinhFun),
DUCKDB_SCALAR_FUNCTION(AtanFun),
DUCKDB_SCALAR_FUNCTION(Atan2Fun),
DUCKDB_SCALAR_FUNCTION(AtanhFun),
DUCKDB_AGGREGATE_FUNCTION_SET(AvgFun),
DUCKDB_SCALAR_FUNCTION_SET(BarFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(Base64Fun),
DUCKDB_SCALAR_FUNCTION_SET(BinFun),
DUCKDB_AGGREGATE_FUNCTION_SET(BitAndFun),
DUCKDB_SCALAR_FUNCTION_SET(BitCountFun),
DUCKDB_AGGREGATE_FUNCTION_SET(BitOrFun),
DUCKDB_SCALAR_FUNCTION(BitPositionFun),
DUCKDB_AGGREGATE_FUNCTION_SET(BitXorFun),
DUCKDB_SCALAR_FUNCTION_SET(BitStringFun),
DUCKDB_AGGREGATE_FUNCTION_SET(BitstringAggFun),
DUCKDB_AGGREGATE_FUNCTION(BoolAndFun),
DUCKDB_AGGREGATE_FUNCTION(BoolOrFun),
DUCKDB_SCALAR_FUNCTION(CanCastImplicitlyFun),
DUCKDB_SCALAR_FUNCTION(CardinalityFun),
DUCKDB_SCALAR_FUNCTION(CastToTypeFun),
DUCKDB_SCALAR_FUNCTION(CbrtFun),
DUCKDB_SCALAR_FUNCTION_SET(CeilFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(CeilingFun),
DUCKDB_SCALAR_FUNCTION_SET(CenturyFun),
DUCKDB_SCALAR_FUNCTION(ChrFun),
DUCKDB_AGGREGATE_FUNCTION(CorrFun),
DUCKDB_SCALAR_FUNCTION(CosFun),
DUCKDB_SCALAR_FUNCTION(CoshFun),
DUCKDB_SCALAR_FUNCTION(CotFun),
DUCKDB_AGGREGATE_FUNCTION(CountIfFun),
DUCKDB_AGGREGATE_FUNCTION_ALIAS(CountifFun),
DUCKDB_AGGREGATE_FUNCTION(CovarPopFun),
DUCKDB_AGGREGATE_FUNCTION(CovarSampFun),
DUCKDB_SCALAR_FUNCTION(CurrentDatabaseFun),
DUCKDB_SCALAR_FUNCTION(CurrentQueryFun),
DUCKDB_SCALAR_FUNCTION(CurrentSchemaFun),
DUCKDB_SCALAR_FUNCTION(CurrentSchemasFun),
DUCKDB_SCALAR_FUNCTION(CurrentSettingFun),
DUCKDB_SCALAR_FUNCTION(DamerauLevenshteinFun),
DUCKDB_SCALAR_FUNCTION_SET(DateDiffFun),
DUCKDB_SCALAR_FUNCTION_SET(DatePartFun),
DUCKDB_SCALAR_FUNCTION_SET(DateSubFun),
DUCKDB_SCALAR_FUNCTION_SET(DateTruncFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(DatediffFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(DatepartFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(DatesubFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(DatetruncFun),
DUCKDB_SCALAR_FUNCTION_SET(DayFun),
DUCKDB_SCALAR_FUNCTION_SET(DayNameFun),
DUCKDB_SCALAR_FUNCTION_SET(DayOfMonthFun),
DUCKDB_SCALAR_FUNCTION_SET(DayOfWeekFun),
DUCKDB_SCALAR_FUNCTION_SET(DayOfYearFun),
DUCKDB_SCALAR_FUNCTION_SET(DecadeFun),
DUCKDB_SCALAR_FUNCTION(DecodeFun),
DUCKDB_SCALAR_FUNCTION(DegreesFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(Editdist3Fun),
DUCKDB_SCALAR_FUNCTION_ALIAS(ElementAtFun),
DUCKDB_SCALAR_FUNCTION(EncodeFun),
DUCKDB_AGGREGATE_FUNCTION_SET(EntropyFun),
DUCKDB_SCALAR_FUNCTION(EnumCodeFun),
DUCKDB_SCALAR_FUNCTION(EnumFirstFun),
DUCKDB_SCALAR_FUNCTION(EnumLastFun),
DUCKDB_SCALAR_FUNCTION(EnumRangeFun),
DUCKDB_SCALAR_FUNCTION(EnumRangeBoundaryFun),
DUCKDB_SCALAR_FUNCTION_SET(EpochFun),
DUCKDB_SCALAR_FUNCTION_SET(EpochMsFun),
DUCKDB_SCALAR_FUNCTION_SET(EpochNsFun),
DUCKDB_SCALAR_FUNCTION_SET(EpochUsFun),
DUCKDB_SCALAR_FUNCTION_SET(EquiWidthBinsFun),
DUCKDB_SCALAR_FUNCTION_SET(EraFun),
DUCKDB_SCALAR_FUNCTION(EvenFun),
DUCKDB_SCALAR_FUNCTION(ExpFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(FactorialFun),
DUCKDB_AGGREGATE_FUNCTION(FAvgFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(FilterFun),
DUCKDB_SCALAR_FUNCTION(ListFlattenFun),
DUCKDB_SCALAR_FUNCTION_SET(FloorFun),
DUCKDB_SCALAR_FUNCTION(FormatFun),
DUCKDB_SCALAR_FUNCTION(FormatreadabledecimalsizeFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(FormatreadablesizeFun),
DUCKDB_SCALAR_FUNCTION(FormatBytesFun),
DUCKDB_SCALAR_FUNCTION(FromBase64Fun),
DUCKDB_SCALAR_FUNCTION_ALIAS(FromBinaryFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(FromHexFun),
DUCKDB_AGGREGATE_FUNCTION_ALIAS(FsumFun),
DUCKDB_SCALAR_FUNCTION(GammaFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(GcdFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(GenRandomUuidFun),
DUCKDB_SCALAR_FUNCTION_SET(GenerateSeriesFun),
DUCKDB_SCALAR_FUNCTION(GetBitFun),
DUCKDB_SCALAR_FUNCTION(GetCurrentTimestampFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(GradeUpFun),
DUCKDB_SCALAR_FUNCTION_SET(GreatestFun),
DUCKDB_SCALAR_FUNCTION_SET(GreatestCommonDivisorFun),
DUCKDB_AGGREGATE_FUNCTION_SET_ALIAS(GroupConcatFun),
DUCKDB_SCALAR_FUNCTION(HammingFun),
DUCKDB_SCALAR_FUNCTION(HashFun),
DUCKDB_SCALAR_FUNCTION_SET(HexFun),
DUCKDB_AGGREGATE_FUNCTION_SET(HistogramFun),
DUCKDB_AGGREGATE_FUNCTION(HistogramExactFun),
DUCKDB_SCALAR_FUNCTION_SET(HoursFun),
DUCKDB_SCALAR_FUNCTION(InSearchPathFun),
DUCKDB_SCALAR_FUNCTION(InstrFun),
DUCKDB_SCALAR_FUNCTION(IsHistogramOtherBinFun),
DUCKDB_SCALAR_FUNCTION_SET(IsFiniteFun),
DUCKDB_SCALAR_FUNCTION_SET(IsInfiniteFun),
DUCKDB_SCALAR_FUNCTION_SET(IsNanFun),
DUCKDB_SCALAR_FUNCTION_SET(ISODayOfWeekFun),
DUCKDB_SCALAR_FUNCTION_SET(ISOYearFun),
DUCKDB_SCALAR_FUNCTION(JaccardFun),
DUCKDB_SCALAR_FUNCTION_SET(JaroSimilarityFun),
DUCKDB_SCALAR_FUNCTION_SET(JaroWinklerSimilarityFun),
DUCKDB_SCALAR_FUNCTION_SET(JulianDayFun),
DUCKDB_AGGREGATE_FUNCTION(KahanSumFun),
DUCKDB_AGGREGATE_FUNCTION(KurtosisFun),
DUCKDB_AGGREGATE_FUNCTION(KurtosisPopFun),
DUCKDB_SCALAR_FUNCTION_SET(LastDayFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(LcmFun),
DUCKDB_SCALAR_FUNCTION_SET(LeastFun),
DUCKDB_SCALAR_FUNCTION_SET(LeastCommonMultipleFun),
DUCKDB_SCALAR_FUNCTION(LeftFun),
DUCKDB_SCALAR_FUNCTION(LeftGraphemeFun),
DUCKDB_SCALAR_FUNCTION(LevenshteinFun),
DUCKDB_SCALAR_FUNCTION(LogGammaFun),
DUCKDB_AGGREGATE_FUNCTION(ListFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(ListAggrFun),
DUCKDB_SCALAR_FUNCTION(ListAggregateFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(ListApplyFun),
DUCKDB_SCALAR_FUNCTION_SET(ListCosineDistanceFun),
DUCKDB_SCALAR_FUNCTION_SET(ListCosineSimilarityFun),
DUCKDB_SCALAR_FUNCTION_SET(ListDistanceFun),
DUCKDB_SCALAR_FUNCTION(ListDistinctFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(ListDotProductFun),
DUCKDB_SCALAR_FUNCTION(ListFilterFun),
DUCKDB_SCALAR_FUNCTION_SET(ListGradeUpFun),
DUCKDB_SCALAR_FUNCTION(ListHasAllFun),
DUCKDB_SCALAR_FUNCTION(ListHasAnyFun),
DUCKDB_SCALAR_FUNCTION_SET(ListInnerProductFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(ListNegativeDotProductFun),
DUCKDB_SCALAR_FUNCTION_SET(ListNegativeInnerProductFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(ListPackFun),
DUCKDB_SCALAR_FUNCTION_SET(ListReduceFun),
DUCKDB_SCALAR_FUNCTION_SET(ListReverseSortFun),
DUCKDB_SCALAR_FUNCTION_SET(ListSliceFun),
DUCKDB_SCALAR_FUNCTION_SET(ListSortFun),
DUCKDB_SCALAR_FUNCTION(ListTransformFun),
DUCKDB_SCALAR_FUNCTION(ListUniqueFun),
DUCKDB_SCALAR_FUNCTION_SET(ListValueFun),
DUCKDB_AGGREGATE_FUNCTION_SET_ALIAS(ListaggFun),
DUCKDB_SCALAR_FUNCTION(LnFun),
DUCKDB_SCALAR_FUNCTION_SET(LogFun),
DUCKDB_SCALAR_FUNCTION(Log10Fun),
DUCKDB_SCALAR_FUNCTION(Log2Fun),
DUCKDB_SCALAR_FUNCTION(LpadFun),
DUCKDB_SCALAR_FUNCTION_SET(LtrimFun),
DUCKDB_AGGREGATE_FUNCTION_SET(MadFun),
DUCKDB_SCALAR_FUNCTION_SET(MakeDateFun),
DUCKDB_SCALAR_FUNCTION(MakeTimeFun),
DUCKDB_SCALAR_FUNCTION_SET(MakeTimestampFun),
DUCKDB_SCALAR_FUNCTION_SET(MakeTimestampMsFun),
DUCKDB_SCALAR_FUNCTION_SET(MakeTimestampNsFun),
DUCKDB_SCALAR_FUNCTION_SET(MapFun),
DUCKDB_SCALAR_FUNCTION(MapConcatFun),
DUCKDB_SCALAR_FUNCTION(MapEntriesFun),
DUCKDB_SCALAR_FUNCTION(MapExtractFun),
DUCKDB_SCALAR_FUNCTION(MapExtractValueFun),
DUCKDB_SCALAR_FUNCTION(MapFromEntriesFun),
DUCKDB_SCALAR_FUNCTION(MapKeysFun),
DUCKDB_SCALAR_FUNCTION(MapValuesFun),
DUCKDB_AGGREGATE_FUNCTION_SET_ALIAS(MaxByFun),
DUCKDB_AGGREGATE_FUNCTION_SET_ALIAS(MeanFun),
DUCKDB_AGGREGATE_FUNCTION_SET(MedianFun),
DUCKDB_SCALAR_FUNCTION_SET(MicrosecondsFun),
DUCKDB_SCALAR_FUNCTION_SET(MillenniumFun),
DUCKDB_SCALAR_FUNCTION_SET(MillisecondsFun),
DUCKDB_AGGREGATE_FUNCTION_SET_ALIAS(MinByFun),
DUCKDB_SCALAR_FUNCTION_SET(MinutesFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(MismatchesFun),
DUCKDB_AGGREGATE_FUNCTION_SET(ModeFun),
DUCKDB_SCALAR_FUNCTION_SET(MonthFun),
DUCKDB_SCALAR_FUNCTION_SET(MonthNameFun),
DUCKDB_SCALAR_FUNCTION_SET(NanosecondsFun),
DUCKDB_SCALAR_FUNCTION_SET(NextAfterFun),
DUCKDB_SCALAR_FUNCTION(NormalizedIntervalFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(NowFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(OrdFun),
DUCKDB_SCALAR_FUNCTION_SET(ParseDirnameFun),
DUCKDB_SCALAR_FUNCTION_SET(ParseDirpathFun),
DUCKDB_SCALAR_FUNCTION_SET(ParseFilenameFun),
DUCKDB_SCALAR_FUNCTION_SET(ParsePathFun),
DUCKDB_SCALAR_FUNCTION(PiFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(PositionFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(PowFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(PowerFun),
DUCKDB_SCALAR_FUNCTION(PrintfFun),
DUCKDB_AGGREGATE_FUNCTION(ProductFun),
DUCKDB_AGGREGATE_FUNCTION_SET_ALIAS(QuantileFun),
DUCKDB_AGGREGATE_FUNCTION_SET(QuantileContFun),
DUCKDB_AGGREGATE_FUNCTION_SET(QuantileDiscFun),
DUCKDB_SCALAR_FUNCTION_SET(QuarterFun),
DUCKDB_SCALAR_FUNCTION(RadiansFun),
DUCKDB_SCALAR_FUNCTION(RandomFun),
DUCKDB_SCALAR_FUNCTION_SET(ListRangeFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(ReduceFun),
DUCKDB_AGGREGATE_FUNCTION(RegrAvgxFun),
DUCKDB_AGGREGATE_FUNCTION(RegrAvgyFun),
DUCKDB_AGGREGATE_FUNCTION(RegrCountFun),
DUCKDB_AGGREGATE_FUNCTION(RegrInterceptFun),
DUCKDB_AGGREGATE_FUNCTION(RegrR2Fun),
DUCKDB_AGGREGATE_FUNCTION(RegrSlopeFun),
DUCKDB_AGGREGATE_FUNCTION(RegrSXXFun),
DUCKDB_AGGREGATE_FUNCTION(RegrSXYFun),
DUCKDB_AGGREGATE_FUNCTION(RegrSYYFun),
DUCKDB_SCALAR_FUNCTION_SET(RepeatFun),
DUCKDB_SCALAR_FUNCTION(ReplaceFun),
DUCKDB_SCALAR_FUNCTION(ReplaceTypeFun),
DUCKDB_AGGREGATE_FUNCTION_SET(ReservoirQuantileFun),
DUCKDB_SCALAR_FUNCTION(ReverseFun),
DUCKDB_SCALAR_FUNCTION(RightFun),
DUCKDB_SCALAR_FUNCTION(RightGraphemeFun),
DUCKDB_SCALAR_FUNCTION_SET(RoundFun),
DUCKDB_SCALAR_FUNCTION(RpadFun),
DUCKDB_SCALAR_FUNCTION_SET(RtrimFun),
DUCKDB_SCALAR_FUNCTION_SET(SecondsFun),
DUCKDB_AGGREGATE_FUNCTION(StandardErrorOfTheMeanFun),
DUCKDB_SCALAR_FUNCTION(SetBitFun),
DUCKDB_SCALAR_FUNCTION(SetseedFun),
DUCKDB_SCALAR_FUNCTION_SET(SignFun),
DUCKDB_SCALAR_FUNCTION_SET(SignBitFun),
DUCKDB_SCALAR_FUNCTION(SinFun),
DUCKDB_SCALAR_FUNCTION(SinhFun),
DUCKDB_AGGREGATE_FUNCTION(SkewnessFun),
DUCKDB_SCALAR_FUNCTION(SqrtFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(StartsWithFun),
DUCKDB_SCALAR_FUNCTION(StatsFun),
DUCKDB_AGGREGATE_FUNCTION_ALIAS(StddevFun),
DUCKDB_AGGREGATE_FUNCTION(StdDevPopFun),
DUCKDB_AGGREGATE_FUNCTION(StdDevSampFun),
DUCKDB_AGGREGATE_FUNCTION_SET(StringAggFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(StrposFun),
DUCKDB_SCALAR_FUNCTION(StructInsertFun),
DUCKDB_SCALAR_FUNCTION(StructUpdateFun),
DUCKDB_AGGREGATE_FUNCTION_SET(SumFun),
DUCKDB_AGGREGATE_FUNCTION_SET(SumNoOverflowFun),
DUCKDB_AGGREGATE_FUNCTION_ALIAS(SumkahanFun),
DUCKDB_SCALAR_FUNCTION(TanFun),
DUCKDB_SCALAR_FUNCTION(TanhFun),
DUCKDB_SCALAR_FUNCTION_SET(TimeBucketFun),
DUCKDB_SCALAR_FUNCTION(TimeTZSortKeyFun),
DUCKDB_SCALAR_FUNCTION_SET(TimezoneFun),
DUCKDB_SCALAR_FUNCTION_SET(TimezoneHourFun),
DUCKDB_SCALAR_FUNCTION_SET(TimezoneMinuteFun),
DUCKDB_SCALAR_FUNCTION_SET(ToBaseFun),
DUCKDB_SCALAR_FUNCTION(ToBase64Fun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(ToBinaryFun),
DUCKDB_SCALAR_FUNCTION_SET(ToCenturiesFun),
DUCKDB_SCALAR_FUNCTION_SET(ToDaysFun),
DUCKDB_SCALAR_FUNCTION_SET(ToDecadesFun),
DUCKDB_SCALAR_FUNCTION_SET_ALIAS(ToHexFun),
DUCKDB_SCALAR_FUNCTION(ToHoursFun),
DUCKDB_SCALAR_FUNCTION(ToMicrosecondsFun),
DUCKDB_SCALAR_FUNCTION_SET(ToMillenniaFun),
DUCKDB_SCALAR_FUNCTION(ToMillisecondsFun),
DUCKDB_SCALAR_FUNCTION(ToMinutesFun),
DUCKDB_SCALAR_FUNCTION_SET(ToMonthsFun),
DUCKDB_SCALAR_FUNCTION_SET(ToQuartersFun),
DUCKDB_SCALAR_FUNCTION(ToSecondsFun),
DUCKDB_SCALAR_FUNCTION(ToTimestampFun),
DUCKDB_SCALAR_FUNCTION_SET(ToWeeksFun),
DUCKDB_SCALAR_FUNCTION_SET(ToYearsFun),
DUCKDB_SCALAR_FUNCTION_ALIAS(TransactionTimestampFun),
DUCKDB_SCALAR_FUNCTION(TranslateFun),
DUCKDB_SCALAR_FUNCTION_SET(TrimFun),
DUCKDB_SCALAR_FUNCTION_SET(TruncFun),
DUCKDB_SCALAR_FUNCTION(CurrentTransactionIdFun),
DUCKDB_SCALAR_FUNCTION(TypeOfFun),
DUCKDB_SCALAR_FUNCTION(UnbinFun),
DUCKDB_SCALAR_FUNCTION(UnhexFun),
DUCKDB_SCALAR_FUNCTION(UnicodeFun),
DUCKDB_SCALAR_FUNCTION(UnionExtractFun),
DUCKDB_SCALAR_FUNCTION(UnionTagFun),
DUCKDB_SCALAR_FUNCTION(UnionValueFun),
DUCKDB_SCALAR_FUNCTION(UnpivotListFun),
DUCKDB_SCALAR_FUNCTION(UrlDecodeFun),
DUCKDB_SCALAR_FUNCTION(UrlEncodeFun),
DUCKDB_SCALAR_FUNCTION(UUIDFun),
DUCKDB_SCALAR_FUNCTION(UUIDExtractTimestampFun),
DUCKDB_SCALAR_FUNCTION(UUIDExtractVersionFun),
DUCKDB_SCALAR_FUNCTION(UUIDv4Fun),
DUCKDB_SCALAR_FUNCTION(UUIDv7Fun),
DUCKDB_AGGREGATE_FUNCTION(VarPopFun),
DUCKDB_AGGREGATE_FUNCTION(VarSampFun),
DUCKDB_AGGREGATE_FUNCTION_ALIAS(VarianceFun),
DUCKDB_SCALAR_FUNCTION(VectorTypeFun),
DUCKDB_SCALAR_FUNCTION(VersionFun),
DUCKDB_SCALAR_FUNCTION_SET(WeekFun),
DUCKDB_SCALAR_FUNCTION_SET(WeekDayFun),
DUCKDB_SCALAR_FUNCTION_SET(WeekOfYearFun),
DUCKDB_SCALAR_FUNCTION_SET(BitwiseXorFun),
DUCKDB_SCALAR_FUNCTION_SET(YearFun),
DUCKDB_SCALAR_FUNCTION_SET(YearWeekFun),
DUCKDB_SCALAR_FUNCTION_SET(BitwiseOrFun),
DUCKDB_SCALAR_FUNCTION_SET(BitwiseNotFun),
FINAL_FUNCTION
};
const StaticFunctionDefinition *CoreFunctionList::GetFunctionList() {
return core_functions;
}
} // namespace duckdb

View File

@@ -0,0 +1,70 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// duckdb/core_functions/aggregate/algebraic/corr.hpp
//
//
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/aggregate_function.hpp"
#include "core_functions/aggregate/algebraic/covar.hpp"
#include "core_functions/aggregate/algebraic/stddev.hpp"
namespace duckdb {
struct CorrState {
CovarState cov_pop;
StddevState dev_pop_x;
StddevState dev_pop_y;
};
// Returns the correlation coefficient for non-null pairs in a group.
// CORR(y, x) = COVAR_POP(y, x) / (STDDEV_POP(x) * STDDEV_POP(y))
struct CorrOperation {
template <class STATE>
static void Initialize(STATE &state) {
CovarOperation::Initialize<CovarState>(state.cov_pop);
STDDevBaseOperation::Initialize<StddevState>(state.dev_pop_x);
STDDevBaseOperation::Initialize<StddevState>(state.dev_pop_y);
}
template <class A_TYPE, class B_TYPE, class STATE, class OP>
static void Operation(STATE &state, const A_TYPE &y, const B_TYPE &x, AggregateBinaryInput &idata) {
CovarOperation::Operation<A_TYPE, B_TYPE, CovarState, OP>(state.cov_pop, y, x, idata);
STDDevBaseOperation::Execute<A_TYPE, StddevState>(state.dev_pop_x, x);
STDDevBaseOperation::Execute<B_TYPE, StddevState>(state.dev_pop_y, y);
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &aggr_input_data) {
CovarOperation::Combine<CovarState, OP>(source.cov_pop, target.cov_pop, aggr_input_data);
STDDevBaseOperation::Combine<StddevState, OP>(source.dev_pop_x, target.dev_pop_x, aggr_input_data);
STDDevBaseOperation::Combine<StddevState, OP>(source.dev_pop_y, target.dev_pop_y, aggr_input_data);
}
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.cov_pop.count == 0 || state.dev_pop_x.count == 0 || state.dev_pop_y.count == 0) {
finalize_data.ReturnNull();
} else {
auto cov = state.cov_pop.co_moment / state.cov_pop.count;
auto std_x = state.dev_pop_x.count > 1 ? sqrt(state.dev_pop_x.dsquared / state.dev_pop_x.count) : 0;
if (!Value::DoubleIsFinite(std_x)) {
throw OutOfRangeException("STDDEV_POP for X is out of range!");
}
auto std_y = state.dev_pop_y.count > 1 ? sqrt(state.dev_pop_y.dsquared / state.dev_pop_y.count) : 0;
if (!Value::DoubleIsFinite(std_y)) {
throw OutOfRangeException("STDDEV_POP for Y is out of range!");
}
target = std_x * std_y != 0 ? cov / (std_x * std_y) : NAN;
}
}
static bool IgnoreNull() {
return true;
}
};
} // namespace duckdb

View File

@@ -0,0 +1,101 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// duckdb/core_functions/aggregate/algebraic/covar.hpp
//
//
//===----------------------------------------------------------------------===//
// COVAR_POP(y,x)
#pragma once
#include "duckdb/function/aggregate_function.hpp"
namespace duckdb {
struct CovarState {
uint64_t count;
double meanx;
double meany;
double co_moment;
};
struct CovarOperation {
template <class STATE>
static void Initialize(STATE &state) {
state.count = 0;
state.meanx = 0;
state.meany = 0;
state.co_moment = 0;
}
template <class A_TYPE, class B_TYPE, class STATE, class OP>
static void Operation(STATE &state, const A_TYPE &y, const B_TYPE &x, AggregateBinaryInput &idata) {
// update running mean and d^2
const double n = static_cast<double>(++(state.count));
const double dx = (x - state.meanx);
const double meanx = state.meanx + dx / n;
const double dy = (y - state.meany);
const double meany = state.meany + dy / n;
// Schubert and Gertz SSDBM 2018 (4.3)
const double C = state.co_moment + dx * (y - meany);
state.meanx = meanx;
state.meany = meany;
state.co_moment = C;
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &) {
if (target.count == 0) {
target = source;
} else if (source.count > 0) {
const auto count = target.count + source.count;
D_ASSERT(count >= target.count); // This is a check that we are not overflowing
const auto target_count = static_cast<double>(target.count);
const auto source_count = static_cast<double>(source.count);
const auto total_count = static_cast<double>(count);
const auto meanx = (source_count * source.meanx + target_count * target.meanx) / total_count;
const auto meany = (source_count * source.meany + target_count * target.meany) / total_count;
// Schubert and Gertz SSDBM 2018, equation 21
const auto deltax = target.meanx - source.meanx;
const auto deltay = target.meany - source.meany;
target.co_moment =
source.co_moment + target.co_moment + deltax * deltay * source_count * target_count / total_count;
target.meanx = meanx;
target.meany = meany;
target.count = count;
}
}
static bool IgnoreNull() {
return true;
}
};
struct CovarPopOperation : public CovarOperation {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.count == 0) {
finalize_data.ReturnNull();
} else {
target = state.co_moment / state.count;
}
}
};
struct CovarSampOperation : public CovarOperation {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.count < 2) {
finalize_data.ReturnNull();
} else {
target = state.co_moment / (state.count - 1);
}
}
};
} // namespace duckdb

View File

@@ -0,0 +1,151 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// duckdb/core_functions/aggregate/algebraic/stddev.hpp
//
//
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/aggregate_function.hpp"
#include <ctgmath>
namespace duckdb {
struct StddevState {
uint64_t count; // n
double mean; // M1
double dsquared; // M2
};
// Streaming approximate standard deviation using Welford's
// method, DOI: 10.2307/1266577
struct STDDevBaseOperation {
template <class STATE>
static void Initialize(STATE &state) {
state.count = 0;
state.mean = 0;
state.dsquared = 0;
}
template <class INPUT_TYPE, class STATE>
static void Execute(STATE &state, const INPUT_TYPE &input) {
// update running mean and d^2
state.count++;
const double mean_differential = (input - state.mean) / state.count;
const double new_mean = state.mean + mean_differential;
const double dsquared_increment = (input - new_mean) * (input - state.mean);
const double new_dsquared = state.dsquared + dsquared_increment;
state.mean = new_mean;
state.dsquared = new_dsquared;
}
template <class INPUT_TYPE, class STATE, class OP>
static void Operation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &) {
Execute(state, input);
}
template <class INPUT_TYPE, class STATE, class OP>
static void ConstantOperation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input,
idx_t count) {
for (idx_t i = 0; i < count; i++) {
Operation<INPUT_TYPE, STATE, OP>(state, input, unary_input);
}
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &) {
if (target.count == 0) {
target = source;
} else if (source.count > 0) {
const auto count = target.count + source.count;
D_ASSERT(count >= target.count); // This is a check that we are not overflowing
const double target_count = static_cast<double>(target.count);
const double source_count = static_cast<double>(source.count);
const double total_count = static_cast<double>(count);
const auto delta = source.mean - target.mean;
const auto mean = std::fma(source_count / total_count, delta, target.mean);
target.dsquared =
source.dsquared + target.dsquared + delta * delta * source_count * target_count / total_count;
target.mean = mean;
target.count = count;
}
}
static bool IgnoreNull() {
return true;
}
};
struct VarSampOperation : public STDDevBaseOperation {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.count <= 1) {
finalize_data.ReturnNull();
} else {
target = state.dsquared / (state.count - 1);
if (!Value::DoubleIsFinite(target)) {
throw OutOfRangeException("VARSAMP is out of range!");
}
}
}
};
struct VarPopOperation : public STDDevBaseOperation {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.count == 0) {
finalize_data.ReturnNull();
} else {
target = state.count > 1 ? (state.dsquared / state.count) : 0;
if (!Value::DoubleIsFinite(target)) {
throw OutOfRangeException("VARPOP is out of range!");
}
}
}
};
struct STDDevSampOperation : public STDDevBaseOperation {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.count <= 1) {
finalize_data.ReturnNull();
} else {
target = sqrt(state.dsquared / (state.count - 1));
if (!Value::DoubleIsFinite(target)) {
throw OutOfRangeException("STDDEV_SAMP is out of range!");
}
}
}
};
struct STDDevPopOperation : public STDDevBaseOperation {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.count == 0) {
finalize_data.ReturnNull();
} else {
target = state.count > 1 ? sqrt(state.dsquared / state.count) : 0;
if (!Value::DoubleIsFinite(target)) {
throw OutOfRangeException("STDDEV_POP is out of range!");
}
}
}
};
struct StandardErrorOfTheMeanOperation : public STDDevBaseOperation {
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.count == 0) {
finalize_data.ReturnNull();
} else {
target = sqrt(state.dsquared / state.count) / sqrt((state.count));
if (!Value::DoubleIsFinite(target)) {
throw OutOfRangeException("SEM is out of range!");
}
}
}
};
} // namespace duckdb

View File

@@ -0,0 +1,136 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/aggregate/algebraic_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct AvgFun {
static constexpr const char *Name = "avg";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Calculates the average value for all tuples in x.";
static constexpr const char *Example = "SUM(x) / COUNT(*)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct MeanFun {
using ALIAS = AvgFun;
static constexpr const char *Name = "mean";
};
struct CorrFun {
static constexpr const char *Name = "corr";
static constexpr const char *Parameters = "y,x";
static constexpr const char *Description = "Returns the correlation coefficient for non-NULL pairs in a group.";
static constexpr const char *Example = "COVAR_POP(y, x) / (STDDEV_POP(x) * STDDEV_POP(y))";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct CovarPopFun {
static constexpr const char *Name = "covar_pop";
static constexpr const char *Parameters = "y,x";
static constexpr const char *Description = "Returns the population covariance of input values.";
static constexpr const char *Example = "(SUM(x*y) - SUM(x) * SUM(y) / COUNT(*)) / COUNT(*)";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct CovarSampFun {
static constexpr const char *Name = "covar_samp";
static constexpr const char *Parameters = "y,x";
static constexpr const char *Description = "Returns the sample covariance for non-NULL pairs in a group.";
static constexpr const char *Example = "(SUM(x*y) - SUM(x) * SUM(y) / COUNT(*)) / (COUNT(*) - 1)";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct FAvgFun {
static constexpr const char *Name = "favg";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Calculates the average using a more accurate floating point summation (Kahan Sum)";
static constexpr const char *Example = "favg(A)";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct StandardErrorOfTheMeanFun {
static constexpr const char *Name = "sem";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns the standard error of the mean";
static constexpr const char *Example = "";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct StdDevPopFun {
static constexpr const char *Name = "stddev_pop";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns the population standard deviation.";
static constexpr const char *Example = "sqrt(var_pop(x))";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct StdDevSampFun {
static constexpr const char *Name = "stddev_samp";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns the sample standard deviation";
static constexpr const char *Example = "sqrt(var_samp(x))";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct StddevFun {
using ALIAS = StdDevSampFun;
static constexpr const char *Name = "stddev";
};
struct VarPopFun {
static constexpr const char *Name = "var_pop";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns the population variance.";
static constexpr const char *Example = "";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct VarSampFun {
static constexpr const char *Name = "var_samp";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns the sample variance of all input values.";
static constexpr const char *Example = "(SUM(x^2) - SUM(x)^2 / COUNT(x)) / (COUNT(x) - 1)";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct VarianceFun {
using ALIAS = VarSampFun;
static constexpr const char *Name = "variance";
};
} // namespace duckdb

View File

@@ -0,0 +1,302 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/aggregate/distributive_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct ApproxCountDistinctFun {
static constexpr const char *Name = "approx_count_distinct";
static constexpr const char *Parameters = "any";
static constexpr const char *Description = "Computes the approximate count of distinct elements using HyperLogLog.";
static constexpr const char *Example = "approx_count_distinct(A)";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct ArgMinFun {
static constexpr const char *Name = "arg_min";
static constexpr const char *Parameters = "arg,val";
static constexpr const char *Description = "Finds the row with the minimum val. Calculates the non-NULL arg expression at that row.";
static constexpr const char *Example = "arg_min(A, B)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct ArgminFun {
using ALIAS = ArgMinFun;
static constexpr const char *Name = "argmin";
};
struct MinByFun {
using ALIAS = ArgMinFun;
static constexpr const char *Name = "min_by";
};
struct ArgMinNullFun {
static constexpr const char *Name = "arg_min_null";
static constexpr const char *Parameters = "arg,val";
static constexpr const char *Description = "Finds the row with the minimum val. Calculates the arg expression at that row.";
static constexpr const char *Example = "arg_min_null(A, B)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct ArgMinNullsLastFun {
static constexpr const char *Name = "arg_min_nulls_last";
static constexpr const char *Parameters = "arg,val,N";
static constexpr const char *Description = "Finds the rows with N minimum vals, including nulls. Calculates the arg expression at that row.";
static constexpr const char *Example = "arg_min_null_val(A, B, N)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct ArgMaxFun {
static constexpr const char *Name = "arg_max";
static constexpr const char *Parameters = "arg,val";
static constexpr const char *Description = "Finds the row with the maximum val. Calculates the non-NULL arg expression at that row.";
static constexpr const char *Example = "arg_max(A, B)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct ArgmaxFun {
using ALIAS = ArgMaxFun;
static constexpr const char *Name = "argmax";
};
struct MaxByFun {
using ALIAS = ArgMaxFun;
static constexpr const char *Name = "max_by";
};
struct ArgMaxNullFun {
static constexpr const char *Name = "arg_max_null";
static constexpr const char *Parameters = "arg,val";
static constexpr const char *Description = "Finds the row with the maximum val. Calculates the arg expression at that row.";
static constexpr const char *Example = "arg_max_null(A, B)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct ArgMaxNullsLastFun {
static constexpr const char *Name = "arg_max_nulls_last";
static constexpr const char *Parameters = "arg,val,N";
static constexpr const char *Description = "Finds the rows with N maximum vals, including nulls. Calculates the arg expression at that row.";
static constexpr const char *Example = "arg_min_null_val(A, B, N)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct BitAndFun {
static constexpr const char *Name = "bit_and";
static constexpr const char *Parameters = "arg";
static constexpr const char *Description = "Returns the bitwise AND of all bits in a given expression.";
static constexpr const char *Example = "bit_and(A)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct BitOrFun {
static constexpr const char *Name = "bit_or";
static constexpr const char *Parameters = "arg";
static constexpr const char *Description = "Returns the bitwise OR of all bits in a given expression.";
static constexpr const char *Example = "bit_or(A)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct BitXorFun {
static constexpr const char *Name = "bit_xor";
static constexpr const char *Parameters = "arg";
static constexpr const char *Description = "Returns the bitwise XOR of all bits in a given expression.";
static constexpr const char *Example = "bit_xor(A)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct BitstringAggFun {
static constexpr const char *Name = "bitstring_agg";
static constexpr const char *Parameters = "arg";
static constexpr const char *Description = "Returns a bitstring with bits set for each distinct value.";
static constexpr const char *Example = "bitstring_agg(A)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct BoolAndFun {
static constexpr const char *Name = "bool_and";
static constexpr const char *Parameters = "arg";
static constexpr const char *Description = "Returns TRUE if every input value is TRUE, otherwise FALSE.";
static constexpr const char *Example = "bool_and(A)";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct BoolOrFun {
static constexpr const char *Name = "bool_or";
static constexpr const char *Parameters = "arg";
static constexpr const char *Description = "Returns TRUE if any input value is TRUE, otherwise FALSE.";
static constexpr const char *Example = "bool_or(A)";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct CountIfFun {
static constexpr const char *Name = "count_if";
static constexpr const char *Parameters = "arg";
static constexpr const char *Description = "Counts the total number of TRUE values for a boolean column";
static constexpr const char *Example = "count_if(A)";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct CountifFun {
using ALIAS = CountIfFun;
static constexpr const char *Name = "countif";
};
struct EntropyFun {
static constexpr const char *Name = "entropy";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns the log-2 entropy of count input-values.";
static constexpr const char *Example = "";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct KahanSumFun {
static constexpr const char *Name = "kahan_sum";
static constexpr const char *Parameters = "arg";
static constexpr const char *Description = "Calculates the sum using a more accurate floating point summation (Kahan Sum).";
static constexpr const char *Example = "kahan_sum(A)";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct FsumFun {
using ALIAS = KahanSumFun;
static constexpr const char *Name = "fsum";
};
struct SumkahanFun {
using ALIAS = KahanSumFun;
static constexpr const char *Name = "sumkahan";
};
struct KurtosisFun {
static constexpr const char *Name = "kurtosis";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns the excess kurtosis (Fishers definition) of all input values, with a bias correction according to the sample size";
static constexpr const char *Example = "";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct KurtosisPopFun {
static constexpr const char *Name = "kurtosis_pop";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns the excess kurtosis (Fishers definition) of all input values, without bias correction";
static constexpr const char *Example = "";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct ProductFun {
static constexpr const char *Name = "product";
static constexpr const char *Parameters = "arg";
static constexpr const char *Description = "Calculates the product of all tuples in arg.";
static constexpr const char *Example = "product(A)";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct SkewnessFun {
static constexpr const char *Name = "skewness";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns the skewness of all input values.";
static constexpr const char *Example = "skewness(A)";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct StringAggFun {
static constexpr const char *Name = "string_agg";
static constexpr const char *Parameters = "str,arg";
static constexpr const char *Description = "Concatenates the column string values with an optional separator.";
static constexpr const char *Example = "string_agg(A, '-')";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct GroupConcatFun {
using ALIAS = StringAggFun;
static constexpr const char *Name = "group_concat";
};
struct ListaggFun {
using ALIAS = StringAggFun;
static constexpr const char *Name = "listagg";
};
struct SumFun {
static constexpr const char *Name = "sum";
static constexpr const char *Parameters = "arg";
static constexpr const char *Description = "Calculates the sum value for all tuples in arg.";
static constexpr const char *Example = "sum(A)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct SumNoOverflowFun {
static constexpr const char *Name = "sum_no_overflow";
static constexpr const char *Parameters = "arg";
static constexpr const char *Description = "Internal only. Calculates the sum value for all tuples in arg without overflow checks.";
static constexpr const char *Example = "sum_no_overflow(A)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
} // namespace duckdb

View File

@@ -0,0 +1,99 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// duckdb/core_functions/aggregate/histogram_helpers.hpp
//
//
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/common/common.hpp"
#include "duckdb/function/create_sort_key.hpp"
namespace duckdb {
struct HistogramFunctor {
template <class T>
static void HistogramFinalize(T value, Vector &result, idx_t offset) {
FlatVector::GetData<T>(result)[offset] = value;
}
static bool CreateExtraState(idx_t count) {
return false;
}
static void PrepareData(Vector &input, idx_t count, bool &, UnifiedVectorFormat &result) {
input.ToUnifiedFormat(count, result);
}
template <class T>
static T ExtractValue(UnifiedVectorFormat &bin_data, idx_t offset, AggregateInputData &) {
return UnifiedVectorFormat::GetData<T>(bin_data)[bin_data.sel->get_index(offset)];
}
static bool RequiresExtract() {
return false;
}
};
struct HistogramStringFunctorBase {
template <class T>
static T ExtractValue(UnifiedVectorFormat &bin_data, idx_t offset, AggregateInputData &aggr_input) {
auto &input_str = UnifiedVectorFormat::GetData<T>(bin_data)[bin_data.sel->get_index(offset)];
if (input_str.IsInlined()) {
// inlined strings can be inserted directly
return input_str;
}
// if the string is not inlined we need to allocate space for it
auto input_str_size = UnsafeNumericCast<uint32_t>(input_str.GetSize());
auto string_memory = aggr_input.allocator.Allocate(input_str_size);
// copy over the string
memcpy(string_memory, input_str.GetData(), input_str_size);
// now insert it into the histogram
string_t histogram_str(char_ptr_cast(string_memory), input_str_size);
return histogram_str;
}
static bool RequiresExtract() {
return true;
}
};
struct HistogramStringFunctor : HistogramStringFunctorBase {
template <class T>
static void HistogramFinalize(T value, Vector &result, idx_t offset) {
FlatVector::GetData<string_t>(result)[offset] = StringVector::AddStringOrBlob(result, value);
}
static bool CreateExtraState(idx_t count) {
return false;
}
static void PrepareData(Vector &input, idx_t count, bool &, UnifiedVectorFormat &result) {
input.ToUnifiedFormat(count, result);
}
};
struct HistogramGenericFunctor : HistogramStringFunctorBase {
template <class T>
static void HistogramFinalize(T value, Vector &result, idx_t offset) {
CreateSortKeyHelpers::DecodeSortKey(value, result, offset,
OrderModifiers(OrderType::ASCENDING, OrderByNullType::NULLS_LAST));
}
static Vector CreateExtraState(idx_t count) {
return Vector(LogicalType::BLOB, count);
}
static void PrepareData(Vector &input, idx_t count, Vector &extra_state, UnifiedVectorFormat &result) {
OrderModifiers modifiers(OrderType::ASCENDING, OrderByNullType::NULLS_LAST);
CreateSortKeyHelpers::CreateSortKey(input, count, modifiers, extra_state);
input.Flatten(count);
extra_state.Flatten(count);
FlatVector::Validity(extra_state).Initialize(FlatVector::Validity(input));
extra_state.ToUnifiedFormat(count, result);
}
};
} // namespace duckdb

View File

@@ -0,0 +1,104 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/aggregate/holistic_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct ApproxQuantileFun {
static constexpr const char *Name = "approx_quantile";
static constexpr const char *Parameters = "x,pos";
static constexpr const char *Description = "Computes the approximate quantile using T-Digest.";
static constexpr const char *Example = "approx_quantile(x, 0.5)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct MadFun {
static constexpr const char *Name = "mad";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns the median absolute deviation for the values within x. NULL values are ignored. Temporal types return a positive INTERVAL. ";
static constexpr const char *Example = "mad(x)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct MedianFun {
static constexpr const char *Name = "median";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns the middle value of the set. NULL values are ignored. For even value counts, interpolate-able types (numeric, date/time) return the average of the two middle values. Non-interpolate-able types (everything else) return the lower of the two middle values.";
static constexpr const char *Example = "median(x)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct ModeFun {
static constexpr const char *Name = "mode";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns the most frequent value for the values within x. NULL values are ignored.";
static constexpr const char *Example = "";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct QuantileDiscFun {
static constexpr const char *Name = "quantile_disc";
static constexpr const char *Parameters = "x,pos";
static constexpr const char *Description = "Returns the exact quantile number between 0 and 1 . If pos is a LIST of FLOATs, then the result is a LIST of the corresponding exact quantiles.";
static constexpr const char *Example = "quantile_disc(x, 0.5)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct QuantileFun {
using ALIAS = QuantileDiscFun;
static constexpr const char *Name = "quantile";
};
struct QuantileContFun {
static constexpr const char *Name = "quantile_cont";
static constexpr const char *Parameters = "x,pos";
static constexpr const char *Description = "Returns the interpolated quantile number between 0 and 1 . If pos is a LIST of FLOATs, then the result is a LIST of the corresponding interpolated quantiles. ";
static constexpr const char *Example = "quantile_cont(x, 0.5)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct ReservoirQuantileFun {
static constexpr const char *Name = "reservoir_quantile";
static constexpr const char *Parameters = "x,quantile,sample_size";
static constexpr const char *Description = "Gives the approximate quantile using reservoir sampling, the sample size is optional and uses 8192 as a default size.";
static constexpr const char *Example = "reservoir_quantile(A, 0.5, 1024)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
};
struct ApproxTopKFun {
static constexpr const char *Name = "approx_top_k";
static constexpr const char *Parameters = "val,k";
static constexpr const char *Description = "Finds the k approximately most occurring values in the data set";
static constexpr const char *Example = "approx_top_k(x, 5)";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
} // namespace duckdb

View File

@@ -0,0 +1,56 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/aggregate/nested_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct HistogramFun {
static constexpr const char *Name = "histogram";
static constexpr const char *Parameters = "arg";
static constexpr const char *Description = "Returns a LIST of STRUCTs with the fields bucket and count.";
static constexpr const char *Example = "histogram(A)";
static constexpr const char *Categories = "";
static AggregateFunctionSet GetFunctions();
static AggregateFunction GetHistogramUnorderedMap(LogicalType &type);
static AggregateFunction BinnedHistogramFunction();
};
struct HistogramExactFun {
static constexpr const char *Name = "histogram_exact";
static constexpr const char *Parameters = "arg,bins";
static constexpr const char *Description = "Returns a LIST of STRUCTs with the fields bucket and count matching the buckets exactly.";
static constexpr const char *Example = "histogram_exact(A, [0, 1, 2])";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct ListFun {
static constexpr const char *Name = "list";
static constexpr const char *Parameters = "arg";
static constexpr const char *Description = "Returns a LIST containing all the values of a column.";
static constexpr const char *Example = "list(A)";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct ArrayAggFun {
using ALIAS = ListFun;
static constexpr const char *Name = "array_agg";
};
} // namespace duckdb

View File

@@ -0,0 +1,65 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// duckdb/core_functions/aggregate/quantile_helpers.hpp
//
//
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/common/common.hpp"
#include "duckdb/common/enums/quantile_enum.hpp"
#include "core_functions/aggregate/holistic_functions.hpp"
namespace duckdb {
// Avoid using naked Values in inner loops...
struct QuantileValue {
explicit QuantileValue(const Value &v) : val(v), dbl(v.GetValue<double>()) {
const auto &type = val.type();
switch (type.id()) {
case LogicalTypeId::DECIMAL: {
integral = IntegralValue::Get(v);
scaling = Hugeint::POWERS_OF_TEN[DecimalType::GetScale(type)];
break;
}
default:
break;
}
}
Value val;
// DOUBLE
double dbl;
// DECIMAL
hugeint_t integral;
hugeint_t scaling;
inline bool operator==(const QuantileValue &other) const {
return val == other.val;
}
};
struct QuantileBindData : public FunctionData {
QuantileBindData();
explicit QuantileBindData(const Value &quantile_p);
explicit QuantileBindData(const vector<Value> &quantiles_p);
QuantileBindData(const QuantileBindData &other);
unique_ptr<FunctionData> Copy() const override;
bool Equals(const FunctionData &other_p) const override;
static void Serialize(Serializer &serializer, const optional_ptr<FunctionData> bind_data_p,
const AggregateFunction &function);
static unique_ptr<FunctionData> Deserialize(Deserializer &deserializer, AggregateFunction &function);
vector<QuantileValue> quantiles;
vector<idx_t> order;
bool desc;
};
} // namespace duckdb

View File

@@ -0,0 +1,414 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// duckdb/core_functions/aggregate/quantile_sort_tree.hpp
//
//
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/common/types/column/column_data_collection.hpp"
#include "core_functions/aggregate/quantile_helpers.hpp"
#include "duckdb/common/operator/cast_operators.hpp"
#include "duckdb/common/operator/interpolate.hpp"
#include "duckdb/common/operator/multiply.hpp"
#include "duckdb/planner/expression/bound_constant_expression.hpp"
#include "duckdb/function/window/window_index_tree.hpp"
#include <algorithm>
#include <stdlib.h>
#include <utility>
namespace duckdb {
// Paged access
template <typename INPUT_TYPE>
struct QuantileCursor {
explicit QuantileCursor(const WindowPartitionInput &partition) : inputs(*partition.inputs) {
D_ASSERT(partition.column_ids.size() == 1);
inputs.InitializeScan(scan, partition.column_ids);
inputs.InitializeScanChunk(scan, page);
D_ASSERT(partition.all_valid.size() == 1);
all_valid = partition.all_valid[0];
}
inline sel_t RowOffset(idx_t row_idx) const {
D_ASSERT(RowIsVisible(row_idx));
return UnsafeNumericCast<sel_t>(row_idx - scan.current_row_index);
}
inline bool RowIsVisible(idx_t row_idx) const {
return (row_idx < scan.next_row_index && scan.current_row_index <= row_idx);
}
inline idx_t Seek(idx_t row_idx) {
if (!RowIsVisible(row_idx)) {
inputs.Seek(row_idx, scan, page);
data = FlatVector::GetData<INPUT_TYPE>(page.data[0]);
validity = &FlatVector::Validity(page.data[0]);
}
return RowOffset(row_idx);
}
inline const INPUT_TYPE &operator[](idx_t row_idx) {
const auto offset = Seek(row_idx);
return data[offset];
}
inline bool RowIsValid(idx_t row_idx) {
const auto offset = Seek(row_idx);
return validity->RowIsValid(offset);
}
inline bool AllValid() {
return all_valid;
}
//! Windowed paging
const ColumnDataCollection &inputs;
//! The state used for reading the collection on this thread
ColumnDataScanState scan;
//! The data chunk paged into into
DataChunk page;
//! The data pointer
const INPUT_TYPE *data = nullptr;
//! The validity mask
const ValidityMask *validity = nullptr;
//! Paged chunks do not track this but it is really necessary for performance
bool all_valid;
};
// Direct access
template <typename T>
struct QuantileDirect {
using INPUT_TYPE = T;
using RESULT_TYPE = T;
inline const INPUT_TYPE &operator()(const INPUT_TYPE &x) const {
return x; // NOLINT
}
};
// Indirect access
template <typename T>
struct QuantileIndirect {
using INPUT_TYPE = idx_t;
using RESULT_TYPE = T;
using CURSOR = QuantileCursor<RESULT_TYPE>;
CURSOR &data;
explicit QuantileIndirect(CURSOR &data_p) : data(data_p) {
}
inline RESULT_TYPE operator()(const INPUT_TYPE &input) const {
return data[input];
}
};
// Composed access
template <typename OUTER, typename INNER>
struct QuantileComposed {
using INPUT_TYPE = typename INNER::INPUT_TYPE;
using RESULT_TYPE = typename OUTER::RESULT_TYPE;
const OUTER &outer;
const INNER &inner;
explicit QuantileComposed(const OUTER &outer_p, const INNER &inner_p) : outer(outer_p), inner(inner_p) {
}
inline RESULT_TYPE operator()(const idx_t &input) const {
return outer(inner(input));
}
};
// Accessed comparison
template <typename ACCESSOR>
struct QuantileCompare {
using INPUT_TYPE = typename ACCESSOR::INPUT_TYPE;
const ACCESSOR &accessor_l;
const ACCESSOR &accessor_r;
const bool desc;
// Single cursor for linear operations
explicit QuantileCompare(const ACCESSOR &accessor, bool desc_p)
: accessor_l(accessor), accessor_r(accessor), desc(desc_p) {
}
// Independent cursors for sorting
explicit QuantileCompare(const ACCESSOR &accessor_l, const ACCESSOR &accessor_r, bool desc_p)
: accessor_l(accessor_l), accessor_r(accessor_r), desc(desc_p) {
}
inline bool operator()(const INPUT_TYPE &lhs, const INPUT_TYPE &rhs) const {
const auto lval = accessor_l(lhs);
const auto rval = accessor_r(rhs);
return desc ? LessThan::Operation(rval, lval) : LessThan::Operation(lval, rval);
}
};
struct QuantileCast {
template <class INPUT_TYPE, class TARGET_TYPE>
static inline TARGET_TYPE Operation(const INPUT_TYPE &src, Vector &result) {
return Cast::Operation<INPUT_TYPE, TARGET_TYPE>(src);
}
};
template <>
interval_t QuantileCast::Operation(const dtime_t &src, Vector &result);
template <>
string_t QuantileCast::Operation(const string_t &src, Vector &result);
// Continuous interpolation
template <bool DISCRETE>
struct QuantileInterpolator {
QuantileInterpolator(const QuantileValue &q, const idx_t n_p, const bool desc_p)
: desc(desc_p), RN((double)(n_p - 1) * q.dbl), FRN(ExactNumericCast<idx_t>(floor(RN))),
CRN(ExactNumericCast<idx_t>(ceil(RN))), begin(0), end(n_p) {
}
template <class INPUT_TYPE, class TARGET_TYPE, typename ACCESSOR = QuantileDirect<INPUT_TYPE>>
TARGET_TYPE Interpolate(INPUT_TYPE lidx, INPUT_TYPE hidx, Vector &result, const ACCESSOR &accessor) const {
using ACCESS_TYPE = typename ACCESSOR::RESULT_TYPE;
if (lidx == hidx) {
return QuantileCast::Operation<ACCESS_TYPE, TARGET_TYPE>(accessor(lidx), result);
} else {
auto lo = QuantileCast::Operation<ACCESS_TYPE, TARGET_TYPE>(accessor(lidx), result);
auto hi = QuantileCast::Operation<ACCESS_TYPE, TARGET_TYPE>(accessor(hidx), result);
return InterpolateOperator::Operation<TARGET_TYPE>(lo, RN - FRN, hi);
}
}
template <class INPUT_TYPE, class TARGET_TYPE, typename ACCESSOR = QuantileDirect<INPUT_TYPE>>
TARGET_TYPE Operation(INPUT_TYPE *v_t, Vector &result, const ACCESSOR &accessor = ACCESSOR()) const {
using ACCESS_TYPE = typename ACCESSOR::RESULT_TYPE;
QuantileCompare<ACCESSOR> comp(accessor, desc);
if (CRN == FRN) {
std::nth_element(v_t + begin, v_t + FRN, v_t + end, comp);
return QuantileCast::Operation<ACCESS_TYPE, TARGET_TYPE>(accessor(v_t[FRN]), result);
} else {
std::nth_element(v_t + begin, v_t + FRN, v_t + end, comp);
std::nth_element(v_t + FRN, v_t + CRN, v_t + end, comp);
auto lo = QuantileCast::Operation<ACCESS_TYPE, TARGET_TYPE>(accessor(v_t[FRN]), result);
auto hi = QuantileCast::Operation<ACCESS_TYPE, TARGET_TYPE>(accessor(v_t[CRN]), result);
return InterpolateOperator::Operation<TARGET_TYPE>(lo, RN - FRN, hi);
}
}
template <class INPUT_TYPE, class TARGET_TYPE>
inline TARGET_TYPE Extract(const INPUT_TYPE *dest, Vector &result) const {
if (CRN == FRN) {
return QuantileCast::Operation<INPUT_TYPE, TARGET_TYPE>(dest[0], result);
} else {
auto lo = QuantileCast::Operation<INPUT_TYPE, TARGET_TYPE>(dest[0], result);
auto hi = QuantileCast::Operation<INPUT_TYPE, TARGET_TYPE>(dest[1], result);
return InterpolateOperator::Operation<TARGET_TYPE>(lo, RN - FRN, hi);
}
}
const bool desc;
const double RN;
const idx_t FRN;
const idx_t CRN;
idx_t begin;
idx_t end;
};
// Discrete "interpolation"
template <>
struct QuantileInterpolator<true> {
static inline idx_t Index(const QuantileValue &q, const idx_t n) {
idx_t floored;
switch (q.val.type().id()) {
case LogicalTypeId::DECIMAL: {
// Integer arithmetic for accuracy
const auto integral = q.integral;
const auto scaling = q.scaling;
const auto scaled_q =
DecimalMultiplyOverflowCheck::Operation<hugeint_t, hugeint_t, hugeint_t>(Hugeint::Convert(n), integral);
const auto scaled_n =
DecimalMultiplyOverflowCheck::Operation<hugeint_t, hugeint_t, hugeint_t>(Hugeint::Convert(n), scaling);
floored = Cast::Operation<hugeint_t, idx_t>((scaled_n - scaled_q) / scaling);
break;
}
default:
const auto scaled_q = double(n) * q.dbl;
floored = LossyNumericCast<idx_t>(floor(double(n) - scaled_q));
break;
}
return MaxValue<idx_t>(1, n - floored) - 1;
}
QuantileInterpolator(const QuantileValue &q, const idx_t n_p, bool desc_p)
: desc(desc_p), FRN(Index(q, n_p)), CRN(FRN), begin(0), end(n_p) {
}
template <class INPUT_TYPE, class TARGET_TYPE, typename ACCESSOR = QuantileDirect<INPUT_TYPE>>
TARGET_TYPE Interpolate(INPUT_TYPE lidx, INPUT_TYPE hidx, Vector &result, const ACCESSOR &accessor) const {
using ACCESS_TYPE = typename ACCESSOR::RESULT_TYPE;
return QuantileCast::Operation<ACCESS_TYPE, TARGET_TYPE>(accessor(lidx), result);
}
template <class INPUT_TYPE, typename ACCESSOR = QuantileDirect<INPUT_TYPE>>
typename ACCESSOR::RESULT_TYPE InterpolateInternal(INPUT_TYPE *v_t, const ACCESSOR &accessor = ACCESSOR()) const {
QuantileCompare<ACCESSOR> comp(accessor, desc);
std::nth_element(v_t + begin, v_t + FRN, v_t + end, comp);
return accessor(v_t[FRN]);
}
template <class INPUT_TYPE, class TARGET_TYPE, typename ACCESSOR = QuantileDirect<INPUT_TYPE>>
TARGET_TYPE Operation(INPUT_TYPE *v_t, Vector &result, const ACCESSOR &accessor = ACCESSOR()) const {
using ACCESS_TYPE = typename ACCESSOR::RESULT_TYPE;
return QuantileCast::Operation<ACCESS_TYPE, TARGET_TYPE>(InterpolateInternal(v_t, accessor), result);
}
template <class INPUT_TYPE, class TARGET_TYPE>
TARGET_TYPE Extract(const INPUT_TYPE *dest, Vector &result) const {
return QuantileCast::Operation<INPUT_TYPE, TARGET_TYPE>(dest[0], result);
}
const bool desc;
const idx_t FRN;
const idx_t CRN;
idx_t begin;
idx_t end;
};
template <typename INPUT_TYPE>
struct QuantileIncluded {
using CURSOR_TYPE = QuantileCursor<INPUT_TYPE>;
inline explicit QuantileIncluded(const ValidityMask &fmask_p, CURSOR_TYPE &dmask_p)
: fmask(fmask_p), dmask(dmask_p) {
}
inline bool operator()(const idx_t &idx) {
return fmask.RowIsValid(idx) && dmask.RowIsValid(idx);
}
inline bool AllValid() {
return fmask.AllValid() && dmask.AllValid();
}
const ValidityMask &fmask;
CURSOR_TYPE &dmask;
};
struct QuantileSortTree {
unique_ptr<WindowIndexTree> index_tree;
QuantileSortTree(AggregateInputData &aggr_input_data, const WindowPartitionInput &partition) {
// TODO: Two pass parallel sorting using Build
auto &inputs = *partition.inputs;
auto &interrupt = partition.interrupt_state;
ColumnDataScanState scan;
DataChunk sort;
inputs.InitializeScan(scan, partition.column_ids);
inputs.InitializeScanChunk(scan, sort);
// Sort on the single argument
auto &bind_data = aggr_input_data.bind_data->Cast<QuantileBindData>();
auto order_expr = make_uniq<BoundConstantExpression>(Value(sort.GetTypes()[0]));
auto order_type = bind_data.desc ? OrderType::DESCENDING : OrderType::ASCENDING;
BoundOrderModifier order_bys;
order_bys.orders.emplace_back(BoundOrderByNode(order_type, OrderByNullType::NULLS_LAST, std::move(order_expr)));
vector<column_t> sort_idx(1, 0);
const auto count = partition.count;
index_tree = make_uniq<WindowIndexTree>(partition.context.client, order_bys, sort_idx, count);
auto index_state = index_tree->GetLocalState(partition.context);
auto &local_state = index_state->Cast<WindowIndexTreeLocalState>();
// Build the indirection array by scanning the valid indices
const auto &filter_mask = partition.filter_mask;
SelectionVector filter_sel(STANDARD_VECTOR_SIZE);
while (inputs.Scan(scan, sort)) {
const auto row_idx = scan.current_row_index;
if (!filter_mask.AllValid() || !partition.all_valid[0]) {
auto &key = sort.data[0];
auto &validity = FlatVector::Validity(key);
idx_t filtered = 0;
for (sel_t i = 0; i < sort.size(); ++i) {
if (filter_mask.RowIsValid(i + row_idx) && validity.RowIsValid(i)) {
filter_sel[filtered++] = i;
}
}
local_state.Sink(partition.context, sort, row_idx, filter_sel, filtered, interrupt);
} else {
local_state.Sink(partition.context, sort, row_idx, nullptr, 0, interrupt);
}
}
local_state.Finalize(partition.context, interrupt);
}
inline idx_t SelectNth(const SubFrames &frames, size_t n) const {
return index_tree->SelectNth(frames, n).first;
}
template <typename INPUT_TYPE, typename RESULT_TYPE, bool DISCRETE>
RESULT_TYPE WindowScalar(QuantileCursor<INPUT_TYPE> &data, const SubFrames &frames, const idx_t n, Vector &result,
const QuantileValue &q) {
D_ASSERT(n > 0);
// Thread safe and idempotent.
index_tree->Build();
// Find the interpolated indicies within the frame
QuantileInterpolator<DISCRETE> interp(q, n, false);
const auto lo_data = SelectNth(frames, interp.FRN);
auto hi_data = lo_data;
if (interp.CRN != interp.FRN) {
hi_data = SelectNth(frames, interp.CRN);
}
// Interpolate indirectly
using ID = QuantileIndirect<INPUT_TYPE>;
ID indirect(data);
return interp.template Interpolate<idx_t, RESULT_TYPE, ID>(lo_data, hi_data, result, indirect);
}
template <typename INPUT_TYPE, typename CHILD_TYPE, bool DISCRETE>
void WindowList(QuantileCursor<INPUT_TYPE> &data, const SubFrames &frames, const idx_t n, Vector &list,
const idx_t lidx, const QuantileBindData &bind_data) {
D_ASSERT(n > 0);
// Thread safe and idempotent.
index_tree->Build();
// Result is a constant LIST<CHILD_TYPE> with a fixed length
auto ldata = FlatVector::GetData<list_entry_t>(list);
auto &lentry = ldata[lidx];
lentry.offset = ListVector::GetListSize(list);
lentry.length = bind_data.quantiles.size();
ListVector::Reserve(list, lentry.offset + lentry.length);
ListVector::SetListSize(list, lentry.offset + lentry.length);
auto &result = ListVector::GetEntry(list);
auto rdata = FlatVector::GetData<CHILD_TYPE>(result);
using ID = QuantileIndirect<INPUT_TYPE>;
ID indirect(data);
for (const auto &q : bind_data.order) {
const auto &quantile = bind_data.quantiles[q];
QuantileInterpolator<DISCRETE> interp(quantile, n, false);
const auto lo_data = SelectNth(frames, interp.FRN);
auto hi_data = lo_data;
if (interp.CRN != interp.FRN) {
hi_data = SelectNth(frames, interp.CRN);
}
// Interpolate indirectly
rdata[lentry.offset + q] =
interp.template Interpolate<idx_t, CHILD_TYPE, ID>(lo_data, hi_data, result, indirect);
}
}
};
} // namespace duckdb

View File

@@ -0,0 +1,310 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// duckdb/core_functions/aggregate/quantile_state.hpp
//
//
//===----------------------------------------------------------------------===//
#pragma once
#include "core_functions/aggregate/quantile_sort_tree.hpp"
#include "SkipList.h"
namespace duckdb {
struct QuantileOperation {
template <class STATE>
static void Initialize(STATE &state) {
new (&state) STATE();
}
template <class INPUT_TYPE, class STATE, class OP>
static void ConstantOperation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &unary_input,
idx_t count) {
for (idx_t i = 0; i < count; i++) {
Operation<INPUT_TYPE, STATE, OP>(state, input, unary_input);
}
}
template <class INPUT_TYPE, class STATE, class OP>
static void Operation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &aggr_input) {
state.AddElement(input, aggr_input.input);
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &) {
if (source.v.empty()) {
return;
}
target.v.insert(target.v.end(), source.v.begin(), source.v.end());
}
template <class STATE>
static void Destroy(STATE &state, AggregateInputData &) {
state.~STATE();
}
static bool IgnoreNull() {
return true;
}
template <class STATE, class INPUT_TYPE>
static void WindowInit(AggregateInputData &aggr_input_data, const WindowPartitionInput &partition,
data_ptr_t g_state) {
D_ASSERT(partition.inputs);
const auto &stats = partition.stats;
// If frames overlap significantly, then use local skip lists.
if (stats[0].end <= stats[1].begin) {
// Frames can overlap
const auto overlap = double(stats[1].begin - stats[0].end);
const auto cover = double(stats[1].end - stats[0].begin);
const auto ratio = overlap / cover;
if (ratio > .75) {
return;
}
}
// Build the tree
auto &state = *reinterpret_cast<STATE *>(g_state);
auto &window_state = state.GetOrCreateWindowState();
window_state.qst = make_uniq<QuantileSortTree>(aggr_input_data, partition);
}
template <class INPUT_TYPE>
static idx_t FrameSize(QuantileIncluded<INPUT_TYPE> &included, const SubFrames &frames) {
// Count the number of valid values
idx_t n = 0;
if (included.AllValid()) {
for (const auto &frame : frames) {
n += frame.end - frame.start;
}
} else {
// NULLs or FILTERed values,
for (const auto &frame : frames) {
for (auto i = frame.start; i < frame.end; ++i) {
n += included(i);
}
}
}
return n;
}
};
template <class T>
struct SkipLess {
inline bool operator()(const T &lhi, const T &rhi) const {
return lhi.second < rhi.second;
}
};
template <typename INPUT_TYPE>
struct WindowQuantileState {
// Windowed Quantile merge sort trees
unique_ptr<QuantileSortTree> qst;
// Windowed Quantile skip lists
using SkipType = pair<idx_t, INPUT_TYPE>;
using SkipListType = duckdb_skiplistlib::skip_list::HeadNode<SkipType, SkipLess<SkipType>>;
SubFrames prevs;
unique_ptr<SkipListType> s;
mutable vector<SkipType> skips;
// Windowed MAD indirection
idx_t count;
vector<idx_t> m;
using IncludedType = QuantileIncluded<INPUT_TYPE>;
using CursorType = QuantileCursor<INPUT_TYPE>;
WindowQuantileState() : count(0) {
}
inline void SetCount(size_t count_p) {
count = count_p;
if (count >= m.size()) {
m.resize(count);
}
}
inline SkipListType &GetSkipList(bool reset = false) {
if (reset || !s) {
s.reset();
s = make_uniq<SkipListType>();
}
return *s;
}
struct SkipListUpdater {
SkipListType &skip;
CursorType &data;
IncludedType &included;
inline SkipListUpdater(SkipListType &skip, CursorType &data, IncludedType &included)
: skip(skip), data(data), included(included) {
}
inline void Neither(idx_t begin, idx_t end) {
}
inline void Left(idx_t begin, idx_t end) {
for (; begin < end; ++begin) {
if (included(begin)) {
skip.remove(SkipType(begin, data[begin]));
}
}
}
inline void Right(idx_t begin, idx_t end) {
for (; begin < end; ++begin) {
if (included(begin)) {
skip.insert(SkipType(begin, data[begin]));
}
}
}
inline void Both(idx_t begin, idx_t end) {
}
};
void UpdateSkip(CursorType &data, const SubFrames &frames, IncludedType &included) {
// No overlap, or no data
if (!s || prevs.back().end <= frames.front().start || frames.back().end <= prevs.front().start) {
auto &skip = GetSkipList(true);
for (const auto &frame : frames) {
for (auto i = frame.start; i < frame.end; ++i) {
if (included(i)) {
skip.insert(SkipType(i, data[i]));
}
}
}
} else {
auto &skip = GetSkipList();
SkipListUpdater updater(skip, data, included);
AggregateExecutor::IntersectFrames(prevs, frames, updater);
}
}
bool HasTree() const {
return qst.get();
}
template <typename RESULT_TYPE, bool DISCRETE>
RESULT_TYPE WindowScalar(CursorType &data, const SubFrames &frames, const idx_t n, Vector &result,
const QuantileValue &q) const {
D_ASSERT(n > 0);
if (qst) {
return qst->WindowScalar<INPUT_TYPE, RESULT_TYPE, DISCRETE>(data, frames, n, result, q);
} else if (s) {
// Find the position(s) needed
try {
QuantileInterpolator<DISCRETE> interp(q, s->size(), false);
s->at(interp.FRN, interp.CRN - interp.FRN + 1, skips);
array<INPUT_TYPE, 2> dest;
dest[0] = skips[0].second;
if (skips.size() > 1) {
dest[1] = skips[1].second;
} else {
// Avoid UMA
dest[1] = skips[0].second;
}
return interp.template Extract<INPUT_TYPE, RESULT_TYPE>(dest.data(), result);
} catch (const duckdb_skiplistlib::skip_list::IndexError &idx_err) {
throw InternalException(idx_err.message());
}
} else {
throw InternalException("No accelerator for scalar QUANTILE");
}
}
template <typename CHILD_TYPE, bool DISCRETE>
void WindowList(CursorType &data, const SubFrames &frames, const idx_t n, Vector &list, const idx_t lidx,
const QuantileBindData &bind_data) const {
D_ASSERT(n > 0);
// Result is a constant LIST<CHILD_TYPE> with a fixed length
auto ldata = FlatVector::GetData<list_entry_t>(list);
auto &lentry = ldata[lidx];
lentry.offset = ListVector::GetListSize(list);
lentry.length = bind_data.quantiles.size();
ListVector::Reserve(list, lentry.offset + lentry.length);
ListVector::SetListSize(list, lentry.offset + lentry.length);
auto &result = ListVector::GetEntry(list);
auto rdata = FlatVector::GetData<CHILD_TYPE>(result);
for (const auto &q : bind_data.order) {
const auto &quantile = bind_data.quantiles[q];
rdata[lentry.offset + q] = WindowScalar<CHILD_TYPE, DISCRETE>(data, frames, n, result, quantile);
}
}
};
struct QuantileStandardType {
template <class T>
static T Operation(T input, AggregateInputData &) {
return input;
}
};
struct QuantileStringType {
template <class T>
static T Operation(T input, AggregateInputData &input_data) {
if (input.IsInlined()) {
return input;
}
auto string_data = input_data.allocator.Allocate(input.GetSize());
memcpy(string_data, input.GetData(), input.GetSize());
return string_t(char_ptr_cast(string_data), UnsafeNumericCast<uint32_t>(input.GetSize()));
}
};
template <typename INPUT_TYPE, class TYPE_OP>
struct QuantileState {
using InputType = INPUT_TYPE;
using CursorType = QuantileCursor<INPUT_TYPE>;
// Regular aggregation
vector<INPUT_TYPE> v;
// Window Quantile State
unique_ptr<WindowQuantileState<INPUT_TYPE>> window_state;
unique_ptr<CursorType> window_cursor;
void AddElement(INPUT_TYPE element, AggregateInputData &aggr_input) {
v.emplace_back(TYPE_OP::Operation(element, aggr_input));
}
bool HasTree() const {
return window_state && window_state->HasTree();
}
WindowQuantileState<INPUT_TYPE> &GetOrCreateWindowState() {
if (!window_state) {
window_state = make_uniq<WindowQuantileState<INPUT_TYPE>>();
}
return *window_state;
}
WindowQuantileState<INPUT_TYPE> &GetWindowState() {
return *window_state;
}
const WindowQuantileState<INPUT_TYPE> &GetWindowState() const {
return *window_state;
}
CursorType &GetOrCreateWindowCursor(const WindowPartitionInput &partition) {
if (!window_cursor) {
window_cursor = make_uniq<CursorType>(partition);
}
return *window_cursor;
}
CursorType &GetWindowCursor() {
return *window_cursor;
}
const CursorType &GetWindowCursor() const {
return *window_cursor;
}
};
} // namespace duckdb

View File

@@ -0,0 +1,42 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// duckdb/core_functions/aggregate/regression/regr_count.hpp
//
//
//===----------------------------------------------------------------------===//
// REGR_COUNT(y, x)
#pragma once
#include "duckdb/function/aggregate_function.hpp"
#include "core_functions/aggregate/algebraic/covar.hpp"
#include "core_functions/aggregate/algebraic/stddev.hpp"
namespace duckdb {
struct RegrCountFunction {
template <class STATE>
static void Initialize(STATE &state) {
state = 0;
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &) {
target += source;
}
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
target = static_cast<T>(state);
}
static bool IgnoreNull() {
return true;
}
template <class A_TYPE, class B_TYPE, class STATE, class OP>
static void Operation(STATE &state, const A_TYPE &, const B_TYPE &, AggregateBinaryInput &) {
state += 1;
}
};
} // namespace duckdb

View File

@@ -0,0 +1,57 @@
// REGR_SLOPE(y, x)
// Returns the slope of the linear regression line for non-null pairs in a group.
// It is computed for non-null pairs using the following formula:
// COVAR_POP(x,y) / VAR_POP(x)
//! Input : Any numeric type
//! Output : Double
#pragma once
#include "core_functions/aggregate/algebraic/stddev.hpp"
#include "core_functions/aggregate/algebraic/covar.hpp"
namespace duckdb {
struct RegrSlopeState {
CovarState cov_pop;
StddevState var_pop;
};
struct RegrSlopeOperation {
template <class STATE>
static void Initialize(STATE &state) {
CovarOperation::Initialize<CovarState>(state.cov_pop);
STDDevBaseOperation::Initialize<StddevState>(state.var_pop);
}
template <class A_TYPE, class B_TYPE, class STATE, class OP>
static void Operation(STATE &state, const A_TYPE &y, const B_TYPE &x, AggregateBinaryInput &idata) {
CovarOperation::Operation<A_TYPE, B_TYPE, CovarState, OP>(state.cov_pop, y, x, idata);
STDDevBaseOperation::Execute<A_TYPE, StddevState>(state.var_pop, x);
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &aggr_input_data) {
CovarOperation::Combine<CovarState, OP>(source.cov_pop, target.cov_pop, aggr_input_data);
STDDevBaseOperation::Combine<StddevState, OP>(source.var_pop, target.var_pop, aggr_input_data);
}
template <class T, class STATE>
static void Finalize(STATE &state, T &target, AggregateFinalizeData &finalize_data) {
if (state.cov_pop.count == 0 || state.var_pop.count == 0) {
finalize_data.ReturnNull();
} else {
auto cov = state.cov_pop.co_moment / state.cov_pop.count;
auto var_pop = state.var_pop.count > 1 ? (state.var_pop.dsquared / state.var_pop.count) : 0;
if (!Value::DoubleIsFinite(var_pop)) {
throw OutOfRangeException("VARPOP is out of range!");
}
target = var_pop != 0 ? cov / var_pop : NAN;
}
}
static bool IgnoreNull() {
return true;
}
};
} // namespace duckdb

View File

@@ -0,0 +1,108 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/aggregate/regression_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct RegrAvgxFun {
static constexpr const char *Name = "regr_avgx";
static constexpr const char *Parameters = "y,x";
static constexpr const char *Description = "Returns the average of the independent variable for non-NULL pairs in a group, where x is the independent variable and y is the dependent variable.";
static constexpr const char *Example = "";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct RegrAvgyFun {
static constexpr const char *Name = "regr_avgy";
static constexpr const char *Parameters = "y,x";
static constexpr const char *Description = "Returns the average of the dependent variable for non-NULL pairs in a group, where x is the independent variable and y is the dependent variable.";
static constexpr const char *Example = "";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct RegrCountFun {
static constexpr const char *Name = "regr_count";
static constexpr const char *Parameters = "y,x";
static constexpr const char *Description = "Returns the number of non-NULL number pairs in a group.";
static constexpr const char *Example = "(SUM(x*y) - SUM(x) * SUM(y) / COUNT(*)) / COUNT(*)";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct RegrInterceptFun {
static constexpr const char *Name = "regr_intercept";
static constexpr const char *Parameters = "y,x";
static constexpr const char *Description = "Returns the intercept of the univariate linear regression line for non-NULL pairs in a group.";
static constexpr const char *Example = "AVG(y)-REGR_SLOPE(y, x)*AVG(x)";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct RegrR2Fun {
static constexpr const char *Name = "regr_r2";
static constexpr const char *Parameters = "y,x";
static constexpr const char *Description = "Returns the coefficient of determination for non-NULL pairs in a group.";
static constexpr const char *Example = "";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct RegrSlopeFun {
static constexpr const char *Name = "regr_slope";
static constexpr const char *Parameters = "y,x";
static constexpr const char *Description = "Returns the slope of the linear regression line for non-NULL pairs in a group.";
static constexpr const char *Example = "COVAR_POP(x, y) / VAR_POP(x)";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct RegrSXXFun {
static constexpr const char *Name = "regr_sxx";
static constexpr const char *Parameters = "y,x";
static constexpr const char *Description = "";
static constexpr const char *Example = "REGR_COUNT(y, x) * VAR_POP(x)";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct RegrSXYFun {
static constexpr const char *Name = "regr_sxy";
static constexpr const char *Parameters = "y,x";
static constexpr const char *Description = "Returns the population covariance of input values";
static constexpr const char *Example = "REGR_COUNT(y, x) * COVAR_POP(y, x)";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
struct RegrSYYFun {
static constexpr const char *Name = "regr_syy";
static constexpr const char *Parameters = "y,x";
static constexpr const char *Description = "";
static constexpr const char *Example = "REGR_COUNT(y, x) * VAR_POP(y)";
static constexpr const char *Categories = "";
static AggregateFunction GetFunction();
};
} // namespace duckdb

View File

@@ -0,0 +1,191 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// duckdb/core_functions/aggregate/sum_helpers.hpp
//
//
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/common/types/hugeint.hpp"
#include "duckdb/common/operator/add.hpp"
#include "duckdb/common/operator/multiply.hpp"
#include "duckdb/function/aggregate_state.hpp"
#include "duckdb/common/operator/cast_operators.hpp"
namespace duckdb {
static inline void KahanAddInternal(double input, double &summed, double &err) {
double diff = input - err;
double newval = summed + diff;
err = (newval - summed) - diff;
summed = newval;
}
template <class T>
struct SumState {
bool isset;
T value;
void Initialize() {
this->isset = false;
this->value = 0;
}
void Combine(const SumState<T> &other) {
this->isset = other.isset || this->isset;
this->value += other.value;
}
};
struct KahanSumState {
bool isset;
double value;
double err;
void Initialize() {
this->isset = false;
this->err = 0.0;
}
void Combine(const KahanSumState &other) {
this->isset = other.isset || this->isset;
KahanAddInternal(other.value, this->value, this->err);
KahanAddInternal(other.err, this->value, this->err);
}
};
struct RegularAdd {
template <class STATE, class T>
static void AddNumber(STATE &state, T input) {
state.value += input;
}
template <class STATE, class T>
static void AddConstant(STATE &state, T input, idx_t count) {
state.value += input * int64_t(count);
}
};
struct HugeintAdd {
template <class STATE, class T>
static void AddNumber(STATE &state, T input) {
state.value = Hugeint::Add(state.value, input);
}
template <class STATE, class T>
static void AddConstant(STATE &state, T input, idx_t count) {
AddNumber(state, Hugeint::Multiply(input, UnsafeNumericCast<int64_t>(count)));
}
};
struct IntervalAdd {
template <class STATE, class T>
static void AddNumber(STATE &state, T input) {
state.value = AddOperator::Operation<interval_t, interval_t, interval_t>(state.value, input);
}
template <class STATE, class T>
static void AddConstant(STATE &state, T input, idx_t count) {
const auto count64 = Cast::Operation<idx_t, int64_t>(count);
input = MultiplyOperator::Operation<interval_t, int64_t, interval_t>(input, count64);
state.value = AddOperator::Operation<interval_t, interval_t, interval_t>(state.value, input);
}
};
struct KahanAdd {
template <class STATE, class T>
static void AddNumber(STATE &state, T input) {
KahanAddInternal(input, state.value, state.err);
}
template <class STATE, class T>
static void AddConstant(STATE &state, T input, idx_t count) {
KahanAddInternal(input * count, state.value, state.err);
}
};
struct AddToHugeint {
static void AddValue(hugeint_t &result, uint64_t value, int positive) {
// integer summation taken from Tim Gubner et al. - Efficient Query Processing
// with Optimistically Compressed Hash Tables & Strings in the USSR
// add the value to the lower part of the hugeint
result.lower += value;
// now handle overflows
int overflow = result.lower < value;
// we consider two situations:
// (1) input[idx] is positive, and current value is lower than value: overflow
// (2) input[idx] is negative, and current value is higher than value: underflow
if (!(overflow ^ positive)) {
// in the case of an overflow or underflow we either increment or decrement the upper base
// positive: +1, negative: -1
result.upper += -1 + 2 * positive;
}
}
template <class STATE, class T>
static void AddNumber(STATE &state, T input) {
AddValue(state.value, uint64_t(input), input >= 0);
}
template <class STATE, class T>
static void AddConstant(STATE &state, T input, idx_t count) {
// add a constant X number of times
// fast path: check if value * count fits into a uint64_t
// note that we check if value * VECTOR_SIZE fits in a uint64_t to avoid having to actually do a division
// this is still a pretty high number (18014398509481984) so most positive numbers will fit
if (input >= 0 && uint64_t(input) < (NumericLimits<uint64_t>::Maximum() / STANDARD_VECTOR_SIZE)) {
// if it does just multiply it and add the value
uint64_t value = uint64_t(input) * count;
AddValue(state.value, value, 1);
} else {
// if it doesn't fit we have two choices
// either we loop over count and add the values individually
// or we convert to a hugeint and multiply the hugeint
// the problem is that hugeint multiplication is expensive
// hence we switch here: with a low count we do the loop
// with a high count we do the hugeint multiplication
if (count < 8) {
for (idx_t i = 0; i < count; i++) {
AddValue(state.value, uint64_t(input), input >= 0);
}
} else {
hugeint_t addition = hugeint_t(input) * Hugeint::Convert(count);
state.value += addition;
}
}
}
};
template <class STATEOP, class ADDOP>
struct BaseSumOperation {
template <class STATE>
static void Initialize(STATE &state) {
state.value = 0;
STATEOP::template Initialize<STATE>(state);
}
template <class STATE, class OP>
static void Combine(const STATE &source, STATE &target, AggregateInputData &aggr_input_data) {
STATEOP::template Combine<STATE>(source, target, aggr_input_data);
}
template <class INPUT_TYPE, class STATE, class OP>
static void Operation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &) {
STATEOP::template AddValues<STATE>(state, 1);
ADDOP::template AddNumber<STATE, INPUT_TYPE>(state, input);
}
template <class INPUT_TYPE, class STATE, class OP>
static void ConstantOperation(STATE &state, const INPUT_TYPE &input, AggregateUnaryInput &, idx_t count) {
STATEOP::template AddValues<STATE>(state, count);
ADDOP::template AddConstant<STATE, INPUT_TYPE>(state, input, count);
}
static bool IgnoreNull() {
return true;
}
};
} // namespace duckdb

View File

@@ -0,0 +1,107 @@
#pragma once
#include "duckdb/common/typedefs.hpp"
#include "duckdb/common/algorithm.hpp"
#include <cmath>
namespace duckdb {
//-------------------------------------------------------------------------
// Folding Operations
//-------------------------------------------------------------------------
struct InnerProductOp {
static constexpr bool ALLOW_EMPTY = true;
template <class TYPE>
static TYPE Operation(const TYPE *lhs_data, const TYPE *rhs_data, const idx_t count) {
TYPE result = 0;
auto lhs_ptr = lhs_data;
auto rhs_ptr = rhs_data;
for (idx_t i = 0; i < count; i++) {
const auto x = *lhs_ptr++;
const auto y = *rhs_ptr++;
result += x * y;
}
return result;
}
};
struct NegativeInnerProductOp {
static constexpr bool ALLOW_EMPTY = true;
template <class TYPE>
static TYPE Operation(const TYPE *lhs_data, const TYPE *rhs_data, const idx_t count) {
return -InnerProductOp::Operation(lhs_data, rhs_data, count);
}
};
struct CosineSimilarityOp {
static constexpr bool ALLOW_EMPTY = false;
template <class TYPE>
static TYPE Operation(const TYPE *lhs_data, const TYPE *rhs_data, const idx_t count) {
TYPE distance = 0;
TYPE norm_l = 0;
TYPE norm_r = 0;
auto l_ptr = lhs_data;
auto r_ptr = rhs_data;
for (idx_t i = 0; i < count; i++) {
const auto x = *l_ptr++;
const auto y = *r_ptr++;
distance += x * y;
norm_l += x * x;
norm_r += y * y;
}
auto similarity = distance / std::sqrt(norm_l * norm_r);
return std::max(static_cast<TYPE>(-1.0), std::min(similarity, static_cast<TYPE>(1.0)));
}
};
struct CosineDistanceOp {
static constexpr bool ALLOW_EMPTY = false;
template <class TYPE>
static TYPE Operation(const TYPE *lhs_data, const TYPE *rhs_data, const idx_t count) {
return static_cast<TYPE>(1.0) - CosineSimilarityOp::Operation(lhs_data, rhs_data, count);
}
};
struct DistanceSquaredOp {
static constexpr bool ALLOW_EMPTY = true;
template <class TYPE>
static TYPE Operation(const TYPE *lhs_data, const TYPE *rhs_data, const idx_t count) {
TYPE distance = 0;
auto l_ptr = lhs_data;
auto r_ptr = rhs_data;
for (idx_t i = 0; i < count; i++) {
const auto x = *l_ptr++;
const auto y = *r_ptr++;
const auto diff = x - y;
distance += diff * diff;
}
return distance;
}
};
struct DistanceOp {
static constexpr bool ALLOW_EMPTY = true;
template <class TYPE>
static TYPE Operation(const TYPE *lhs_data, const TYPE *rhs_data, const idx_t count) {
return std::sqrt(DistanceSquaredOp::Operation(lhs_data, rhs_data, count));
}
};
} // namespace duckdb

View File

@@ -0,0 +1,19 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// extension/core_functions/include/core_functions/function_list.hpp
//
//
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_list.hpp"
namespace duckdb {
struct CoreFunctionList {
static const StaticFunctionDefinition *GetFunctionList();
};
} // namespace duckdb

View File

@@ -0,0 +1,100 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/scalar/array_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct ArrayValueFun {
static constexpr const char *Name = "array_value";
static constexpr const char *Parameters = "any,...";
static constexpr const char *Description = "Creates an `ARRAY` containing the argument values.";
static constexpr const char *Example = "array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT)";
static constexpr const char *Categories = "array";
static ScalarFunction GetFunction();
};
struct ArrayCrossProductFun {
static constexpr const char *Name = "array_cross_product";
static constexpr const char *Parameters = "array,array";
static constexpr const char *Description = "Computes the cross product of two arrays of size 3. The array elements can not be `NULL`.";
static constexpr const char *Example = "array_cross_product(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))";
static constexpr const char *Categories = "array";
static ScalarFunctionSet GetFunctions();
};
struct ArrayCosineSimilarityFun {
static constexpr const char *Name = "array_cosine_similarity";
static constexpr const char *Parameters = "array1,array2";
static constexpr const char *Description = "Computes the cosine similarity between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments.";
static constexpr const char *Example = "array_cosine_similarity(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))";
static constexpr const char *Categories = "array";
static ScalarFunctionSet GetFunctions();
};
struct ArrayCosineDistanceFun {
static constexpr const char *Name = "array_cosine_distance";
static constexpr const char *Parameters = "array1,array2";
static constexpr const char *Description = "Computes the cosine distance between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments.";
static constexpr const char *Example = "array_cosine_distance(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))";
static constexpr const char *Categories = "array";
static ScalarFunctionSet GetFunctions();
};
struct ArrayDistanceFun {
static constexpr const char *Name = "array_distance";
static constexpr const char *Parameters = "array1,array2";
static constexpr const char *Description = "Computes the distance between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments.";
static constexpr const char *Example = "array_distance(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))";
static constexpr const char *Categories = "array";
static ScalarFunctionSet GetFunctions();
};
struct ArrayInnerProductFun {
static constexpr const char *Name = "array_inner_product";
static constexpr const char *Parameters = "array1,array2";
static constexpr const char *Description = "Computes the inner product between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments.";
static constexpr const char *Example = "array_inner_product(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))";
static constexpr const char *Categories = "array";
static ScalarFunctionSet GetFunctions();
};
struct ArrayDotProductFun {
using ALIAS = ArrayInnerProductFun;
static constexpr const char *Name = "array_dot_product";
};
struct ArrayNegativeInnerProductFun {
static constexpr const char *Name = "array_negative_inner_product";
static constexpr const char *Parameters = "array1,array2";
static constexpr const char *Description = "Computes the negative inner product between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments.";
static constexpr const char *Example = "array_negative_inner_product(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))";
static constexpr const char *Categories = "array";
static ScalarFunctionSet GetFunctions();
};
struct ArrayNegativeDotProductFun {
using ALIAS = ArrayNegativeInnerProductFun;
static constexpr const char *Name = "array_negative_dot_product";
};
} // namespace duckdb

View File

@@ -0,0 +1,58 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/scalar/bit_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct GetBitFun {
static constexpr const char *Name = "get_bit";
static constexpr const char *Parameters = "bitstring,index";
static constexpr const char *Description = "Extracts the nth bit from bitstring; the first (leftmost) bit is indexed 0";
static constexpr const char *Example = "get_bit('0110010'::BIT, 2)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct SetBitFun {
static constexpr const char *Name = "set_bit";
static constexpr const char *Parameters = "bitstring,index,new_value";
static constexpr const char *Description = "Sets the nth bit in bitstring to newvalue; the first (leftmost) bit is indexed 0. Returns a new bitstring";
static constexpr const char *Example = "set_bit('0110010'::BIT, 2, 0)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct BitPositionFun {
static constexpr const char *Name = "bit_position";
static constexpr const char *Parameters = "substring,bitstring";
static constexpr const char *Description = "Returns first starting index of the specified substring within bits, or zero if it is not present. The first (leftmost) bit is indexed 1";
static constexpr const char *Example = "bit_position('010'::BIT, '1110101'::BIT)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct BitStringFun {
static constexpr const char *Name = "bitstring";
static constexpr const char *Parameters = "bitstring,length";
static constexpr const char *Description = "Pads the bitstring until the specified length";
static constexpr const char *Example = "bitstring('1010'::BIT, 7)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
} // namespace duckdb

View File

@@ -0,0 +1,64 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/scalar/blob_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct DecodeFun {
static constexpr const char *Name = "decode";
static constexpr const char *Parameters = "blob";
static constexpr const char *Description = "Converts `blob` to `VARCHAR`. Fails if `blob` is not valid UTF-8.";
static constexpr const char *Example = "decode('\\xC3\\xBC'::BLOB)";
static constexpr const char *Categories = "blob";
static ScalarFunction GetFunction();
};
struct EncodeFun {
static constexpr const char *Name = "encode";
static constexpr const char *Parameters = "string";
static constexpr const char *Description = "Converts the `string` to `BLOB`. Converts UTF-8 characters into literal encoding.";
static constexpr const char *Example = "encode('my_string_with_ü')";
static constexpr const char *Categories = "blob";
static ScalarFunction GetFunction();
};
struct FromBase64Fun {
static constexpr const char *Name = "from_base64";
static constexpr const char *Parameters = "string";
static constexpr const char *Description = "Converts a base64 encoded `string` to a character string (`BLOB`).";
static constexpr const char *Example = "from_base64('QQ==')";
static constexpr const char *Categories = "string,blob";
static ScalarFunction GetFunction();
};
struct ToBase64Fun {
static constexpr const char *Name = "to_base64";
static constexpr const char *Parameters = "blob";
static constexpr const char *Description = "Converts a `blob` to a base64 encoded string.";
static constexpr const char *Example = "to_base64('A'::BLOB)";
static constexpr const char *Categories = "string,blob";
static ScalarFunction GetFunction();
};
struct Base64Fun {
using ALIAS = ToBase64Fun;
static constexpr const char *Name = "base64";
};
} // namespace duckdb

View File

@@ -0,0 +1,674 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/scalar/date_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct AgeFun {
static constexpr const char *Name = "age";
static constexpr const char *Parameters = "timestamp,timestamp";
static constexpr const char *Description = "Subtract arguments, resulting in the time difference between the two timestamps";
static constexpr const char *Example = "age(TIMESTAMP '2001-04-10', TIMESTAMP '1992-09-20')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct CenturyFun {
static constexpr const char *Name = "century";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the century component from a date or timestamp";
static constexpr const char *Example = "century(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct DateDiffFun {
static constexpr const char *Name = "date_diff";
static constexpr const char *Parameters = "part,startdate,enddate";
static constexpr const char *Description = "The number of partition boundaries between the timestamps";
static constexpr const char *Example = "date_diff('hour', TIMESTAMPTZ '1992-09-30 23:59:59', TIMESTAMPTZ '1992-10-01 01:58:00')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct DatediffFun {
using ALIAS = DateDiffFun;
static constexpr const char *Name = "datediff";
};
struct DatePartFun {
static constexpr const char *Name = "date_part";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Get subfield (equivalent to extract)";
static constexpr const char *Example = "date_part('minute', TIMESTAMP '1992-09-20 20:38:40')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct DatepartFun {
using ALIAS = DatePartFun;
static constexpr const char *Name = "datepart";
};
struct DateSubFun {
static constexpr const char *Name = "date_sub";
static constexpr const char *Parameters = "part,startdate,enddate";
static constexpr const char *Description = "The number of complete partitions between the timestamps";
static constexpr const char *Example = "date_sub('hour', TIMESTAMPTZ '1992-09-30 23:59:59', TIMESTAMPTZ '1992-10-01 01:58:00')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct DatesubFun {
using ALIAS = DateSubFun;
static constexpr const char *Name = "datesub";
};
struct DateTruncFun {
static constexpr const char *Name = "date_trunc";
static constexpr const char *Parameters = "part,timestamp";
static constexpr const char *Description = "Truncate to specified precision";
static constexpr const char *Example = "date_trunc('hour', TIMESTAMPTZ '1992-09-20 20:38:40')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct DatetruncFun {
using ALIAS = DateTruncFun;
static constexpr const char *Name = "datetrunc";
};
struct DayFun {
static constexpr const char *Name = "day";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the day component from a date or timestamp";
static constexpr const char *Example = "day(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct DayNameFun {
static constexpr const char *Name = "dayname";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "The (English) name of the weekday";
static constexpr const char *Example = "dayname(TIMESTAMP '1992-03-22')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct DayOfMonthFun {
static constexpr const char *Name = "dayofmonth";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the dayofmonth component from a date or timestamp";
static constexpr const char *Example = "dayofmonth(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct DayOfWeekFun {
static constexpr const char *Name = "dayofweek";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the dayofweek component from a date or timestamp";
static constexpr const char *Example = "dayofweek(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct DayOfYearFun {
static constexpr const char *Name = "dayofyear";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the dayofyear component from a date or timestamp";
static constexpr const char *Example = "dayofyear(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct DecadeFun {
static constexpr const char *Name = "decade";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the decade component from a date or timestamp";
static constexpr const char *Example = "decade(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct EpochFun {
static constexpr const char *Name = "epoch";
static constexpr const char *Parameters = "temporal";
static constexpr const char *Description = "Extract the epoch component from a temporal type";
static constexpr const char *Example = "epoch(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct EpochMsFun {
static constexpr const char *Name = "epoch_ms";
static constexpr const char *Parameters = "temporal";
static constexpr const char *Description = "Extract the epoch component in milliseconds from a temporal type";
static constexpr const char *Example = "epoch_ms(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct EpochUsFun {
static constexpr const char *Name = "epoch_us";
static constexpr const char *Parameters = "temporal";
static constexpr const char *Description = "Extract the epoch component in microseconds from a temporal type";
static constexpr const char *Example = "epoch_us(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct EpochNsFun {
static constexpr const char *Name = "epoch_ns";
static constexpr const char *Parameters = "temporal";
static constexpr const char *Description = "Extract the epoch component in nanoseconds from a temporal type";
static constexpr const char *Example = "epoch_ns(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct EraFun {
static constexpr const char *Name = "era";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the era component from a date or timestamp";
static constexpr const char *Example = "era(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct GetCurrentTimestampFun {
static constexpr const char *Name = "get_current_timestamp";
static constexpr const char *Parameters = "";
static constexpr const char *Description = "Returns the current timestamp";
static constexpr const char *Example = "get_current_timestamp()";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct NowFun {
using ALIAS = GetCurrentTimestampFun;
static constexpr const char *Name = "now";
};
struct TransactionTimestampFun {
using ALIAS = GetCurrentTimestampFun;
static constexpr const char *Name = "transaction_timestamp";
};
struct HoursFun {
static constexpr const char *Name = "hour";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the hour component from a date or timestamp";
static constexpr const char *Example = "hour(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct ISODayOfWeekFun {
static constexpr const char *Name = "isodow";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the isodow component from a date or timestamp";
static constexpr const char *Example = "isodow(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct ISOYearFun {
static constexpr const char *Name = "isoyear";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the isoyear component from a date or timestamp";
static constexpr const char *Example = "isoyear(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct JulianDayFun {
static constexpr const char *Name = "julian";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the Julian Day number from a date or timestamp";
static constexpr const char *Example = "julian(timestamp '2006-01-01 12:00')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct LastDayFun {
static constexpr const char *Name = "last_day";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Returns the last day of the month";
static constexpr const char *Example = "last_day(TIMESTAMP '1992-03-22 01:02:03.1234')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct MakeDateFun {
static constexpr const char *Name = "make_date";
static constexpr const char *Parameters = "year,month,day\001date-struct::STRUCT(year BIGINT, month BIGINT, day BIGINT)";
static constexpr const char *Description = "The date for the given parts.\001The date for the given struct.";
static constexpr const char *Example = "make_date(1992, 9, 20)\001make_date({'year': 2024, 'month': 11, 'day': 14})";
static constexpr const char *Categories = "\001";
static ScalarFunctionSet GetFunctions();
};
struct MakeTimeFun {
static constexpr const char *Name = "make_time";
static constexpr const char *Parameters = "hour,minute,seconds";
static constexpr const char *Description = "The time for the given parts";
static constexpr const char *Example = "make_time(13, 34, 27.123456)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct MakeTimestampFun {
static constexpr const char *Name = "make_timestamp";
static constexpr const char *Parameters = "year,month,day,hour,minute,seconds";
static constexpr const char *Description = "The timestamp for the given parts";
static constexpr const char *Example = "make_timestamp(1992, 9, 20, 13, 34, 27.123456)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct MakeTimestampMsFun {
static constexpr const char *Name = "make_timestamp_ms";
static constexpr const char *Parameters = "nanos";
static constexpr const char *Description = "The timestamp for the given microseconds since the epoch";
static constexpr const char *Example = "make_timestamp_ms(1732117793000000)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct MakeTimestampNsFun {
static constexpr const char *Name = "make_timestamp_ns";
static constexpr const char *Parameters = "nanos";
static constexpr const char *Description = "The timestamp for the given nanoseconds since epoch";
static constexpr const char *Example = "make_timestamp_ns(1732117793000000000)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct MicrosecondsFun {
static constexpr const char *Name = "microsecond";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the microsecond component from a date or timestamp";
static constexpr const char *Example = "microsecond(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct MillenniumFun {
static constexpr const char *Name = "millennium";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the millennium component from a date or timestamp";
static constexpr const char *Example = "millennium(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct MillisecondsFun {
static constexpr const char *Name = "millisecond";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the millisecond component from a date or timestamp";
static constexpr const char *Example = "millisecond(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct MinutesFun {
static constexpr const char *Name = "minute";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the minute component from a date or timestamp";
static constexpr const char *Example = "minute(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct MonthFun {
static constexpr const char *Name = "month";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the month component from a date or timestamp";
static constexpr const char *Example = "month(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct MonthNameFun {
static constexpr const char *Name = "monthname";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "The (English) name of the month";
static constexpr const char *Example = "monthname(TIMESTAMP '1992-09-20')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct NanosecondsFun {
static constexpr const char *Name = "nanosecond";
static constexpr const char *Parameters = "tsns";
static constexpr const char *Description = "Extract the nanosecond component from a date or timestamp";
static constexpr const char *Example = "nanosecond(timestamp_ns '2021-08-03 11:59:44.123456789')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct NormalizedIntervalFun {
static constexpr const char *Name = "normalized_interval";
static constexpr const char *Parameters = "interval";
static constexpr const char *Description = "Normalizes an INTERVAL to an equivalent interval";
static constexpr const char *Example = "normalized_interval(INTERVAL '30 days')";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct QuarterFun {
static constexpr const char *Name = "quarter";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the quarter component from a date or timestamp";
static constexpr const char *Example = "quarter(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct SecondsFun {
static constexpr const char *Name = "second";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the second component from a date or timestamp";
static constexpr const char *Example = "second(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct TimeBucketFun {
static constexpr const char *Name = "time_bucket";
static constexpr const char *Parameters = "bucket_width,timestamp,origin";
static constexpr const char *Description = "Truncate TIMESTAMPTZ by the specified interval bucket_width. Buckets are aligned relative to origin TIMESTAMPTZ. The origin defaults to 2000-01-03 00:00:00+00 for buckets that do not include a month or year interval, and to 2000-01-01 00:00:00+00 for month and year buckets";
static constexpr const char *Example = "time_bucket(INTERVAL '2 weeks', TIMESTAMP '1992-04-20 15:26:00-07', TIMESTAMP '1992-04-01 00:00:00-07')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct TimezoneFun {
static constexpr const char *Name = "timezone";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the timezone component from a date or timestamp";
static constexpr const char *Example = "timezone(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct TimezoneHourFun {
static constexpr const char *Name = "timezone_hour";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the timezone_hour component from a date or timestamp";
static constexpr const char *Example = "timezone_hour(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct TimezoneMinuteFun {
static constexpr const char *Name = "timezone_minute";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the timezone_minute component from a date or timestamp";
static constexpr const char *Example = "timezone_minute(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct TimeTZSortKeyFun {
static constexpr const char *Name = "timetz_byte_comparable";
static constexpr const char *Parameters = "time_tz";
static constexpr const char *Description = "Converts a TIME WITH TIME ZONE to an integer sort key";
static constexpr const char *Example = "timetz_byte_comparable('18:18:16.21-07:00'::TIMETZ)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct ToCenturiesFun {
static constexpr const char *Name = "to_centuries";
static constexpr const char *Parameters = "integer";
static constexpr const char *Description = "Construct a century interval";
static constexpr const char *Example = "to_centuries(5)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct ToDaysFun {
static constexpr const char *Name = "to_days";
static constexpr const char *Parameters = "integer";
static constexpr const char *Description = "Construct a day interval";
static constexpr const char *Example = "to_days(5)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct ToDecadesFun {
static constexpr const char *Name = "to_decades";
static constexpr const char *Parameters = "integer";
static constexpr const char *Description = "Construct a decade interval";
static constexpr const char *Example = "to_decades(5)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct ToHoursFun {
static constexpr const char *Name = "to_hours";
static constexpr const char *Parameters = "integer";
static constexpr const char *Description = "Construct a hour interval";
static constexpr const char *Example = "to_hours(5)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct ToMicrosecondsFun {
static constexpr const char *Name = "to_microseconds";
static constexpr const char *Parameters = "integer";
static constexpr const char *Description = "Construct a microsecond interval";
static constexpr const char *Example = "to_microseconds(5)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct ToMillenniaFun {
static constexpr const char *Name = "to_millennia";
static constexpr const char *Parameters = "integer";
static constexpr const char *Description = "Construct a millenium interval";
static constexpr const char *Example = "to_millennia(1)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct ToMillisecondsFun {
static constexpr const char *Name = "to_milliseconds";
static constexpr const char *Parameters = "double";
static constexpr const char *Description = "Construct a millisecond interval";
static constexpr const char *Example = "to_milliseconds(5.5)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct ToMinutesFun {
static constexpr const char *Name = "to_minutes";
static constexpr const char *Parameters = "integer";
static constexpr const char *Description = "Construct a minute interval";
static constexpr const char *Example = "to_minutes(5)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct ToMonthsFun {
static constexpr const char *Name = "to_months";
static constexpr const char *Parameters = "integer";
static constexpr const char *Description = "Construct a month interval";
static constexpr const char *Example = "to_months(5)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct ToQuartersFun {
static constexpr const char *Name = "to_quarters";
static constexpr const char *Parameters = "integer";
static constexpr const char *Description = "Construct a quarter interval";
static constexpr const char *Example = "to_quarters(5)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct ToSecondsFun {
static constexpr const char *Name = "to_seconds";
static constexpr const char *Parameters = "double";
static constexpr const char *Description = "Construct a second interval";
static constexpr const char *Example = "to_seconds(5.5)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct ToTimestampFun {
static constexpr const char *Name = "to_timestamp";
static constexpr const char *Parameters = "sec";
static constexpr const char *Description = "Converts secs since epoch to a timestamp with time zone";
static constexpr const char *Example = "to_timestamp(1284352323.5)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct ToWeeksFun {
static constexpr const char *Name = "to_weeks";
static constexpr const char *Parameters = "integer";
static constexpr const char *Description = "Construct a week interval";
static constexpr const char *Example = "to_weeks(5)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct ToYearsFun {
static constexpr const char *Name = "to_years";
static constexpr const char *Parameters = "integer";
static constexpr const char *Description = "Construct a year interval";
static constexpr const char *Example = "to_years(5)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct WeekFun {
static constexpr const char *Name = "week";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the week component from a date or timestamp";
static constexpr const char *Example = "week(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct WeekDayFun {
static constexpr const char *Name = "weekday";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the weekday component from a date or timestamp";
static constexpr const char *Example = "weekday(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct WeekOfYearFun {
static constexpr const char *Name = "weekofyear";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the weekofyear component from a date or timestamp";
static constexpr const char *Example = "weekofyear(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct YearFun {
static constexpr const char *Name = "year";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the year component from a date or timestamp";
static constexpr const char *Example = "year(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct YearWeekFun {
static constexpr const char *Name = "yearweek";
static constexpr const char *Parameters = "ts";
static constexpr const char *Description = "Extract the yearweek component from a date or timestamp";
static constexpr const char *Example = "yearweek(timestamp '2021-08-03 11:59:44.123456')";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
} // namespace duckdb

View File

@@ -0,0 +1,28 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/scalar/debug_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct VectorTypeFun {
static constexpr const char *Name = "vector_type";
static constexpr const char *Parameters = "col";
static constexpr const char *Description = "Returns the VectorType of a given column";
static constexpr const char *Example = "vector_type(col)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
} // namespace duckdb

View File

@@ -0,0 +1,68 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/scalar/enum_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct EnumFirstFun {
static constexpr const char *Name = "enum_first";
static constexpr const char *Parameters = "enum";
static constexpr const char *Description = "Returns the first value of the input enum type";
static constexpr const char *Example = "enum_first(NULL::mood)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct EnumLastFun {
static constexpr const char *Name = "enum_last";
static constexpr const char *Parameters = "enum";
static constexpr const char *Description = "Returns the last value of the input enum type";
static constexpr const char *Example = "enum_last(NULL::mood)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct EnumCodeFun {
static constexpr const char *Name = "enum_code";
static constexpr const char *Parameters = "enum";
static constexpr const char *Description = "Returns the numeric value backing the given enum value";
static constexpr const char *Example = "enum_code('happy'::mood)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct EnumRangeFun {
static constexpr const char *Name = "enum_range";
static constexpr const char *Parameters = "enum";
static constexpr const char *Description = "Returns all values of the input enum type as an array";
static constexpr const char *Example = "enum_range(NULL::mood)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct EnumRangeBoundaryFun {
static constexpr const char *Name = "enum_range_boundary";
static constexpr const char *Parameters = "start,end";
static constexpr const char *Description = "Returns the range between the two given enum values as an array. The values must be of the same enum type. When the first parameter is NULL, the result starts with the first value of the enum type. When the second parameter is NULL, the result ends with the last value of the enum type";
static constexpr const char *Example = "enum_range_boundary(NULL, 'happy'::mood)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
} // namespace duckdb

View File

@@ -0,0 +1,208 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/scalar/generic_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct AliasFun {
static constexpr const char *Name = "alias";
static constexpr const char *Parameters = "expr";
static constexpr const char *Description = "Returns the name of a given expression";
static constexpr const char *Example = "alias(42 + 1)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct CurrentSettingFun {
static constexpr const char *Name = "current_setting";
static constexpr const char *Parameters = "setting_name";
static constexpr const char *Description = "Returns the current value of the configuration setting";
static constexpr const char *Example = "current_setting('access_mode')";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct HashFun {
static constexpr const char *Name = "hash";
static constexpr const char *Parameters = "value";
static constexpr const char *Description = "Returns a `UBIGINT` with the hash of the `value`. Note that this is not a cryptographic hash.";
static constexpr const char *Example = "hash('🦆')";
static constexpr const char *Categories = "string";
static ScalarFunction GetFunction();
};
struct LeastFun {
static constexpr const char *Name = "least";
static constexpr const char *Parameters = "arg1,arg2,...";
static constexpr const char *Description = "Returns the smallest value. For strings lexicographical ordering is used. Note that uppercase characters are considered “smaller” than lowercase characters, and collations are not supported.";
static constexpr const char *Example = "least(42, 84)\002least('abc', 'bcd', 'cde', 'EFG')";
static constexpr const char *Categories = "string,numeric,date,timestamp,aggregate";
static ScalarFunctionSet GetFunctions();
};
struct GreatestFun {
static constexpr const char *Name = "greatest";
static constexpr const char *Parameters = "arg1,arg2,...";
static constexpr const char *Description = "Returns the largest value. For strings lexicographical ordering is used. Note that lowercase characters are considered “larger” than uppercase characters and collations are not supported.";
static constexpr const char *Example = "greatest(42, 84)\002greatest('abc', 'bcd', 'cde', 'EFG')";
static constexpr const char *Categories = "string,numeric,date,timestamp,aggregate";
static ScalarFunctionSet GetFunctions();
};
struct StatsFun {
static constexpr const char *Name = "stats";
static constexpr const char *Parameters = "expression";
static constexpr const char *Description = "Returns a string with statistics about the expression. Expression can be a column, constant, or SQL expression";
static constexpr const char *Example = "stats(5)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct TypeOfFun {
static constexpr const char *Name = "typeof";
static constexpr const char *Parameters = "expression";
static constexpr const char *Description = "Returns the name of the data type of the result of the expression";
static constexpr const char *Example = "typeof('abc')";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct CanCastImplicitlyFun {
static constexpr const char *Name = "can_cast_implicitly";
static constexpr const char *Parameters = "source_type,target_type";
static constexpr const char *Description = "Whether or not we can implicitly cast from the source type to the other type";
static constexpr const char *Example = "can_cast_implicitly(NULL::INTEGER, NULL::BIGINT)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct CurrentQueryFun {
static constexpr const char *Name = "current_query";
static constexpr const char *Parameters = "";
static constexpr const char *Description = "Returns the current query as a string";
static constexpr const char *Example = "current_query()";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct CurrentSchemaFun {
static constexpr const char *Name = "current_schema";
static constexpr const char *Parameters = "";
static constexpr const char *Description = "Returns the name of the currently active schema. Default is main";
static constexpr const char *Example = "current_schema()";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct CurrentSchemasFun {
static constexpr const char *Name = "current_schemas";
static constexpr const char *Parameters = "include_implicit";
static constexpr const char *Description = "Returns list of schemas. Pass a parameter of True to include implicit schemas";
static constexpr const char *Example = "current_schemas(true)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct CurrentDatabaseFun {
static constexpr const char *Name = "current_database";
static constexpr const char *Parameters = "";
static constexpr const char *Description = "Returns the name of the currently active database";
static constexpr const char *Example = "current_database()";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct InSearchPathFun {
static constexpr const char *Name = "in_search_path";
static constexpr const char *Parameters = "database_name,schema_name";
static constexpr const char *Description = "Returns whether or not the database/schema are in the search path";
static constexpr const char *Example = "in_search_path('memory', 'main')";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct CurrentTransactionIdFun {
static constexpr const char *Name = "txid_current";
static constexpr const char *Parameters = "";
static constexpr const char *Description = "Returns the current transactions ID (a BIGINT). It will assign a new one if the current transaction does not have one already";
static constexpr const char *Example = "txid_current()";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct VersionFun {
static constexpr const char *Name = "version";
static constexpr const char *Parameters = "";
static constexpr const char *Description = "Returns the currently active version of DuckDB in this format: v0.3.2 ";
static constexpr const char *Example = "version()";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct EquiWidthBinsFun {
static constexpr const char *Name = "equi_width_bins";
static constexpr const char *Parameters = "min,max,bin_count,nice_rounding";
static constexpr const char *Description = "Generates bin_count equi-width bins between the min and max. If enabled nice_rounding makes the numbers more readable/less jagged";
static constexpr const char *Example = "equi_width_bins(0, 10, 2, true)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct IsHistogramOtherBinFun {
static constexpr const char *Name = "is_histogram_other_bin";
static constexpr const char *Parameters = "val";
static constexpr const char *Description = "Whether or not the provided value is the histogram \"other\" bin (used for values not belonging to any provided bin)";
static constexpr const char *Example = "is_histogram_other_bin(v)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct CastToTypeFun {
static constexpr const char *Name = "cast_to_type";
static constexpr const char *Parameters = "param,type";
static constexpr const char *Description = "Casts the first argument to the type of the second argument";
static constexpr const char *Example = "cast_to_type('42', NULL::INTEGER)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct ReplaceTypeFun {
static constexpr const char *Name = "replace_type";
static constexpr const char *Parameters = "param,type1,type2";
static constexpr const char *Description = "Casts all fields of type1 to type2";
static constexpr const char *Example = "replace_type({duck: 3.141592653589793::DOUBLE}, NULL::DOUBLE, NULL::DECIMAL(15,2))";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
} // namespace duckdb

View File

@@ -0,0 +1,412 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/scalar/list_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct ListFlattenFun {
static constexpr const char *Name = "flatten";
static constexpr const char *Parameters = "nested_list";
static constexpr const char *Description = "Flattens a nested list by one level.";
static constexpr const char *Example = "flatten([[1, 2, 3], [4, 5]])";
static constexpr const char *Categories = "list";
static ScalarFunction GetFunction();
};
struct ListAggregateFun {
static constexpr const char *Name = "list_aggregate";
static constexpr const char *Parameters = "list,function_name";
static constexpr const char *Description = "Executes the aggregate function `function_name` on the elements of `list`.";
static constexpr const char *Example = "list_aggregate([1, 2, NULL], 'min')";
static constexpr const char *Categories = "list";
static ScalarFunction GetFunction();
};
struct ArrayAggregateFun {
using ALIAS = ListAggregateFun;
static constexpr const char *Name = "array_aggregate";
};
struct ListAggrFun {
using ALIAS = ListAggregateFun;
static constexpr const char *Name = "list_aggr";
};
struct ArrayAggrFun {
using ALIAS = ListAggregateFun;
static constexpr const char *Name = "array_aggr";
};
struct AggregateFun {
using ALIAS = ListAggregateFun;
static constexpr const char *Name = "aggregate";
};
struct ListDistinctFun {
static constexpr const char *Name = "list_distinct";
static constexpr const char *Parameters = "list";
static constexpr const char *Description = "Removes all duplicates and `NULL` values from a list. Does not preserve the original order.";
static constexpr const char *Example = "list_distinct([1, 1, NULL, -3, 1, 5])";
static constexpr const char *Categories = "list";
static ScalarFunction GetFunction();
};
struct ArrayDistinctFun {
using ALIAS = ListDistinctFun;
static constexpr const char *Name = "array_distinct";
};
struct ListUniqueFun {
static constexpr const char *Name = "list_unique";
static constexpr const char *Parameters = "list";
static constexpr const char *Description = "Counts the unique elements of a `list`.";
static constexpr const char *Example = "list_unique([1, 1, NULL, -3, 1, 5])";
static constexpr const char *Categories = "list";
static ScalarFunction GetFunction();
};
struct ArrayUniqueFun {
using ALIAS = ListUniqueFun;
static constexpr const char *Name = "array_unique";
};
struct ListValueFun {
static constexpr const char *Name = "list_value";
static constexpr const char *Parameters = "any,...";
static constexpr const char *Description = "Creates a LIST containing the argument values.";
static constexpr const char *Example = "list_value(4, 5, 6)";
static constexpr const char *Categories = "list";
static ScalarFunctionSet GetFunctions();
};
struct ListPackFun {
using ALIAS = ListValueFun;
static constexpr const char *Name = "list_pack";
};
struct ListSliceFun {
static constexpr const char *Name = "list_slice";
static constexpr const char *Parameters = "list,begin,end\001list,begin,end,step";
static constexpr const char *Description = "Extracts a sublist or substring using slice conventions. Negative values are accepted.\001list_slice with added step feature.";
static constexpr const char *Example = "list_slice([4, 5, 6], 2, 3)\002array_slice('DuckDB', 3, 4)\002array_slice('DuckDB', 3, NULL)\002array_slice('DuckDB', 0, -3)\001list_slice([4, 5, 6], 1, 3, 2)";
static constexpr const char *Categories = "list,string\001list";
static ScalarFunctionSet GetFunctions();
};
struct ArraySliceFun {
using ALIAS = ListSliceFun;
static constexpr const char *Name = "array_slice";
};
struct ListSortFun {
static constexpr const char *Name = "list_sort";
static constexpr const char *Parameters = "list";
static constexpr const char *Description = "Sorts the elements of the list.";
static constexpr const char *Example = "list_sort([3, 6, 1, 2])";
static constexpr const char *Categories = "list";
static ScalarFunctionSet GetFunctions();
};
struct ArraySortFun {
using ALIAS = ListSortFun;
static constexpr const char *Name = "array_sort";
};
struct ListGradeUpFun {
static constexpr const char *Name = "list_grade_up";
static constexpr const char *Parameters = "list";
static constexpr const char *Description = "Works like list_sort, but the results are the indexes that correspond to the position in the original list instead of the actual values.";
static constexpr const char *Example = "list_grade_up([3, 6, 1, 2])";
static constexpr const char *Categories = "list";
static ScalarFunctionSet GetFunctions();
};
struct ArrayGradeUpFun {
using ALIAS = ListGradeUpFun;
static constexpr const char *Name = "array_grade_up";
};
struct GradeUpFun {
using ALIAS = ListGradeUpFun;
static constexpr const char *Name = "grade_up";
};
struct ListReverseSortFun {
static constexpr const char *Name = "list_reverse_sort";
static constexpr const char *Parameters = "list";
static constexpr const char *Description = "Sorts the elements of the list in reverse order.";
static constexpr const char *Example = "list_reverse_sort([3, 6, 1, 2])";
static constexpr const char *Categories = "list";
static ScalarFunctionSet GetFunctions();
};
struct ArrayReverseSortFun {
using ALIAS = ListReverseSortFun;
static constexpr const char *Name = "array_reverse_sort";
};
struct ListTransformFun {
static constexpr const char *Name = "list_transform";
static constexpr const char *Parameters = "list,lambda(x)";
static constexpr const char *Description = "Returns a list that is the result of applying the `lambda` function to each element of the input `list`. The return type is defined by the return type of the `lambda` function.";
static constexpr const char *Example = "list_transform([1, 2, 3], lambda x : x + 1)";
static constexpr const char *Categories = "list,lambda";
static ScalarFunction GetFunction();
};
struct ArrayTransformFun {
using ALIAS = ListTransformFun;
static constexpr const char *Name = "array_transform";
};
struct ListApplyFun {
using ALIAS = ListTransformFun;
static constexpr const char *Name = "list_apply";
};
struct ArrayApplyFun {
using ALIAS = ListTransformFun;
static constexpr const char *Name = "array_apply";
};
struct ApplyFun {
using ALIAS = ListTransformFun;
static constexpr const char *Name = "apply";
};
struct ListFilterFun {
static constexpr const char *Name = "list_filter";
static constexpr const char *Parameters = "list,lambda(x)";
static constexpr const char *Description = "Constructs a list from those elements of the input `list` for which the `lambda` function returns `true`. DuckDB must be able to cast the `lambda` function's return type to `BOOL`. The return type of `list_filter` is the same as the input list's.";
static constexpr const char *Example = "list_filter([3, 4, 5], lambda x : x > 4)";
static constexpr const char *Categories = "list,lambda";
static ScalarFunction GetFunction();
};
struct ArrayFilterFun {
using ALIAS = ListFilterFun;
static constexpr const char *Name = "array_filter";
};
struct FilterFun {
using ALIAS = ListFilterFun;
static constexpr const char *Name = "filter";
};
struct ListReduceFun {
static constexpr const char *Name = "list_reduce";
static constexpr const char *Parameters = "list,lambda(x,y),initial_value";
static constexpr const char *Description = "Reduces all elements of the input `list` into a single scalar value by executing the `lambda` function on a running result and the next list element. The `lambda` function has an optional `initial_value` argument.";
static constexpr const char *Example = "list_reduce([1, 2, 3], lambda x, y : x + y)";
static constexpr const char *Categories = "list,lambda";
static ScalarFunctionSet GetFunctions();
};
struct ArrayReduceFun {
using ALIAS = ListReduceFun;
static constexpr const char *Name = "array_reduce";
};
struct ReduceFun {
using ALIAS = ListReduceFun;
static constexpr const char *Name = "reduce";
};
struct GenerateSeriesFun {
static constexpr const char *Name = "generate_series";
static constexpr const char *Parameters = "start,stop,step";
static constexpr const char *Description = "Creates a list of values between `start` and `stop` - the stop parameter is inclusive.";
static constexpr const char *Example = "generate_series(2, 5, 3)";
static constexpr const char *Categories = "list";
static ScalarFunctionSet GetFunctions();
};
struct ListRangeFun {
static constexpr const char *Name = "range";
static constexpr const char *Parameters = "start,stop,step";
static constexpr const char *Description = "Creates a list of values between `start` and `stop` - the stop parameter is exclusive.";
static constexpr const char *Example = "range(2, 5, 3)";
static constexpr const char *Categories = "list";
static ScalarFunctionSet GetFunctions();
};
struct ListCosineDistanceFun {
static constexpr const char *Name = "list_cosine_distance";
static constexpr const char *Parameters = "list1,list2";
static constexpr const char *Description = "Computes the cosine distance between two same-sized lists.";
static constexpr const char *Example = "list_cosine_distance([1, 2, 3], [1, 2, 3])";
static constexpr const char *Categories = "list";
static ScalarFunctionSet GetFunctions();
};
struct ListCosineDistanceFunAlias {
using ALIAS = ListCosineDistanceFun;
static constexpr const char *Name = "<=>";
};
struct ListCosineSimilarityFun {
static constexpr const char *Name = "list_cosine_similarity";
static constexpr const char *Parameters = "list1,list2";
static constexpr const char *Description = "Computes the cosine similarity between two same-sized lists.";
static constexpr const char *Example = "list_cosine_similarity([1, 2, 3], [1, 2, 3])";
static constexpr const char *Categories = "list";
static ScalarFunctionSet GetFunctions();
};
struct ListDistanceFun {
static constexpr const char *Name = "list_distance";
static constexpr const char *Parameters = "list1,list2";
static constexpr const char *Description = "Calculates the Euclidean distance between two points with coordinates given in two inputs lists of equal length.";
static constexpr const char *Example = "list_distance([1, 2, 3], [1, 2, 5])";
static constexpr const char *Categories = "list";
static ScalarFunctionSet GetFunctions();
};
struct ListDistanceFunAlias {
using ALIAS = ListDistanceFun;
static constexpr const char *Name = "<->";
};
struct ListInnerProductFun {
static constexpr const char *Name = "list_inner_product";
static constexpr const char *Parameters = "list1,list2";
static constexpr const char *Description = "Computes the inner product between two same-sized lists.";
static constexpr const char *Example = "list_inner_product([1, 2, 3], [1, 2, 3])";
static constexpr const char *Categories = "list";
static ScalarFunctionSet GetFunctions();
};
struct ListDotProductFun {
using ALIAS = ListInnerProductFun;
static constexpr const char *Name = "list_dot_product";
};
struct ListNegativeInnerProductFun {
static constexpr const char *Name = "list_negative_inner_product";
static constexpr const char *Parameters = "list1,list2";
static constexpr const char *Description = "Computes the negative inner product between two same-sized lists.";
static constexpr const char *Example = "list_negative_inner_product([1, 2, 3], [1, 2, 3])";
static constexpr const char *Categories = "list";
static ScalarFunctionSet GetFunctions();
};
struct ListNegativeDotProductFun {
using ALIAS = ListNegativeInnerProductFun;
static constexpr const char *Name = "list_negative_dot_product";
};
struct UnpivotListFun {
static constexpr const char *Name = "unpivot_list";
static constexpr const char *Parameters = "any,...";
static constexpr const char *Description = "Identical to list_value, but generated as part of unpivot for better error messages.";
static constexpr const char *Example = "unpivot_list(4, 5, 6)";
static constexpr const char *Categories = "list";
static ScalarFunction GetFunction();
};
struct ListHasAnyFun {
static constexpr const char *Name = "list_has_any";
static constexpr const char *Parameters = "list1,list2";
static constexpr const char *Description = "Returns true if the lists have any element in common. NULLs are ignored.";
static constexpr const char *Example = "list_has_any([1, 2, 3], [2, 3, 4])";
static constexpr const char *Categories = "list";
static ScalarFunction GetFunction();
};
struct ArrayHasAnyFun {
using ALIAS = ListHasAnyFun;
static constexpr const char *Name = "array_has_any";
};
struct ListHasAnyFunAlias {
using ALIAS = ListHasAnyFun;
static constexpr const char *Name = "&&";
};
struct ListHasAllFun {
static constexpr const char *Name = "list_has_all";
static constexpr const char *Parameters = "list1,list2";
static constexpr const char *Description = "Returns true if all elements of list2 are in list1. NULLs are ignored.";
static constexpr const char *Example = "list_has_all([1, 2, 3], [2, 3])";
static constexpr const char *Categories = "list";
static ScalarFunction GetFunction();
};
struct ArrayHasAllFun {
using ALIAS = ListHasAllFun;
static constexpr const char *Name = "array_has_all";
};
struct ListHasAllFunAlias {
using ALIAS = ListHasAllFun;
static constexpr const char *Name = "@>";
};
struct ListHasAllFunAlias2 {
using ALIAS = ListHasAllFun;
static constexpr const char *Name = "<@";
};
} // namespace duckdb

View File

@@ -0,0 +1,114 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/scalar/map_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct CardinalityFun {
static constexpr const char *Name = "cardinality";
static constexpr const char *Parameters = "map";
static constexpr const char *Description = "Returns the size of the map (or the number of entries in the map)";
static constexpr const char *Example = "cardinality( map([4, 2], ['a', 'b']) );";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct MapFun {
static constexpr const char *Name = "map";
static constexpr const char *Parameters = "keys,values";
static constexpr const char *Description = "Creates a map from a set of keys and values";
static constexpr const char *Example = "map(['key1', 'key2'], ['val1', 'val2'])";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct MapEntriesFun {
static constexpr const char *Name = "map_entries";
static constexpr const char *Parameters = "map";
static constexpr const char *Description = "Returns the map entries as a list of keys/values";
static constexpr const char *Example = "map_entries(map(['key'], ['val']))";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct MapExtractFun {
static constexpr const char *Name = "map_extract";
static constexpr const char *Parameters = "map,key";
static constexpr const char *Description = "Returns a list containing the value for a given key or an empty list if the key is not contained in the map. The type of the key provided in the second parameter must match the type of the maps keys else an error is returned";
static constexpr const char *Example = "map_extract(map(['key'], ['val']), 'key')";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct ElementAtFun {
using ALIAS = MapExtractFun;
static constexpr const char *Name = "element_at";
};
struct MapExtractValueFun {
static constexpr const char *Name = "map_extract_value";
static constexpr const char *Parameters = "map,key";
static constexpr const char *Description = "Returns the value for a given key or NULL if the key is not contained in the map. The type of the key provided in the second parameter must match the type of the maps keys else an error is returned";
static constexpr const char *Example = "map_extract_value(map(['key'], ['val']), 'key')";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct MapFromEntriesFun {
static constexpr const char *Name = "map_from_entries";
static constexpr const char *Parameters = "map";
static constexpr const char *Description = "Returns a map created from the entries of the array";
static constexpr const char *Example = "map_from_entries([{k: 5, v: 'val1'}, {k: 3, v: 'val2'}]);";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct MapConcatFun {
static constexpr const char *Name = "map_concat";
static constexpr const char *Parameters = "any,...";
static constexpr const char *Description = "Returns a map created from merging the input maps, on key collision the value is taken from the last map with that key";
static constexpr const char *Example = "map_concat(map([1, 2], ['a', 'b']), map([2, 3], ['c', 'd']));";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct MapKeysFun {
static constexpr const char *Name = "map_keys";
static constexpr const char *Parameters = "map";
static constexpr const char *Description = "Returns the keys of a map as a list";
static constexpr const char *Example = "map_keys(map(['key'], ['val']))";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct MapValuesFun {
static constexpr const char *Name = "map_values";
static constexpr const char *Parameters = "map";
static constexpr const char *Description = "Returns the values of a map as a list";
static constexpr const char *Example = "map_values(map(['key'], ['val']))";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
} // namespace duckdb

View File

@@ -0,0 +1,496 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/scalar/math_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct AbsOperatorFun {
static constexpr const char *Name = "@";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Absolute value";
static constexpr const char *Example = "abs(-17.4)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct AbsFun {
using ALIAS = AbsOperatorFun;
static constexpr const char *Name = "abs";
};
struct PowOperatorFun {
static constexpr const char *Name = "**";
static constexpr const char *Parameters = "x,y";
static constexpr const char *Description = "Computes x to the power of y";
static constexpr const char *Example = "pow(2, 3)\002power(2, 3)\0022 ** 3\0022 ^ 3";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct PowFun {
using ALIAS = PowOperatorFun;
static constexpr const char *Name = "pow";
};
struct PowerFun {
using ALIAS = PowOperatorFun;
static constexpr const char *Name = "power";
};
struct PowOperatorFunAlias {
using ALIAS = PowOperatorFun;
static constexpr const char *Name = "^";
};
struct FactorialOperatorFun {
static constexpr const char *Name = "!__postfix";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Factorial of x. Computes the product of the current integer and all integers below it";
static constexpr const char *Example = "4!";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct FactorialFun {
using ALIAS = FactorialOperatorFun;
static constexpr const char *Name = "factorial";
};
struct AcosFun {
static constexpr const char *Name = "acos";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Computes the arccosine of x";
static constexpr const char *Example = "acos(0.5)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct AsinFun {
static constexpr const char *Name = "asin";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Computes the arcsine of x";
static constexpr const char *Example = "asin(0.5)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct AtanFun {
static constexpr const char *Name = "atan";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Computes the arctangent of x";
static constexpr const char *Example = "atan(0.5)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct Atan2Fun {
static constexpr const char *Name = "atan2";
static constexpr const char *Parameters = "y,x";
static constexpr const char *Description = "Computes the arctangent (y, x)";
static constexpr const char *Example = "atan2(1.0, 0.0)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct BitCountFun {
static constexpr const char *Name = "bit_count";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns the number of bits that are set";
static constexpr const char *Example = "bit_count(31)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct CbrtFun {
static constexpr const char *Name = "cbrt";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns the cube root of x";
static constexpr const char *Example = "cbrt(8)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct CeilFun {
static constexpr const char *Name = "ceil";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Rounds the number up";
static constexpr const char *Example = "ceil(17.4)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct CeilingFun {
using ALIAS = CeilFun;
static constexpr const char *Name = "ceiling";
};
struct CosFun {
static constexpr const char *Name = "cos";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Computes the cos of x";
static constexpr const char *Example = "cos(90)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct CotFun {
static constexpr const char *Name = "cot";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Computes the cotangent of x";
static constexpr const char *Example = "cot(0.5)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct DegreesFun {
static constexpr const char *Name = "degrees";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Converts radians to degrees";
static constexpr const char *Example = "degrees(pi())";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct EvenFun {
static constexpr const char *Name = "even";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Rounds x to next even number by rounding away from zero";
static constexpr const char *Example = "even(2.9)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct ExpFun {
static constexpr const char *Name = "exp";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Computes e to the power of x";
static constexpr const char *Example = "exp(1)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct FloorFun {
static constexpr const char *Name = "floor";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Rounds the number down";
static constexpr const char *Example = "floor(17.4)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct IsFiniteFun {
static constexpr const char *Name = "isfinite";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns true if the floating point value is finite, false otherwise";
static constexpr const char *Example = "isfinite(5.5)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct IsInfiniteFun {
static constexpr const char *Name = "isinf";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns true if the floating point value is infinite, false otherwise";
static constexpr const char *Example = "isinf('Infinity'::float)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct IsNanFun {
static constexpr const char *Name = "isnan";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns true if the floating point value is not a number, false otherwise";
static constexpr const char *Example = "isnan('NaN'::FLOAT)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct GammaFun {
static constexpr const char *Name = "gamma";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Interpolation of (x-1) factorial (so decimal inputs are allowed)";
static constexpr const char *Example = "gamma(5.5)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct GreatestCommonDivisorFun {
static constexpr const char *Name = "greatest_common_divisor";
static constexpr const char *Parameters = "x,y";
static constexpr const char *Description = "Computes the greatest common divisor of x and y";
static constexpr const char *Example = "greatest_common_divisor(42, 57)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct GcdFun {
using ALIAS = GreatestCommonDivisorFun;
static constexpr const char *Name = "gcd";
};
struct LeastCommonMultipleFun {
static constexpr const char *Name = "least_common_multiple";
static constexpr const char *Parameters = "x,y";
static constexpr const char *Description = "Computes the least common multiple of x and y";
static constexpr const char *Example = "least_common_multiple(42, 57)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct LcmFun {
using ALIAS = LeastCommonMultipleFun;
static constexpr const char *Name = "lcm";
};
struct LogGammaFun {
static constexpr const char *Name = "lgamma";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Computes the log of the gamma function";
static constexpr const char *Example = "lgamma(2)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct LnFun {
static constexpr const char *Name = "ln";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Computes the natural logarithm of x";
static constexpr const char *Example = "ln(2)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct Log2Fun {
static constexpr const char *Name = "log2";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Computes the 2-log of x";
static constexpr const char *Example = "log2(8)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct Log10Fun {
static constexpr const char *Name = "log10";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Computes the 10-log of x";
static constexpr const char *Example = "log10(1000)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct LogFun {
static constexpr const char *Name = "log";
static constexpr const char *Parameters = "b,x";
static constexpr const char *Description = "Computes the logarithm of x to base b. b may be omitted, in which case the default 10";
static constexpr const char *Example = "log(2, 64)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct NextAfterFun {
static constexpr const char *Name = "nextafter";
static constexpr const char *Parameters = "x,y";
static constexpr const char *Description = "Returns the next floating point value after x in the direction of y";
static constexpr const char *Example = "nextafter(1::float, 2::float)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct PiFun {
static constexpr const char *Name = "pi";
static constexpr const char *Parameters = "";
static constexpr const char *Description = "Returns the value of pi";
static constexpr const char *Example = "pi()";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct RadiansFun {
static constexpr const char *Name = "radians";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Converts degrees to radians";
static constexpr const char *Example = "radians(90)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct RoundFun {
static constexpr const char *Name = "round";
static constexpr const char *Parameters = "x,precision";
static constexpr const char *Description = "Rounds x to s decimal places";
static constexpr const char *Example = "round(42.4332, 2)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct SignFun {
static constexpr const char *Name = "sign";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns the sign of x as -1, 0 or 1";
static constexpr const char *Example = "sign(-349)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct SignBitFun {
static constexpr const char *Name = "signbit";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns whether the signbit is set or not";
static constexpr const char *Example = "signbit(-0.0)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct SinFun {
static constexpr const char *Name = "sin";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Computes the sin of x";
static constexpr const char *Example = "sin(90)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct SqrtFun {
static constexpr const char *Name = "sqrt";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Returns the square root of x";
static constexpr const char *Example = "sqrt(4)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct TanFun {
static constexpr const char *Name = "tan";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Computes the tan of x";
static constexpr const char *Example = "tan(90)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct TruncFun {
static constexpr const char *Name = "trunc";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Truncates the number";
static constexpr const char *Example = "trunc(17.4)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct CoshFun {
static constexpr const char *Name = "cosh";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Computes the hyperbolic cos of x";
static constexpr const char *Example = "cosh(1)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct SinhFun {
static constexpr const char *Name = "sinh";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Computes the hyperbolic sin of x";
static constexpr const char *Example = "sinh(1)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct TanhFun {
static constexpr const char *Name = "tanh";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Computes the hyperbolic tan of x";
static constexpr const char *Example = "tanh(1)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct AcoshFun {
static constexpr const char *Name = "acosh";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Computes the inverse hyperbolic cos of x";
static constexpr const char *Example = "acosh(2.3)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct AsinhFun {
static constexpr const char *Name = "asinh";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Computes the inverse hyperbolic sin of x";
static constexpr const char *Example = "asinh(0.5)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct AtanhFun {
static constexpr const char *Name = "atanh";
static constexpr const char *Parameters = "x";
static constexpr const char *Description = "Computes the inverse hyperbolic tan of x";
static constexpr const char *Example = "atanh(0.5)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
} // namespace duckdb

View File

@@ -0,0 +1,78 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/scalar/operators_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct BitwiseAndFun {
static constexpr const char *Name = "&";
static constexpr const char *Parameters = "left,right";
static constexpr const char *Description = "Bitwise AND";
static constexpr const char *Example = "91 & 15";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct BitwiseOrFun {
static constexpr const char *Name = "|";
static constexpr const char *Parameters = "left,right";
static constexpr const char *Description = "Bitwise OR";
static constexpr const char *Example = "32 | 3";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct BitwiseNotFun {
static constexpr const char *Name = "~";
static constexpr const char *Parameters = "input";
static constexpr const char *Description = "Bitwise NOT";
static constexpr const char *Example = "~15";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct LeftShiftFun {
static constexpr const char *Name = "<<";
static constexpr const char *Parameters = "input";
static constexpr const char *Description = "Bitwise shift left";
static constexpr const char *Example = "1 << 4";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct RightShiftFun {
static constexpr const char *Name = ">>";
static constexpr const char *Parameters = "input";
static constexpr const char *Description = "Bitwise shift right";
static constexpr const char *Example = "8 >> 2";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
struct BitwiseXorFun {
static constexpr const char *Name = "xor";
static constexpr const char *Parameters = "left,right";
static constexpr const char *Description = "Bitwise XOR";
static constexpr const char *Example = "xor(17, 5)";
static constexpr const char *Categories = "";
static ScalarFunctionSet GetFunctions();
};
} // namespace duckdb

View File

@@ -0,0 +1,94 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/scalar/random_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct RandomFun {
static constexpr const char *Name = "random";
static constexpr const char *Parameters = "";
static constexpr const char *Description = "Returns a random number between 0 and 1";
static constexpr const char *Example = "random()";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct SetseedFun {
static constexpr const char *Name = "setseed";
static constexpr const char *Parameters = "";
static constexpr const char *Description = "Sets the seed to be used for the random function";
static constexpr const char *Example = "setseed(0.42)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct UUIDFun {
static constexpr const char *Name = "uuid";
static constexpr const char *Parameters = "";
static constexpr const char *Description = "Returns a random UUID v4 similar to this: eeccb8c5-9943-b2bb-bb5e-222f4e14b687";
static constexpr const char *Example = "uuid()";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct GenRandomUuidFun {
using ALIAS = UUIDFun;
static constexpr const char *Name = "gen_random_uuid";
};
struct UUIDv4Fun {
static constexpr const char *Name = "uuidv4";
static constexpr const char *Parameters = "";
static constexpr const char *Description = "Returns a random UUIDv4 similar to this: eeccb8c5-9943-b2bb-bb5e-222f4e14b687";
static constexpr const char *Example = "uuidv4()";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct UUIDv7Fun {
static constexpr const char *Name = "uuidv7";
static constexpr const char *Parameters = "";
static constexpr const char *Description = "Returns a random UUID v7 similar to this: 019482e4-1441-7aad-8127-eec99573b0a0";
static constexpr const char *Example = "uuidv7()";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct UUIDExtractVersionFun {
static constexpr const char *Name = "uuid_extract_version";
static constexpr const char *Parameters = "uuid";
static constexpr const char *Description = "Extract a version for the given UUID.";
static constexpr const char *Example = "uuid_extract_version('019482e4-1441-7aad-8127-eec99573b0a0')";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct UUIDExtractTimestampFun {
static constexpr const char *Name = "uuid_extract_timestamp";
static constexpr const char *Parameters = "uuid";
static constexpr const char *Description = "Extract the timestamp for the given UUID v7.";
static constexpr const char *Example = "uuid_extract_timestamp('019482e4-1441-7aad-8127-eec99573b0a0')";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
} // namespace duckdb

View File

@@ -0,0 +1,27 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// duckdb/core_functions/scalar/secret_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct WhichSecretFun {
static constexpr const char *Name = "which_secret";
static constexpr const char *Parameters = "path,type";
static constexpr const char *Description = "Print out the name of the secret that will be used for reading a path";
static constexpr const char *Example = "which_secret('s3://some/authenticated/path.csv', 's3')";
static ScalarFunction GetFunction();
};
} // namespace duckdb

View File

@@ -0,0 +1,484 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/scalar/string_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct StartsWithOperatorFun {
static constexpr const char *Name = "^@";
static constexpr const char *Parameters = "string,search_string";
static constexpr const char *Description = "Returns `true` if `string` begins with `search_string`.";
static constexpr const char *Example = "starts_with('abc', 'a')";
static constexpr const char *Categories = "string";
static ScalarFunction GetFunction();
};
struct StartsWithFun {
using ALIAS = StartsWithOperatorFun;
static constexpr const char *Name = "starts_with";
};
struct ASCIIFun {
static constexpr const char *Name = "ascii";
static constexpr const char *Parameters = "string";
static constexpr const char *Description = "Returns an integer that represents the Unicode code point of the first character of the `string`.";
static constexpr const char *Example = "ascii('Ω')";
static constexpr const char *Categories = "string";
static ScalarFunction GetFunction();
};
struct BarFun {
static constexpr const char *Name = "bar";
static constexpr const char *Parameters = "x,min,max,width";
static constexpr const char *Description = "Draws a band whose width is proportional to (`x - min`) and equal to `width` characters when `x` = `max`. `width` defaults to 80.";
static constexpr const char *Example = "bar(5, 0, 20, 10)";
static constexpr const char *Categories = "string";
static ScalarFunctionSet GetFunctions();
};
struct BinFun {
static constexpr const char *Name = "bin";
static constexpr const char *Parameters = "string::VARCHAR\001value::ANY";
static constexpr const char *Description = "Converts the `string` to binary representation.\001Converts the `value` to binary representation.";
static constexpr const char *Example = "bin('Aa')\001bin(42)";
static constexpr const char *Categories = "string\001numeric";
static ScalarFunctionSet GetFunctions();
};
struct ToBinaryFun {
using ALIAS = BinFun;
static constexpr const char *Name = "to_binary";
};
struct ChrFun {
static constexpr const char *Name = "chr";
static constexpr const char *Parameters = "code_point";
static constexpr const char *Description = "Returns a character which is corresponding the ASCII code value or Unicode code point.";
static constexpr const char *Example = "chr(65)";
static constexpr const char *Categories = "string";
static ScalarFunction GetFunction();
};
struct DamerauLevenshteinFun {
static constexpr const char *Name = "damerau_levenshtein";
static constexpr const char *Parameters = "s1,s2";
static constexpr const char *Description = "Extension of Levenshtein distance to also include transposition of adjacent characters as an allowed edit operation. In other words, the minimum number of edit operations (insertions, deletions, substitutions or transpositions) required to change one string to another. Characters of different cases (e.g., `a` and `A`) are considered different.";
static constexpr const char *Example = "damerau_levenshtein('duckdb', 'udckbd')";
static constexpr const char *Categories = "text_similarity";
static ScalarFunction GetFunction();
};
struct FormatFun {
static constexpr const char *Name = "format";
static constexpr const char *Parameters = "format,parameters...";
static constexpr const char *Description = "Formats a string using the fmt syntax.";
static constexpr const char *Example = "format('Benchmark \"{}\" took {} seconds', 'CSV', 42)";
static constexpr const char *Categories = "string";
static ScalarFunction GetFunction();
};
struct FormatBytesFun {
static constexpr const char *Name = "format_bytes";
static constexpr const char *Parameters = "integer";
static constexpr const char *Description = "Converts `integer` to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.).";
static constexpr const char *Example = "format_bytes(16_000)";
static constexpr const char *Categories = "string,numeric";
static ScalarFunction GetFunction();
};
struct FormatreadablesizeFun {
using ALIAS = FormatBytesFun;
static constexpr const char *Name = "formatReadableSize";
};
struct FormatreadabledecimalsizeFun {
static constexpr const char *Name = "formatReadableDecimalSize";
static constexpr const char *Parameters = "integer";
static constexpr const char *Description = "Converts `integer` to a human-readable representation using units based on powers of 10 (KB, MB, GB, etc.).";
static constexpr const char *Example = "formatReadableDecimalSize(16_000)";
static constexpr const char *Categories = "string,numeric";
static ScalarFunction GetFunction();
};
struct HammingFun {
static constexpr const char *Name = "hamming";
static constexpr const char *Parameters = "s1,s2";
static constexpr const char *Description = "The Hamming distance between to strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., `a` and `A`) are considered different.";
static constexpr const char *Example = "hamming('duck', 'luck')";
static constexpr const char *Categories = "text_similarity";
static ScalarFunction GetFunction();
};
struct MismatchesFun {
using ALIAS = HammingFun;
static constexpr const char *Name = "mismatches";
};
struct HexFun {
static constexpr const char *Name = "hex";
static constexpr const char *Parameters = "string::VARCHAR\001blob::BLOB\001value::ANY";
static constexpr const char *Description = "Converts the `string` to hexadecimal representation.\001Converts `blob` to `VARCHAR` using hexadecimal encoding.\001Converts the `value` to `VARCHAR` using hexadecimal representation.";
static constexpr const char *Example = "hex('Hello')\001hex('\\xAA\\xBB'::BLOB)\001hex(42)";
static constexpr const char *Categories = "string\001blob\001numeric";
static ScalarFunctionSet GetFunctions();
};
struct ToHexFun {
using ALIAS = HexFun;
static constexpr const char *Name = "to_hex";
};
struct InstrFun {
static constexpr const char *Name = "instr";
static constexpr const char *Parameters = "string,search_string";
static constexpr const char *Description = "Returns location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found.";
static constexpr const char *Example = "instr('test test', 'es')\002position('b' IN 'abc')";
static constexpr const char *Categories = "string";
static ScalarFunction GetFunction();
};
struct StrposFun {
using ALIAS = InstrFun;
static constexpr const char *Name = "strpos";
};
struct PositionFun {
using ALIAS = InstrFun;
static constexpr const char *Name = "position";
};
struct JaccardFun {
static constexpr const char *Name = "jaccard";
static constexpr const char *Parameters = "s1,s2";
static constexpr const char *Description = "The Jaccard similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1.";
static constexpr const char *Example = "jaccard('duck', 'luck')";
static constexpr const char *Categories = "text_similarity";
static ScalarFunction GetFunction();
};
struct JaroSimilarityFun {
static constexpr const char *Name = "jaro_similarity";
static constexpr const char *Parameters = "s1,s2,score_cutoff";
static constexpr const char *Description = "The Jaro similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0.";
static constexpr const char *Example = "jaro_similarity('duck', 'duckdb')";
static constexpr const char *Categories = "text_similarity";
static ScalarFunctionSet GetFunctions();
};
struct JaroWinklerSimilarityFun {
static constexpr const char *Name = "jaro_winkler_similarity";
static constexpr const char *Parameters = "s1,s2,score_cutoff";
static constexpr const char *Description = "The Jaro-Winkler similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0.";
static constexpr const char *Example = "jaro_winkler_similarity('duck', 'duckdb')";
static constexpr const char *Categories = "text_similarity";
static ScalarFunctionSet GetFunctions();
};
struct LeftFun {
static constexpr const char *Name = "left";
static constexpr const char *Parameters = "string,count";
static constexpr const char *Description = "Extracts the left-most count characters.";
static constexpr const char *Example = "left('Hello🦆', 2)";
static constexpr const char *Categories = "string";
static ScalarFunction GetFunction();
};
struct LeftGraphemeFun {
static constexpr const char *Name = "left_grapheme";
static constexpr const char *Parameters = "string,count";
static constexpr const char *Description = "Extracts the left-most count grapheme clusters.";
static constexpr const char *Example = "left_grapheme('🤦🏼‍♂️🤦🏽‍♀️', 1)";
static constexpr const char *Categories = "string";
static ScalarFunction GetFunction();
};
struct LevenshteinFun {
static constexpr const char *Name = "levenshtein";
static constexpr const char *Parameters = "s1,s2";
static constexpr const char *Description = "The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., `a` and `A`) are considered different.";
static constexpr const char *Example = "levenshtein('duck', 'db')";
static constexpr const char *Categories = "text_similarity";
static ScalarFunction GetFunction();
};
struct Editdist3Fun {
using ALIAS = LevenshteinFun;
static constexpr const char *Name = "editdist3";
};
struct LpadFun {
static constexpr const char *Name = "lpad";
static constexpr const char *Parameters = "string,count,character";
static constexpr const char *Description = "Pads the `string` with the `character` on the left until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters.";
static constexpr const char *Example = "lpad('hello', 8, '>')";
static constexpr const char *Categories = "string";
static ScalarFunction GetFunction();
};
struct LtrimFun {
static constexpr const char *Name = "ltrim";
static constexpr const char *Parameters = "string,characters";
static constexpr const char *Description = "Removes any occurrences of any of the `characters` from the left side of the `string`. `characters` defaults to `space`.";
static constexpr const char *Example = "ltrim(' test ')\002ltrim('>>>>test<<', '><')";
static constexpr const char *Categories = "string";
static ScalarFunctionSet GetFunctions();
};
struct ParseDirnameFun {
static constexpr const char *Name = "parse_dirname";
static constexpr const char *Parameters = "path,separator";
static constexpr const char *Description = "Returns the top-level directory name from the given `path`. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`.";
static constexpr const char *Example = "parse_dirname('path/to/file.csv', 'system')";
static constexpr const char *Categories = "string";
static ScalarFunctionSet GetFunctions();
};
struct ParseDirpathFun {
static constexpr const char *Name = "parse_dirpath";
static constexpr const char *Parameters = "path,separator";
static constexpr const char *Description = "Returns the head of the `path` (the pathname until the last slash) similarly to Python's `os.path.dirname`. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`.";
static constexpr const char *Example = "parse_dirpath('path/to/file.csv', 'forward_slash')";
static constexpr const char *Categories = "string";
static ScalarFunctionSet GetFunctions();
};
struct ParseFilenameFun {
static constexpr const char *Name = "parse_filename";
static constexpr const char *Parameters = "string,trim_extension,separator";
static constexpr const char *Description = "Returns the last component of the `path` similarly to Python's `os.path.basename` function. If `trim_extension` is `true`, the file extension will be removed (defaults to `false`). `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`.";
static constexpr const char *Example = "parse_filename('path/to/file.csv', true, 'forward_slash')";
static constexpr const char *Categories = "string";
static ScalarFunctionSet GetFunctions();
};
struct ParsePathFun {
static constexpr const char *Name = "parse_path";
static constexpr const char *Parameters = "path,separator";
static constexpr const char *Description = "Returns a list of the components (directories and filename) in the `path` similarly to Python's `pathlib.parts` function. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`.";
static constexpr const char *Example = "parse_path('path/to/file.csv', 'system')";
static constexpr const char *Categories = "string";
static ScalarFunctionSet GetFunctions();
};
struct PrintfFun {
static constexpr const char *Name = "printf";
static constexpr const char *Parameters = "format,parameters...";
static constexpr const char *Description = "Formats a `string` using printf syntax.";
static constexpr const char *Example = "printf('Benchmark \"%s\" took %d seconds', 'CSV', 42)";
static constexpr const char *Categories = "string";
static ScalarFunction GetFunction();
};
struct RepeatFun {
static constexpr const char *Name = "repeat";
static constexpr const char *Parameters = "string::VARCHAR,count::BIGINT\001list::ANY[],count::BIGINT\001blob::BLOB,count::BIGINT";
static constexpr const char *Description = "Repeats the `string` `count` number of times.\001Repeats the `list` `count` number of times.\001Repeats the `blob` `count` number of times.";
static constexpr const char *Example = "repeat('A', 5)\001repeat([1, 2, 3], 5)\001repeat('\\xAA\\xBB'::BLOB, 5)";
static constexpr const char *Categories = "string\001list\001blob";
static ScalarFunctionSet GetFunctions();
};
struct ReplaceFun {
static constexpr const char *Name = "replace";
static constexpr const char *Parameters = "string,source,target";
static constexpr const char *Description = "Replaces any occurrences of the `source` with `target` in `string`.";
static constexpr const char *Example = "replace('hello', 'l', '-')";
static constexpr const char *Categories = "string";
static ScalarFunction GetFunction();
};
struct ReverseFun {
static constexpr const char *Name = "reverse";
static constexpr const char *Parameters = "string";
static constexpr const char *Description = "Reverses the `string`.";
static constexpr const char *Example = "reverse('hello')";
static constexpr const char *Categories = "string";
static ScalarFunction GetFunction();
};
struct RightFun {
static constexpr const char *Name = "right";
static constexpr const char *Parameters = "string,count";
static constexpr const char *Description = "Extract the right-most `count` characters.";
static constexpr const char *Example = "right('Hello🦆', 3)";
static constexpr const char *Categories = "string";
static ScalarFunction GetFunction();
};
struct RightGraphemeFun {
static constexpr const char *Name = "right_grapheme";
static constexpr const char *Parameters = "string,count";
static constexpr const char *Description = "Extracts the right-most `count` grapheme clusters.";
static constexpr const char *Example = "right_grapheme('🤦🏼‍♂️🤦🏽‍♀️', 1)";
static constexpr const char *Categories = "string";
static ScalarFunction GetFunction();
};
struct RpadFun {
static constexpr const char *Name = "rpad";
static constexpr const char *Parameters = "string,count,character";
static constexpr const char *Description = "Pads the `string` with the `character` on the right until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters.";
static constexpr const char *Example = "rpad('hello', 10, '<')";
static constexpr const char *Categories = "string";
static ScalarFunction GetFunction();
};
struct RtrimFun {
static constexpr const char *Name = "rtrim";
static constexpr const char *Parameters = "string,characters";
static constexpr const char *Description = "Removes any occurrences of any of the `characters` from the right side of the `string`. `characters` defaults to `space`.";
static constexpr const char *Example = "rtrim(' test ')\002rtrim('>>>>test<<', '><')";
static constexpr const char *Categories = "string";
static ScalarFunctionSet GetFunctions();
};
struct TranslateFun {
static constexpr const char *Name = "translate";
static constexpr const char *Parameters = "string,from,to";
static constexpr const char *Description = "Replaces each character in `string` that matches a character in the `from` set with the corresponding character in the `to` set. If `from` is longer than `to`, occurrences of the extra characters in `from` are deleted.";
static constexpr const char *Example = "translate('12345', '143', 'ax')";
static constexpr const char *Categories = "string";
static ScalarFunction GetFunction();
};
struct TrimFun {
static constexpr const char *Name = "trim";
static constexpr const char *Parameters = "string,characters";
static constexpr const char *Description = "Removes any occurrences of any of the `characters` from either side of the `string`. `characters` defaults to `space`.";
static constexpr const char *Example = "trim(' test ')\002trim('>>>>test<<', '><')";
static constexpr const char *Categories = "string";
static ScalarFunctionSet GetFunctions();
};
struct UnbinFun {
static constexpr const char *Name = "unbin";
static constexpr const char *Parameters = "value";
static constexpr const char *Description = "Converts a `value` from binary representation to a blob.";
static constexpr const char *Example = "unbin('0110')";
static constexpr const char *Categories = "string,blob";
static ScalarFunction GetFunction();
};
struct FromBinaryFun {
using ALIAS = UnbinFun;
static constexpr const char *Name = "from_binary";
};
struct UnhexFun {
static constexpr const char *Name = "unhex";
static constexpr const char *Parameters = "value";
static constexpr const char *Description = "Converts a `value` from hexadecimal representation to a blob.";
static constexpr const char *Example = "unhex('2A')";
static constexpr const char *Categories = "string,blob";
static ScalarFunction GetFunction();
};
struct FromHexFun {
using ALIAS = UnhexFun;
static constexpr const char *Name = "from_hex";
};
struct UnicodeFun {
static constexpr const char *Name = "unicode";
static constexpr const char *Parameters = "string";
static constexpr const char *Description = "Returns an `INTEGER` representing the `unicode` codepoint of the first character in the `string`.";
static constexpr const char *Example = "[unicode('âbcd'), unicode('â'), unicode(''), unicode(NULL)]";
static constexpr const char *Categories = "string";
static ScalarFunction GetFunction();
};
struct OrdFun {
using ALIAS = UnicodeFun;
static constexpr const char *Name = "ord";
};
struct ToBaseFun {
static constexpr const char *Name = "to_base";
static constexpr const char *Parameters = "number,radix,min_length";
static constexpr const char *Description = "Converts `number` to a string in the given base `radix`, optionally padding with leading zeros to `min_length`.";
static constexpr const char *Example = "to_base(42, 16, 5)";
static constexpr const char *Categories = "string,numeric";
static ScalarFunctionSet GetFunctions();
};
struct UrlEncodeFun {
static constexpr const char *Name = "url_encode";
static constexpr const char *Parameters = "string";
static constexpr const char *Description = "Encodes a URL to a representation using Percent-Encoding.";
static constexpr const char *Example = "url_encode('this string has/ special+ characters>')";
static constexpr const char *Categories = "string";
static ScalarFunction GetFunction();
};
struct UrlDecodeFun {
static constexpr const char *Name = "url_decode";
static constexpr const char *Parameters = "string";
static constexpr const char *Description = "Decodes a URL from a representation using Percent-Encoding.";
static constexpr const char *Example = "url_decode('https%3A%2F%2Fduckdb.org%2Fwhy_duckdb%23portable')";
static constexpr const char *Categories = "string";
static ScalarFunction GetFunction();
};
} // namespace duckdb

View File

@@ -0,0 +1,38 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/scalar/struct_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct StructInsertFun {
static constexpr const char *Name = "struct_insert";
static constexpr const char *Parameters = "struct,any";
static constexpr const char *Description = "Adds field(s)/value(s) to an existing STRUCT with the argument values. The entry name(s) will be the bound variable name(s)";
static constexpr const char *Example = "struct_insert({'a': 1}, b := 2)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct StructUpdateFun {
static constexpr const char *Name = "struct_update";
static constexpr const char *Parameters = "struct,any";
static constexpr const char *Description = "Changes field(s)/value(s) to an existing STRUCT with the argument values. The entry name(s) will be the bound variable name(s)";
static constexpr const char *Example = "struct_update({'a': 1}, a := 2)";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
} // namespace duckdb

View File

@@ -0,0 +1,48 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions/scalar/union_functions.hpp
//
//
//===----------------------------------------------------------------------===//
// This file is automatically generated by scripts/generate_functions.py
// Do not edit this file manually, your changes will be overwritten
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb/function/function_set.hpp"
namespace duckdb {
struct UnionExtractFun {
static constexpr const char *Name = "union_extract";
static constexpr const char *Parameters = "union,tag";
static constexpr const char *Description = "Extract the value with the named tags from the union. NULL if the tag is not currently selected";
static constexpr const char *Example = "union_extract(s, 'k')";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct UnionTagFun {
static constexpr const char *Name = "union_tag";
static constexpr const char *Parameters = "union";
static constexpr const char *Description = "Retrieve the currently selected tag of the union as an ENUM";
static constexpr const char *Example = "union_tag(union_value(k := 'foo'))";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
struct UnionValueFun {
static constexpr const char *Name = "union_value";
static constexpr const char *Parameters = "tag";
static constexpr const char *Description = "Create a single member UNION containing the argument value. The tag of the value will be the bound variable name";
static constexpr const char *Example = "union_value(k := 'hello')";
static constexpr const char *Categories = "";
static ScalarFunction GetFunction();
};
} // namespace duckdb

View File

@@ -0,0 +1,22 @@
//===----------------------------------------------------------------------===//
// DuckDB
//
// core_functions_extension.hpp
//
//
//===----------------------------------------------------------------------===//
#pragma once
#include "duckdb.hpp"
namespace duckdb {
class CoreFunctionsExtension : public Extension {
public:
void Load(ExtensionLoader &db) override;
std::string Name() override;
std::string Version() const override;
};
} // namespace duckdb

View File

@@ -0,0 +1,407 @@
#include "duckdb/function/lambda_functions.hpp"
#include "duckdb/common/serializer/serializer.hpp"
#include "duckdb/common/serializer/deserializer.hpp"
#include "duckdb/planner/expression/bound_function_expression.hpp"
#include "duckdb/planner/expression/bound_cast_expression.hpp"
namespace duckdb {
//===--------------------------------------------------------------------===//
// Helper functions
//===--------------------------------------------------------------------===//
//! LambdaExecuteInfo holds information for executing the lambda expression on an input chunk and
//! a resulting lambda chunk.
struct LambdaExecuteInfo {
LambdaExecuteInfo(ClientContext &context, const Expression &lambda_expr, const DataChunk &args,
const bool has_index, const Vector &child_vector)
: has_index(has_index) {
expr_executor = make_uniq<ExpressionExecutor>(context, lambda_expr);
// get the input types for the input chunk
vector<LogicalType> input_types;
if (has_index) {
input_types.push_back(LogicalType::BIGINT);
}
input_types.push_back(child_vector.GetType());
for (idx_t i = 1; i < args.ColumnCount(); i++) {
input_types.push_back(args.data[i].GetType());
}
// get the result types
vector<LogicalType> result_types {lambda_expr.return_type};
// initialize the data chunks
input_chunk.InitializeEmpty(input_types);
lambda_chunk.Initialize(Allocator::DefaultAllocator(), result_types);
};
//! The expression executor that executes the lambda expression
unique_ptr<ExpressionExecutor> expr_executor;
//! The input chunk on which we execute the lambda expression
DataChunk input_chunk;
//! The chunk holding the result of executing the lambda expression
DataChunk lambda_chunk;
//! True, if this lambda expression expects an index vector in the input chunk
bool has_index;
};
//! A helper struct with information that is specific to the list_filter function
struct ListFilterInfo {
//! The new list lengths after filtering out elements
vector<idx_t> entry_lengths;
//! The length of the current list
idx_t length = 0;
//! The offset of the current list
idx_t offset = 0;
//! The current row index
idx_t row_idx = 0;
//! The length of the source list
idx_t src_length = 0;
};
//! ListTransformFunctor contains list_transform specific functionality
struct ListTransformFunctor {
static void ReserveNewLengths(vector<idx_t> &, const idx_t) {
// NOP
}
static void PushEmptyList(vector<idx_t> &) {
// NOP
}
//! Sets the list entries of the result vector
static void SetResultEntry(list_entry_t *result_entries, idx_t &offset, const list_entry_t &entry,
const idx_t row_idx, vector<idx_t> &) {
result_entries[row_idx].offset = offset;
result_entries[row_idx].length = entry.length;
offset += entry.length;
}
//! Appends the lambda vector to the result's child vector
static void AppendResult(Vector &result, Vector &lambda_vector, const idx_t elem_cnt, list_entry_t *,
ListFilterInfo &, LambdaExecuteInfo &) {
ListVector::Append(result, lambda_vector, elem_cnt, 0);
}
};
//! ListFilterFunctor contains list_filter specific functionality
struct ListFilterFunctor {
//! Initializes the entry_lengths vector
static void ReserveNewLengths(vector<idx_t> &entry_lengths, const idx_t row_count) {
entry_lengths.reserve(row_count);
}
//! Pushes an empty list to the entry_lengths vector
static void PushEmptyList(vector<idx_t> &entry_lengths) {
entry_lengths.emplace_back(0);
}
//! Pushes the length of the original list to the entry_lengths vector
static void SetResultEntry(list_entry_t *, idx_t &, const list_entry_t &entry, const idx_t,
vector<idx_t> &entry_lengths) {
entry_lengths.push_back(entry.length);
}
//! Uses the lambda vector to filter the incoming list and to append the filtered list to the result vector
static void AppendResult(Vector &result, Vector &lambda_vector, const idx_t elem_cnt, list_entry_t *result_entries,
ListFilterInfo &info, LambdaExecuteInfo &execute_info) {
idx_t count = 0;
SelectionVector sel(elem_cnt);
UnifiedVectorFormat lambda_data;
lambda_vector.ToUnifiedFormat(elem_cnt, lambda_data);
auto lambda_values = UnifiedVectorFormat::GetData<bool>(lambda_data);
auto &lambda_validity = lambda_data.validity;
// compute the new lengths and offsets, and create a selection vector
for (idx_t i = 0; i < elem_cnt; i++) {
auto entry_idx = lambda_data.sel->get_index(i);
// set length and offset of empty lists
while (info.row_idx < info.entry_lengths.size() && !info.entry_lengths[info.row_idx]) {
result_entries[info.row_idx].offset = info.offset;
result_entries[info.row_idx].length = 0;
info.row_idx++;
}
// found a true value
if (lambda_validity.RowIsValid(entry_idx) && lambda_values[entry_idx]) {
sel.set_index(count++, i);
info.length++;
}
info.src_length++;
// we traversed the entire source list
if (info.entry_lengths[info.row_idx] == info.src_length) {
// set the offset and length of the result entry
result_entries[info.row_idx].offset = info.offset;
result_entries[info.row_idx].length = info.length;
// reset all other fields
info.offset += info.length;
info.row_idx++;
info.length = 0;
info.src_length = 0;
}
}
// set length and offset of all remaining empty lists
while (info.row_idx < info.entry_lengths.size() && !info.entry_lengths[info.row_idx]) {
result_entries[info.row_idx].offset = info.offset;
result_entries[info.row_idx].length = 0;
info.row_idx++;
}
// slice the input chunk's corresponding vector to get the new lists
// and append them to the result
idx_t source_list_idx = execute_info.has_index ? 1 : 0;
Vector result_lists(execute_info.input_chunk.data[source_list_idx], sel, count);
ListVector::Append(result, result_lists, count, 0);
}
};
vector<LambdaFunctions::ColumnInfo> LambdaFunctions::GetColumnInfo(DataChunk &args, const idx_t row_count) {
vector<ColumnInfo> data;
// skip the input list and then insert all remaining input vectors
for (idx_t i = 1; i < args.ColumnCount(); i++) {
data.emplace_back(args.data[i]);
args.data[i].ToUnifiedFormat(row_count, data.back().format);
}
return data;
}
vector<reference<LambdaFunctions::ColumnInfo>>
LambdaFunctions::GetMutableColumnInfo(vector<LambdaFunctions::ColumnInfo> &data) {
vector<reference<ColumnInfo>> inconstant_info;
for (auto &entry : data) {
if (entry.vector.get().GetVectorType() != VectorType::CONSTANT_VECTOR) {
inconstant_info.push_back(entry);
}
}
return inconstant_info;
}
static void ExecuteExpression(const idx_t elem_cnt, const LambdaFunctions::ColumnInfo &column_info,
const vector<LambdaFunctions::ColumnInfo> &column_infos, const Vector &index_vector,
LambdaExecuteInfo &info) {
info.input_chunk.SetCardinality(elem_cnt);
info.lambda_chunk.SetCardinality(elem_cnt);
// slice the child vector
Vector slice(column_info.vector, column_info.sel, elem_cnt);
// reference the child vector (and the index vector)
if (info.has_index) {
info.input_chunk.data[0].Reference(index_vector);
info.input_chunk.data[1].Reference(slice);
} else {
info.input_chunk.data[0].Reference(slice);
}
idx_t slice_offset = info.has_index ? 2 : 1;
// (slice and) reference the other columns
vector<Vector> slices;
for (idx_t i = 0; i < column_infos.size(); i++) {
if (column_infos[i].vector.get().GetVectorType() == VectorType::CONSTANT_VECTOR) {
// only reference constant vectorsl
info.input_chunk.data[i + slice_offset].Reference(column_infos[i].vector);
} else {
// slice inconstant vectors
slices.emplace_back(column_infos[i].vector, column_infos[i].sel, elem_cnt);
info.input_chunk.data[i + slice_offset].Reference(slices.back());
}
}
// execute the lambda expression
info.expr_executor->Execute(info.input_chunk, info.lambda_chunk);
}
//===--------------------------------------------------------------------===//
// ListLambdaBindData
//===--------------------------------------------------------------------===//
void ListLambdaBindData::Serialize(Serializer &serializer, const optional_ptr<FunctionData> bind_data_p,
const ScalarFunction &) {
auto &bind_data = bind_data_p->Cast<ListLambdaBindData>();
serializer.WriteProperty(100, "return_type", bind_data.return_type);
serializer.WritePropertyWithDefault(101, "lambda_expr", bind_data.lambda_expr, unique_ptr<Expression>());
serializer.WriteProperty(102, "has_index", bind_data.has_index);
serializer.WritePropertyWithDefault<bool>(103, "has_initial", bind_data.has_initial, false);
}
unique_ptr<FunctionData> ListLambdaBindData::Deserialize(Deserializer &deserializer, ScalarFunction &) {
auto return_type = deserializer.ReadProperty<LogicalType>(100, "return_type");
auto lambda_expr = deserializer.ReadPropertyWithExplicitDefault<unique_ptr<Expression>>(101, "lambda_expr",
unique_ptr<Expression>());
auto has_index = deserializer.ReadProperty<bool>(102, "has_index");
auto has_initial = deserializer.ReadPropertyWithExplicitDefault<bool>(103, "has_initial", false);
return make_uniq<ListLambdaBindData>(return_type, std::move(lambda_expr), has_index, has_initial);
}
//===--------------------------------------------------------------------===//
// LambdaFunctions
//===--------------------------------------------------------------------===//
LogicalType LambdaFunctions::DetermineListChildType(const LogicalType &child_type) {
if (child_type.id() != LogicalTypeId::SQLNULL && child_type.id() != LogicalTypeId::UNKNOWN) {
if (child_type.id() == LogicalTypeId::ARRAY) {
return ArrayType::GetChildType(child_type);
} else if (child_type.id() == LogicalTypeId::LIST) {
return ListType::GetChildType(child_type);
}
throw InternalException("The first argument must be a list or array type");
}
return child_type;
}
LogicalType LambdaFunctions::BindBinaryChildren(const vector<LogicalType> &function_child_types,
const idx_t parameter_idx) {
auto list_type = DetermineListChildType(function_child_types[0]);
switch (parameter_idx) {
case 0:
return list_type;
case 1:
return LogicalType::BIGINT;
default:
throw BinderException("This lambda function only supports up to two lambda parameters!");
}
}
template <class FUNCTION_FUNCTOR>
static void ExecuteLambda(DataChunk &args, ExpressionState &state, Vector &result) {
bool result_is_null = false;
LambdaFunctions::LambdaInfo info(args, state, result, result_is_null);
if (result_is_null) {
return;
}
auto result_entries = FlatVector::GetData<list_entry_t>(result);
auto mutable_column_infos = LambdaFunctions::GetMutableColumnInfo(info.column_infos);
// special-handling for the child_vector
auto child_vector_size = ListVector::GetListSize(args.data[0]);
LambdaFunctions::ColumnInfo child_info(*info.child_vector);
info.child_vector->ToUnifiedFormat(child_vector_size, child_info.format);
// get the expression executor
LambdaExecuteInfo execute_info(state.GetContext(), *info.lambda_expr, args, info.has_index, *info.child_vector);
// get list_filter specific info
ListFilterInfo list_filter_info;
FUNCTION_FUNCTOR::ReserveNewLengths(list_filter_info.entry_lengths, info.row_count);
// additional index vector
Vector index_vector(LogicalType::BIGINT);
// loop over the child entries and create chunks to be executed by the expression executor
idx_t elem_cnt = 0;
idx_t offset = 0;
for (idx_t row_idx = 0; row_idx < info.row_count; row_idx++) {
auto list_idx = info.list_column_format.sel->get_index(row_idx);
const auto &list_entry = info.list_entries[list_idx];
// set the result to NULL for this row
if (!info.list_column_format.validity.RowIsValid(list_idx)) {
info.result_validity->SetInvalid(row_idx);
FUNCTION_FUNCTOR::PushEmptyList(list_filter_info.entry_lengths);
continue;
}
FUNCTION_FUNCTOR::SetResultEntry(result_entries, offset, list_entry, row_idx, list_filter_info.entry_lengths);
// empty list, nothing to execute
if (list_entry.length == 0) {
continue;
}
// iterate the elements of the current list and create the corresponding selection vectors
for (idx_t child_idx = 0; child_idx < list_entry.length; child_idx++) {
// reached STANDARD_VECTOR_SIZE elements
if (elem_cnt == STANDARD_VECTOR_SIZE) {
execute_info.lambda_chunk.Reset();
ExecuteExpression(elem_cnt, child_info, info.column_infos, index_vector, execute_info);
auto &lambda_vector = execute_info.lambda_chunk.data[0];
FUNCTION_FUNCTOR::AppendResult(result, lambda_vector, elem_cnt, result_entries, list_filter_info,
execute_info);
elem_cnt = 0;
}
// FIXME: reuse same selection vector for inconstant rows
// adjust indexes for slicing
child_info.sel.set_index(elem_cnt, list_entry.offset + child_idx);
for (auto &entry : mutable_column_infos) {
entry.get().sel.set_index(elem_cnt, row_idx);
}
// set the index vector
if (info.has_index) {
index_vector.SetValue(elem_cnt, Value::BIGINT(NumericCast<int64_t>(child_idx + 1)));
}
elem_cnt++;
}
}
execute_info.lambda_chunk.Reset();
ExecuteExpression(elem_cnt, child_info, info.column_infos, index_vector, execute_info);
auto &lambda_vector = execute_info.lambda_chunk.data[0];
FUNCTION_FUNCTOR::AppendResult(result, lambda_vector, elem_cnt, result_entries, list_filter_info, execute_info);
if (info.is_all_constant && !info.is_volatile) {
result.SetVectorType(VectorType::CONSTANT_VECTOR);
}
}
unique_ptr<FunctionData> LambdaFunctions::ListLambdaPrepareBind(vector<unique_ptr<Expression>> &arguments,
ClientContext &context,
ScalarFunction &bound_function) {
// NULL list parameter
if (arguments[0]->return_type.id() == LogicalTypeId::SQLNULL) {
bound_function.arguments[0] = LogicalType::SQLNULL;
bound_function.return_type = LogicalType::SQLNULL;
return make_uniq<ListLambdaBindData>(bound_function.return_type, nullptr);
}
// prepared statements
if (arguments[0]->return_type.id() == LogicalTypeId::UNKNOWN) {
throw ParameterNotResolvedException();
}
arguments[0] = BoundCastExpression::AddArrayCastToList(context, std::move(arguments[0]));
D_ASSERT(arguments[0]->return_type.id() == LogicalTypeId::LIST);
return nullptr;
}
unique_ptr<FunctionData> LambdaFunctions::ListLambdaBind(ClientContext &context, ScalarFunction &bound_function,
vector<unique_ptr<Expression>> &arguments,
const bool has_index) {
unique_ptr<FunctionData> bind_data = ListLambdaPrepareBind(arguments, context, bound_function);
if (bind_data) {
return bind_data;
}
// get the lambda expression and put it in the bind info
auto &bound_lambda_expr = arguments[1]->Cast<BoundLambdaExpression>();
auto lambda_expr = std::move(bound_lambda_expr.lambda_expr);
return make_uniq<ListLambdaBindData>(bound_function.return_type, std::move(lambda_expr), has_index);
}
void LambdaFunctions::ListTransformFunction(DataChunk &args, ExpressionState &state, Vector &result) {
ExecuteLambda<ListTransformFunctor>(args, state, result);
}
void LambdaFunctions::ListFilterFunction(DataChunk &args, ExpressionState &state, Vector &result) {
ExecuteLambda<ListFilterFunctor>(args, state, result);
}
} // namespace duckdb

View File

@@ -0,0 +1,19 @@
add_subdirectory(array)
add_subdirectory(bit)
add_subdirectory(blob)
add_subdirectory(date)
add_subdirectory(debug)
add_subdirectory(enum)
add_subdirectory(generic)
add_subdirectory(list)
add_subdirectory(map)
add_subdirectory(math)
add_subdirectory(operators)
add_subdirectory(random)
add_subdirectory(string)
add_subdirectory(struct)
add_subdirectory(union)
set(CORE_FUNCTION_FILES
${CORE_FUNCTION_FILES}
PARENT_SCOPE)

View File

@@ -0,0 +1,5 @@
add_library_unity(duckdb_core_functions_array OBJECT array_functions.cpp
array_value.cpp)
set(CORE_FUNCTION_FILES
${CORE_FUNCTION_FILES} $<TARGET_OBJECTS:duckdb_core_functions_array>
PARENT_SCOPE)

View File

@@ -0,0 +1,281 @@
#include "core_functions/scalar/array_functions.hpp"
#include "core_functions/array_kernels.hpp"
#include "duckdb/planner/expression/bound_function_expression.hpp"
namespace duckdb {
static unique_ptr<FunctionData> ArrayGenericBinaryBind(ClientContext &context, ScalarFunction &bound_function,
vector<unique_ptr<Expression>> &arguments) {
const auto &lhs_type = arguments[0]->return_type;
const auto &rhs_type = arguments[1]->return_type;
if (lhs_type.IsUnknown() && rhs_type.IsUnknown()) {
bound_function.arguments[0] = rhs_type;
bound_function.arguments[1] = lhs_type;
bound_function.return_type = LogicalType::UNKNOWN;
return nullptr;
}
bound_function.arguments[0] = lhs_type.IsUnknown() ? rhs_type : lhs_type;
bound_function.arguments[1] = rhs_type.IsUnknown() ? lhs_type : rhs_type;
if (bound_function.arguments[0].id() != LogicalTypeId::ARRAY ||
bound_function.arguments[1].id() != LogicalTypeId::ARRAY) {
throw InvalidInputException(
StringUtil::Format("%s: Arguments must be arrays of FLOAT or DOUBLE", bound_function.name));
}
const auto lhs_size = ArrayType::GetSize(bound_function.arguments[0]);
const auto rhs_size = ArrayType::GetSize(bound_function.arguments[1]);
if (lhs_size != rhs_size) {
throw BinderException("%s: Array arguments must be of the same size", bound_function.name);
}
const auto &lhs_element_type = ArrayType::GetChildType(bound_function.arguments[0]);
const auto &rhs_element_type = ArrayType::GetChildType(bound_function.arguments[1]);
// Resolve common type
LogicalType common_type;
if (!LogicalType::TryGetMaxLogicalType(context, lhs_element_type, rhs_element_type, common_type)) {
throw BinderException("%s: Cannot infer common element type (left = '%s', right = '%s')", bound_function.name,
lhs_element_type.ToString(), rhs_element_type.ToString());
}
// Ensure it is float or double
if (common_type.id() != LogicalTypeId::FLOAT && common_type.id() != LogicalTypeId::DOUBLE) {
throw BinderException("%s: Arguments must be arrays of FLOAT or DOUBLE", bound_function.name);
}
// The important part is just that we resolve the size of the input arrays
bound_function.arguments[0] = LogicalType::ARRAY(common_type, lhs_size);
bound_function.arguments[1] = LogicalType::ARRAY(common_type, rhs_size);
return nullptr;
}
//------------------------------------------------------------------------------
// Element-wise combine functions
//------------------------------------------------------------------------------
// Given two arrays of the same size, combine their elements into a single array
// of the same size as the input arrays.
namespace {
struct CrossProductOp {
template <class TYPE>
static void Operation(const TYPE *lhs_data, const TYPE *rhs_data, TYPE *res_data, idx_t size) {
D_ASSERT(size == 3);
auto lx = lhs_data[0];
auto ly = lhs_data[1];
auto lz = lhs_data[2];
auto rx = rhs_data[0];
auto ry = rhs_data[1];
auto rz = rhs_data[2];
res_data[0] = ly * rz - lz * ry;
res_data[1] = lz * rx - lx * rz;
res_data[2] = lx * ry - ly * rx;
}
};
} // namespace
template <class TYPE, class OP, idx_t N>
static void ArrayFixedCombine(DataChunk &args, ExpressionState &state, Vector &result) {
const auto &lstate = state.Cast<ExecuteFunctionState>();
const auto &expr = lstate.expr.Cast<BoundFunctionExpression>();
const auto &func_name = expr.function.name;
const auto count = args.size();
auto &lhs_child = ArrayVector::GetEntry(args.data[0]);
auto &rhs_child = ArrayVector::GetEntry(args.data[1]);
auto &res_child = ArrayVector::GetEntry(result);
const auto &lhs_child_validity = FlatVector::Validity(lhs_child);
const auto &rhs_child_validity = FlatVector::Validity(rhs_child);
UnifiedVectorFormat lhs_format;
UnifiedVectorFormat rhs_format;
args.data[0].ToUnifiedFormat(count, lhs_format);
args.data[1].ToUnifiedFormat(count, rhs_format);
auto lhs_data = FlatVector::GetData<TYPE>(lhs_child);
auto rhs_data = FlatVector::GetData<TYPE>(rhs_child);
auto res_data = FlatVector::GetData<TYPE>(res_child);
for (idx_t i = 0; i < count; i++) {
const auto lhs_idx = lhs_format.sel->get_index(i);
const auto rhs_idx = rhs_format.sel->get_index(i);
if (!lhs_format.validity.RowIsValid(lhs_idx) || !rhs_format.validity.RowIsValid(rhs_idx)) {
FlatVector::SetNull(result, i, true);
continue;
}
const auto left_offset = lhs_idx * N;
if (!lhs_child_validity.CheckAllValid(left_offset + N, left_offset)) {
throw InvalidInputException(StringUtil::Format("%s: left argument can not contain NULL values", func_name));
}
const auto right_offset = rhs_idx * N;
if (!rhs_child_validity.CheckAllValid(right_offset + N, right_offset)) {
throw InvalidInputException(
StringUtil::Format("%s: right argument can not contain NULL values", func_name));
}
const auto result_offset = i * N;
const auto lhs_data_ptr = lhs_data + left_offset;
const auto rhs_data_ptr = rhs_data + right_offset;
const auto res_data_ptr = res_data + result_offset;
OP::Operation(lhs_data_ptr, rhs_data_ptr, res_data_ptr, N);
}
if (count == 1) {
result.SetVectorType(VectorType::CONSTANT_VECTOR);
}
}
//------------------------------------------------------------------------------
// Generic "fold" function
//------------------------------------------------------------------------------
// Given two arrays, combine and reduce their elements into a single scalar value.
template <class TYPE, class OP>
static void ArrayGenericFold(DataChunk &args, ExpressionState &state, Vector &result) {
const auto &lstate = state.Cast<ExecuteFunctionState>();
const auto &expr = lstate.expr.Cast<BoundFunctionExpression>();
const auto &func_name = expr.function.name;
const auto count = args.size();
auto &lhs_child = ArrayVector::GetEntry(args.data[0]);
auto &rhs_child = ArrayVector::GetEntry(args.data[1]);
const auto &lhs_child_validity = FlatVector::Validity(lhs_child);
const auto &rhs_child_validity = FlatVector::Validity(rhs_child);
UnifiedVectorFormat lhs_format;
UnifiedVectorFormat rhs_format;
args.data[0].ToUnifiedFormat(count, lhs_format);
args.data[1].ToUnifiedFormat(count, rhs_format);
auto lhs_data = FlatVector::GetData<TYPE>(lhs_child);
auto rhs_data = FlatVector::GetData<TYPE>(rhs_child);
auto res_data = FlatVector::GetData<TYPE>(result);
const auto array_size = ArrayType::GetSize(args.data[0].GetType());
D_ASSERT(array_size == ArrayType::GetSize(args.data[1].GetType()));
for (idx_t i = 0; i < count; i++) {
const auto lhs_idx = lhs_format.sel->get_index(i);
const auto rhs_idx = rhs_format.sel->get_index(i);
if (!lhs_format.validity.RowIsValid(lhs_idx) || !rhs_format.validity.RowIsValid(rhs_idx)) {
FlatVector::SetNull(result, i, true);
continue;
}
const auto left_offset = lhs_idx * array_size;
if (!lhs_child_validity.CheckAllValid(left_offset + array_size, left_offset)) {
throw InvalidInputException(StringUtil::Format("%s: left argument can not contain NULL values", func_name));
}
const auto right_offset = rhs_idx * array_size;
if (!rhs_child_validity.CheckAllValid(right_offset + array_size, right_offset)) {
throw InvalidInputException(
StringUtil::Format("%s: right argument can not contain NULL values", func_name));
}
const auto lhs_data_ptr = lhs_data + left_offset;
const auto rhs_data_ptr = rhs_data + right_offset;
res_data[i] = OP::Operation(lhs_data_ptr, rhs_data_ptr, array_size);
}
if (count == 1) {
result.SetVectorType(VectorType::CONSTANT_VECTOR);
}
}
//------------------------------------------------------------------------------
// Function Registration
//------------------------------------------------------------------------------
// Note: In the future we could add a wrapper with a non-type template parameter to specialize for specific array sizes
// e.g. 256, 512, 1024, 2048 etc. which may allow the compiler to vectorize the loop better. Perhaps something for an
// extension.
template <class OP>
static void AddArrayFoldFunction(ScalarFunctionSet &set, const LogicalType &type) {
const auto array = LogicalType::ARRAY(type, optional_idx());
if (type.id() == LogicalTypeId::FLOAT) {
ScalarFunction function({array, array}, type, ArrayGenericFold<float, OP>, ArrayGenericBinaryBind);
BaseScalarFunction::SetReturnsError(function);
set.AddFunction(function);
} else if (type.id() == LogicalTypeId::DOUBLE) {
ScalarFunction function({array, array}, type, ArrayGenericFold<double, OP>, ArrayGenericBinaryBind);
BaseScalarFunction::SetReturnsError(function);
set.AddFunction(function);
} else {
throw NotImplementedException("Array function not implemented for type %s", type.ToString());
}
}
ScalarFunctionSet ArrayDistanceFun::GetFunctions() {
ScalarFunctionSet set("array_distance");
for (auto &type : LogicalType::Real()) {
AddArrayFoldFunction<DistanceOp>(set, type);
}
return set;
}
ScalarFunctionSet ArrayInnerProductFun::GetFunctions() {
ScalarFunctionSet set("array_inner_product");
for (auto &type : LogicalType::Real()) {
AddArrayFoldFunction<InnerProductOp>(set, type);
}
return set;
}
ScalarFunctionSet ArrayNegativeInnerProductFun::GetFunctions() {
ScalarFunctionSet set("array_negative_inner_product");
for (auto &type : LogicalType::Real()) {
AddArrayFoldFunction<NegativeInnerProductOp>(set, type);
}
return set;
}
ScalarFunctionSet ArrayCosineSimilarityFun::GetFunctions() {
ScalarFunctionSet set("array_cosine_similarity");
for (auto &type : LogicalType::Real()) {
AddArrayFoldFunction<CosineSimilarityOp>(set, type);
}
return set;
}
ScalarFunctionSet ArrayCosineDistanceFun::GetFunctions() {
ScalarFunctionSet set("array_cosine_distance");
for (auto &type : LogicalType::Real()) {
AddArrayFoldFunction<CosineDistanceOp>(set, type);
}
return set;
}
ScalarFunctionSet ArrayCrossProductFun::GetFunctions() {
ScalarFunctionSet set("array_cross_product");
auto float_array = LogicalType::ARRAY(LogicalType::FLOAT, 3);
auto double_array = LogicalType::ARRAY(LogicalType::DOUBLE, 3);
set.AddFunction(
ScalarFunction({float_array, float_array}, float_array, ArrayFixedCombine<float, CrossProductOp, 3>));
set.AddFunction(
ScalarFunction({double_array, double_array}, double_array, ArrayFixedCombine<double, CrossProductOp, 3>));
for (auto &func : set.functions) {
BaseScalarFunction::SetReturnsError(func);
}
return set;
}
} // namespace duckdb

View File

@@ -0,0 +1,91 @@
#include "core_functions/scalar/array_functions.hpp"
#include "duckdb/function/scalar/nested_functions.hpp"
#include "duckdb/storage/statistics/array_stats.hpp"
#include "duckdb/planner/expression/bound_function_expression.hpp"
namespace duckdb {
namespace {
void ArrayValueFunction(DataChunk &args, ExpressionState &state, Vector &result) {
auto array_type = result.GetType();
D_ASSERT(array_type.id() == LogicalTypeId::ARRAY);
D_ASSERT(args.ColumnCount() == ArrayType::GetSize(array_type));
auto &child_type = ArrayType::GetChildType(array_type);
result.SetVectorType(VectorType::CONSTANT_VECTOR);
for (idx_t i = 0; i < args.ColumnCount(); i++) {
if (args.data[i].GetVectorType() != VectorType::CONSTANT_VECTOR) {
result.SetVectorType(VectorType::FLAT_VECTOR);
}
}
auto num_rows = args.size();
auto num_columns = args.ColumnCount();
auto &child = ArrayVector::GetEntry(result);
if (num_columns > 1) {
// Ensure that the child has a validity mask of the correct size
// The SetValue call below expects the validity mask to be initialized
auto &child_validity = FlatVector::Validity(child);
child_validity.Resize(num_rows * num_columns);
}
for (idx_t i = 0; i < num_rows; i++) {
for (idx_t j = 0; j < num_columns; j++) {
auto val = args.GetValue(j, i).DefaultCastAs(child_type);
child.SetValue((i * num_columns) + j, val);
}
}
result.Verify(args.size());
}
unique_ptr<FunctionData> ArrayValueBind(ClientContext &context, ScalarFunction &bound_function,
vector<unique_ptr<Expression>> &arguments) {
if (arguments.empty()) {
throw InvalidInputException("array_value requires at least one argument");
}
// construct return type
LogicalType child_type = arguments[0]->return_type;
for (idx_t i = 1; i < arguments.size(); i++) {
child_type = LogicalType::MaxLogicalType(context, child_type, arguments[i]->return_type);
}
if (arguments.size() > ArrayType::MAX_ARRAY_SIZE) {
throw OutOfRangeException("Array size exceeds maximum allowed size");
}
// this is more for completeness reasons
bound_function.varargs = child_type;
bound_function.return_type = LogicalType::ARRAY(child_type, arguments.size());
return make_uniq<VariableReturnBindData>(bound_function.return_type);
}
unique_ptr<BaseStatistics> ArrayValueStats(ClientContext &context, FunctionStatisticsInput &input) {
auto &child_stats = input.child_stats;
auto &expr = input.expr;
auto list_stats = ArrayStats::CreateEmpty(expr.return_type);
auto &list_child_stats = ArrayStats::GetChildStats(list_stats);
for (idx_t i = 0; i < child_stats.size(); i++) {
list_child_stats.Merge(child_stats[i]);
}
return list_stats.ToUnique();
}
} // namespace
ScalarFunction ArrayValueFun::GetFunction() {
// the arguments and return types are actually set in the binder function
ScalarFunction fun("array_value", {}, LogicalTypeId::ARRAY, ArrayValueFunction, ArrayValueBind, nullptr,
ArrayValueStats);
fun.varargs = LogicalType::ANY;
fun.null_handling = FunctionNullHandling::SPECIAL_HANDLING;
return fun;
}
} // namespace duckdb

View File

@@ -0,0 +1,60 @@
[
{
"name": "array_value",
"parameters": "any,...",
"description": "Creates an `ARRAY` containing the argument values.",
"example": "array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT)",
"categories": ["array"],
"type": "scalar_function"
},
{
"name": "array_cross_product",
"parameters": "array, array",
"description": "Computes the cross product of two arrays of size 3. The array elements can not be `NULL`.",
"example": "array_cross_product(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))",
"categories": ["array"],
"type": "scalar_function_set"
},
{
"name": "array_cosine_similarity",
"parameters": "array1,array2",
"description": "Computes the cosine similarity between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments.",
"example": "array_cosine_similarity(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))",
"categories": ["array"],
"type": "scalar_function_set"
},
{
"name": "array_cosine_distance",
"parameters": "array1,array2",
"description": "Computes the cosine distance between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments.",
"example": "array_cosine_distance(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))",
"categories": ["array"],
"type": "scalar_function_set"
},
{
"name": "array_distance",
"parameters": "array1,array2",
"description": "Computes the distance between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments.",
"example": "array_distance(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))",
"categories": ["array"],
"type": "scalar_function_set"
},
{
"name": "array_inner_product",
"parameters": "array1,array2",
"description": "Computes the inner product between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments.",
"example": "array_inner_product(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))",
"categories": ["array"],
"type": "scalar_function_set",
"aliases": ["array_dot_product"]
},
{
"name": "array_negative_inner_product",
"parameters": "array1,array2",
"description": "Computes the negative inner product between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments.",
"example": "array_negative_inner_product(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))",
"categories": ["array"],
"type": "scalar_function_set",
"aliases": ["array_negative_dot_product"]
}
]

View File

@@ -0,0 +1,4 @@
add_library_unity(duckdb_core_functions_bit OBJECT bitstring.cpp)
set(CORE_FUNCTION_FILES
${CORE_FUNCTION_FILES} $<TARGET_OBJECTS:duckdb_core_functions_bit>
PARENT_SCOPE)

View File

@@ -0,0 +1,129 @@
#include "core_functions/scalar/bit_functions.hpp"
#include "duckdb/common/types/bit.hpp"
#include "duckdb/common/types/cast_helpers.hpp"
namespace duckdb {
//===--------------------------------------------------------------------===//
// BitStringFunction
//===--------------------------------------------------------------------===//
template <bool FROM_STRING>
static void BitStringFunction(DataChunk &args, ExpressionState &state, Vector &result) {
BinaryExecutor::Execute<string_t, int32_t, string_t>(
args.data[0], args.data[1], result, args.size(), [&](string_t input, int32_t n) {
if (n < 0) {
throw InvalidInputException("The bitstring length cannot be negative");
}
idx_t input_length;
if (FROM_STRING) {
input_length = input.GetSize();
} else {
input_length = Bit::BitLength(input);
}
if (idx_t(n) < input_length) {
throw InvalidInputException("Length must be equal or larger than input string");
}
idx_t len;
if (FROM_STRING) {
Bit::TryGetBitStringSize(input, len, nullptr); // string verification
}
len = Bit::ComputeBitstringLen(UnsafeNumericCast<idx_t>(n));
string_t target = StringVector::EmptyString(result, len);
if (FROM_STRING) {
Bit::BitString(input, UnsafeNumericCast<idx_t>(n), target);
} else {
Bit::ExtendBitString(input, UnsafeNumericCast<idx_t>(n), target);
}
target.Finalize();
return target;
});
}
ScalarFunctionSet BitStringFun::GetFunctions() {
ScalarFunctionSet bitstring;
bitstring.AddFunction(
ScalarFunction({LogicalType::VARCHAR, LogicalType::INTEGER}, LogicalType::BIT, BitStringFunction<true>));
bitstring.AddFunction(
ScalarFunction({LogicalType::BIT, LogicalType::INTEGER}, LogicalType::BIT, BitStringFunction<false>));
for (auto &func : bitstring.functions) {
BaseScalarFunction::SetReturnsError(func);
}
return bitstring;
}
//===--------------------------------------------------------------------===//
// get_bit
//===--------------------------------------------------------------------===//
namespace {
struct GetBitOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA input, TB n) {
if (n < 0 || (idx_t)n > Bit::BitLength(input) - 1) {
throw OutOfRangeException("bit index %s out of valid range (0..%s)", NumericHelper::ToString(n),
NumericHelper::ToString(Bit::BitLength(input) - 1));
}
return UnsafeNumericCast<TR>(Bit::GetBit(input, UnsafeNumericCast<idx_t>(n)));
}
};
} // namespace
ScalarFunction GetBitFun::GetFunction() {
ScalarFunction func({LogicalType::BIT, LogicalType::INTEGER}, LogicalType::INTEGER,
ScalarFunction::BinaryFunction<string_t, int32_t, int32_t, GetBitOperator>);
BaseScalarFunction::SetReturnsError(func);
return func;
}
//===--------------------------------------------------------------------===//
// set_bit
//===--------------------------------------------------------------------===//
static void SetBitOperation(DataChunk &args, ExpressionState &state, Vector &result) {
TernaryExecutor::Execute<string_t, int32_t, int32_t, string_t>(
args.data[0], args.data[1], args.data[2], result, args.size(),
[&](string_t input, int32_t n, int32_t new_value) {
if (new_value != 0 && new_value != 1) {
throw InvalidInputException("The new bit must be 1 or 0");
}
if (n < 0 || (idx_t)n > Bit::BitLength(input) - 1) {
throw OutOfRangeException("bit index %s out of valid range (0..%s)", NumericHelper::ToString(n),
NumericHelper::ToString(Bit::BitLength(input) - 1));
}
string_t target = StringVector::EmptyString(result, input.GetSize());
memcpy(target.GetDataWriteable(), input.GetData(), input.GetSize());
Bit::SetBit(target, UnsafeNumericCast<idx_t>(n), UnsafeNumericCast<idx_t>(new_value));
return target;
});
}
ScalarFunction SetBitFun::GetFunction() {
ScalarFunction function({LogicalType::BIT, LogicalType::INTEGER, LogicalType::INTEGER}, LogicalType::BIT,
SetBitOperation);
BaseScalarFunction::SetReturnsError(function);
return function;
}
//===--------------------------------------------------------------------===//
// bit_position
//===--------------------------------------------------------------------===//
namespace {
struct BitPositionOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA substring, TB input) {
if (substring.GetSize() > input.GetSize()) {
return 0;
}
return UnsafeNumericCast<TR>(Bit::BitPosition(substring, input));
}
};
} // namespace
ScalarFunction BitPositionFun::GetFunction() {
return ScalarFunction({LogicalType::BIT, LogicalType::BIT}, LogicalType::INTEGER,
ScalarFunction::BinaryFunction<string_t, string_t, int32_t, BitPositionOperator>);
}
} // namespace duckdb

View File

@@ -0,0 +1,31 @@
[
{
"name": "get_bit",
"parameters": "bitstring,index",
"description": "Extracts the nth bit from bitstring; the first (leftmost) bit is indexed 0",
"example": "get_bit('0110010'::BIT, 2)",
"type": "scalar_function"
},
{
"name": "set_bit",
"parameters": "bitstring,index,new_value",
"description": "Sets the nth bit in bitstring to newvalue; the first (leftmost) bit is indexed 0. Returns a new bitstring",
"example": "set_bit('0110010'::BIT, 2, 0)",
"type": "scalar_function"
},
{
"name": "bit_position",
"parameters": "substring,bitstring",
"description": "Returns first starting index of the specified substring within bits, or zero if it is not present. The first (leftmost) bit is indexed 1",
"example": "bit_position('010'::BIT, '1110101'::BIT)",
"type": "scalar_function"
},
{
"name": "bitstring",
"parameters": "bitstring,length",
"description": "Pads the bitstring until the specified length",
"example": "bitstring('1010'::BIT, 7)",
"struct": "BitStringFun",
"type": "scalar_function_set"
}
]

View File

@@ -0,0 +1,4 @@
add_library_unity(duckdb_core_functions_blob OBJECT base64.cpp encode.cpp)
set(CORE_FUNCTION_FILES
${CORE_FUNCTION_FILES} $<TARGET_OBJECTS:duckdb_core_functions_blob>
PARENT_SCOPE)

View File

@@ -0,0 +1,50 @@
#include "core_functions/scalar/blob_functions.hpp"
#include "duckdb/common/types/blob.hpp"
namespace duckdb {
namespace {
struct Base64EncodeOperator {
template <class INPUT_TYPE, class RESULT_TYPE>
static RESULT_TYPE Operation(INPUT_TYPE input, Vector &result) {
auto result_str = StringVector::EmptyString(result, Blob::ToBase64Size(input));
Blob::ToBase64(input, result_str.GetDataWriteable());
result_str.Finalize();
return result_str;
}
};
struct Base64DecodeOperator {
template <class INPUT_TYPE, class RESULT_TYPE>
static RESULT_TYPE Operation(INPUT_TYPE input, Vector &result) {
auto result_size = Blob::FromBase64Size(input);
auto result_blob = StringVector::EmptyString(result, result_size);
Blob::FromBase64(input, data_ptr_cast(result_blob.GetDataWriteable()), result_size);
result_blob.Finalize();
return result_blob;
}
};
void Base64EncodeFunction(DataChunk &args, ExpressionState &state, Vector &result) {
// decode is also a nop cast, but requires verification if the provided string is actually
UnaryExecutor::ExecuteString<string_t, string_t, Base64EncodeOperator>(args.data[0], result, args.size());
}
void Base64DecodeFunction(DataChunk &args, ExpressionState &state, Vector &result) {
// decode is also a nop cast, but requires verification if the provided string is actually
UnaryExecutor::ExecuteString<string_t, string_t, Base64DecodeOperator>(args.data[0], result, args.size());
}
} // namespace
ScalarFunction ToBase64Fun::GetFunction() {
return ScalarFunction({LogicalType::BLOB}, LogicalType::VARCHAR, Base64EncodeFunction);
}
ScalarFunction FromBase64Fun::GetFunction() {
ScalarFunction function({LogicalType::VARCHAR}, LogicalType::BLOB, Base64DecodeFunction);
BaseScalarFunction::SetReturnsError(function);
return function;
}
} // namespace duckdb

View File

@@ -0,0 +1,46 @@
#include "core_functions/scalar/blob_functions.hpp"
#include "utf8proc_wrapper.hpp"
#include "duckdb/common/exception/conversion_exception.hpp"
namespace duckdb {
namespace {
void EncodeFunction(DataChunk &args, ExpressionState &state, Vector &result) {
// encode is essentially a nop cast from varchar to blob
// we only need to reinterpret the data using the blob type
result.Reinterpret(args.data[0]);
}
struct BlobDecodeOperator {
template <class INPUT_TYPE, class RESULT_TYPE>
static RESULT_TYPE Operation(INPUT_TYPE input) {
auto input_data = input.GetData();
auto input_length = input.GetSize();
if (Utf8Proc::Analyze(input_data, input_length) == UnicodeType::INVALID) {
throw ConversionException(
"Failure in decode: could not convert blob to UTF8 string, the blob contained invalid UTF8 characters");
}
return input;
}
};
void DecodeFunction(DataChunk &args, ExpressionState &state, Vector &result) {
// decode is also a nop cast, but requires verification if the provided string is actually
UnaryExecutor::Execute<string_t, string_t, BlobDecodeOperator>(args.data[0], result, args.size());
StringVector::AddHeapReference(result, args.data[0]);
}
} // namespace
ScalarFunction EncodeFun::GetFunction() {
return ScalarFunction({LogicalType::VARCHAR}, LogicalType::BLOB, EncodeFunction);
}
ScalarFunction DecodeFun::GetFunction() {
ScalarFunction function({LogicalType::BLOB}, LogicalType::VARCHAR, DecodeFunction);
BaseScalarFunction::SetReturnsError(function);
return function;
}
} // namespace duckdb

View File

@@ -0,0 +1,35 @@
[
{
"name": "decode",
"parameters": "blob",
"description": "Converts `blob` to `VARCHAR`. Fails if `blob` is not valid UTF-8.",
"example": "decode('\\xC3\\xBC'::BLOB)",
"type": "scalar_function",
"categories": ["blob"]
},
{
"name": "encode",
"parameters": "string",
"description": "Converts the `string` to `BLOB`. Converts UTF-8 characters into literal encoding.",
"example": "encode('my_string_with_\u00fc')",
"type": "scalar_function",
"categories": ["blob"]
},
{
"name": "from_base64",
"parameters": "string",
"description": "Converts a base64 encoded `string` to a character string (`BLOB`).",
"example": "from_base64('QQ==')",
"type": "scalar_function",
"categories": ["string", "blob"]
},
{
"name": "to_base64",
"parameters": "blob",
"description": "Converts a `blob` to a base64 encoded string.",
"example": "to_base64('A'::BLOB)",
"type": "scalar_function",
"categories": ["string", "blob"],
"aliases": ["base64"]
}
]

View File

@@ -0,0 +1,27 @@
add_library_unity(
duckdb_core_functions_date
OBJECT
current.cpp
age.cpp
date_diff.cpp
date_sub.cpp
to_interval.cpp
time_bucket.cpp
date_trunc.cpp
epoch.cpp
date_part.cpp
make_date.cpp)
set(CORE_FUNCTION_FILES
${CORE_FUNCTION_FILES} $<TARGET_OBJECTS:duckdb_core_functions_date>
PARENT_SCOPE)
# https://learn.microsoft.com/en-us/cpp/build/reference/ob-inline-function-expansion?view=msvc-170
# Lower functions inlining, /Ob2 causes ICE with ARM64 toolchain (both
# cross-compile and native): error MSB6006: "CL.exe" exited with code -529706956
if(MSVC
AND (CMAKE_VS_PLATFORM_NAME STREQUAL "ARM64")
AND (CMAKE_BUILD_TYPE STREQUAL "Release" OR CMAKE_BUILD_TYPE STREQUAL
"RelWithDebInfo"))
set_source_files_properties(ub_duckdb_core_functions_date
PROPERTIES COMPILE_OPTIONS /Ob1)
endif()

View File

@@ -0,0 +1,55 @@
#include "core_functions/scalar/date_functions.hpp"
#include "duckdb/common/types/interval.hpp"
#include "duckdb/common/types/time.hpp"
#include "duckdb/common/types/timestamp.hpp"
#include "duckdb/common/vector_operations/vector_operations.hpp"
#include "duckdb/common/vector_operations/unary_executor.hpp"
#include "duckdb/common/vector_operations/binary_executor.hpp"
#include "duckdb/transaction/meta_transaction.hpp"
namespace duckdb {
static void AgeFunctionStandard(DataChunk &input, ExpressionState &state, Vector &result) {
D_ASSERT(input.ColumnCount() == 1);
// Subtract argument from current_date (at midnight)
// Theoretically, this should be TZ-sensitive, but since we have to be able to handle
// plain TZ when ICU is not loaded, we implement this in UTC (like everything else)
// To get the PG behaviour, we overload these functions in ICU for TSTZ arguments.
auto current_date = Timestamp::FromDatetime(
Timestamp::GetDate(MetaTransaction::Get(state.GetContext()).start_timestamp), dtime_t(0));
UnaryExecutor::ExecuteWithNulls<timestamp_t, interval_t>(input.data[0], result, input.size(),
[&](timestamp_t input, ValidityMask &mask, idx_t idx) {
if (Timestamp::IsFinite(input)) {
return Interval::GetAge(current_date, input);
} else {
mask.SetInvalid(idx);
return interval_t();
}
});
}
static void AgeFunction(DataChunk &input, ExpressionState &state, Vector &result) {
D_ASSERT(input.ColumnCount() == 2);
BinaryExecutor::ExecuteWithNulls<timestamp_t, timestamp_t, interval_t>(
input.data[0], input.data[1], result, input.size(),
[&](timestamp_t input1, timestamp_t input2, ValidityMask &mask, idx_t idx) {
if (Timestamp::IsFinite(input1) && Timestamp::IsFinite(input2)) {
return Interval::GetAge(input1, input2);
} else {
mask.SetInvalid(idx);
return interval_t();
}
});
}
ScalarFunctionSet AgeFun::GetFunctions() {
ScalarFunctionSet age("age");
age.AddFunction(ScalarFunction({LogicalType::TIMESTAMP}, LogicalType::INTERVAL, AgeFunctionStandard));
age.AddFunction(
ScalarFunction({LogicalType::TIMESTAMP, LogicalType::TIMESTAMP}, LogicalType::INTERVAL, AgeFunction));
return age;
}
} // namespace duckdb

View File

@@ -0,0 +1,30 @@
#include "core_functions/scalar/date_functions.hpp"
#include "duckdb/common/exception.hpp"
#include "duckdb/common/operator/cast_operators.hpp"
#include "duckdb/common/types/timestamp.hpp"
#include "duckdb/common/vector_operations/vector_operations.hpp"
#include "duckdb/main/client_context.hpp"
#include "duckdb/planner/expression/bound_function_expression.hpp"
#include "duckdb/transaction/meta_transaction.hpp"
#include "duckdb/planner/expression/bound_cast_expression.hpp"
namespace duckdb {
static timestamp_t GetTransactionTimestamp(ExpressionState &state) {
return MetaTransaction::Get(state.GetContext()).start_timestamp;
}
static void CurrentTimestampFunction(DataChunk &input, ExpressionState &state, Vector &result) {
D_ASSERT(input.ColumnCount() == 0);
auto ts = GetTransactionTimestamp(state);
auto val = Value::TIMESTAMPTZ(timestamp_tz_t(ts));
result.Reference(val);
}
ScalarFunction GetCurrentTimestampFun::GetFunction() {
ScalarFunction current_timestamp({}, LogicalType::TIMESTAMP_TZ, CurrentTimestampFunction);
current_timestamp.stability = FunctionStability::CONSISTENT_WITHIN_QUERY;
return current_timestamp;
}
} // namespace duckdb

View File

@@ -0,0 +1,458 @@
#include "core_functions/scalar/date_functions.hpp"
#include "duckdb/common/enums/date_part_specifier.hpp"
#include "duckdb/common/exception.hpp"
#include "duckdb/common/operator/subtract.hpp"
#include "duckdb/common/types/date.hpp"
#include "duckdb/common/types/interval.hpp"
#include "duckdb/common/types/time.hpp"
#include "duckdb/common/types/timestamp.hpp"
#include "duckdb/common/vector_operations/ternary_executor.hpp"
#include "duckdb/common/vector_operations/vector_operations.hpp"
#include "duckdb/common/string_util.hpp"
namespace duckdb {
// This function is an implementation of the "period-crossing" date difference function from T-SQL
// https://docs.microsoft.com/en-us/sql/t-sql/functions/datediff-transact-sql?view=sql-server-ver15
namespace {
struct DateDiff {
template <class TA, class TB, class TR, class OP>
static inline void BinaryExecute(Vector &left, Vector &right, Vector &result, idx_t count) {
BinaryExecutor::ExecuteWithNulls<TA, TB, TR>(
left, right, result, count, [&](TA startdate, TB enddate, ValidityMask &mask, idx_t idx) {
if (Value::IsFinite(startdate) && Value::IsFinite(enddate)) {
return OP::template Operation<TA, TB, TR>(startdate, enddate);
} else {
mask.SetInvalid(idx);
return TR();
}
});
}
// We need to truncate down, not towards 0
static inline int64_t Truncate(int64_t value, int64_t units) {
return (value + (value < 0)) / units - (value < 0);
}
static inline int64_t Diff(int64_t start, int64_t end, int64_t units) {
return Truncate(end, units) - Truncate(start, units);
}
struct YearOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
return Date::ExtractYear(enddate) - Date::ExtractYear(startdate);
}
};
struct MonthOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
int32_t start_year, start_month, start_day;
Date::Convert(startdate, start_year, start_month, start_day);
int32_t end_year, end_month, end_day;
Date::Convert(enddate, end_year, end_month, end_day);
return (end_year * 12 + end_month - 1) - (start_year * 12 + start_month - 1);
}
};
struct DayOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
return TR(Date::EpochDays(enddate)) - TR(Date::EpochDays(startdate));
}
};
struct DecadeOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
return Date::ExtractYear(enddate) / 10 - Date::ExtractYear(startdate) / 10;
}
};
struct CenturyOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
return Date::ExtractYear(enddate) / 100 - Date::ExtractYear(startdate) / 100;
}
};
struct MilleniumOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
return Date::ExtractYear(enddate) / 1000 - Date::ExtractYear(startdate) / 1000;
}
};
struct QuarterOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
int32_t start_year, start_month, start_day;
Date::Convert(startdate, start_year, start_month, start_day);
int32_t end_year, end_month, end_day;
Date::Convert(enddate, end_year, end_month, end_day);
return (end_year * 12 + end_month - 1) / Interval::MONTHS_PER_QUARTER -
(start_year * 12 + start_month - 1) / Interval::MONTHS_PER_QUARTER;
}
};
struct WeekOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
// Weeks do not count Monday crossings, just distance
return (enddate.days - startdate.days) / Interval::DAYS_PER_WEEK;
}
};
struct ISOYearOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
return Date::ExtractISOYearNumber(enddate) - Date::ExtractISOYearNumber(startdate);
}
};
struct MicrosecondsOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
return Date::EpochMicroseconds(enddate) - Date::EpochMicroseconds(startdate);
}
};
struct MillisecondsOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
return Date::EpochMicroseconds(enddate) / Interval::MICROS_PER_MSEC -
Date::EpochMicroseconds(startdate) / Interval::MICROS_PER_MSEC;
}
};
struct SecondsOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
return Date::Epoch(enddate) - Date::Epoch(startdate);
}
};
struct MinutesOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
return Date::Epoch(enddate) / Interval::SECS_PER_MINUTE -
Date::Epoch(startdate) / Interval::SECS_PER_MINUTE;
}
};
struct HoursOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
return Date::Epoch(enddate) / Interval::SECS_PER_HOUR - Date::Epoch(startdate) / Interval::SECS_PER_HOUR;
}
};
};
// TIMESTAMP specialisations
template <>
int64_t DateDiff::YearOperator::Operation(timestamp_t startdate, timestamp_t enddate) {
return YearOperator::Operation<date_t, date_t, int64_t>(Timestamp::GetDate(startdate), Timestamp::GetDate(enddate));
}
template <>
int64_t DateDiff::MonthOperator::Operation(timestamp_t startdate, timestamp_t enddate) {
return MonthOperator::Operation<date_t, date_t, int64_t>(Timestamp::GetDate(startdate),
Timestamp::GetDate(enddate));
}
template <>
int64_t DateDiff::DayOperator::Operation(timestamp_t startdate, timestamp_t enddate) {
return DayOperator::Operation<date_t, date_t, int64_t>(Timestamp::GetDate(startdate), Timestamp::GetDate(enddate));
}
template <>
int64_t DateDiff::DecadeOperator::Operation(timestamp_t startdate, timestamp_t enddate) {
return DecadeOperator::Operation<date_t, date_t, int64_t>(Timestamp::GetDate(startdate),
Timestamp::GetDate(enddate));
}
template <>
int64_t DateDiff::CenturyOperator::Operation(timestamp_t startdate, timestamp_t enddate) {
return CenturyOperator::Operation<date_t, date_t, int64_t>(Timestamp::GetDate(startdate),
Timestamp::GetDate(enddate));
}
template <>
int64_t DateDiff::MilleniumOperator::Operation(timestamp_t startdate, timestamp_t enddate) {
return MilleniumOperator::Operation<date_t, date_t, int64_t>(Timestamp::GetDate(startdate),
Timestamp::GetDate(enddate));
}
template <>
int64_t DateDiff::QuarterOperator::Operation(timestamp_t startdate, timestamp_t enddate) {
return QuarterOperator::Operation<date_t, date_t, int64_t>(Timestamp::GetDate(startdate),
Timestamp::GetDate(enddate));
}
template <>
int64_t DateDiff::WeekOperator::Operation(timestamp_t startdate, timestamp_t enddate) {
return WeekOperator::Operation<date_t, date_t, int64_t>(Timestamp::GetDate(startdate), Timestamp::GetDate(enddate));
}
template <>
int64_t DateDiff::ISOYearOperator::Operation(timestamp_t startdate, timestamp_t enddate) {
return ISOYearOperator::Operation<date_t, date_t, int64_t>(Timestamp::GetDate(startdate),
Timestamp::GetDate(enddate));
}
template <>
int64_t DateDiff::MicrosecondsOperator::Operation(timestamp_t startdate, timestamp_t enddate) {
const auto start = Timestamp::GetEpochMicroSeconds(startdate);
const auto end = Timestamp::GetEpochMicroSeconds(enddate);
return SubtractOperatorOverflowCheck::Operation<int64_t, int64_t, int64_t>(end, start);
}
template <>
int64_t DateDiff::MillisecondsOperator::Operation(timestamp_t startdate, timestamp_t enddate) {
D_ASSERT(Timestamp::IsFinite(startdate));
D_ASSERT(Timestamp::IsFinite(enddate));
return Diff(startdate.value, enddate.value, Interval::MICROS_PER_MSEC);
}
template <>
int64_t DateDiff::SecondsOperator::Operation(timestamp_t startdate, timestamp_t enddate) {
D_ASSERT(Timestamp::IsFinite(startdate));
D_ASSERT(Timestamp::IsFinite(enddate));
return Diff(startdate.value, enddate.value, Interval::MICROS_PER_SEC);
}
template <>
int64_t DateDiff::MinutesOperator::Operation(timestamp_t startdate, timestamp_t enddate) {
D_ASSERT(Timestamp::IsFinite(startdate));
D_ASSERT(Timestamp::IsFinite(enddate));
return Diff(startdate.value, enddate.value, Interval::MICROS_PER_MINUTE);
}
template <>
int64_t DateDiff::HoursOperator::Operation(timestamp_t startdate, timestamp_t enddate) {
D_ASSERT(Timestamp::IsFinite(startdate));
D_ASSERT(Timestamp::IsFinite(enddate));
return Diff(startdate.value, enddate.value, Interval::MICROS_PER_HOUR);
}
// TIME specialisations
template <>
int64_t DateDiff::YearOperator::Operation(dtime_t startdate, dtime_t enddate) {
throw NotImplementedException("\"time\" units \"year\" not recognized");
}
template <>
int64_t DateDiff::MonthOperator::Operation(dtime_t startdate, dtime_t enddate) {
throw NotImplementedException("\"time\" units \"month\" not recognized");
}
template <>
int64_t DateDiff::DayOperator::Operation(dtime_t startdate, dtime_t enddate) {
throw NotImplementedException("\"time\" units \"day\" not recognized");
}
template <>
int64_t DateDiff::DecadeOperator::Operation(dtime_t startdate, dtime_t enddate) {
throw NotImplementedException("\"time\" units \"decade\" not recognized");
}
template <>
int64_t DateDiff::CenturyOperator::Operation(dtime_t startdate, dtime_t enddate) {
throw NotImplementedException("\"time\" units \"century\" not recognized");
}
template <>
int64_t DateDiff::MilleniumOperator::Operation(dtime_t startdate, dtime_t enddate) {
throw NotImplementedException("\"time\" units \"millennium\" not recognized");
}
template <>
int64_t DateDiff::QuarterOperator::Operation(dtime_t startdate, dtime_t enddate) {
throw NotImplementedException("\"time\" units \"quarter\" not recognized");
}
template <>
int64_t DateDiff::WeekOperator::Operation(dtime_t startdate, dtime_t enddate) {
throw NotImplementedException("\"time\" units \"week\" not recognized");
}
template <>
int64_t DateDiff::ISOYearOperator::Operation(dtime_t startdate, dtime_t enddate) {
throw NotImplementedException("\"time\" units \"isoyear\" not recognized");
}
template <>
int64_t DateDiff::MicrosecondsOperator::Operation(dtime_t startdate, dtime_t enddate) {
return enddate.micros - startdate.micros;
}
template <>
int64_t DateDiff::MillisecondsOperator::Operation(dtime_t startdate, dtime_t enddate) {
return enddate.micros / Interval::MICROS_PER_MSEC - startdate.micros / Interval::MICROS_PER_MSEC;
}
template <>
int64_t DateDiff::SecondsOperator::Operation(dtime_t startdate, dtime_t enddate) {
return enddate.micros / Interval::MICROS_PER_SEC - startdate.micros / Interval::MICROS_PER_SEC;
}
template <>
int64_t DateDiff::MinutesOperator::Operation(dtime_t startdate, dtime_t enddate) {
return enddate.micros / Interval::MICROS_PER_MINUTE - startdate.micros / Interval::MICROS_PER_MINUTE;
}
template <>
int64_t DateDiff::HoursOperator::Operation(dtime_t startdate, dtime_t enddate) {
return enddate.micros / Interval::MICROS_PER_HOUR - startdate.micros / Interval::MICROS_PER_HOUR;
}
template <typename TA, typename TB, typename TR>
int64_t DifferenceDates(DatePartSpecifier type, TA startdate, TB enddate) {
switch (type) {
case DatePartSpecifier::YEAR:
return DateDiff::YearOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::MONTH:
return DateDiff::MonthOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::DAY:
case DatePartSpecifier::DOW:
case DatePartSpecifier::ISODOW:
case DatePartSpecifier::DOY:
case DatePartSpecifier::JULIAN_DAY:
return DateDiff::DayOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::DECADE:
return DateDiff::DecadeOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::CENTURY:
return DateDiff::CenturyOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::MILLENNIUM:
return DateDiff::MilleniumOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::QUARTER:
return DateDiff::QuarterOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::WEEK:
case DatePartSpecifier::YEARWEEK:
return DateDiff::WeekOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::ISOYEAR:
return DateDiff::ISOYearOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::MICROSECONDS:
return DateDiff::MicrosecondsOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::MILLISECONDS:
return DateDiff::MillisecondsOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::SECOND:
case DatePartSpecifier::EPOCH:
return DateDiff::SecondsOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::MINUTE:
return DateDiff::MinutesOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::HOUR:
return DateDiff::HoursOperator::template Operation<TA, TB, TR>(startdate, enddate);
default:
throw NotImplementedException("Specifier type not implemented for DATEDIFF");
}
}
struct DateDiffTernaryOperator {
template <typename TS, typename TA, typename TB, typename TR>
static inline TR Operation(TS part, TA startdate, TB enddate, ValidityMask &mask, idx_t idx) {
if (Value::IsFinite(startdate) && Value::IsFinite(enddate)) {
return DifferenceDates<TA, TB, TR>(GetDatePartSpecifier(part.GetString()), startdate, enddate);
} else {
mask.SetInvalid(idx);
return TR();
}
}
};
template <typename TA, typename TB, typename TR>
void DateDiffBinaryExecutor(DatePartSpecifier type, Vector &left, Vector &right, Vector &result, idx_t count) {
switch (type) {
case DatePartSpecifier::YEAR:
DateDiff::BinaryExecute<TA, TB, TR, DateDiff::YearOperator>(left, right, result, count);
break;
case DatePartSpecifier::MONTH:
DateDiff::BinaryExecute<TA, TB, TR, DateDiff::MonthOperator>(left, right, result, count);
break;
case DatePartSpecifier::DAY:
case DatePartSpecifier::DOW:
case DatePartSpecifier::ISODOW:
case DatePartSpecifier::DOY:
case DatePartSpecifier::JULIAN_DAY:
DateDiff::BinaryExecute<TA, TB, TR, DateDiff::DayOperator>(left, right, result, count);
break;
case DatePartSpecifier::DECADE:
DateDiff::BinaryExecute<TA, TB, TR, DateDiff::DecadeOperator>(left, right, result, count);
break;
case DatePartSpecifier::CENTURY:
DateDiff::BinaryExecute<TA, TB, TR, DateDiff::CenturyOperator>(left, right, result, count);
break;
case DatePartSpecifier::MILLENNIUM:
DateDiff::BinaryExecute<TA, TB, TR, DateDiff::MilleniumOperator>(left, right, result, count);
break;
case DatePartSpecifier::QUARTER:
DateDiff::BinaryExecute<TA, TB, TR, DateDiff::QuarterOperator>(left, right, result, count);
break;
case DatePartSpecifier::WEEK:
case DatePartSpecifier::YEARWEEK:
DateDiff::BinaryExecute<TA, TB, TR, DateDiff::WeekOperator>(left, right, result, count);
break;
case DatePartSpecifier::ISOYEAR:
DateDiff::BinaryExecute<TA, TB, TR, DateDiff::ISOYearOperator>(left, right, result, count);
break;
case DatePartSpecifier::MICROSECONDS:
DateDiff::BinaryExecute<TA, TB, TR, DateDiff::MicrosecondsOperator>(left, right, result, count);
break;
case DatePartSpecifier::MILLISECONDS:
DateDiff::BinaryExecute<TA, TB, TR, DateDiff::MillisecondsOperator>(left, right, result, count);
break;
case DatePartSpecifier::SECOND:
case DatePartSpecifier::EPOCH:
DateDiff::BinaryExecute<TA, TB, TR, DateDiff::SecondsOperator>(left, right, result, count);
break;
case DatePartSpecifier::MINUTE:
DateDiff::BinaryExecute<TA, TB, TR, DateDiff::MinutesOperator>(left, right, result, count);
break;
case DatePartSpecifier::HOUR:
DateDiff::BinaryExecute<TA, TB, TR, DateDiff::HoursOperator>(left, right, result, count);
break;
default:
throw NotImplementedException("Specifier type not implemented for DATEDIFF");
}
}
template <typename T>
void DateDiffFunction(DataChunk &args, ExpressionState &state, Vector &result) {
D_ASSERT(args.ColumnCount() == 3);
auto &part_arg = args.data[0];
auto &start_arg = args.data[1];
auto &end_arg = args.data[2];
if (part_arg.GetVectorType() == VectorType::CONSTANT_VECTOR) {
// Common case of constant part.
if (ConstantVector::IsNull(part_arg)) {
result.SetVectorType(VectorType::CONSTANT_VECTOR);
ConstantVector::SetNull(result, true);
} else {
const auto type = GetDatePartSpecifier(ConstantVector::GetData<string_t>(part_arg)->GetString());
DateDiffBinaryExecutor<T, T, int64_t>(type, start_arg, end_arg, result, args.size());
}
} else {
TernaryExecutor::ExecuteWithNulls<string_t, T, T, int64_t>(
part_arg, start_arg, end_arg, result, args.size(),
DateDiffTernaryOperator::Operation<string_t, T, T, int64_t>);
}
}
} // namespace
ScalarFunctionSet DateDiffFun::GetFunctions() {
ScalarFunctionSet date_diff("date_diff");
date_diff.AddFunction(ScalarFunction({LogicalType::VARCHAR, LogicalType::DATE, LogicalType::DATE},
LogicalType::BIGINT, DateDiffFunction<date_t>));
date_diff.AddFunction(ScalarFunction({LogicalType::VARCHAR, LogicalType::TIMESTAMP, LogicalType::TIMESTAMP},
LogicalType::BIGINT, DateDiffFunction<timestamp_t>));
date_diff.AddFunction(ScalarFunction({LogicalType::VARCHAR, LogicalType::TIME, LogicalType::TIME},
LogicalType::BIGINT, DateDiffFunction<dtime_t>));
return date_diff;
}
} // namespace duckdb

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,458 @@
#include "core_functions/scalar/date_functions.hpp"
#include "duckdb/common/enums/date_part_specifier.hpp"
#include "duckdb/common/exception.hpp"
#include "duckdb/common/operator/subtract.hpp"
#include "duckdb/common/types/date.hpp"
#include "duckdb/common/types/interval.hpp"
#include "duckdb/common/types/time.hpp"
#include "duckdb/common/types/timestamp.hpp"
#include "duckdb/common/vector_operations/ternary_executor.hpp"
#include "duckdb/common/vector_operations/vector_operations.hpp"
#include "duckdb/common/string_util.hpp"
namespace duckdb {
namespace {
struct DateSub {
static int64_t SubtractMicros(timestamp_t startdate, timestamp_t enddate) {
const auto start = Timestamp::GetEpochMicroSeconds(startdate);
const auto end = Timestamp::GetEpochMicroSeconds(enddate);
return SubtractOperatorOverflowCheck::Operation<int64_t, int64_t, int64_t>(end, start);
}
template <class TA, class TB, class TR, class OP>
static inline void BinaryExecute(Vector &left, Vector &right, Vector &result, idx_t count) {
BinaryExecutor::ExecuteWithNulls<TA, TB, TR>(
left, right, result, count, [&](TA startdate, TB enddate, ValidityMask &mask, idx_t idx) {
if (Value::IsFinite(startdate) && Value::IsFinite(enddate)) {
return OP::template Operation<TA, TB, TR>(startdate, enddate);
} else {
mask.SetInvalid(idx);
return TR();
}
});
}
struct MonthOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA start_ts, TB end_ts) {
if (start_ts > end_ts) {
return -MonthOperator::Operation<TA, TB, TR>(end_ts, start_ts);
}
// The number of complete months depends on whether end_ts is on the last day of the month.
date_t end_date;
dtime_t end_time;
Timestamp::Convert(end_ts, end_date, end_time);
int32_t yyyy, mm, dd;
Date::Convert(end_date, yyyy, mm, dd);
const auto end_days = Date::MonthDays(yyyy, mm);
if (end_days == dd) {
// Now check whether the start day is after the end day
date_t start_date;
dtime_t start_time;
Timestamp::Convert(start_ts, start_date, start_time);
Date::Convert(start_date, yyyy, mm, dd);
if (dd > end_days || (dd == end_days && start_time < end_time)) {
// Move back to the same time on the last day of the (shorter) end month
start_date = Date::FromDate(yyyy, mm, end_days);
start_ts = Timestamp::FromDatetime(start_date, start_time);
}
}
// Our interval difference will now give the correct result.
// Note that PG gives different interval subtraction results,
// so if we change this we will have to reimplement.
return Interval::GetAge(end_ts, start_ts).months;
}
};
struct QuarterOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA start_ts, TB end_ts) {
return MonthOperator::Operation<TA, TB, TR>(start_ts, end_ts) / Interval::MONTHS_PER_QUARTER;
}
};
struct YearOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA start_ts, TB end_ts) {
return MonthOperator::Operation<TA, TB, TR>(start_ts, end_ts) / Interval::MONTHS_PER_YEAR;
}
};
struct DecadeOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA start_ts, TB end_ts) {
return MonthOperator::Operation<TA, TB, TR>(start_ts, end_ts) / Interval::MONTHS_PER_DECADE;
}
};
struct CenturyOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA start_ts, TB end_ts) {
return MonthOperator::Operation<TA, TB, TR>(start_ts, end_ts) / Interval::MONTHS_PER_CENTURY;
}
};
struct MilleniumOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA start_ts, TB end_ts) {
return MonthOperator::Operation<TA, TB, TR>(start_ts, end_ts) / Interval::MONTHS_PER_MILLENIUM;
}
};
struct DayOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
return SubtractMicros(startdate, enddate) / Interval::MICROS_PER_DAY;
}
};
struct WeekOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
return SubtractMicros(startdate, enddate) / Interval::MICROS_PER_WEEK;
}
};
struct MicrosecondsOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
return SubtractMicros(startdate, enddate);
}
};
struct MillisecondsOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
return SubtractMicros(startdate, enddate) / Interval::MICROS_PER_MSEC;
}
};
struct SecondsOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
return SubtractMicros(startdate, enddate) / Interval::MICROS_PER_SEC;
}
};
struct MinutesOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
return SubtractMicros(startdate, enddate) / Interval::MICROS_PER_MINUTE;
}
};
struct HoursOperator {
template <class TA, class TB, class TR>
static inline TR Operation(TA startdate, TB enddate) {
return SubtractMicros(startdate, enddate) / Interval::MICROS_PER_HOUR;
}
};
};
// DATE specialisations
template <>
int64_t DateSub::YearOperator::Operation(date_t startdate, date_t enddate) {
dtime_t t0(0);
return YearOperator::Operation<timestamp_t, timestamp_t, int64_t>(Timestamp::FromDatetime(startdate, t0),
Timestamp::FromDatetime(enddate, t0));
}
template <>
int64_t DateSub::MonthOperator::Operation(date_t startdate, date_t enddate) {
dtime_t t0(0);
return MonthOperator::Operation<timestamp_t, timestamp_t, int64_t>(Timestamp::FromDatetime(startdate, t0),
Timestamp::FromDatetime(enddate, t0));
}
template <>
int64_t DateSub::DayOperator::Operation(date_t startdate, date_t enddate) {
dtime_t t0(0);
return DayOperator::Operation<timestamp_t, timestamp_t, int64_t>(Timestamp::FromDatetime(startdate, t0),
Timestamp::FromDatetime(enddate, t0));
}
template <>
int64_t DateSub::DecadeOperator::Operation(date_t startdate, date_t enddate) {
dtime_t t0(0);
return DecadeOperator::Operation<timestamp_t, timestamp_t, int64_t>(Timestamp::FromDatetime(startdate, t0),
Timestamp::FromDatetime(enddate, t0));
}
template <>
int64_t DateSub::CenturyOperator::Operation(date_t startdate, date_t enddate) {
dtime_t t0(0);
return CenturyOperator::Operation<timestamp_t, timestamp_t, int64_t>(Timestamp::FromDatetime(startdate, t0),
Timestamp::FromDatetime(enddate, t0));
}
template <>
int64_t DateSub::MilleniumOperator::Operation(date_t startdate, date_t enddate) {
dtime_t t0(0);
return MilleniumOperator::Operation<timestamp_t, timestamp_t, int64_t>(Timestamp::FromDatetime(startdate, t0),
Timestamp::FromDatetime(enddate, t0));
}
template <>
int64_t DateSub::QuarterOperator::Operation(date_t startdate, date_t enddate) {
dtime_t t0(0);
return QuarterOperator::Operation<timestamp_t, timestamp_t, int64_t>(Timestamp::FromDatetime(startdate, t0),
Timestamp::FromDatetime(enddate, t0));
}
template <>
int64_t DateSub::WeekOperator::Operation(date_t startdate, date_t enddate) {
dtime_t t0(0);
return WeekOperator::Operation<timestamp_t, timestamp_t, int64_t>(Timestamp::FromDatetime(startdate, t0),
Timestamp::FromDatetime(enddate, t0));
}
template <>
int64_t DateSub::MicrosecondsOperator::Operation(date_t startdate, date_t enddate) {
dtime_t t0(0);
return MicrosecondsOperator::Operation<timestamp_t, timestamp_t, int64_t>(Timestamp::FromDatetime(startdate, t0),
Timestamp::FromDatetime(enddate, t0));
}
template <>
int64_t DateSub::MillisecondsOperator::Operation(date_t startdate, date_t enddate) {
dtime_t t0(0);
return MillisecondsOperator::Operation<timestamp_t, timestamp_t, int64_t>(Timestamp::FromDatetime(startdate, t0),
Timestamp::FromDatetime(enddate, t0));
}
template <>
int64_t DateSub::SecondsOperator::Operation(date_t startdate, date_t enddate) {
dtime_t t0(0);
return SecondsOperator::Operation<timestamp_t, timestamp_t, int64_t>(Timestamp::FromDatetime(startdate, t0),
Timestamp::FromDatetime(enddate, t0));
}
template <>
int64_t DateSub::MinutesOperator::Operation(date_t startdate, date_t enddate) {
dtime_t t0(0);
return MinutesOperator::Operation<timestamp_t, timestamp_t, int64_t>(Timestamp::FromDatetime(startdate, t0),
Timestamp::FromDatetime(enddate, t0));
}
template <>
int64_t DateSub::HoursOperator::Operation(date_t startdate, date_t enddate) {
dtime_t t0(0);
return HoursOperator::Operation<timestamp_t, timestamp_t, int64_t>(Timestamp::FromDatetime(startdate, t0),
Timestamp::FromDatetime(enddate, t0));
}
// TIME specialisations
template <>
int64_t DateSub::YearOperator::Operation(dtime_t startdate, dtime_t enddate) {
throw NotImplementedException("\"time\" units \"year\" not recognized");
}
template <>
int64_t DateSub::MonthOperator::Operation(dtime_t startdate, dtime_t enddate) {
throw NotImplementedException("\"time\" units \"month\" not recognized");
}
template <>
int64_t DateSub::DayOperator::Operation(dtime_t startdate, dtime_t enddate) {
throw NotImplementedException("\"time\" units \"day\" not recognized");
}
template <>
int64_t DateSub::DecadeOperator::Operation(dtime_t startdate, dtime_t enddate) {
throw NotImplementedException("\"time\" units \"decade\" not recognized");
}
template <>
int64_t DateSub::CenturyOperator::Operation(dtime_t startdate, dtime_t enddate) {
throw NotImplementedException("\"time\" units \"century\" not recognized");
}
template <>
int64_t DateSub::MilleniumOperator::Operation(dtime_t startdate, dtime_t enddate) {
throw NotImplementedException("\"time\" units \"millennium\" not recognized");
}
template <>
int64_t DateSub::QuarterOperator::Operation(dtime_t startdate, dtime_t enddate) {
throw NotImplementedException("\"time\" units \"quarter\" not recognized");
}
template <>
int64_t DateSub::WeekOperator::Operation(dtime_t startdate, dtime_t enddate) {
throw NotImplementedException("\"time\" units \"week\" not recognized");
}
template <>
int64_t DateSub::MicrosecondsOperator::Operation(dtime_t startdate, dtime_t enddate) {
return enddate.micros - startdate.micros;
}
template <>
int64_t DateSub::MillisecondsOperator::Operation(dtime_t startdate, dtime_t enddate) {
return (enddate.micros - startdate.micros) / Interval::MICROS_PER_MSEC;
}
template <>
int64_t DateSub::SecondsOperator::Operation(dtime_t startdate, dtime_t enddate) {
return (enddate.micros - startdate.micros) / Interval::MICROS_PER_SEC;
}
template <>
int64_t DateSub::MinutesOperator::Operation(dtime_t startdate, dtime_t enddate) {
return (enddate.micros - startdate.micros) / Interval::MICROS_PER_MINUTE;
}
template <>
int64_t DateSub::HoursOperator::Operation(dtime_t startdate, dtime_t enddate) {
return (enddate.micros - startdate.micros) / Interval::MICROS_PER_HOUR;
}
template <typename TA, typename TB, typename TR>
int64_t SubtractDateParts(DatePartSpecifier type, TA startdate, TB enddate) {
switch (type) {
case DatePartSpecifier::YEAR:
case DatePartSpecifier::ISOYEAR:
return DateSub::YearOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::MONTH:
return DateSub::MonthOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::DAY:
case DatePartSpecifier::DOW:
case DatePartSpecifier::ISODOW:
case DatePartSpecifier::DOY:
case DatePartSpecifier::JULIAN_DAY:
return DateSub::DayOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::DECADE:
return DateSub::DecadeOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::CENTURY:
return DateSub::CenturyOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::MILLENNIUM:
return DateSub::MilleniumOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::QUARTER:
return DateSub::QuarterOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::WEEK:
case DatePartSpecifier::YEARWEEK:
return DateSub::WeekOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::MICROSECONDS:
return DateSub::MicrosecondsOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::MILLISECONDS:
return DateSub::MillisecondsOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::SECOND:
case DatePartSpecifier::EPOCH:
return DateSub::SecondsOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::MINUTE:
return DateSub::MinutesOperator::template Operation<TA, TB, TR>(startdate, enddate);
case DatePartSpecifier::HOUR:
return DateSub::HoursOperator::template Operation<TA, TB, TR>(startdate, enddate);
default:
throw NotImplementedException("Specifier type not implemented for DATESUB");
}
}
struct DateSubTernaryOperator {
template <typename TS, typename TA, typename TB, typename TR>
static inline TR Operation(TS part, TA startdate, TB enddate, ValidityMask &mask, idx_t idx) {
if (Value::IsFinite(startdate) && Value::IsFinite(enddate)) {
return SubtractDateParts<TA, TB, TR>(GetDatePartSpecifier(part.GetString()), startdate, enddate);
} else {
mask.SetInvalid(idx);
return TR();
}
}
};
template <typename TA, typename TB, typename TR>
void DateSubBinaryExecutor(DatePartSpecifier type, Vector &left, Vector &right, Vector &result, idx_t count) {
switch (type) {
case DatePartSpecifier::YEAR:
case DatePartSpecifier::ISOYEAR:
DateSub::BinaryExecute<TA, TB, TR, DateSub::YearOperator>(left, right, result, count);
break;
case DatePartSpecifier::MONTH:
DateSub::BinaryExecute<TA, TB, TR, DateSub::MonthOperator>(left, right, result, count);
break;
case DatePartSpecifier::DAY:
case DatePartSpecifier::DOW:
case DatePartSpecifier::ISODOW:
case DatePartSpecifier::DOY:
case DatePartSpecifier::JULIAN_DAY:
DateSub::BinaryExecute<TA, TB, TR, DateSub::DayOperator>(left, right, result, count);
break;
case DatePartSpecifier::DECADE:
DateSub::BinaryExecute<TA, TB, TR, DateSub::DecadeOperator>(left, right, result, count);
break;
case DatePartSpecifier::CENTURY:
DateSub::BinaryExecute<TA, TB, TR, DateSub::CenturyOperator>(left, right, result, count);
break;
case DatePartSpecifier::MILLENNIUM:
DateSub::BinaryExecute<TA, TB, TR, DateSub::MilleniumOperator>(left, right, result, count);
break;
case DatePartSpecifier::QUARTER:
DateSub::BinaryExecute<TA, TB, TR, DateSub::QuarterOperator>(left, right, result, count);
break;
case DatePartSpecifier::WEEK:
case DatePartSpecifier::YEARWEEK:
DateSub::BinaryExecute<TA, TB, TR, DateSub::WeekOperator>(left, right, result, count);
break;
case DatePartSpecifier::MICROSECONDS:
DateSub::BinaryExecute<TA, TB, TR, DateSub::MicrosecondsOperator>(left, right, result, count);
break;
case DatePartSpecifier::MILLISECONDS:
DateSub::BinaryExecute<TA, TB, TR, DateSub::MillisecondsOperator>(left, right, result, count);
break;
case DatePartSpecifier::SECOND:
case DatePartSpecifier::EPOCH:
DateSub::BinaryExecute<TA, TB, TR, DateSub::SecondsOperator>(left, right, result, count);
break;
case DatePartSpecifier::MINUTE:
DateSub::BinaryExecute<TA, TB, TR, DateSub::MinutesOperator>(left, right, result, count);
break;
case DatePartSpecifier::HOUR:
DateSub::BinaryExecute<TA, TB, TR, DateSub::HoursOperator>(left, right, result, count);
break;
default:
throw NotImplementedException("Specifier type not implemented for DATESUB");
}
}
template <typename T>
void DateSubFunction(DataChunk &args, ExpressionState &state, Vector &result) {
D_ASSERT(args.ColumnCount() == 3);
auto &part_arg = args.data[0];
auto &start_arg = args.data[1];
auto &end_arg = args.data[2];
if (part_arg.GetVectorType() == VectorType::CONSTANT_VECTOR) {
// Common case of constant part.
if (ConstantVector::IsNull(part_arg)) {
result.SetVectorType(VectorType::CONSTANT_VECTOR);
ConstantVector::SetNull(result, true);
} else {
const auto type = GetDatePartSpecifier(ConstantVector::GetData<string_t>(part_arg)->GetString());
DateSubBinaryExecutor<T, T, int64_t>(type, start_arg, end_arg, result, args.size());
}
} else {
TernaryExecutor::ExecuteWithNulls<string_t, T, T, int64_t>(
part_arg, start_arg, end_arg, result, args.size(),
DateSubTernaryOperator::Operation<string_t, T, T, int64_t>);
}
}
} // namespace
ScalarFunctionSet DateSubFun::GetFunctions() {
ScalarFunctionSet date_sub("date_sub");
date_sub.AddFunction(ScalarFunction({LogicalType::VARCHAR, LogicalType::DATE, LogicalType::DATE},
LogicalType::BIGINT, DateSubFunction<date_t>));
date_sub.AddFunction(ScalarFunction({LogicalType::VARCHAR, LogicalType::TIMESTAMP, LogicalType::TIMESTAMP},
LogicalType::BIGINT, DateSubFunction<timestamp_t>));
date_sub.AddFunction(ScalarFunction({LogicalType::VARCHAR, LogicalType::TIME, LogicalType::TIME},
LogicalType::BIGINT, DateSubFunction<dtime_t>));
return date_sub;
}
} // namespace duckdb

Some files were not shown because too many files have changed in this diff Show More