docs for the database utils brought to you 100% by claude
This commit is contained in:
487
docs/database_utils.md
Normal file
487
docs/database_utils.md
Normal file
@@ -0,0 +1,487 @@
|
||||
# Database Utilities – Grade Snapshot System
|
||||
|
||||
## Overview
|
||||
|
||||
This module implements a **ClickHouse-backed grade snapshot and diffing system**.
|
||||
It ingests grade data from an external API, persists **immutable snapshots**, tracks **stable entities** (users, classes, assignments), and computes **changes over time** (new / updated / removed grades).
|
||||
|
||||
The design emphasizes:
|
||||
|
||||
* **Idempotent ingestion**
|
||||
* **Historical accuracy**
|
||||
* **Efficient change detection**
|
||||
* **Append-only semantics** (ClickHouse-friendly)
|
||||
|
||||
All functionality lives in the `database_utils` namespace.
|
||||
|
||||
---
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### 1. Stable Entities vs Snapshots
|
||||
|
||||
| Concept | Description |
|
||||
| ----------------------- | ---------------------------------------------------------- |
|
||||
| **User** | A logical account |
|
||||
| **Class** | A course belonging to a user (stable across time) |
|
||||
| **Assignment** | A specific graded item within a class (stable across time) |
|
||||
| **Snapshot (response)** | A point-in-time capture of all grades returned by the API |
|
||||
| **Grade history** | Per-assignment grades linked to a snapshot |
|
||||
| **Diffs** | Computed changes between two snapshots |
|
||||
|
||||
Stable entities are **created once and reused**.
|
||||
Snapshots are **immutable** and **time-ordered**.
|
||||
|
||||
---
|
||||
|
||||
## UUID Utilities
|
||||
|
||||
### `parse_uuid(string) → UUID`
|
||||
|
||||
Parses a standard UUID string (`xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`) into ClickHouse’s `UUID { high, low }` format.
|
||||
|
||||
* Validates format
|
||||
* Throws on malformed input
|
||||
* Used everywhere UUIDs enter the DB
|
||||
|
||||
### `uuid_to_string(UUID) → string`
|
||||
|
||||
Converts ClickHouse UUIDs back into standard string format.
|
||||
|
||||
---
|
||||
|
||||
## Date Handling
|
||||
|
||||
### `parse_date_to_clickhouse(string) → uint16_t`
|
||||
|
||||
Converts an API date string (`YYYY-MM-DD`) into ClickHouse `Date` format (days since Unix epoch).
|
||||
|
||||
* Empty or invalid dates → epoch (`0`)
|
||||
* Logs parsing failures instead of throwing
|
||||
|
||||
---
|
||||
|
||||
## Database Handle
|
||||
|
||||
```cpp
|
||||
using CHClient = std::shared_ptr<clickhouse::Client>;
|
||||
```
|
||||
|
||||
All functions accept a shared ClickHouse client to allow:
|
||||
|
||||
* Connection reuse
|
||||
* Thread-safe sharing
|
||||
* Easy dependency injection
|
||||
|
||||
---
|
||||
|
||||
## User Operations
|
||||
|
||||
### `get_all_users()`
|
||||
|
||||
Returns all users with:
|
||||
|
||||
* `user_id`
|
||||
* `username`
|
||||
* `password`
|
||||
|
||||
Primarily for administration/debugging.
|
||||
|
||||
---
|
||||
|
||||
### `register_user(username, password)`
|
||||
|
||||
Inserts a new user row.
|
||||
|
||||
⚠️ **Note:** Passwords are currently stored in plaintext.
|
||||
Hashing should be added before production use.
|
||||
|
||||
---
|
||||
|
||||
### `authenticate_user(username, password)`
|
||||
|
||||
Returns `true` if a matching user exists.
|
||||
|
||||
* Uses `count()` for minimal payload
|
||||
* Simple boolean authentication check
|
||||
|
||||
---
|
||||
|
||||
### `get_user_uuid(username)`
|
||||
|
||||
Returns the user’s UUID if found.
|
||||
|
||||
---
|
||||
|
||||
## Snapshot Insertion Flow
|
||||
|
||||
### `insert_grade_snapshot(user_id, api_response) → response_id`
|
||||
|
||||
This is the **main ingestion pipeline**.
|
||||
|
||||
#### Step-by-Step Flow
|
||||
|
||||
1. **Insert `grade_responses`**
|
||||
|
||||
* One row per API fetch
|
||||
* Contains metadata (`success`, `total_classes`, timestamp)
|
||||
|
||||
2. **Fetch generated `response_id`**
|
||||
|
||||
* Most recent response for that user
|
||||
|
||||
3. **Process each class**
|
||||
|
||||
* `get_or_create_class()`
|
||||
* Ensures a stable `class_id`
|
||||
* Links class to the response (`response_classes`)
|
||||
|
||||
4. **Process assignments**
|
||||
|
||||
* `get_or_create_assignment()`
|
||||
* Ensures stable `assignment_id`
|
||||
|
||||
5. **Insert grade history**
|
||||
|
||||
* Batched inserts into `assignment_grade_history`
|
||||
* Each row ties:
|
||||
|
||||
* response
|
||||
* assignment
|
||||
* score
|
||||
* attempts
|
||||
|
||||
Snapshots are **never updated**, only appended.
|
||||
|
||||
---
|
||||
|
||||
## Stable Entity Management
|
||||
|
||||
### `get_or_create_class(user_id, class_data) → class_id`
|
||||
|
||||
* Searches by `(user_id, class_name)`
|
||||
* If found:
|
||||
|
||||
* Updates metadata (teacher, period, category)
|
||||
* If not found:
|
||||
|
||||
* Inserts a new class record
|
||||
* Returns the stable `class_id`
|
||||
|
||||
---
|
||||
|
||||
### `get_or_create_assignment(user_id, class_id, assignment_data) → assignment_id`
|
||||
|
||||
* Searches by `(user_id, class_id, assignment_name)`
|
||||
* Updates due date / major flag if it exists
|
||||
* Otherwise inserts a new assignment
|
||||
* Returns stable `assignment_id`
|
||||
|
||||
---
|
||||
|
||||
## Snapshot Loading
|
||||
|
||||
### `load_latest_snapshot(user_id)`
|
||||
|
||||
Loads the most recent snapshot for a user.
|
||||
|
||||
Returns `std::nullopt` if none exists.
|
||||
|
||||
---
|
||||
|
||||
### `load_snapshot_by_id(response_id)`
|
||||
|
||||
Loads a **fully hydrated snapshot**, including:
|
||||
|
||||
* User
|
||||
* Classes
|
||||
* Assignments
|
||||
* Grades
|
||||
|
||||
#### Result Structure
|
||||
|
||||
```text
|
||||
GradeSnapshot
|
||||
├── response_id
|
||||
├── user_id
|
||||
├── classes[class_name] -> ClassRecord
|
||||
├── assignments[class::assignment] -> AssignmentRecord
|
||||
└── grades[assignment_id] -> GradeRecord
|
||||
```
|
||||
|
||||
Used for:
|
||||
|
||||
* Diffing
|
||||
* UI display
|
||||
* Historical comparisons
|
||||
|
||||
---
|
||||
|
||||
## Change Detection
|
||||
|
||||
### `has_changes(user_id, new_api_response) → bool`
|
||||
|
||||
Fast pre-check before inserting a new snapshot.
|
||||
|
||||
Detects:
|
||||
|
||||
* New assignments
|
||||
* Removed assignments
|
||||
* Score changes
|
||||
* Attempt changes
|
||||
|
||||
If no prior snapshot exists → **changes detected**.
|
||||
|
||||
---
|
||||
|
||||
## Snapshot Diffing
|
||||
|
||||
### `diff_snapshots(old, new) → vector<AssignmentDiff>`
|
||||
|
||||
Computes **semantic differences** between two snapshots.
|
||||
|
||||
#### Change Types
|
||||
|
||||
| Type | Meaning |
|
||||
| --------- | ------------------------------- |
|
||||
| `NEW` | Assignment did not exist before |
|
||||
| `UPDATED` | Score or attempts changed |
|
||||
| `REMOVED` | Assignment disappeared |
|
||||
|
||||
Each diff includes:
|
||||
|
||||
* Assignment ID
|
||||
* Class name
|
||||
* Assignment name
|
||||
* Old grade (optional)
|
||||
* New grade
|
||||
|
||||
---
|
||||
|
||||
## Grade Update Logging
|
||||
|
||||
### `insert_grade_updates(user_id, old_response_id, new_response_id, diffs)`
|
||||
|
||||
Persists diffs into `grade_updates`.
|
||||
|
||||
#### Features
|
||||
|
||||
* Nullable old values for new assignments
|
||||
* Placeholder values for removed assignments
|
||||
* Compact enum encoding for change type
|
||||
* Batched insert for efficiency
|
||||
|
||||
This table provides a **clear audit trail** of grade changes over time.
|
||||
|
||||
---
|
||||
|
||||
## Assignment Key Strategy
|
||||
|
||||
```cpp
|
||||
"class_name::assignment_name"
|
||||
```
|
||||
|
||||
Used as a human-stable lookup key when comparing snapshots.
|
||||
|
||||
* Avoids relying on database IDs during diffing
|
||||
* Keeps logic resilient to ID reuse or refactors
|
||||
|
||||
---
|
||||
|
||||
## Design Guarantees
|
||||
|
||||
* ✅ Snapshots are immutable
|
||||
* ✅ Stable IDs persist across time
|
||||
* ✅ Changes are explicitly logged
|
||||
* ✅ Efficient ClickHouse-friendly inserts
|
||||
* ⚠️ SQL string concatenation is used (should be parameterized later)
|
||||
* ⚠️ Password hashing not implemented
|
||||
|
||||
# `database_utils` – Function Documentation
|
||||
|
||||
---
|
||||
|
||||
## **UUID Utilities**
|
||||
|
||||
### `clickhouse::UUID parse_uuid(const std::string& str)`
|
||||
|
||||
* **Purpose:** Converts a UUID string (`xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`) into a ClickHouse `UUID` object.
|
||||
* **Input:** `str` – UUID string.
|
||||
* **Output:** `clickhouse::UUID` (high/low 64-bit parts).
|
||||
* **Behavior:**
|
||||
|
||||
* Throws `std::runtime_error` if string is not 36 characters or malformed.
|
||||
* Removes hyphens and parses hex into two 64-bit integers.
|
||||
* **Usage:** Any place UUID strings need to be stored in ClickHouse.
|
||||
|
||||
---
|
||||
|
||||
### `std::string uuid_to_string(const clickhouse::UUID& u)`
|
||||
|
||||
* **Purpose:** Converts a ClickHouse `UUID` back into a human-readable UUID string.
|
||||
* **Input:** `u` – ClickHouse UUID.
|
||||
* **Output:** Standard UUID string.
|
||||
* **Behavior:** Formats `high` and `low` 64-bit integers as a zero-padded 36-character string with hyphens.
|
||||
|
||||
---
|
||||
|
||||
## **Date Utilities**
|
||||
|
||||
### `uint16_t parse_date_to_clickhouse(const std::string& date_str)`
|
||||
|
||||
* **Purpose:** Converts a date string from the API into ClickHouse `Date` format.
|
||||
* **Input:** `date_str` – string in `YYYY-MM-DD` format.
|
||||
* **Output:** `uint16_t` representing days since 1970-01-01.
|
||||
* **Behavior:**
|
||||
|
||||
* Empty or invalid strings return `0` (epoch).
|
||||
* Logs warnings for parse failures.
|
||||
* **Usage:** Assignments’ `due_date` conversion.
|
||||
|
||||
---
|
||||
|
||||
## **User Operations**
|
||||
|
||||
### `std::vector<UserRecord> get_all_users(const CHClient& client)`
|
||||
|
||||
* **Purpose:** Retrieves all users from the database.
|
||||
* **Input:** `client` – ClickHouse client.
|
||||
* **Output:** Vector of `UserRecord` (includes `user_id`, `login` info).
|
||||
* **Behavior:** Logs batch size and total retrieved.
|
||||
|
||||
---
|
||||
|
||||
### `bool register_user(const CHClient& client, const std::string& username, const std::string& password)`
|
||||
|
||||
* **Purpose:** Inserts a new user into the database.
|
||||
* **Input:** `username`, `password`.
|
||||
* **Output:** `true` on success.
|
||||
* **Behavior:** Currently stores passwords in plaintext; logs success.
|
||||
|
||||
---
|
||||
|
||||
### `bool authenticate_user(const CHClient& client, const std::string& username, const std::string& password)`
|
||||
|
||||
* **Purpose:** Checks if a user exists with given credentials.
|
||||
* **Input:** `username`, `password`.
|
||||
* **Output:** `true` if valid, `false` otherwise.
|
||||
* **Behavior:** Uses `SELECT count()` for boolean check. Logs results.
|
||||
|
||||
---
|
||||
|
||||
### `std::optional<clickhouse::UUID> get_user_uuid(const CHClient& client, const std::string& username)`
|
||||
|
||||
* **Purpose:** Retrieves a user’s UUID based on their username.
|
||||
* **Input:** `username`.
|
||||
* **Output:** `optional<UUID>`; empty if user not found.
|
||||
* **Behavior:** Uses `LIMIT 1` for efficiency.
|
||||
|
||||
---
|
||||
|
||||
## **Class & Assignment Management**
|
||||
|
||||
### `std::string get_or_create_class(const CHClient& client, const std::string& user_id, const api_utils::ClassGrades& class_data)`
|
||||
|
||||
* **Purpose:** Ensures a stable `class_id` exists for a user.
|
||||
* **Input:** `user_id`, `class_data` (name, teacher, period, category).
|
||||
* **Output:** `class_id` as string.
|
||||
* **Behavior:**
|
||||
|
||||
* Searches for existing class.
|
||||
* Updates metadata if found.
|
||||
* Inserts new record if not found.
|
||||
* Links class to the user in `user_classes`.
|
||||
|
||||
---
|
||||
|
||||
### `std::string get_or_create_assignment(const CHClient& client, const std::string& user_id, const std::string& class_id, const api_utils::AssignmentGrade& assignment_data)`
|
||||
|
||||
* **Purpose:** Ensures a stable `assignment_id` exists within a class.
|
||||
* **Input:** `user_id`, `class_id`, `assignment_data` (name, dueDate, isMajorGrade).
|
||||
* **Output:** `assignment_id` as string.
|
||||
* **Behavior:**
|
||||
|
||||
* Updates existing assignment if found.
|
||||
* Inserts a new assignment if not found.
|
||||
* Uses `parse_date_to_clickhouse()` for due date.
|
||||
|
||||
---
|
||||
|
||||
## **Snapshot Insertion**
|
||||
|
||||
### `std::string insert_grade_snapshot(const CHClient& client, const std::string& user_id, const api_utils::GradesResponse& api_response)`
|
||||
|
||||
* **Purpose:** Inserts a complete snapshot of grades from the API.
|
||||
* **Input:** `user_id`, `api_response` (success flag, total classes, grades per class/assignment).
|
||||
* **Output:** `response_id` of the inserted snapshot; empty string on failure.
|
||||
* **Behavior:**
|
||||
|
||||
1. Inserts metadata into `grade_responses`.
|
||||
2. Retrieves `response_id`.
|
||||
3. Processes each class:
|
||||
|
||||
* Calls `get_or_create_class()`
|
||||
* Links to response in `response_classes`
|
||||
4. Processes each assignment:
|
||||
|
||||
* Calls `get_or_create_assignment()`
|
||||
* Inserts grades into `assignment_grade_history`.
|
||||
* **Notes:** Immutable snapshots; append-only.
|
||||
|
||||
---
|
||||
|
||||
## **Snapshot Loading**
|
||||
|
||||
### `std::optional<GradeSnapshot> load_latest_snapshot(const CHClient& client, const std::string& user_id)`
|
||||
|
||||
* **Purpose:** Loads the most recent snapshot for a user.
|
||||
* **Output:** Fully populated `GradeSnapshot` or `nullopt` if none exists.
|
||||
* **Behavior:** Uses `fetched_at DESC LIMIT 1`.
|
||||
|
||||
---
|
||||
|
||||
### `std::optional<GradeSnapshot> load_snapshot_by_id(const CHClient& client, const std::string& response_id)`
|
||||
|
||||
* **Purpose:** Loads a snapshot by `response_id`.
|
||||
* **Output:** `GradeSnapshot` including:
|
||||
|
||||
* Classes
|
||||
* Assignments
|
||||
* Grades
|
||||
* **Behavior:** Joins `user_classes`, `user_assignments`, `assignment_grade_history`, `response_classes`.
|
||||
|
||||
---
|
||||
|
||||
## **Diffing & Change Detection**
|
||||
|
||||
### `bool has_changes(const CHClient& client, const std::string& user_id, const api_utils::GradesResponse& new_api_response)`
|
||||
|
||||
* **Purpose:** Detects if new API response differs from the latest snapshot.
|
||||
* **Output:** `true` if changes exist, `false` otherwise.
|
||||
* **Checks for:**
|
||||
|
||||
* New assignments
|
||||
* Removed assignments
|
||||
* Score/attempt changes
|
||||
|
||||
---
|
||||
|
||||
### `std::vector<AssignmentDiff> diff_snapshots(const GradeSnapshot& old_snapshot, const GradeSnapshot& new_snapshot)`
|
||||
|
||||
* **Purpose:** Returns detailed differences between two snapshots.
|
||||
* **Output:** Vector of `AssignmentDiff`.
|
||||
* **Change Types:** `NEW`, `UPDATED`, `REMOVED`.
|
||||
* **Behavior:** Compares old and new grades per assignment key.
|
||||
|
||||
---
|
||||
|
||||
### `void insert_grade_updates(const CHClient& client, const std::string& user_id, const std::string& old_response_id, const std::string& new_response_id, const std::vector<AssignmentDiff>& diffs)`
|
||||
|
||||
* **Purpose:** Inserts diffs into `grade_updates` table.
|
||||
* **Behavior:**
|
||||
|
||||
* Maps `AssignmentDiff` to ClickHouse columns.
|
||||
* Handles nullable old values for new assignments.
|
||||
* Uses placeholder values for removed assignments.
|
||||
* Logs number of inserted updates.
|
||||
|
||||
Reference in New Issue
Block a user