# Database Utilities – Grade Snapshot System ## Overview This module implements a **ClickHouse-backed grade snapshot and diffing system**. It ingests grade data from an external API, persists **immutable snapshots**, tracks **stable entities** (users, classes, assignments), and computes **changes over time** (new / updated / removed grades). The design emphasizes: * **Idempotent ingestion** * **Historical accuracy** * **Efficient change detection** * **Append-only semantics** (ClickHouse-friendly) All functionality lives in the `database_utils` namespace. --- ## Core Concepts ### 1. Stable Entities vs Snapshots | Concept | Description | | ----------------------- | ---------------------------------------------------------- | | **User** | A logical account | | **Class** | A course belonging to a user (stable across time) | | **Assignment** | A specific graded item within a class (stable across time) | | **Snapshot (response)** | A point-in-time capture of all grades returned by the API | | **Grade history** | Per-assignment grades linked to a snapshot | | **Diffs** | Computed changes between two snapshots | Stable entities are **created once and reused**. Snapshots are **immutable** and **time-ordered**. --- ## UUID Utilities ### `parse_uuid(string) → UUID` Parses a standard UUID string (`xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`) into ClickHouse’s `UUID { high, low }` format. * Validates format * Throws on malformed input * Used everywhere UUIDs enter the DB ### `uuid_to_string(UUID) → string` Converts ClickHouse UUIDs back into standard string format. --- ## Date Handling ### `parse_date_to_clickhouse(string) → uint16_t` Converts an API date string (`YYYY-MM-DD`) into ClickHouse `Date` format (days since Unix epoch). * Empty or invalid dates → epoch (`0`) * Logs parsing failures instead of throwing --- ## Database Handle ```cpp using CHClient = std::shared_ptr; ``` All functions accept a shared ClickHouse client to allow: * Connection reuse * Thread-safe sharing * Easy dependency injection --- ## User Operations ### `get_all_users()` Returns all users with: * `user_id` * `username` * `password` Primarily for administration/debugging. --- ### `register_user(username, password)` Inserts a new user row. ⚠️ **Note:** Passwords are currently stored in plaintext. Hashing should be added before production use. --- ### `authenticate_user(username, password)` Returns `true` if a matching user exists. * Uses `count()` for minimal payload * Simple boolean authentication check --- ### `get_user_uuid(username)` Returns the user’s UUID if found. --- ## Snapshot Insertion Flow ### `insert_grade_snapshot(user_id, api_response) → response_id` This is the **main ingestion pipeline**. #### Step-by-Step Flow 1. **Insert `grade_responses`** * One row per API fetch * Contains metadata (`success`, `total_classes`, timestamp) 2. **Fetch generated `response_id`** * Most recent response for that user 3. **Process each class** * `get_or_create_class()` * Ensures a stable `class_id` * Links class to the response (`response_classes`) 4. **Process assignments** * `get_or_create_assignment()` * Ensures stable `assignment_id` 5. **Insert grade history** * Batched inserts into `assignment_grade_history` * Each row ties: * response * assignment * score * attempts Snapshots are **never updated**, only appended. --- ## Stable Entity Management ### `get_or_create_class(user_id, class_data) → class_id` * Searches by `(user_id, class_name)` * If found: * Updates metadata (teacher, period, category) * If not found: * Inserts a new class record * Returns the stable `class_id` --- ### `get_or_create_assignment(user_id, class_id, assignment_data) → assignment_id` * Searches by `(user_id, class_id, assignment_name)` * Updates due date / major flag if it exists * Otherwise inserts a new assignment * Returns stable `assignment_id` --- ## Snapshot Loading ### `load_latest_snapshot(user_id)` Loads the most recent snapshot for a user. Returns `std::nullopt` if none exists. --- ### `load_snapshot_by_id(response_id)` Loads a **fully hydrated snapshot**, including: * User * Classes * Assignments * Grades #### Result Structure ```text GradeSnapshot ├── response_id ├── user_id ├── classes[class_name] -> ClassRecord ├── assignments[class::assignment] -> AssignmentRecord └── grades[assignment_id] -> GradeRecord ``` Used for: * Diffing * UI display * Historical comparisons --- ## Change Detection ### `has_changes(user_id, new_api_response) → bool` Fast pre-check before inserting a new snapshot. Detects: * New assignments * Removed assignments * Score changes * Attempt changes If no prior snapshot exists → **changes detected**. --- ## Snapshot Diffing ### `diff_snapshots(old, new) → vector` Computes **semantic differences** between two snapshots. #### Change Types | Type | Meaning | | --------- | ------------------------------- | | `NEW` | Assignment did not exist before | | `UPDATED` | Score or attempts changed | | `REMOVED` | Assignment disappeared | Each diff includes: * Assignment ID * Class name * Assignment name * Old grade (optional) * New grade --- ## Grade Update Logging ### `insert_grade_updates(user_id, old_response_id, new_response_id, diffs)` Persists diffs into `grade_updates`. #### Features * Nullable old values for new assignments * Placeholder values for removed assignments * Compact enum encoding for change type * Batched insert for efficiency This table provides a **clear audit trail** of grade changes over time. --- ## Assignment Key Strategy ```cpp "class_name::assignment_name" ``` Used as a human-stable lookup key when comparing snapshots. * Avoids relying on database IDs during diffing * Keeps logic resilient to ID reuse or refactors --- ## Design Guarantees * ✅ Snapshots are immutable * ✅ Stable IDs persist across time * ✅ Changes are explicitly logged * ✅ Efficient ClickHouse-friendly inserts * ⚠️ SQL string concatenation is used (should be parameterized later) * ⚠️ Password hashing not implemented # `database_utils` – Function Documentation --- ## **UUID Utilities** ### `clickhouse::UUID parse_uuid(const std::string& str)` * **Purpose:** Converts a UUID string (`xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`) into a ClickHouse `UUID` object. * **Input:** `str` – UUID string. * **Output:** `clickhouse::UUID` (high/low 64-bit parts). * **Behavior:** * Throws `std::runtime_error` if string is not 36 characters or malformed. * Removes hyphens and parses hex into two 64-bit integers. * **Usage:** Any place UUID strings need to be stored in ClickHouse. --- ### `std::string uuid_to_string(const clickhouse::UUID& u)` * **Purpose:** Converts a ClickHouse `UUID` back into a human-readable UUID string. * **Input:** `u` – ClickHouse UUID. * **Output:** Standard UUID string. * **Behavior:** Formats `high` and `low` 64-bit integers as a zero-padded 36-character string with hyphens. --- ## **Date Utilities** ### `uint16_t parse_date_to_clickhouse(const std::string& date_str)` * **Purpose:** Converts a date string from the API into ClickHouse `Date` format. * **Input:** `date_str` – string in `YYYY-MM-DD` format. * **Output:** `uint16_t` representing days since 1970-01-01. * **Behavior:** * Empty or invalid strings return `0` (epoch). * Logs warnings for parse failures. * **Usage:** Assignments’ `due_date` conversion. --- ## **User Operations** ### `std::vector get_all_users(const CHClient& client)` * **Purpose:** Retrieves all users from the database. * **Input:** `client` – ClickHouse client. * **Output:** Vector of `UserRecord` (includes `user_id`, `login` info). * **Behavior:** Logs batch size and total retrieved. --- ### `bool register_user(const CHClient& client, const std::string& username, const std::string& password)` * **Purpose:** Inserts a new user into the database. * **Input:** `username`, `password`. * **Output:** `true` on success. * **Behavior:** Currently stores passwords in plaintext; logs success. --- ### `bool authenticate_user(const CHClient& client, const std::string& username, const std::string& password)` * **Purpose:** Checks if a user exists with given credentials. * **Input:** `username`, `password`. * **Output:** `true` if valid, `false` otherwise. * **Behavior:** Uses `SELECT count()` for boolean check. Logs results. --- ### `std::optional get_user_uuid(const CHClient& client, const std::string& username)` * **Purpose:** Retrieves a user’s UUID based on their username. * **Input:** `username`. * **Output:** `optional`; empty if user not found. * **Behavior:** Uses `LIMIT 1` for efficiency. --- ## **Class & Assignment Management** ### `std::string get_or_create_class(const CHClient& client, const std::string& user_id, const api_utils::ClassGrades& class_data)` * **Purpose:** Ensures a stable `class_id` exists for a user. * **Input:** `user_id`, `class_data` (name, teacher, period, category). * **Output:** `class_id` as string. * **Behavior:** * Searches for existing class. * Updates metadata if found. * Inserts new record if not found. * Links class to the user in `user_classes`. --- ### `std::string get_or_create_assignment(const CHClient& client, const std::string& user_id, const std::string& class_id, const api_utils::AssignmentGrade& assignment_data)` * **Purpose:** Ensures a stable `assignment_id` exists within a class. * **Input:** `user_id`, `class_id`, `assignment_data` (name, dueDate, isMajorGrade). * **Output:** `assignment_id` as string. * **Behavior:** * Updates existing assignment if found. * Inserts a new assignment if not found. * Uses `parse_date_to_clickhouse()` for due date. --- ## **Snapshot Insertion** ### `std::string insert_grade_snapshot(const CHClient& client, const std::string& user_id, const api_utils::GradesResponse& api_response)` * **Purpose:** Inserts a complete snapshot of grades from the API. * **Input:** `user_id`, `api_response` (success flag, total classes, grades per class/assignment). * **Output:** `response_id` of the inserted snapshot; empty string on failure. * **Behavior:** 1. Inserts metadata into `grade_responses`. 2. Retrieves `response_id`. 3. Processes each class: * Calls `get_or_create_class()` * Links to response in `response_classes` 4. Processes each assignment: * Calls `get_or_create_assignment()` * Inserts grades into `assignment_grade_history`. * **Notes:** Immutable snapshots; append-only. --- ## **Snapshot Loading** ### `std::optional load_latest_snapshot(const CHClient& client, const std::string& user_id)` * **Purpose:** Loads the most recent snapshot for a user. * **Output:** Fully populated `GradeSnapshot` or `nullopt` if none exists. * **Behavior:** Uses `fetched_at DESC LIMIT 1`. --- ### `std::optional load_snapshot_by_id(const CHClient& client, const std::string& response_id)` * **Purpose:** Loads a snapshot by `response_id`. * **Output:** `GradeSnapshot` including: * Classes * Assignments * Grades * **Behavior:** Joins `user_classes`, `user_assignments`, `assignment_grade_history`, `response_classes`. --- ## **Diffing & Change Detection** ### `bool has_changes(const CHClient& client, const std::string& user_id, const api_utils::GradesResponse& new_api_response)` * **Purpose:** Detects if new API response differs from the latest snapshot. * **Output:** `true` if changes exist, `false` otherwise. * **Checks for:** * New assignments * Removed assignments * Score/attempt changes --- ### `std::vector diff_snapshots(const GradeSnapshot& old_snapshot, const GradeSnapshot& new_snapshot)` * **Purpose:** Returns detailed differences between two snapshots. * **Output:** Vector of `AssignmentDiff`. * **Change Types:** `NEW`, `UPDATED`, `REMOVED`. * **Behavior:** Compares old and new grades per assignment key. --- ### `void insert_grade_updates(const CHClient& client, const std::string& user_id, const std::string& old_response_id, const std::string& new_response_id, const std::vector& diffs)` * **Purpose:** Inserts diffs into `grade_updates` table. * **Behavior:** * Maps `AssignmentDiff` to ClickHouse columns. * Handles nullable old values for new assignments. * Uses placeholder values for removed assignments. * Logs number of inserted updates.