Files
skyward-analysis-daemon/docs/database_utils.md

13 KiB
Raw Blame History

Database Utilities Grade Snapshot System

Overview

This module implements a ClickHouse-backed grade snapshot and diffing system. It ingests grade data from an external API, persists immutable snapshots, tracks stable entities (users, classes, assignments), and computes changes over time (new / updated / removed grades).

The design emphasizes:

  • Idempotent ingestion
  • Historical accuracy
  • Efficient change detection
  • Append-only semantics (ClickHouse-friendly)

All functionality lives in the database_utils namespace.


Core Concepts

1. Stable Entities vs Snapshots

Concept Description
User A logical account
Class A course belonging to a user (stable across time)
Assignment A specific graded item within a class (stable across time)
Snapshot (response) A point-in-time capture of all grades returned by the API
Grade history Per-assignment grades linked to a snapshot
Diffs Computed changes between two snapshots

Stable entities are created once and reused. Snapshots are immutable and time-ordered.


UUID Utilities

parse_uuid(string) → UUID

Parses a standard UUID string (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) into ClickHouses UUID { high, low } format.

  • Validates format
  • Throws on malformed input
  • Used everywhere UUIDs enter the DB

uuid_to_string(UUID) → string

Converts ClickHouse UUIDs back into standard string format.


Date Handling

parse_date_to_clickhouse(string) → uint16_t

Converts an API date string (YYYY-MM-DD) into ClickHouse Date format (days since Unix epoch).

  • Empty or invalid dates → epoch (0)
  • Logs parsing failures instead of throwing

Database Handle

using CHClient = std::shared_ptr<clickhouse::Client>;

All functions accept a shared ClickHouse client to allow:

  • Connection reuse
  • Thread-safe sharing
  • Easy dependency injection

User Operations

get_all_users()

Returns all users with:

  • user_id
  • username
  • password

Primarily for administration/debugging.


register_user(username, password)

Inserts a new user row.

⚠️ Note: Passwords are currently stored in plaintext. Hashing should be added before production use.


authenticate_user(username, password)

Returns true if a matching user exists.

  • Uses count() for minimal payload
  • Simple boolean authentication check

get_user_uuid(username)

Returns the users UUID if found.


Snapshot Insertion Flow

insert_grade_snapshot(user_id, api_response) → response_id

This is the main ingestion pipeline.

Step-by-Step Flow

  1. Insert grade_responses

    • One row per API fetch
    • Contains metadata (success, total_classes, timestamp)
  2. Fetch generated response_id

    • Most recent response for that user
  3. Process each class

    • get_or_create_class()
    • Ensures a stable class_id
    • Links class to the response (response_classes)
  4. Process assignments

    • get_or_create_assignment()
    • Ensures stable assignment_id
  5. Insert grade history

    • Batched inserts into assignment_grade_history

    • Each row ties:

      • response
      • assignment
      • score
      • attempts

Snapshots are never updated, only appended.


Stable Entity Management

get_or_create_class(user_id, class_data) → class_id

  • Searches by (user_id, class_name)

  • If found:

    • Updates metadata (teacher, period, category)
  • If not found:

    • Inserts a new class record
  • Returns the stable class_id


get_or_create_assignment(user_id, class_id, assignment_data) → assignment_id

  • Searches by (user_id, class_id, assignment_name)
  • Updates due date / major flag if it exists
  • Otherwise inserts a new assignment
  • Returns stable assignment_id

Snapshot Loading

load_latest_snapshot(user_id)

Loads the most recent snapshot for a user.

Returns std::nullopt if none exists.


load_snapshot_by_id(response_id)

Loads a fully hydrated snapshot, including:

  • User
  • Classes
  • Assignments
  • Grades

Result Structure

GradeSnapshot
├── response_id
├── user_id
├── classes[class_name] -> ClassRecord
├── assignments[class::assignment] -> AssignmentRecord
└── grades[assignment_id] -> GradeRecord

Used for:

  • Diffing
  • UI display
  • Historical comparisons

Change Detection

has_changes(user_id, new_api_response) → bool

Fast pre-check before inserting a new snapshot.

Detects:

  • New assignments
  • Removed assignments
  • Score changes
  • Attempt changes

If no prior snapshot exists → changes detected.


Snapshot Diffing

diff_snapshots(old, new) → vector<AssignmentDiff>

Computes semantic differences between two snapshots.

Change Types

Type Meaning
NEW Assignment did not exist before
UPDATED Score or attempts changed
REMOVED Assignment disappeared

Each diff includes:

  • Assignment ID
  • Class name
  • Assignment name
  • Old grade (optional)
  • New grade

Grade Update Logging

insert_grade_updates(user_id, old_response_id, new_response_id, diffs)

Persists diffs into grade_updates.

Features

  • Nullable old values for new assignments
  • Placeholder values for removed assignments
  • Compact enum encoding for change type
  • Batched insert for efficiency

This table provides a clear audit trail of grade changes over time.


Assignment Key Strategy

"class_name::assignment_name"

Used as a human-stable lookup key when comparing snapshots.

  • Avoids relying on database IDs during diffing
  • Keeps logic resilient to ID reuse or refactors

Design Guarantees

  • Snapshots are immutable
  • Stable IDs persist across time
  • Changes are explicitly logged
  • Efficient ClickHouse-friendly inserts
  • ⚠️ SQL string concatenation is used (should be parameterized later)
  • ⚠️ Password hashing not implemented

database_utils Function Documentation


UUID Utilities

clickhouse::UUID parse_uuid(const std::string& str)

  • Purpose: Converts a UUID string (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) into a ClickHouse UUID object.

  • Input: str UUID string.

  • Output: clickhouse::UUID (high/low 64-bit parts).

  • Behavior:

    • Throws std::runtime_error if string is not 36 characters or malformed.
    • Removes hyphens and parses hex into two 64-bit integers.
  • Usage: Any place UUID strings need to be stored in ClickHouse.


std::string uuid_to_string(const clickhouse::UUID& u)

  • Purpose: Converts a ClickHouse UUID back into a human-readable UUID string.
  • Input: u ClickHouse UUID.
  • Output: Standard UUID string.
  • Behavior: Formats high and low 64-bit integers as a zero-padded 36-character string with hyphens.

Date Utilities

uint16_t parse_date_to_clickhouse(const std::string& date_str)

  • Purpose: Converts a date string from the API into ClickHouse Date format.

  • Input: date_str string in YYYY-MM-DD format.

  • Output: uint16_t representing days since 1970-01-01.

  • Behavior:

    • Empty or invalid strings return 0 (epoch).
    • Logs warnings for parse failures.
  • Usage: Assignments due_date conversion.


User Operations

std::vector<UserRecord> get_all_users(const CHClient& client)

  • Purpose: Retrieves all users from the database.
  • Input: client ClickHouse client.
  • Output: Vector of UserRecord (includes user_id, login info).
  • Behavior: Logs batch size and total retrieved.

bool register_user(const CHClient& client, const std::string& username, const std::string& password)

  • Purpose: Inserts a new user into the database.
  • Input: username, password.
  • Output: true on success.
  • Behavior: Currently stores passwords in plaintext; logs success.

bool authenticate_user(const CHClient& client, const std::string& username, const std::string& password)

  • Purpose: Checks if a user exists with given credentials.
  • Input: username, password.
  • Output: true if valid, false otherwise.
  • Behavior: Uses SELECT count() for boolean check. Logs results.

std::optional<clickhouse::UUID> get_user_uuid(const CHClient& client, const std::string& username)

  • Purpose: Retrieves a users UUID based on their username.
  • Input: username.
  • Output: optional<UUID>; empty if user not found.
  • Behavior: Uses LIMIT 1 for efficiency.

Class & Assignment Management

std::string get_or_create_class(const CHClient& client, const std::string& user_id, const api_utils::ClassGrades& class_data)

  • Purpose: Ensures a stable class_id exists for a user.

  • Input: user_id, class_data (name, teacher, period, category).

  • Output: class_id as string.

  • Behavior:

    • Searches for existing class.
    • Updates metadata if found.
    • Inserts new record if not found.
    • Links class to the user in user_classes.

std::string get_or_create_assignment(const CHClient& client, const std::string& user_id, const std::string& class_id, const api_utils::AssignmentGrade& assignment_data)

  • Purpose: Ensures a stable assignment_id exists within a class.

  • Input: user_id, class_id, assignment_data (name, dueDate, isMajorGrade).

  • Output: assignment_id as string.

  • Behavior:

    • Updates existing assignment if found.
    • Inserts a new assignment if not found.
    • Uses parse_date_to_clickhouse() for due date.

Snapshot Insertion

std::string insert_grade_snapshot(const CHClient& client, const std::string& user_id, const api_utils::GradesResponse& api_response)

  • Purpose: Inserts a complete snapshot of grades from the API.

  • Input: user_id, api_response (success flag, total classes, grades per class/assignment).

  • Output: response_id of the inserted snapshot; empty string on failure.

  • Behavior:

    1. Inserts metadata into grade_responses.

    2. Retrieves response_id.

    3. Processes each class:

      • Calls get_or_create_class()
      • Links to response in response_classes
    4. Processes each assignment:

      • Calls get_or_create_assignment()
      • Inserts grades into assignment_grade_history.
  • Notes: Immutable snapshots; append-only.


Snapshot Loading

std::optional<GradeSnapshot> load_latest_snapshot(const CHClient& client, const std::string& user_id)

  • Purpose: Loads the most recent snapshot for a user.
  • Output: Fully populated GradeSnapshot or nullopt if none exists.
  • Behavior: Uses fetched_at DESC LIMIT 1.

std::optional<GradeSnapshot> load_snapshot_by_id(const CHClient& client, const std::string& response_id)

  • Purpose: Loads a snapshot by response_id.

  • Output: GradeSnapshot including:

    • Classes
    • Assignments
    • Grades
  • Behavior: Joins user_classes, user_assignments, assignment_grade_history, response_classes.


Diffing & Change Detection

bool has_changes(const CHClient& client, const std::string& user_id, const api_utils::GradesResponse& new_api_response)

  • Purpose: Detects if new API response differs from the latest snapshot.

  • Output: true if changes exist, false otherwise.

  • Checks for:

    • New assignments
    • Removed assignments
    • Score/attempt changes

std::vector<AssignmentDiff> diff_snapshots(const GradeSnapshot& old_snapshot, const GradeSnapshot& new_snapshot)

  • Purpose: Returns detailed differences between two snapshots.
  • Output: Vector of AssignmentDiff.
  • Change Types: NEW, UPDATED, REMOVED.
  • Behavior: Compares old and new grades per assignment key.

void insert_grade_updates(const CHClient& client, const std::string& user_id, const std::string& old_response_id, const std::string& new_response_id, const std::vector<AssignmentDiff>& diffs)

  • Purpose: Inserts diffs into grade_updates table.

  • Behavior:

    • Maps AssignmentDiff to ClickHouse columns.
    • Handles nullable old values for new assignments.
    • Uses placeholder values for removed assignments.
    • Logs number of inserted updates.