1 Client Data Structure Configuration

1.1 Overview

This document defines the integration between Google Drive collaboration workspace, Hetzner S3 heavy storage, and the hooks parquet data lake that serves as the authoritative data home for all ECIC client information.

1.2 Data Architecture Philosophy

Google Drive: Collaboration and live document workspace
Staging Storage: Raw Markdown exports for advisor handoffs and human readability
Hetzner S3: Heavy/long-term storage (recordings, transcripts, archives)
Hooks Parquet Lake: Authoritative structured data home and SEC compliance catalog
DuckDB: Primary queryable interface over parquet lake

1.3 Data Flow: Staging → Parquet Pipeline

graph TB
    A[Google Drive Document] --> B[Export to Markdown]
    B --> C[Stage: hookslake-staging/raw/drive/client-id/doc.md]
    C --> D[Transform to Structured Record]
    D --> E[Append to Parquet: hookslake/metadata/drive/...]

    C --> F[Advisor Access - Human Readable]
    E --> G[DuckDB Query Interface]
    G --> H[Compliance Dashboard]
    G --> I[Business Intelligence]

    style C fill:#fff3cd
    style E fill:#d1ecf1

1.4 Google Drive Configuration

1.4.1 Drive Folder Structure

1.4.1.1 Parent Folder IDs (REQUIRED)

// Configuration variables needed:
GOOGLE_DRIVE_PARENT_ID_INDIVIDUAL = ""; // TODO: Provide individual client parent folder ID
GOOGLE_DRIVE_PARENT_ID_JOINT = "";      // TODO: Provide joint client parent folder ID (if different)

Recommended Drive Structure:

Ethical Capital Client Files/
├── Individual Clients/         <- GOOGLE_DRIVE_PARENT_ID_INDIVIDUAL
│   ├── [Client Name - Email]/
│   │   ├── Onboarding/
│   │   │   ├── Call Notes (Template Copy)
│   │   │   ├── Onboarding Form Responses
│   │   │   └── KYC Documents/
│   │   ├── Portfolio Reviews/
│   │   └── Correspondence/
│   └── ...
├── Joint Clients/              <- GOOGLE_DRIVE_PARENT_ID_JOINT (optional)
│   └── [Client Names - Primary Email]/
│       └── (same structure as individual)
└── Templates/                  <- Template storage
    ├── Call Notes Template
    ├── Portfolio Review Template
    └── Client Communication Templates

1.4.2 Document Template IDs (REQUIRED)

1.4.2.1 Call Notes Template

GOOGLE_DOCS_CALL_NOTES_TEMPLATE_ID = ""; // TODO: Provide call notes template document ID

Required Template Structure:

# Client Call Notes - [CLIENT_NAME]
**Date:** [CALL_DATE]
**Type:** [Discovery/Onboarding/Review]
**Participants:** [PARTICIPANTS]

## Pre-Call Context
**Onboarding Form Summary:**
- Ethical Priorities: [ETHICAL_CONSIDERATIONS]
- Risk Indicators: [BEHAVIORAL_RESPONSES]
- Investment Experience: [EXPERIENCE_LEVEL]
- Financial Context: [NET_WORTH_ESTIMATES]

## Call Discussion
### Key Points Discussed
[TO BE FILLED DURING/AFTER CALL]

### Client Questions & Concerns
[TO BE FILLED DURING/AFTER CALL]

### Strategy Preferences Identified
[TO BE FILLED DURING/AFTER CALL]

## Post-Call Analysis
### AI-Generated Insights
[AUTOMATED SECTION - POPULATED BY RECORDING PIPELINE]

### Action Items
- [ ] [ACTION_ITEM_1]
- [ ] [ACTION_ITEM_2]

### Next Steps
[TO BE FILLED]

## Recording & Transcript
**Recording Available:** [AUTO-POPULATED]
**Transcript URL:** [AUTO-POPULATED]
**Storage Location:** [AUTO-POPULATED]

1.4.2.2 Portfolio Review Template (Optional)

GOOGLE_DOCS_PORTFOLIO_REVIEW_TEMPLATE_ID = ""; // TODO: If separate template needed

1.5 Hetzner S3 Storage Configuration

1.5.1 S3 Storage Path Convention

Confirmed Path Structure:

client-recordings/
├── YYYY/                     # Year (2024, 2025, etc.)
│   ├── MM/                   # Month (01, 02, ..., 12)
│   │   ├── client-email/     # Email with @ replaced by _at_
│   │   │   ├── discovery-YYYY-MM-DD/
│   │   │   │   ├── recording-compressed.mp4
│   │   │   │   ├── extracted-audio.wav
│   │   │   │   ├── raw-transcript.json
│   │   │   │   ├── processed-insights.json
│   │   │   │   └── metadata.json
│   │   │   ├── onboarding-YYYY-MM-DD/
│   │   │   │   └── (same file structure)
│   │   │   └── review-YYYY-MM-DD/
│   │   │       └── (same file structure)

Example Path:

client-recordings/2025/01/jane.doe_at_example.com/discovery-2025-01-15/recording-compressed.mp4

1.5.2 S3 Environment Variables

HETZNER_S3_ENDPOINT = "https://s3.hetzner.com";  // Verify actual endpoint
HETZNER_BUCKET_NAME = "ecic-client-recordings";   // TODO: Confirm bucket name
HETZNER_ACCESS_KEY = "";                          // From Doppler
HETZNER_SECRET_KEY = "";                          // From Doppler

1.6 Automated Folder Creation Logic

1.6.1 Drive Folder Creation Function

async function createClientFolder(clientData, env) {
  const isJoint = clientData.household_addition === true;
  const parentFolderId = isJoint ?
    env.GOOGLE_DRIVE_PARENT_ID_JOINT || env.GOOGLE_DRIVE_PARENT_ID_INDIVIDUAL :
    env.GOOGLE_DRIVE_PARENT_ID_INDIVIDUAL;

  const folderName = isJoint ?
    `${clientData.primary_contact.legal_name} & ${clientData.co_client.legal_name} - ${clientData.primary_contact.email}` :
    `${clientData.primary_contact.legal_name} - ${clientData.primary_contact.email}`;

  // Create main client folder
  const clientFolder = await googleDrive.createFolder({
    name: folderName,
    parents: [parentFolderId]
  }, env);

  // Create subfolders
  const subfolders = await Promise.all([
    googleDrive.createFolder({ name: 'Onboarding', parents: [clientFolder.id] }, env),
    googleDrive.createFolder({ name: 'Portfolio Reviews', parents: [clientFolder.id] }, env),
    googleDrive.createFolder({ name: 'Correspondence', parents: [clientFolder.id] }, env),
    googleDrive.createFolder({ name: 'KYC Documents', parents: [clientFolder.id] }, env)
  ]);

  return {
    clientFolderId: clientFolder.id,
    onboardingFolderId: subfolders[0].id,
    reviewsFolderId: subfolders[1].id,
    correspondenceFolderId: subfolders[2].id,
    kycFolderId: subfolders[3].id
  };
}

1.6.2 Call Notes Document Creation

async function createCallNotesDocument(clientData, folderIds, env) {
  // Copy template document
  const callNotesDoc = await googleDrive.copyFile({
    fileId: env.GOOGLE_DOCS_CALL_NOTES_TEMPLATE_ID,
    name: `Call Notes - ${clientData.primary_contact.legal_name}`,
    parents: [folderIds.onboardingFolderId]
  }, env);

  // Prefill template with onboarding form data
  const replacements = {
    '[CLIENT_NAME]': clientData.primary_contact.legal_name,
    '[CALL_DATE]': new Date().toISOString().split('T')[0],
    '[PARTICIPANTS]': `${clientData.primary_contact.legal_name}, Sloane Ortel`,
    '[ETHICAL_CONSIDERATIONS]': clientData.ethical_input?.primary_considerations?.join(', ') || 'Not specified',
    '[BEHAVIORAL_RESPONSES]': summarizeBehavioralResponses(clientData.behavioral_responses),
    '[EXPERIENCE_LEVEL]': clientData.financial_context?.investment_experience || 'Not specified',
    '[NET_WORTH_ESTIMATES]': `Net Worth: ${clientData.financial_context?.net_worth_estimate}, Investable: ${clientData.financial_context?.investable_net_worth}`
  };

  await googleDrive.batchUpdateDocument(callNotesDoc.id, replacements, env);

  return {
    callNotesDocId: callNotesDoc.id,
    callNotesUrl: `https://docs.google.com/document/d/${callNotesDoc.id}/edit`
  };
}

1.7 Integration with Hooks Parquet Lake

1.7.1 Data Lake Integration Flow

graph TB
    A[Onboarding Form Submit] --> B[Create Drive Folder + Doc]
    B --> C[Update LACRM with Links]
    C --> D[Store Metadata in Hooks Lake]

    E[Google Meet Recording] --> F[Hetzner S3 Storage]
    F --> G[Update Drive Call Notes]
    G --> H[Sync to Hooks Lake]

    D --> I[DuckDB Catalog]
    H --> I
    I --> J[SEC Compliance Dashboard]
    I --> K[Client Records Completeness]

1.8 Staging → Parquet Pipeline

1.8.1 1. Raw Markdown Export with YAML Frontmatter (Staging)

async function exportDriveDocToMarkdown(docId, clientEmail, docType, env) {
  // Get Drive document metadata
  const docMetadata = await googleDrive.getFileMetadata(docId, env);
  const rawMarkdown = await googleDrive.exportDocument(docId, 'text/markdown', env);

  // Generate YAML frontmatter
  const frontmatter = generateDocumentFrontmatter({
    clientEmail,
    docType,
    docMetadata,
    exportTimestamp: new Date().toISOString()
  });

  // Combine frontmatter + content
  const markdownWithFrontmatter = `---\n${frontmatter}\n---\n\n${rawMarkdown}`;

  // Generate staging path
  const clientId = clientEmail.replace('@', '_at_').replace('.', '_');
  const timestamp = new Date().toISOString().split('T')[0];
  const stagingPath = `hookslake-staging/raw/drive/${clientId}/${docType}-${timestamp}.md`;

  // Store enhanced Markdown for advisor access
  await s3Client.send(new PutObjectCommand({
    Bucket: env.STAGING_BUCKET_NAME,
    Key: stagingPath,
    Body: markdownWithFrontmatter,
    ContentType: 'text/markdown',
    Metadata: {
      clientEmail: clientEmail,
      driveDocId: docId,
      docType: docType,
      exportTimestamp: new Date().toISOString(),
      hasFrontmatter: 'true'
    }
  }));

  return {
    stagingPath: stagingPath,
    markdownContent: markdownWithFrontmatter,
    frontmatter: frontmatter,
    stagingUrl: `${env.S3_ENDPOINT}/${env.STAGING_BUCKET_NAME}/${stagingPath}`
  };
}

1.9 YAML Frontmatter Standards

1.9.1 Universal Fields (All Document Types)

# Document Identity
client_email: "jane.doe@example.com"
client_name: "Jane Doe"
doc_type: "call-notes" # call-notes, onboarding-form, portfolio-review, correspondence
submission_id: "sub_20250115_jane_doe_call_notes"

# Drive Integration
drive_file_id: "1abc123def456..."
drive_folder_id: "1xyz789uvw012..."
drive_url: "https://docs.google.com/document/d/1abc123def456/edit"

# Timestamps (ISO 8601)
created_date: "2025-01-14T10:30:00Z"
modified_date: "2025-01-15T14:20:00Z"
exported_date: "2025-01-15T16:45:00Z"

# Processing Status
processing_status: "exported" # draft, exported, analyzed, archived
version: 1 # Increments on each export
content_hash: "sha256:abc123..." # For change detection

# SEC Compliance
record_type: "5.6" # SEC Rule 204-2 section
retention_period: "7y" # 7 years minimum
compliance_status: "compliant"

1.9.2 Call Notes Specific Fields

# Meeting Details
call_type: "discovery" # discovery, onboarding, portfolio-review, check-in
call_date: "2025-01-15"
duration_minutes: 45
participants: ["Jane Doe", "Sloane Ortel"]

# Recording Integration
has_recording: true
recording_s3_path: "client-recordings/2025/01/jane.doe_at_example.com/discovery-2025-01-15/"
transcript_processed: true
ai_analysis_complete: true

# Client Context (from onboarding form)
ethical_priorities: ["animal_welfare", "environmental_impact"]
divestment_alignments: ["bds", "fossil_fuels"]
investment_experience: "limited"
risk_indicators:
  sleep_stress_tolerance: "disagree" # q4_stress_sleep_patterns
  general_risk_avoidance: "neutral"  # q3_general_risk_avoidance
  ethics_over_volatility: "strongly_agree" # q7_volatility_vs_ethics

# Strategy Assessment
anxiety_level: "moderate" # low, moderate, high (from AI analysis)
values_alignment: "excellent" # poor, fair, good, excellent
strategy_preference: "growth" # growth, income, diversification, balanced
red_flags: [] # Array of concern strings
green_lights: ["strong ethical conviction", "adequate emergency fund"]

1.9.3 Onboarding Form Specific Fields

# Form Metadata
form_version: "v8"
completion_duration_minutes: 23
submission_timestamp: "2025-01-14T09:15:00Z"

# Household Structure
is_joint_client: false
co_client_name: null # or "John Doe" if joint

# Behavioral Profile Summary
behavioral_summary:
  missed_gains_tolerance: "neutral"
  worst_case_focus: "agree"
  general_risk_avoidance: "neutral"
  stress_sleep_patterns: "disagree"
  long_term_commitment: "strongly_agree"
  values_alignment_worry: "strongly_agree"
  volatility_vs_ethics: "strongly_agree"

# Financial Context
net_worth_estimate: "$250,000 - $500,000"
investable_net_worth: "$100,000 - $250,000"
emergency_liquidity: true
employment_status: "full_time"

# Ethical Framework
primary_concerns: ["animal_welfare", "environmental_impact", "human_rights"]
divestment_preferences: ["bds", "fossil_fuels"]
framework_evolution_comfort: "strongly_support"
novel_concerns: "Concerned about AI companies' labor practices"

# Professional Coordination
existing_team: ["accountant", "attorney"]
referral_requests: ["money_coach"]

1.9.4 Portfolio Review Specific Fields

# Review Details
review_type: "quarterly" # monthly, quarterly, semi-annual, annual, ad-hoc
review_period_start: "2024-10-01"
review_period_end: "2024-12-31"
portfolio_value_start: 125000.00
portfolio_value_end: 132500.00

# Performance Summary
total_return_percent: 6.0
benchmark_return_percent: 4.8
outperformance_percent: 1.2

# Strategy Changes
strategy_adjustments: ["increased growth allocation", "reduced fossil fuel exposure"]
rebalancing_required: true
tax_loss_harvesting: false

# Client Satisfaction
client_questions_count: 3
concerns_raised: ["market volatility", "ESG fund performance"]
satisfaction_level: "high" # low, moderate, high
next_review_date: "2025-04-15"

1.9.5 Frontmatter Generation Function

function generateDocumentFrontmatter(data) {
  const { clientEmail, docType, docMetadata, exportTimestamp, clientContext } = data;

  // Universal fields for all documents
  const universal = {
    client_email: clientEmail,
    client_name: clientContext?.legal_name || extractNameFromEmail(clientEmail),
    doc_type: docType,
    submission_id: generateSubmissionId(clientEmail, docType, exportTimestamp),

    // Drive Integration
    drive_file_id: docMetadata.id,
    drive_folder_id: docMetadata.parents?.[0] || null,
    drive_url: `https://docs.google.com/document/d/${docMetadata.id}/edit`,

    // Timestamps
    created_date: docMetadata.createdTime,
    modified_date: docMetadata.modifiedTime,
    exported_date: exportTimestamp,

    // Processing Status
    processing_status: 'exported',
    version: 1, // TODO: Implement version tracking
    content_hash: null, // TODO: Generate content hash

    // SEC Compliance
    record_type: getRecordType(docType),
    retention_period: '7y',
    compliance_status: 'compliant'
  };

  // Type-specific fields
  let typeSpecific = {};

  switch (docType) {
    case 'call-notes':
      typeSpecific = generateCallNotesFields(clientContext, docMetadata);
      break;
    case 'onboarding-form':
      typeSpecific = generateOnboardingFields(clientContext, docMetadata);
      break;
    case 'portfolio-review':
      typeSpecific = generatePortfolioReviewFields(clientContext, docMetadata);
      break;
  }

  // Convert to YAML string
  return Object.entries({ ...universal, ...typeSpecific })
    .map(([key, value]) => {
      if (typeof value === 'object' && value !== null) {
        return `${key}:\n${Object.entries(value)
          .map(([k, v]) => `  ${k}: ${formatYamlValue(v)}`)
          .join('\n')}`;
      }
      return `${key}: ${formatYamlValue(value)}`;
    })
    .join('\n');
}

function generateCallNotesFields(clientContext, docMetadata) {
  return {
    // Meeting Details (extracted from doc title or context)
    call_type: detectCallType(docMetadata.name),
    call_date: extractCallDate(docMetadata.createdTime),
    duration_minutes: null, // TODO: Extract from recording data
    participants: [clientContext?.legal_name, 'Sloane Ortel'].filter(Boolean),

    // Recording Integration
    has_recording: false, // TODO: Check for linked recording
    recording_s3_path: null,
    transcript_processed: false,
    ai_analysis_complete: false,

    // Client Context (from onboarding form data)
    ethical_priorities: clientContext?.ethical_input?.primary_considerations || [],
    divestment_alignments: clientContext?.ethical_input?.divestment_alignments || [],
    investment_experience: clientContext?.financial_context?.investment_experience || null,
    risk_indicators: {
      sleep_stress_tolerance: clientContext?.behavioral_responses?.q4_stress_sleep_patterns || null,
      general_risk_avoidance: clientContext?.behavioral_responses?.q3_general_risk_avoidance || null,
      ethics_over_volatility: clientContext?.behavioral_responses?.q7_volatility_vs_ethics || null
    },

    // Strategy Assessment (initially null, filled by AI analysis)
    anxiety_level: null,
    values_alignment: null,
    strategy_preference: null,
    red_flags: [],
    green_lights: []
  };
}

function getRecordType(docType) {
  // Maps document types to SEC Rule 204-2 record categories from ECIC matrix
  const mapping = {
    'call-notes': '5.6',          // Client Communications (per matrix)
    'onboarding-form': '5.5',     // Client Due Diligence Information (per matrix)
    'portfolio-review': '4.12',   // Billing Records/Reviews (per matrix)
    'correspondence': '5.6',      // Client Communications (per matrix)
    'discovery-call': '4.13',     // RFP Communications (per matrix)
    'client-agreement': '5.1'     // Client Service Agreements (per matrix)
  };
  return mapping[docType] || '5.0';
}

function formatYamlValue(value) {
  if (value === null) return 'null';
  if (typeof value === 'string') return `"${value.replace(/"/g, '\\"')}"`;
  if (Array.isArray(value)) return `[${value.map(v => `"${v}"`).join(', ')}]`;
  return value;
}

1.9.6 2. Enhanced Parquet Ingestion with YAML Metadata

async function ingestToParquetLake(docMetadata, markdownContent, frontmatter, env) {
  // Parse YAML frontmatter for structured fields
  const yamlData = parseYamlFrontmatter(frontmatter);

  // Extract content without frontmatter
  const contentOnly = markdownContent.replace(/^---\n.*?\n---\n\n/s, '');

  // Transform to structured record with YAML metadata
  const parquetRecord = {
    // Core identifiers
    client_id: yamlData.client_email.replace('@', '_at_').replace('.', '_'),
    client_email: yamlData.client_email,
    client_name: yamlData.client_name,
    submission_id: yamlData.submission_id,
    doc_type: yamlData.doc_type,

    // Drive integration
    drive_file_id: yamlData.drive_file_id,
    drive_folder_id: yamlData.drive_folder_id,
    drive_url: yamlData.drive_url,

    // Timestamps
    created_timestamp: yamlData.created_date,
    modified_timestamp: yamlData.modified_date,
    exported_timestamp: yamlData.exported_date,

    // Content handling
    staging_path: docMetadata.stagingPath,
    staging_url: docMetadata.stagingUrl,
    content_excerpt: contentOnly.substring(0, 500),
    content_word_count: contentOnly.split(/\s+/).length,
    content_hash: yamlData.content_hash || await generateContentHash(contentOnly),

    // Storage strategy
    full_content: contentOnly.length < 10000 ? contentOnly : null,
    large_content_pointer: contentOnly.length >= 10000 ? docMetadata.stagingUrl : null,

    // Processing metadata
    processing_status: yamlData.processing_status,
    version: yamlData.version,
    has_frontmatter: true,

    // SEC compliance
    record_type: yamlData.record_type,
    retention_period: yamlData.retention_period,
    compliance_status: yamlData.compliance_status,

    // Type-specific structured data (JSON columns)
    call_notes_metadata: yamlData.doc_type === 'call-notes' ? {
      call_type: yamlData.call_type,
      call_date: yamlData.call_date,
      duration_minutes: yamlData.duration_minutes,
      participants: yamlData.participants,
      has_recording: yamlData.has_recording,
      recording_s3_path: yamlData.recording_s3_path,
      ethical_priorities: yamlData.ethical_priorities,
      risk_indicators: yamlData.risk_indicators,
      strategy_assessment: {
        anxiety_level: yamlData.anxiety_level,
        values_alignment: yamlData.values_alignment,
        strategy_preference: yamlData.strategy_preference,
        red_flags: yamlData.red_flags,
        green_lights: yamlData.green_lights
      }
    } : null,

    onboarding_metadata: yamlData.doc_type === 'onboarding-form' ? {
      form_version: yamlData.form_version,
      completion_duration_minutes: yamlData.completion_duration_minutes,
      is_joint_client: yamlData.is_joint_client,
      behavioral_summary: yamlData.behavioral_summary,
      financial_context: {
        net_worth_estimate: yamlData.net_worth_estimate,
        investable_net_worth: yamlData.investable_net_worth,
        emergency_liquidity: yamlData.emergency_liquidity
      },
      ethical_framework: {
        primary_concerns: yamlData.primary_concerns,
        divestment_preferences: yamlData.divestment_preferences,
        novel_concerns: yamlData.novel_concerns
      }
    } : null,

    portfolio_review_metadata: yamlData.doc_type === 'portfolio-review' ? {
      review_type: yamlData.review_type,
      review_period_start: yamlData.review_period_start,
      review_period_end: yamlData.review_period_end,
      performance_summary: {
        total_return_percent: yamlData.total_return_percent,
        benchmark_return_percent: yamlData.benchmark_return_percent,
        outperformance_percent: yamlData.outperformance_percent
      },
      strategy_changes: yamlData.strategy_adjustments,
      client_satisfaction: {
        satisfaction_level: yamlData.satisfaction_level,
        concerns_raised: yamlData.concerns_raised
      }
    } : null
  };

  // Append to parquet dataset
  await appendToParquetTable('hookslake/metadata/drive/drive_documents.parquet', parquetRecord, env);

  return parquetRecord;
}

function parseYamlFrontmatter(frontmatterString) {
  // Simple YAML parser for frontmatter
  // In production, use a proper YAML library like js-yaml
  const lines = frontmatterString.split('\n');
  const result = {};

  lines.forEach(line => {
    if (line.includes(':')) {
      const [key, ...valueParts] = line.split(':');
      const value = valueParts.join(':').trim();

      // Handle arrays [item1, item2]
      if (value.startsWith('[') && value.endsWith(']')) {
        result[key.trim()] = value.slice(1, -1).split(',').map(v => v.trim().replace(/"/g, ''));
      }
      // Handle quoted strings
      else if (value.startsWith('"') && value.endsWith('"')) {
        result[key.trim()] = value.slice(1, -1);
      }
      // Handle null
      else if (value === 'null') {
        result[key.trim()] = null;
      }
      // Handle booleans
      else if (value === 'true' || value === 'false') {
        result[key.trim()] = value === 'true';
      }
      // Handle numbers
      else if (!isNaN(value) && value !== '') {
        result[key.trim()] = parseFloat(value);
      }
      // Default to string
      else {
        result[key.trim()] = value;
      }
    }
  });

  return result;
}

1.9.7 Hooks Data Lake Schema

The hooks system will query this structured parquet data:

-- Drive documents metadata table
CREATE TABLE drive_documents (
  client_id VARCHAR,
  client_email VARCHAR,
  submission_id VARCHAR,
  doc_type VARCHAR, -- 'call-notes', 'onboarding-form', 'portfolio-review'
  drive_file_id VARCHAR,
  drive_folder_id VARCHAR,
  document_title VARCHAR,
  created_timestamp TIMESTAMP,
  modified_timestamp TIMESTAMP,
  exported_timestamp TIMESTAMP,
  staging_path VARCHAR, -- hookslake-staging/raw/drive/client/doc.md
  staging_url VARCHAR,  -- Full URL for advisor access
  content_excerpt VARCHAR(500),
  content_word_count INTEGER,
  content_hash VARCHAR,
  full_content TEXT, -- For documents < 10KB
  large_content_pointer VARCHAR, -- URL for documents >= 10KB
  PRIMARY KEY (client_id, drive_file_id, exported_timestamp)
);

-- Client folder structure metadata (existing)
CREATE TABLE client_drive_folders (
  client_email VARCHAR PRIMARY KEY,
  legal_name VARCHAR,
  is_joint BOOLEAN,
  client_folder_id VARCHAR,
  onboarding_folder_id VARCHAR,
  call_notes_doc_id VARCHAR,
  drive_folder_url VARCHAR,
  call_notes_url VARCHAR,
  created_date TIMESTAMP,
  last_updated TIMESTAMP,
  onboarding_status VARCHAR
);

-- Meeting recording catalog (existing)
CREATE TABLE client_recordings (
  client_email VARCHAR,
  recording_date DATE,
  call_type VARCHAR, -- 'discovery', 'onboarding', 'review'
  s3_path VARCHAR,
  drive_doc_id VARCHAR,
  transcript_processed BOOLEAN,
  duration_minutes INTEGER,
  recording_size_mb DECIMAL,
  PRIMARY KEY (client_email, recording_date, call_type)
);

1.9.8 LACRM Custom Fields Integration

Update LACRM contacts with drive links that get synced to parquet lake:

async function updateLACRMWithDriveLinks(clientEmail, driveData, env) {
  const contact = await lacrm.findContactByEmail(clientEmail, env);

  if (contact) {
    // These fields get synced to hooks parquet lake via LACRM ingestion
    await lacrm.updateContact(contact.id, {
      'Client Folder': `https://drive.google.com/drive/folders/${driveData.clientFolderId}`,
      'Call Notes Doc': driveData.callNotesUrl,
      'Onboarding Status': 'Drive Setup Complete',
      'Drive Folder ID': driveData.clientFolderId, // For automation reference
      'Call Notes Doc ID': driveData.callNotesDocId, // For recording pipeline
    }, env);
  }
}

1.9.9 Recording Pipeline Integration

When recordings arrive, they update both the Drive doc AND get cataloged in the lake:

async function linkRecordingToParquetLake(clientEmail, recordingData, env) {
  // This gets picked up by hooks ingestion
  const recordingMetadata = {
    client_email: clientEmail,
    recording_date: recordingData.recordingDate,
    call_type: recordingData.callType, // 'discovery', 'onboarding', 'review'
    s3_path: recordingData.s3StoragePath,
    transcript_url: recordingData.transcriptUrl,
    insights_url: recordingData.insightsUrl,
    duration_minutes: Math.round(recordingData.duration / 60),
    recording_size_mb: Math.round(recordingData.compressedSize / (1024 * 1024)),
    drive_doc_updated: true,
    processed_timestamp: new Date().toISOString()
  };

  // Store in KV for hooks ingestion pickup
  await env.KV_STORAGE.put(
    `recording_metadata_${clientEmail}_${recordingData.recordingDate}`,
    JSON.stringify(recordingMetadata)
  );
}

## Implementation Checklist

### **Phase 1: Configuration Setup**
- [ ] **GOOGLE_DRIVE_PARENT_ID_INDIVIDUAL**: Provide individual client parent folder ID
- [ ] **GOOGLE_DRIVE_PARENT_ID_JOINT**: Provide joint client parent folder ID (if different from individual)
- [ ] **GOOGLE_DOCS_CALL_NOTES_TEMPLATE_ID**: Create and provide call notes template document ID
- [ ] **HETZNER_BUCKET_NAME**: Confirm or create Hetzner S3 bucket name
- [ ] **HETZNER_S3_ENDPOINT**: Verify correct Hetzner S3 endpoint URL

### **Phase 2: Template Creation**
- [ ] Create call notes Google Doc template with placeholder text
- [ ] Test template copying and text replacement functionality
- [ ] Verify template permissions allow Worker access
- [ ] Create additional templates if needed (portfolio review, correspondence)

### **Phase 3: Testing**
- [ ] Test folder creation with sample client data
- [ ] Verify S3 path convention with sample recording
- [ ] Test integration between Drive folder creation and recording pipeline
- [ ] Validate LACRM updates with drive links

## Required Information from User

### **Immediate Need (for automation implementation):**

1. **Google Drive Parent Folder ID(s)**
   - Single parent for all clients, OR
   - Separate parents for individual vs joint clients
   - How to obtain: Navigate to desired parent folder in Google Drive, copy ID from URL

2. **Call Notes Template Document ID**
   - Google Doc template for call notes
   - Should contain placeholder text for automated replacement
   - How to obtain: Create template document, copy ID from URL

3. **S3 Storage Configuration**
   ```javascript
   // Heavy storage (recordings, transcripts, long-term archives)
   HETZNER_BUCKET_NAME = "ecic-client-recordings"; // Confirm bucket name
   HETZNER_S3_ENDPOINT = "https://s3.hetzner.com"; // Verify endpoint

   // Staging storage (raw Markdown for advisor access)
   STAGING_BUCKET_NAME = "ecic-hookslake-staging"; // Or use same bucket with prefix

Storage Structure: ``` # Heavy Storage (Hetzner S3) client-recordings/YYYY/MM/client-email/call-type-date/ ├── recording-compressed.mp4 ├── extracted-audio.wav ├── raw-transcript.json └── processed-insights.json

# Staging Storage (Raw Markdown) hookslake-staging/raw/drive/client-id/ ├── call-notes-2025-01-15.md ├── onboarding-form-2025-01-14.md └── portfolio-review-2025-01-20.md

# Parquet Lake (Structured Data) hookslake/metadata/drive/ ├── drive_documents.parquet ├── client_drive_folders.parquet └── client_recordings.parquet ```

1.9.10 Optional Enhancements:

Portfolio review template ID (if separate document needed)
Client correspondence template ID
Custom folder naming conventions
Additional subfolder structures

1.10 SEC Compliance Integration

1.10.1 Records Management Matrix Alignment

The automated data structure supports SEC Rule 204-2 compliance via hooks integration:

Section 5.0 - Client Records: - 5.1 Client Service Agreements: Client agreements → LACRM + Drive folder links → parquet lake catalog - 5.5 Client Due Diligence Information: Onboarding forms + KYC data → structured analysis - 5.6 Client Communications: Call notes + Gmail metadata → automated archival - 5.8 Investment Policy Statements: Portfolio reviews + client preferences → strategy documentation

Section 4.0 - Investment Advisory Records: - 4.12 Billing Records/Reviews: Portfolio review documents → performance tracking - 4.13 RFP Communications: Discovery call recordings → client acquisition pipeline

1.10.2 Hooks Parquet Lake Queries for Compliance

-- Client records completeness dashboard (enhanced with document tracking)
SELECT
  c.client_email,
  c.legal_name,
  c.drive_folder_url IS NOT NULL as has_folder,
  c.call_notes_url IS NOT NULL as has_notes,
  COUNT(DISTINCT r.recording_date) as total_recordings,
  MAX(r.recording_date) as last_call_date,
  COUNT(DISTINCT d.drive_file_id) as total_documents,
  MAX(d.exported_timestamp) as last_document_export,
  -- Document types breakdown
  SUM(CASE WHEN d.doc_type = 'call-notes' THEN 1 ELSE 0 END) as call_notes_docs,
  SUM(CASE WHEN d.doc_type = 'onboarding-form' THEN 1 ELSE 0 END) as onboarding_docs,
  SUM(CASE WHEN d.doc_type = 'portfolio-review' THEN 1 ELSE 0 END) as review_docs
FROM client_drive_folders c
LEFT JOIN client_recordings r ON c.client_email = r.client_email
LEFT JOIN drive_documents d ON c.client_email = d.client_email
GROUP BY c.client_email, c.legal_name, c.drive_folder_url, c.call_notes_url;

-- Document staging and parquet health check
SELECT
  doc_type,
  COUNT(*) as total_documents,
  AVG(content_word_count) as avg_word_count,
  SUM(CASE WHEN full_content IS NOT NULL THEN 1 ELSE 0 END) as stored_inline,
  SUM(CASE WHEN large_content_pointer IS NOT NULL THEN 1 ELSE 0 END) as stored_staging,
  -- Content freshness
  COUNT(CASE WHEN exported_timestamp >= CURRENT_DATE - INTERVAL 7 DAYS THEN 1 END) as exported_last_7d,
  -- Staging accessibility check
  COUNT(DISTINCT staging_path) as unique_staging_paths
FROM drive_documents
WHERE exported_timestamp >= '2025-01-01'
GROUP BY doc_type;

-- SEC compliance document inventory (aligned with matrix)
SELECT
  'Section 5.1 - Client Service Agreements' as record_type,
  COUNT(CASE WHEN d.record_type = '5.1' THEN 1 END) as document_count,
  COUNT(DISTINCT d.client_email) as clients_covered
FROM drive_documents d
UNION ALL
SELECT
  'Section 5.5 - Client Due Diligence Information' as record_type,
  COUNT(CASE WHEN d.record_type = '5.5' THEN 1 END) as document_count,
  COUNT(DISTINCT d.client_email) as clients_covered
FROM drive_documents d
UNION ALL
SELECT
  'Section 5.6 - Client Communications' as record_type,
  COUNT(CASE WHEN d.record_type = '5.6' THEN 1 END) as document_count,
  COUNT(DISTINCT d.client_email) as clients_covered
FROM drive_documents d
UNION ALL
SELECT
  'Section 4.12 - Billing Records/Reviews' as record_type,
  COUNT(CASE WHEN d.record_type = '4.12' THEN 1 END) as document_count,
  COUNT(DISTINCT d.client_email) as clients_covered
FROM drive_documents d
UNION ALL
SELECT
  'Section 4.13 - RFP Communications' as record_type,
  COUNT(*) as document_count,
  COUNT(DISTINCT r.client_email) as clients_covered
FROM client_recordings r
WHERE r.call_type = 'discovery';

-- Advisor handoff query - get staged Markdown URLs
SELECT
  d.client_email,
  d.doc_type,
  d.document_title,
  d.staging_url as markdown_url_for_advisor,
  d.exported_timestamp,
  d.content_excerpt
FROM drive_documents d
WHERE d.client_email = ? -- Parameter for specific client
  AND d.exported_timestamp >= CURRENT_DATE - INTERVAL 30 DAYS
ORDER BY d.exported_timestamp DESC;

1.11 Worker Orchestration Flow

1.11.1 Complete Automation Pipeline

async function handleClientOnboardingComplete(formData, env) {
  // Phase 1: Drive Setup
  const driveData = await createClientFolder(formData, env);
  const callNotesDoc = await createCallNotesDocument(formData, driveData, env);

  // Phase 2: LACRM Integration
  await updateLACRMWithDriveLinks(formData.primary_contact.email, {
    ...driveData,
    ...callNotesDoc
  }, env);

  // Phase 3: Export to Staging + Parquet
  const markdownExport = await exportDriveDocToMarkdown(
    callNotesDoc.callNotesDocId,
    formData.primary_contact.email,
    'call-notes',
    env
  );

  const parquetRecord = await ingestToParquetLake(
    {
      clientEmail: formData.primary_contact.email,
      driveDocId: callNotesDoc.callNotesDocId,
      driveFolderId: driveData.clientFolderId,
      docType: 'call-notes',
      stagingPath: markdownExport.stagingPath,
      stagingUrl: markdownExport.stagingUrl
    },
    markdownExport.markdownContent,
    env
  );

  // Phase 4: Recording Pipeline Preparation
  await prepareRecordingPipeline(formData.primary_contact.email, driveData, env);

  return {
    success: true,
    driveFolder: driveData.clientFolderId,
    callNotesDoc: callNotesDoc.callNotesUrl,
    stagingMarkdown: markdownExport.stagingUrl,
    parquetRecord: parquetRecord.submission_id
  };
}

1.11.2 Follow-up Queue Jobs

For heavy operations that shouldn’t block the main response:

// Queue job for periodic document sync
async function scheduleDocumentSync(clientEmail, env) {
  await env.DOCUMENT_SYNC_QUEUE.send({
    type: 'periodic_sync',
    clientEmail: clientEmail,
    syncTypes: ['call-notes', 'onboarding-form', 'portfolio-review'],
    schedule: 'daily' // or 'on_modification'
  });
}

// Queue job for recording processing
async function handleRecordingDetected(recordingData, env) {
  // This integrates with existing recording pipeline
  const clientData = await getClientDriveData(recordingData.clientEmail, env);

  // Process recording to S3
  const recordingStorage = await processRecordingToS3(recordingData, env);

  // Update Drive call notes doc
  await updateCallNotesWithRecording(clientData.callNotesDocId, recordingStorage, env);

  // Export updated doc to staging + parquet
  const updatedExport = await exportDriveDocToMarkdown(
    clientData.callNotesDocId,
    recordingData.clientEmail,
    'call-notes',
    env
  );

  await ingestToParquetLake(updatedExport, env);
}

Next Steps: This staging → parquet architecture ensures: 1. Advisors get human-readable Markdown for quick handoffs 2. Parquet lake stays clean with structured metadata 3. Hooks system ingests everything for compliance dashboards 4. SEC compliance maintained through automated document cataloging 5. Business intelligence enabled via DuckDB query interface

Once you provide the Google Drive IDs and S3 configuration, the automation can orchestrate Drive creation, Markdown staging, parquet ingestion, and full integration with your existing hooks data lake.