API Data Flow Architecture

1 Reporting Center API Data Flow Architecture

Comprehensive mapping of all data sources, transformation pipelines, and API endpoints required for the ECIC Static Reporting Center.

1.1 Executive Summary

Purpose: Design unified API layer connecting all ECIC data sources to interactive Quarto dashboards Architecture: Event-driven data pipelines feeding centralized API serving real-time and historical data Security: Role-based access with data sensitivity classification at the API level

1.2 Data Source Inventory

1.2.1 Primary Data Sources

1.2.1.1 1. Hooks Data Lake (DuckDB)

Location: ~/dev/workbench/hooks/data/duckdb/lake_catalog.duckdb Type: Analytical database (primary reporting source) Update Frequency: Real-time ingestion + daily batch processing Data Volume: 9,941 files catalogued, 110 SEC record types

Key Datasets: - contacts_index - 269KB unified contact mapping - matrix_records_retention - SEC compliance tracking - client_communications - Email metadata (180-day retention) - portfolio_positions - Account holdings and performance - compliance_audit_trail - Regulatory record completeness

1.2.1.2 2. LACRM API

Type: CRM system of record API Limits: 100 requests/minute Authentication: API key (stored in Doppler) Update Pattern: 4-hour sync cycles with incremental updates

Critical Data: - Client master records (48 active contacts) - Custom fields mapping to SEC record types - Pipeline/opportunity tracking - Contact interaction history - Document attachment metadata

1.2.1.3 3. Google Workspace APIs

Services: Gmail, Drive, Calendar, Docs Authentication: OAuth 2.0 with service accounts Update Pattern: Daily ingestion (2 AM UTC) Retention: 180-day email metadata compliance

Data Types: - Email communication summaries - Document collaboration activity - Meeting/calendar data - Shared drive file inventory - Form responses and submissions

1.2.1.4 4. Altruist Custody Platform

Critical Limitation: No API access available Current Process: Manual copy/paste from platform Data Staleness Risk: Updates dependent on manual process Integration Need: Screen scraping or alternative solution

Essential Data: - Real-time account balances and positions - Transaction history and confirmations - Performance attribution by account - Fee billing and cost basis tracking

1.2.2 Secondary Data Sources

1.2.2.1 5. Market Data & Benchmarks

Need: External market data for performance comparison Potential Sources: Alpha Vantage API, Yahoo Finance, SEC EDGAR Update Frequency: Daily market close

Requirements: - S&P 500, Russell indices for benchmarking - ESG/sustainable investing index performance - Sector performance for attribution analysis - Yield curve and fixed income benchmarks

1.2.2.2 6. Regulatory Data

Sources: SEC IARD system, Utah Division of Securities Type: Form ADV filings, registration status, regulatory notices Update Pattern: On-demand when filings occur

1.2.2.3 7. Vendor Performance Data

Sources: Direct vendor APIs where available Types: SLA metrics, cost data, service quality scores Integration Complexity: Varies by vendor API capabilities

1.3 API Architecture Design

1.3.1 Core API Principles

1. Unified Data Layer

graph TB
    A[LACRM API] --> E[Data Processing Layer]
    B[Google Workspace] --> E
    C[Hooks DuckDB] --> E
    D[Altruist Manual] --> E
    E --> F[Unified API Gateway]
    F --> G[Dashboard Consumption]
    F --> H[Report Generation]
    F --> I[Compliance Monitoring]

    style E fill:#e1f5fe
    style F fill:#c8e6c9
    style D fill:#ffcccc

2. Event-Driven Architecture - Real-time events for critical data changes - Batch processing for historical analysis - Webhook integration for external systems - Queue-based processing for reliability

3. Data Classification & Access Control

Level 1: Public (aggregated performance summaries)
Level 2: Internal (operational metrics, vendor data)
Level 3: Restricted (individual client data, detailed financials)
Level 4: Confidential (regulatory investigations, strategic plans)
Level 5: Executive (board materials, compliance issues)

1.3.2 API Endpoint Architecture

1.3.2.1 Authentication & Authorization

/api/auth/
  POST /login          # Google SSO integration
  POST /refresh        # Token refresh
  GET  /permissions    # Role-based permissions check
  POST /logout         # Session termination

1.3.2.2 Executive Dashboard APIs

/api/executive/
  GET  /dashboard      # Real-time KPI summary
  GET  /aum           # Assets under management trends
  GET  /clients       # Client count and growth metrics
  GET  /performance   # Portfolio performance summary
  GET  /alerts        # Priority alerts and actions needed

Data Sources: Hooks DuckDB + LACRM + Altruist Update Frequency: Every 4 hours Cache Strategy: 1-hour cache with real-time invalidation Security Level: Executive (Level 5)

1.3.2.3 Regulatory Intelligence APIs

/api/regulatory/
  GET  /compliance    # SEC record compliance status
  GET  /form-adv      # Form ADV amendment tracking
  GET  /examinations  # Examination readiness checklist
  GET  /filings       # Regulatory filing status
  GET  /audit-trail   # Document completeness audit

Data Sources: Hooks DuckDB (110 SEC record types) + LACRM custom fields Update Frequency: Daily Cache Strategy: 8-hour cache Security Level: Confidential (Level 4)

1.3.2.4 Client Analytics APIs

/api/clients/
  GET  /performance   # Portfolio performance by strategy
  GET  /demographics  # Client distribution analysis
  GET  /satisfaction  # Client satisfaction metrics
  GET  /lifecycle     # Client acquisition/retention
  GET  /communications # Communication touch analysis

Data Sources: Hooks DuckDB + LACRM + Google Workspace + Altruist Update Frequency: Daily for aggregates, 4-hour for individual metrics Cache Strategy: 2-hour cache Security Level: Restricted (Level 3)

1.3.2.5 Operations APIs

/api/operations/
  GET  /vendors       # Vendor performance scorecards
  GET  /processes     # Process efficiency metrics
  GET  /systems       # System health and uptime
  GET  /costs         # Cost analysis and budget tracking
  GET  /automation    # Automation status and effectiveness

Data Sources: Multiple vendor APIs + system monitoring + cost tracking Update Frequency: Hourly for health, daily for performance metrics Cache Strategy: 30-minute cache Security Level: Internal (Level 2)

1.3.2.6 Strategic Intelligence APIs

/api/strategic/
  GET  /market        # Market analysis and trends
  GET  /competition   # Competitive positioning
  GET  /growth        # Growth planning metrics
  GET  /initiatives   # Strategic initiative tracking
  GET  /scenarios     # Risk/scenario modeling

Data Sources: External market data + internal strategic planning data Update Frequency: Daily for market data, weekly for strategic metrics Cache Strategy: 4-hour cache Security Level: Executive (Level 5)

1.3.3 Data Transformation Pipelines

1.3.3.1 Real-Time Processing

# Priority data flows requiring immediate updates
REAL_TIME_SOURCES = {
    'client_communications': 'google_gmail_api',
    'compliance_alerts': 'hooks_monitoring',
    'system_health': 'infrastructure_monitoring',
    'priority_tasks': 'current_priorities_yaml'
}

1.3.3.2 Batch Processing

# Scheduled data processing pipelines
BATCH_SCHEDULES = {
    'daily_2am': [
        'google_workspace_sync',
        'lacrm_full_sync',
        'market_data_update',
        'performance_calculations'
    ],
    'hourly': [
        'lacrm_incremental_sync',
        'hooks_data_validation',
        'vendor_health_checks'
    ],
    'weekly': [
        'compliance_completeness_audit',
        'client_satisfaction_analysis',
        'strategic_metrics_update'
    ]
}

1.4 API Implementation Requirements

1.4.1 Technical Specifications

1.4.1.1 1. Python FastAPI Framework

Rationale: - Native async support for high-performance data serving - Automatic OpenAPI documentation generation - Built-in data validation with Pydantic - Excellent integration with DuckDB and other data sources

1.4.1.2 2. Authentication Integration

# Google Workspace SSO integration
AUTHENTICATION = {
    'provider': 'google_workspace_sso',
    'mfa_required': True,
    'session_timeout': 8, # hours
    'concurrent_sessions': 2,
    'role_based_access': True
}

1.4.1.3 3. Data Connection Layer

# Connection pooling and management
DATABASE_CONNECTIONS = {
    'hooks_duckdb': '~/dev/workbench/hooks/data/duckdb/lake_catalog.duckdb',
    'lacrm_api': 'https://api.lessannoyingcrm.com/v2/',
    'google_apis': ['gmail', 'drive', 'calendar'],
    'altruist_manual': 'manual_csv_import_endpoint'
}

1.4.1.4 4. Caching Strategy

# Redis-based caching with invalidation
CACHE_CONFIGURATION = {
    'executive_dashboard': {'ttl': 3600, 'invalidate_on': ['aum_update', 'client_change']},
    'regulatory_compliance': {'ttl': 28800, 'invalidate_on': ['document_upload', 'filing_update']},
    'client_analytics': {'ttl': 7200, 'invalidate_on': ['performance_update', 'client_communication']},
    'operations_metrics': {'ttl': 1800, 'invalidate_on': ['vendor_status_change']}
}

1.4.2 Data Quality & Validation

1.4.2.1 Data Consistency Checks

validation_rules:
  client_data:
    - unique_client_id_across_systems
    - lacrm_hooks_contact_reconciliation
    - altruist_account_mapping_validation

  compliance_data:
    - sec_record_completeness_by_type
    - retention_period_compliance
    - audit_trail_integrity

  performance_data:
    - position_reconciliation_altruist_hooks
    - benchmark_data_freshness
    - calculation_accuracy_validation

1.4.2.2 Error Handling & Monitoring

# Comprehensive error tracking and alerting
ERROR_HANDLING = {
    'api_failures': 'log_and_alert_immediate',
    'data_inconsistencies': 'log_and_queue_for_review',
    'authentication_failures': 'security_alert_and_log',
    'performance_degradation': 'performance_monitoring_alert'
}

1.5 Integration Challenges & Solutions

1.5.1 1. Altruist API Limitation

Problem: No API access for real-time custody data Solutions: - Screen Scraping: Automated browser interaction (legal/ToS review required) - CSV Import API: Manual export → automated processing pipeline - Alternative Custodian: Evaluate custodians with robust APIs - Reconciliation Reports: Enhanced manual process with validation

1.5.2 2. LACRM Rate Limiting

Problem: 100 requests/minute limit Solutions: - Intelligent Queuing: Batch requests and optimize query patterns - Incremental Sync: Only sync changed records since last update - Caching Strategy: Aggressive caching with smart invalidation - Webhook Integration: Real-time updates instead of polling

1.5.3 3. Google Workspace Quotas

Problem: API quotas for Gmail, Drive, Calendar Solutions: - Service Account Distribution: Multiple service accounts for load balancing - Batch Processing: Combine multiple operations in single requests - Selective Sync: Only sync relevant data based on business rules - Exponential Backoff: Robust retry mechanisms for quota exhaustion

1.5.4 4. Data Freshness vs Performance

Problem: Balance between real-time data and system performance Solutions: - Tiered Refresh Rates: Critical data (1hr), Important (4hr), Standard (daily) - Event-Driven Updates: Real-time updates only for material changes - Progressive Loading: Fast initial load with progressive detail enhancement - Background Processing: Heavy calculations during off-hours

1.6 Security & Compliance Integration

1.6.1 Data Privacy Controls

privacy_framework:
  pii_handling:
    - client_names: "hash_for_analytics_mask_for_reports"
    - ssn_tax_ids: "encrypted_storage_no_api_exposure"
    - account_numbers: "masked_display_full_access_restricted"

  audit_logging:
    - api_access: "who_when_what_from_where"
    - data_exports: "complete_audit_trail_required"
    - permission_changes: "approval_workflow_required"

1.6.2 SEC Compliance Integration

regulatory_requirements:
  books_and_records:
    - api_access_logs: "7_year_retention_searchable"
    - data_modifications: "complete_change_audit_trail"
    - report_generation: "pdf_export_permanent_archive"

  fiduciary_duty:
    - client_data_accuracy: "real_time_validation_required"
    - performance_reporting: "standardized_calculation_methods"
    - conflict_disclosure: "automated_conflict_detection"

1.7 Performance Requirements

1.7.1 Response Time Targets

performance_sla:
  executive_dashboard:
    target: "<2 seconds initial load"
    acceptable: "<5 seconds with cold cache"

  detailed_reports:
    target: "<10 seconds for complex analysis"
    acceptable: "<30 seconds for historical deep-dives"

  real_time_alerts:
    target: "<1 second for critical notifications"
    acceptable: "<5 seconds for standard alerts"

1.7.2 Scalability Planning

scaling_requirements:
  current_state:
    clients: 48
    aum: "~$50M (estimated)"
    data_volume: "9,941 files catalogued"

  growth_projections:
    year_1: "100 clients, $100M AUM"
    year_3: "300 clients, $300M AUM"
    year_5: "500 clients, $500M AUM"

  infrastructure_scaling:
    api_performance: "linear scaling with client count"
    data_storage: "exponential growth planning required"
    concurrent_users: "10-20 simultaneous dashboard users"

1.8 Implementation Priority Matrix

1.8.1 Phase 1: Foundation APIs (Weeks 1-4)

High Priority: - Authentication and authorization framework - Executive dashboard core endpoints - LACRM integration with proper rate limiting - Hooks DuckDB connection layer

1.8.2 Phase 2: Core Reporting (Weeks 5-8)

Medium Priority: - Regulatory compliance APIs - Client analytics endpoints - Google Workspace integration - Caching and performance optimization

1.8.3 Phase 3: Advanced Features (Weeks 9-12)

Lower Priority: - Strategic intelligence APIs - Advanced vendor integrations - Market data integration - Altruist workaround implementation

1.9 Questions for Strategic Research

1.9.1 Technical Architecture

  1. Observable JS Performance: How well does Observable JS handle large DuckDB result sets (10k+ rows)?
  2. Real-time Updates: Can Quarto dashboards refresh data without full page reload?
  3. Mobile Responsiveness: How do interactive dashboards perform on mobile devices?
  4. Export Capabilities: Can Observable JS charts be exported to PDF/PNG for compliance documentation?

1.9.2 Data Integration

  1. Altruist Alternatives: What custodians provide robust APIs for RIA firms?
  2. Screen Scraping Legality: Legal review of automated data extraction from Altruist platform
  3. Data Quality Thresholds: What accuracy levels are acceptable for different report types?
  4. Historical Data Retention: How much historical data should remain in active vs archive systems?

1.9.3 Regulatory Compliance

  1. SEC API Requirements: Are there specific SEC requirements for how advisers present data via APIs?
  2. Audit Trail Standards: What level of API access logging is required for regulatory examinations?
  3. Client Data Consent: Do clients need explicit consent for data aggregation and reporting?
  4. Third-Party Integration: How do SEC custody rules apply to API-based data access?

1.9.4 Performance & Scaling

  1. DuckDB Performance: At what data volume do DuckDB queries become too slow for real-time dashboards?
  2. Cloudflare Limitations: Are there Cloudflare Pages limitations for interactive dashboard complexity?
  3. Concurrent User Load: How many simultaneous dashboard users can the architecture support?
  4. Geographic Performance: How do dashboard load times vary for remote users?

1.9.5 User Experience

  1. Training Requirements: What training is needed for non-technical users to access interactive reports?
  2. Dashboard Maintenance: How much ongoing maintenance do Observable JS dashboards require?
  3. Print-Friendly Reports: How to ensure dashboards can produce print-friendly versions for meetings?
  4. Accessibility Compliance: How to ensure dashboards meet ADA accessibility requirements?

Next Steps: 1. Review API architecture with technical team 2. Prioritize integration challenges based on business impact 3. Begin Phase 1 implementation with authentication and core endpoints 4. Establish testing framework for API performance and reliability