Google Drive Backup SOP

graph LR
  A[Drive Discovery] --> B[OAuth Authentication]
  B --> C[Shared Drive Enumeration]
  C --> D[rclone Configuration]
  D --> E[Parallel Drive Sync]
  E --> F[Hetzner S3 Storage]
  F --> G[Sync Verification]
  G --> H[Log Archival]

1 Purpose

Maintain comprehensive backup of Google Drive content including personal drive and all shared drives to ensure business continuity, disaster recovery, and regulatory compliance for client documents and operational files.

2 Triggers

  • Scheduled: Daily at 3 AM UTC via cron job for complete backup refresh
  • Manual: Operator execution for immediate backup or specific drive recovery
  • Event-driven: Pre-maintenance backup, disaster recovery procedures, compliance audits

3 Inputs

  • Google OAuth Token: Valid workspace authentication with drive.readonly scope
  • Shared Drive Configuration: Current list of accessible shared drives and permissions
  • Hetzner Credentials: S3-compatible storage access keys and bucket configuration
  • Backup Policy: Retention settings, drive inclusion/exclusion rules, sync parameters

4 Steps

4.1 1. Drive Discovery and Authentication

  • Token Validation: Verify Google OAuth token validity and refresh if necessary
  • Shared Drive Enumeration: Query Google Drive API for all accessible shared drives
  • Permission Verification: Confirm read access to each drive and identify access restrictions
  • Configuration Update: Refresh shared drive manifest if older than 24 hours

4.2 2. rclone Configuration Generation

  • Dynamic Config: Generate temporary rclone configuration with OAuth credentials
  • Drive Mapping: Create individual rclone remotes for personal drive and each shared drive
  • S3 Integration: Configure Hetzner Object Storage target with encryption and access controls
  • Performance Tuning: Set optimal transfer parameters (8 checkers, 4 transfers, rate limiting)

4.3 3. Backup Execution

  • Personal Drive: Sync primary Google Drive to raw/google-drive/my-drive/
  • Shared Drives: Parallel sync of each shared drive to dedicated subdirectories
  • Incremental Sync: rclone copy with checksum verification and modification time comparison
  • Error Handling: Automatic retry for transient failures, skip inaccessible files with logging

4.4 4. Data Validation

  • Transfer Verification: Confirm file count and size matching between source and destination
  • Checksum Validation: Verify file integrity using MD5 checksums where available
  • Access Logging: Record all files accessed during backup process for audit trail
  • Error Reporting: Document any failed transfers, permission issues, or data quality problems

4.5 5. Log Management

  • Execution Logs: Detailed rclone logs with timestamp, file counts, error conditions
  • Summary Generation: JSON summary with drive names, file counts, sync duration
  • Archive Storage: Compress and store logs with retention policy alignment
  • Performance Metrics: Transfer speeds, error rates, completion statistics

5 Exceptions

5.1 Authentication Failures

  • OAuth Expiry: Automatic token refresh using stored refresh token
  • Permission Changes: Alert for drives with reduced access, update drive manifest
  • API Quota: Implement exponential backoff and resume from checkpoint

5.2 Storage Issues

  • Hetzner Unavailable: Queue backup for retry, alert infrastructure team
  • Storage Full: Alert for capacity management, implement automatic cleanup
  • Network Failures: Resume from last successful checkpoint using rclone’s resume capability

5.3 Data Quality Issues

  • Corrupted Files: Skip and log corrupted files, flag for manual review
  • Large Files: Handle Google Drive file size limits and export format conversions
  • Access Restrictions: Document files that cannot be backed up due to permissions

5.4 Performance Issues

  • Slow Transfers: Adjust transfer parameters, check network connectivity
  • API Rate Limits: Implement smart throttling with burst handling
  • Concurrent Limits: Balance parallel transfers with API quota management

6 Owner Handoffs

  • Data Management → Infrastructure for storage capacity planning and network issues
  • Data Management → Compliance for retention policy enforcement and audit requirements
  • Data Management → Security for access control violations or encryption issues

7 SLAs

  • Daily Backup: Complete within 6 hours of 3 AM UTC start time
  • Manual Backup: Complete within 2 hours for emergency recovery scenarios
  • Error Resolution: Automatic retry within 30 minutes, manual escalation at 4 hours
  • Recovery Time: Restore critical files within 1 hour of request

8 Controls

  • Encryption at Rest: All backup data encrypted using AES-256 on Hetzner storage
  • Access Logging: Complete audit trail of all file access and transfer operations
  • Retention Policy: 90-day backup retention with automated cleanup of older backups
  • Permission Validation: Regular verification of backup system access rights

9 Audit Artifacts

  • Backup Logs: Complete rclone execution logs with file-level detail
  • Sync Summaries: JSON reports with drive statistics, transfer counts, error summaries
  • Performance Reports: Transfer speeds, completion times, error rates by drive
  • Compliance Records: Backup verification reports for regulatory documentation

10 Security Considerations

  • OAuth Scope Limitation: Read-only access to Drive content, no modification capabilities
  • Temporary Credentials: rclone configuration files deleted after execution
  • Network Security: Encrypted transfers using HTTPS and TLS throughout pipeline
  • Access Control: Backup storage accessible only to authorized operations personnel

11 Disaster Recovery

  • Full Restore: Complete workspace restoration from Hetzner backup within 24 hours
  • Selective Restore: Individual file or folder recovery within 2 hours
  • Cross-Region Backup: Consider secondary backup location for geographic redundancy
  • Restore Testing: Monthly verification of backup integrity and restore procedures

12 Storage Management

  • Capacity Planning: Monitor storage growth trends and plan capacity expansion
  • Cost Optimization: Archive older backups to cheaper storage tiers
  • Deduplication: Implement file deduplication to reduce storage requirements
  • Lifecycle Management: Automated transition between storage classes based on age

13 Monitoring & Alerts

  • Backup Success: Daily confirmation of successful backup completion
  • Error Conditions: Immediate alerts for authentication failures, storage issues
  • Performance Degradation: Alerts for backup duration exceeding SLA thresholds
  • Capacity Warnings: Proactive alerts when storage utilization approaches limits

14 FAQs

How are Google Workspace files (Docs, Sheets, Slides) handled? Google native formats are exported to Microsoft Office formats (DOCX, XLSX, PPTX) during backup to ensure portability and recoverability.

What happens if a shared drive is removed during backup? The backup continues with remaining drives. Removed drives are flagged in logs and their backup data is retained according to retention policy.

Can individual files be restored without full drive restoration? Yes, the backup structure preserves original file hierarchy enabling selective file or folder restoration using standard S3 tools.

How is backup storage cost managed? Storage costs are optimized through lifecycle policies, deduplication, and retention management. Older backups are moved to cheaper storage tiers automatically.