Automated Spell Check & Content Validation

1 Purpose

Implement automated spell checking and content validation workflows to maintain professional content standards across all public-facing materials while supporting financial and investment terminology.

2 Spell Check Implementation

2.1 CSpell Configuration Files

2.1.1 Primary Configuration

// .cspell.json
{
  "version": "0.2",
  "language": "en",
  "languageSettings": [
    {
      "languageId": "markdown",
      "allowCompoundWords": true,
      "dictionaries": [
        "en_US",
        "financial-terms",
        "investment-vocabulary",
        "company-names",
        "proper-nouns"
      ]
    }
  ],
  "files": [
    "**/*.{md,qmd,html,txt}",
    "content/**/*.{md,qmd}",
    "blog/**/*.{md,qmd}",
    "strategies/**/*.{md,qmd}"
  ],
  "ignorePaths": [
    "node_modules/**",
    ".git/**",
    "dist/**",
    "build/**",
    "*.min.js",
    "*.bundle.*"
  ],
  "words": [
    "ECIC",
    "ethicic",
    "fiduciary",
    "stewardship",
    "ESG",
    "SRI",
    "sustainability",
    "diversification",
    "rebalancing",
    "drawdown",
    "Sharpe",
    "sortino",
    "calmar",
    "volatility",
    "correlation",
    "capitalization",
    "cryptocurrency",
    "blockchain",
    "DeFi",
    "robo",
    "advisor",
    "fintech"
  ],
  "ignoreWords": [
    "githubusercontent",
    "kinsta",
    "ubicloud",
    "deepinfra",
    "localhost",
    "webhook",
    "API",
    "JSON",
    "YAML",
    "OAuth",
    "PostgreSQL",
    "DuckDB"
  ],
  "dictionaries": [
    "en_US",
    "financial-terms",
    "investment-vocabulary"
  ]
}

2.1.2 Financial Terms Dictionary

# dictionaries/financial-terms.txt
401k
403b
529
IRA
Roth
SEP
SIMPLE
fiduciary
stewardship
rebalancing
diversification
capitalization
alpha
beta
sharpe
sortino
calmar
drawdown
volatility
correlation
covariance
attribution
optimization
backtesting
benchmarking
outperformance
underperformance
risk-adjusted
downside
upside
maximum
drawdown
recovery
rolling
annualized
compounded
geometric
arithmetic
standard
deviation
skewness
kurtosis
value-at-risk
conditional
tail
risk

2.1.3 Investment Vocabulary Dictionary

# dictionaries/investment-vocabulary.txt
equities
fixed-income
alternatives
commodities
REITs
MLPs
ETFs
ETPs
mutual
funds
index
active
passive
factor
smart-beta
momentum
value
growth
quality
profitability
investment
low-volatility
minimum
variance
equal
weight
market
cap
small
mid
large
mega
micro
developed
emerging
frontier
markets
domestic
international
global
regional
sector
industry
thematic
dividend
yield
growth
aristocrats
kings
champions
sustainable
responsible
impact
ESG
environmental
social
governance
exclusionary
screening
positive
negative
best-in-class
integration
shareholder
engagement
proxy
voting

2.2 GitHub Actions Workflow

2.2.1 Comprehensive Spell Check Action

# .github/workflows/spell-check.yml
name: Spell Check and Content Validation

on:
  push:
    branches: [main, develop, staging]
    paths: ['content/**', 'blog/**', 'strategies/**', '**/*.md', '**/*.qmd']
  pull_request:
    branches: [main]
    paths: ['content/**', 'blog/**', 'strategies/**', '**/*.md', '**/*.qmd']
  schedule:
    - cron: '0 6 * * *'  # Daily at 6 AM UTC

jobs:
  spell-check:
    name: Spell Check Content
    runs-on: ubuntu-latest

    steps:
      - name: Checkout Repository
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '18'
          cache: 'npm'

      - name: Install Dependencies
        run: |
          npm install -g cspell@latest
          npm install --save-dev @cspell/cspell-bundled-dicts

      - name: Validate CSpell Configuration
        run: |
          echo "šŸ” Validating CSpell configuration..."
          cspell --config .cspell.json --validate-directives
          echo "āœ… Configuration is valid"

      - name: Run Spell Check on All Content
        id: spell-check
        run: |
          echo "šŸ”¤ Running spell check on content files..."

          # Create results directory
          mkdir -p spell-check-results

          # Run spell check and capture results
          if cspell --config .cspell.json "content/**/*.{md,qmd}" \
             --reporter @cspell/reporter-json \
             > spell-check-results/results.json 2>&1; then
            echo "āœ… No spelling errors found"
            echo "spell_errors=false" >> $GITHUB_OUTPUT
          else
            echo "āŒ Spelling errors detected"
            echo "spell_errors=true" >> $GITHUB_OUTPUT

            # Generate human-readable report
            cspell --config .cspell.json "content/**/*.{md,qmd}" \
              --reporter default > spell-check-results/report.txt 2>&1 || true
          fi

      - name: Generate Spell Check Summary
        if: steps.spell-check.outputs.spell_errors == 'true'
        run: |
          echo "šŸ“ Generating spell check summary..."

          # Extract unique misspelled words
          cat spell-check-results/report.txt | \
            grep -o 'Unknown word.*' | \
            sed 's/Unknown word (\([^)]*\)).*/\1/' | \
            sort -u > spell-check-results/misspelled-words.txt

          echo "šŸ”¤ Unique misspelled words:"
          cat spell-check-results/misspelled-words.txt

          # Create GitHub issue comment body
          echo "## Spell Check Results" > spell-check-results/comment.md
          echo "" >> spell-check-results/comment.md
          echo "āŒ **Spelling errors detected in the following files:**" >> spell-check-results/comment.md
          echo "" >> spell-check-results/comment.md
          echo '```' >> spell-check-results/comment.md
          cat spell-check-results/report.txt >> spell-check-results/comment.md
          echo '```' >> spell-check-results/comment.md

      - name: Upload Spell Check Results
        if: steps.spell-check.outputs.spell_errors == 'true'
        uses: actions/upload-artifact@v4
        with:
          name: spell-check-results
          path: spell-check-results/
          retention-days: 7

      - name: Comment on Pull Request
        if: github.event_name == 'pull_request' && steps.spell-check.outputs.spell_errors == 'true'
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const comment = fs.readFileSync('spell-check-results/comment.md', 'utf8');

            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: comment
            });

      - name: Fail on Spelling Errors
        if: steps.spell-check.outputs.spell_errors == 'true' && github.ref == 'refs/heads/main'
        run: |
          echo "āŒ Spelling errors found in main branch content"
          echo "Please fix spelling errors before merging to main"
          exit 1

  content-validation:
    name: Content Structure Validation
    runs-on: ubuntu-latest
    needs: spell-check

    steps:
      - name: Checkout Repository
        uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install Python Dependencies
        run: |
          pip install pyyaml python-frontmatter textstat

      - name: Validate Content Structure
        run: |
          python - << 'EOF'
          import os
          import frontmatter
          import yaml
          from pathlib import Path
          import textstat

          def validate_frontmatter(file_path):
              """Validate required front matter fields"""
              try:
                  with open(file_path, 'r', encoding='utf-8') as f:
                      post = frontmatter.load(f)

                  required_fields = ['title']
                  recommended_fields = ['description', 'date', 'author']

                  errors = []
                  warnings = []

                  # Check required fields
                  for field in required_fields:
                      if field not in post.metadata:
                          errors.append(f"Missing required field: {field}")

                  # Check recommended fields
                  for field in recommended_fields:
                      if field not in post.metadata:
                          warnings.append(f"Missing recommended field: {field}")

                  # Validate title length
                  if 'title' in post.metadata:
                      title_len = len(post.metadata['title'])
                      if title_len > 60:
                          warnings.append(f"Title too long: {title_len} chars (recommended: <60)")

                  # Validate description length
                  if 'description' in post.metadata:
                      desc_len = len(post.metadata['description'])
                      if desc_len > 160:
                          warnings.append(f"Description too long: {desc_len} chars (recommended: <160)")

                  # Check readability
                  content = post.content
                  if content.strip():
                      reading_level = textstat.flesch_kincaid().flesch_reading_ease(content)
                      if reading_level < 60:  # Below "Standard" level
                          warnings.append(f"Content may be too complex (Flesch score: {reading_level:.1f})")

                  return errors, warnings

              except Exception as e:
                  return [f"Error parsing file: {str(e)}"], []

          # Process all content files
          content_files = []
          for pattern in ['content/**/*.md', 'content/**/*.qmd', 'blog/**/*.md', 'strategies/**/*.md']:
              content_files.extend(Path('.').glob(pattern))

          total_errors = 0
          total_warnings = 0

          print("šŸ“‹ Content Structure Validation Report")
          print("=" * 50)

          for file_path in content_files:
              errors, warnings = validate_frontmatter(file_path)

              if errors or warnings:
                  print(f"\nšŸ“„ {file_path}")

                  for error in errors:
                      print(f"  āŒ {error}")
                      total_errors += 1

                  for warning in warnings:
                      print(f"  āš ļø {warning}")
                      total_warnings += 1

          print(f"\nšŸ“Š Summary:")
          print(f"  Files processed: {len(content_files)}")
          print(f"  Total errors: {total_errors}")
          print(f"  Total warnings: {total_warnings}")

          if total_errors > 0:
              print(f"\nāŒ Content validation failed with {total_errors} errors")
              exit(1)
          else:
              print(f"\nāœ… Content validation passed")
          EOF

2.3 Pre-commit Hook Integration

2.3.1 Pre-commit Configuration

# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: cspell
        name: Spell Check
        entry: cspell
        language: node
        files: \.(md|qmd|txt)$
        args: ['--config', '.cspell.json']
        additional_dependencies: ['cspell@latest']

      - id: content-validation
        name: Content Structure Validation
        entry: python
        language: python
        files: \.(md|qmd)$
        args: ['-c', 'import frontmatter; import sys; [frontmatter.load(open(f)) for f in sys.argv[1:]]']
        additional_dependencies: ['python-frontmatter']

2.4 Local Development Tools

2.4.1 Spell Check Script

#!/bin/bash
# scripts/spell-check.sh

set -e

echo "šŸ”¤ Running spell check on content..."

# Check if cspell is installed
if ! command -v cspell &> /dev/null; then
    echo "Installing cspell..."
    npm install -g cspell@latest
fi

# Create results directory
mkdir -p .spell-check-results

# Run spell check
if cspell --config .cspell.json "content/**/*.{md,qmd}" --reporter default; then
    echo "āœ… No spelling errors found"
    rm -rf .spell-check-results
    exit 0
else
    echo "āŒ Spelling errors detected"

    # Generate suggestions for misspelled words
    cspell --config .cspell.json "content/**/*.{md,qmd}" \
        --reporter @cspell/reporter-json > .spell-check-results/errors.json 2>&1 || true

    echo ""
    echo "šŸ’” To add words to dictionary:"
    echo "echo 'newword' >> dictionaries/financial-terms.txt"
    echo ""
    echo "šŸ’” To ignore a word in this file only:"
    echo "Add '<!-- cspell:ignore word -->' comment"
    echo ""
    echo "šŸ’” To disable spell check for a section:"
    echo "<!-- cspell:disable -->"
    echo "content with technical terms"
    echo "<!-- cspell:enable -->"

    exit 1
fi

2.4.2 Dictionary Management Script

#!/bin/bash
# scripts/manage-dictionary.sh

DICT_FILE="dictionaries/financial-terms.txt"

case "$1" in
    "add")
        if [ -z "$2" ]; then
            echo "Usage: $0 add <word>"
            exit 1
        fi
        echo "$2" >> "$DICT_FILE"
        sort -u "$DICT_FILE" -o "$DICT_FILE"
        echo "āœ… Added '$2' to financial terms dictionary"
        ;;

    "remove")
        if [ -z "$2" ]; then
            echo "Usage: $0 remove <word>"
            exit 1
        fi
        sed -i '' "/^$2$/d" "$DICT_FILE"
        echo "āœ… Removed '$2' from financial terms dictionary"
        ;;

    "list")
        echo "šŸ“š Financial terms dictionary contents:"
        cat "$DICT_FILE" | sort
        ;;

    "check")
        if [ -z "$2" ]; then
            echo "Usage: $0 check <word>"
            exit 1
        fi
        if grep -q "^$2$" "$DICT_FILE"; then
            echo "āœ… '$2' is in the dictionary"
        else
            echo "āŒ '$2' is not in the dictionary"
        fi
        ;;

    *)
        echo "Usage: $0 {add|remove|list|check} [word]"
        exit 1
        ;;
esac

2.5 Content Quality Metrics

2.5.1 Quality Dashboard Script

# scripts/content-quality-dashboard.py
import json
import subprocess
import os
from pathlib import Path
from datetime import datetime

def run_spell_check():
    """Run spell check and return results"""
    try:
        result = subprocess.run([
            'cspell', '--config', '.cspell.json',
            'content/**/*.md', 'content/**/*.qmd',
            '--reporter', '@cspell/reporter-json'
        ], capture_output=True, text=True)

        if result.returncode == 0:
            return {"error_count": 0, "errors": []}
        else:
            # Parse JSON output for error details
            errors = []
            for line in result.stdout.split('\n'):
                if line.strip():
                    try:
                        error_data = json.loads(line)
                        errors.extend(error_data.get('issues', []))
                    except json.JSONDecodeError:
                        pass

            return {"error_count": len(errors), "errors": errors}

    except subprocess.CalledProcessError as e:
        return {"error_count": -1, "errors": [f"Spell check failed: {e}"]}

def count_content_files():
    """Count total content files"""
    patterns = ['content/**/*.md', 'content/**/*.qmd']
    files = []
    for pattern in patterns:
        files.extend(Path('.').glob(pattern))
    return len(files)

def generate_quality_report():
    """Generate comprehensive quality report"""
    print("šŸ“Š Content Quality Dashboard")
    print("=" * 40)
    print(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    print()

    # File count
    file_count = count_content_files()
    print(f"šŸ“„ Total content files: {file_count}")

    # Spell check results
    spell_results = run_spell_check()
    if spell_results["error_count"] == 0:
        print("āœ… Spell check: PASSED (0 errors)")
    elif spell_results["error_count"] > 0:
        print(f"āŒ Spell check: FAILED ({spell_results['error_count']} errors)")

        # Show top misspelled words
        words = {}
        for error in spell_results["errors"]:
            word = error.get("text", "")
            if word:
                words[word] = words.get(word, 0) + 1

        if words:
            print("   Top misspelled words:")
            for word, count in sorted(words.items(), key=lambda x: x[1], reverse=True)[:10]:
                print(f"     - {word} ({count} times)")
    else:
        print("āŒ Spell check: ERROR")

    print()
    print("šŸ’” To fix spelling errors:")
    print("   - Add words to dictionaries/financial-terms.txt")
    print("   - Use inline comments: <!-- cspell:ignore word -->")
    print("   - Run: ./scripts/spell-check.sh")

if __name__ == "__main__":
    generate_quality_report()

This comprehensive spell check and content validation system ensures high-quality, professional content standards while supporting the specialized vocabulary of financial services and investment management.