Scaling Strategies¶

This guide covers best practices for maximizing HEC-RAS compute throughput using ras-commander's parallel execution capabilities.

Understanding HEC-RAS Scaling¶

Single Plan Performance¶

HEC-RAS 2D computations scale with CPU cores, but with diminishing returns:

Cores	Relative Speed	Efficiency
1	1.0x	100%
2	1.8x	90%
4	3.2x	80%
8	5.0x	62%
16	6.5x	41%
32	7.5x	23%

Actual performance varies by model size and complexity

Throughput vs. Latency¶

Two optimization targets:

Latency: Minimize time to complete a single plan (more cores per plan)
Throughput: Maximize plans completed per hour (more parallel plans)

For batch processing, throughput is usually more important.

Scaling Tiers¶

Tier 1: Single Machine Optimization¶

Goal: Maximize throughput on one workstation

Python

from ras_commander import RasCmdr, init_ras_project
import os

init_ras_project("/path/to/project", "6.5")

# Determine optimal configuration
total_cores = os.cpu_count()
cores_per_plan = 4  # Sweet spot for most 2D models
num_workers = total_cores // cores_per_plan

results = RasCmdr.compute_parallel(
    plan_number=plans,
    num_workers=num_workers,
    num_cores=cores_per_plan
)

Configuration Matrix (16-core machine):

Model Size	Workers	Cores/Plan	Rationale
Small (<10K cells)	8	2	I/O bound, parallelize
Medium (10-100K)	4	4	Balanced
Large (>100K)	2	8	CPU bound, more cores help

Tier 2: Multi-Machine Parallel¶

Goal: Scale beyond single machine limits

Python

from ras_commander.remote import init_ras_worker, compute_parallel_remote

# Create heterogeneous worker pool
workers = [
    # Local workstation (16 cores)
    init_ras_worker("local", ras_version="6.5", num_cores=8),
    init_ras_worker("local", ras_version="6.5", num_cores=8),

    # Remote workstation 1 (32 cores)
    init_ras_worker("psexec", host="ws1.local", ras_version="6.5",
                    num_cores=8, session_id=2, ...),
    init_ras_worker("psexec", host="ws1.local", ras_version="6.5",
                    num_cores=8, session_id=2, ...),
    init_ras_worker("psexec", host="ws1.local", ras_version="6.5",
                    num_cores=8, session_id=2, ...),
    init_ras_worker("psexec", host="ws1.local", ras_version="6.5",
                    num_cores=8, session_id=2, ...),

    # Remote workstation 2 (16 cores)
    init_ras_worker("psexec", host="ws2.local", ras_version="6.5",
                    num_cores=8, session_id=2, ...),
    init_ras_worker("psexec", host="ws2.local", ras_version="6.5",
                    num_cores=8, session_id=2, ...),
]

# 8 workers, 64 total cores utilized
results = compute_parallel_remote(plan_number=plans, workers=workers)

Tier 3: Hybrid Cloud¶

Goal: Burst capacity using cloud resources

Python

# Mix of on-premise and cloud workers
workers = [
    # On-premise (always available)
    init_ras_worker("local", ras_version="6.5", num_cores=8),
    init_ras_worker("psexec", host="office-ws1", ...),

    # Cloud burst (Docker on cloud VMs)
    init_ras_worker("docker", host="cloud-vm1.example.com",
                    image="hec-ras:6.5", num_cores=8, ...),
    init_ras_worker("docker", host="cloud-vm2.example.com",
                    image="hec-ras:6.5", num_cores=8, ...),
]

Optimization Techniques¶

1. Pre-process Geometry¶

Run geometry preprocessing once before parallel execution:

Python

from ras_commander import RasCmdr, init_ras_project

init_ras_project("/path/to/project", "6.5")

# Pre-process geometry (modifies original)
RasCmdr.compute_plan("01", clear_geompre=False)  # First run builds geom

# Parallel runs use cached geometry
results = RasCmdr.compute_parallel(
    plan_number=["02", "03", "04", "05"],
    num_workers=4,
    num_cores=4,
    clear_geompre=False  # Don't clear cached geometry
)

2. Use Fast Storage¶

Worker folder creation is I/O intensive. Storage hierarchy:

Storage Type	Copy Speed	Recommendation
NVMe SSD	Excellent	Use for worker folders
SATA SSD	Good	Acceptable
HDD	Poor	Avoid for parallel work
Network	Variable	Only for remote workers

Python

from pathlib import Path

# Use fast local storage for workers
fast_drive = Path("D:/temp/ras_workers")  # SSD/NVMe

results = RasCmdr.compute_parallel(
    plan_number=plans,
    dest_folder=fast_drive,
    num_workers=8
)

3. Right-size Worker Count¶

More workers isn't always better:

Python

import psutil

def optimal_worker_config(model_cells, total_cores=None):
    """Suggest optimal worker configuration."""
    if total_cores is None:
        total_cores = psutil.cpu_count(logical=False)

    # Reserve cores for system
    available_cores = max(1, total_cores - 2)

    if model_cells < 10000:
        cores_per_plan = 2
    elif model_cells < 100000:
        cores_per_plan = 4
    else:
        cores_per_plan = 8

    num_workers = available_cores // cores_per_plan

    return {
        "num_workers": max(1, num_workers),
        "num_cores": cores_per_plan,
        "total_utilized": num_workers * cores_per_plan
    }

# Usage
config = optimal_worker_config(model_cells=50000)
print(f"Recommended: {config['num_workers']} workers, "
      f"{config['num_cores']} cores each")

4. Batch Processing Strategy¶

For very large plan sets, process in batches:

Python

from ras_commander import RasCmdr

def process_in_batches(all_plans, batch_size=20, **kwargs):
    """Process plans in batches to manage memory/resources."""
    all_results = {}

    for i in range(0, len(all_plans), batch_size):
        batch = all_plans[i:i + batch_size]
        print(f"Processing batch {i//batch_size + 1}: plans {batch[0]}-{batch[-1]}")

        results = RasCmdr.compute_parallel(
            plan_number=batch,
            **kwargs
        )
        all_results.update(results)

        # Optional: cleanup between batches
        # cleanup_worker_folders(kwargs.get('dest_folder'))

    return all_results

# Process 100 plans in batches of 20
all_plans = [f"{i:02d}" for i in range(1, 101)]
results = process_in_batches(all_plans, batch_size=20, num_workers=4, num_cores=4)

5. Memory Management¶

Monitor memory usage to avoid swapping:

Python

import psutil

def check_memory_for_workers(num_workers, mem_per_plan_gb=4):
    """Check if system has enough memory for planned workers."""
    available_gb = psutil.virtual_memory().available / (1024**3)
    required_gb = num_workers * mem_per_plan_gb

    if required_gb > available_gb * 0.8:  # 80% threshold
        suggested = int(available_gb * 0.8 / mem_per_plan_gb)
        print(f"Warning: {num_workers} workers need ~{required_gb:.1f} GB")
        print(f"Available: {available_gb:.1f} GB")
        print(f"Suggested max workers: {suggested}")
        return False
    return True

# Check before running
if check_memory_for_workers(num_workers=8, mem_per_plan_gb=4):
    results = RasCmdr.compute_parallel(...)

Performance Benchmarking¶

Measuring Throughput¶

Python

import time
from ras_commander import RasCmdr

def benchmark_config(plans, num_workers, num_cores, **kwargs):
    """Benchmark a specific configuration."""
    start = time.time()

    results = RasCmdr.compute_parallel(
        plan_number=plans,
        num_workers=num_workers,
        num_cores=num_cores,
        **kwargs
    )

    elapsed = time.time() - start
    successful = sum(1 for v in results.values() if v)

    return {
        "num_workers": num_workers,
        "num_cores": num_cores,
        "total_plans": len(plans),
        "successful": successful,
        "elapsed_seconds": elapsed,
        "plans_per_hour": len(plans) / elapsed * 3600,
        "avg_seconds_per_plan": elapsed / len(plans)
    }

# Compare configurations
configs = [
    {"num_workers": 2, "num_cores": 8},
    {"num_workers": 4, "num_cores": 4},
    {"num_workers": 8, "num_cores": 2},
]

for config in configs:
    result = benchmark_config(plans[:8], **config)
    print(f"Workers={config['num_workers']}, Cores={config['num_cores']}: "
          f"{result['plans_per_hour']:.1f} plans/hour")

Common Bottlenecks¶

Symptom	Cause	Solution
Low CPU usage	Too few workers	Increase num_workers
High CPU but slow	Too many cores/plan	Decrease num_cores
Disk thrashing	HDD storage	Use SSD for workers
Memory pressure	Too many workers	Reduce num_workers
Network bottleneck	Remote workers	Check bandwidth

Summary Recommendations¶

Start conservative: Begin with fewer workers and scale up
Monitor resources: Watch CPU, memory, and disk I/O
Benchmark your model: Optimal config depends on model size
Use SSDs: Critical for worker folder creation
Pre-process geometry: Avoid redundant preprocessing
Consider throughput over latency: More parallel plans usually wins