We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

Earth Systems Modeling Operations Analyst

Science Systems and Applications, Inc.
75000.00 To 105000.00 (USD) Annually
United States, Maryland, Greenbelt
Jun 24, 2026

Science Systems and Applications, Inc (SSAI) is seeking an Operations Analyst to support the reliable and timely production of near real-time GEOS model products by monitoring operational workflows and executing approved workflow scripts within an HPC environment. This individual will ensure scheduled run cycles complete successfully and near real-time outputs are delivered according to operational timelines.

Key Responsibilities:

  • Operate during scheduled shifts and participate in on-call rotation to support near real-time product generation.
  • Monitor workflow execution for operational cycles (e.g., data staging/ingest steps, model runs, post-processing, output archive, and product distribution).
  • Execute approved workflow scripts and operational commands according to operational procedures.
  • Monitor job status and system health in scheduling tools like Slurm, PBS , and Cylc (job states, failures, retries, dependencies) and confirm expected workflow progress.
  • Perform routine operational checks:
    • Validate inputs/paths and confirm required inputs and dependencies exist
    • Inspect key logs for known error signatures
    • Run basic QC "sanity checks" on outputs as defined by operations procedures
  • Diagnose issues at the workflow level (missing inputs, scheduler issues, environment/module mismatches, missing/corrupt inputs, storage/permission problems) and initiate recovery actions per operational procedures.
  • Escalate to model/system specialists when problems exceed operator scope; provide actionable incident details (error logs, job IDs, timestamps, impacted cycles).
  • Maintain operational documentation:
    • Submit and update error tracking tickets.
    • Update web-based documentation of operational procedures.
  • Coordinate with upstream data providers on data outages, file modifications, or network issues.
  • Coordinate with downstream product users/teams to ensure timely near real-time delivery and communicate delays or expected recovery timelines.
  • Note: Operators do not modify or develop the model, but they are responsible for workflow execution, monitoring, and recovery within their authorized procedures.

Required Qualifications:

  • Bachelor's Degree (B.S.) and a minimum of 2 years related experience and/or training, or equivalent combination of education and experience.
  • Specifically, 1-3 years of Earth System Modeling operations experience in a production environment with scheduled near real-time workloads.
  • Hands-on Linux operations and troubleshooting in production:
    • Log review and diagnostics
    • Environment/module awareness
    • File system/storage space and permissions checks
    • Comfort using standard admin tools and CLIs for troubleshooting
  • Experience using a job scheduler (Slurm, PBS, Cylc, or equivalent) for monitoring and operational troubleshooting (job states, dependencies, reruns, resource/time failures).
  • Demonstrated experience supporting shift/on-call responsibilities and responding to time-critical incidents.
  • Basic knowledge of scripting/programming for operations:
    • bash/csh: workflow execution, wrapper scripts, log parsing, operational utilities
    • Python: simple tool development for status reporting, log parsing/QC automation, incident summaries
    • Perl: ability to maintain or extend existing operational scripts (at least to the level needed for troubleshooting and minor updates.
  • Workflow operations mindset with an ability to follow procedures precisely; in addition, practice safe recovery (e.g., when reruns are permitted, how to avoid data corruption or duplicate outputs)
  • Ability to inspect output presence, metadata, and perform basic sanity checks (e.g., netCDF/HDF5 familiarity at a practical level)
  • Strong attention to detail and ability to follow procedures under time pressure.
  • Clear communication during escalations (what failed, when, where, which logs/job IDs).
  • Team collaboration during cross-functional troubleshooting (operations science teams data providers and users).

Desired Qualifications:

  • Familiarity with numerical weather/climate operations concepts (cycles, near real-time product timing, typical failure modes).
  • Experience integrating lightweight monitoring/alerting (dashboards, alerts, automated status emails/messages).
  • Prior participation in incident management and structured post-incident review.
  • Running jobs in an HPC environment.
  • Sphinx: for operational document generation from rst (reStructuredText) files.

Note: The actual salary offered will be determined based on factors including experience, qualifications, tenure, skill set, availability of qualified candidates, geographic location, certifications, and other job-related criteria deemed relevant to the position.

EEO/AA Veterans and Individuals with Disabilities

Physical Requirements: While performing the duties of this job, the employee is regularly required to stand, walk, and use hands to touch, handle or feel objects, tools or controls. The employee frequently is required to talk and hear and occasionally required to reach with hands and arms and stoop, kneel, crouch, or crawl. Must regularly lift and/or move up to 10 pounds, and occasionally lift and/or move up to 25 pounds. Specific vision abilities required by this job include close vision, peripheral vision, depth perception and the ability to adjust focus.

Applied = 0

(web-77cf7d65c7-wz29x)