Requisition ID# 169815 Job Category: Information Technology; Accounting / Finance; Engineering / Science Job Level: Individual Contributor Business Unit: Electric Engineering Work Type: Hybrid Job Location: Oakland Department Overview Electric T&D Engineering is responsible for the electric system engineering and planning, asset strategy, and risk management across transmission, distribution, and substation asset families. This centralized, risk-informed approach allows PG&E to manage electric risk, asset and system health, interconnections, and performance by using consistent standards, work methods, prioritization, and program sponsorship, while leveraging lessons learned from inspections and asset data to inform asset management decisions. Accountable for asset planning and strategy, standards and work methods, and asset data management for Electric. Position Summary PG&E is seeking an experienced data science professional to join the Electric Compliance Assurance, Analysis & Intake (A&I) Team as a Data Scientist, Expert. This role will focus on developing advanced machine learning (ML) and predictive modeling solutions to improve electric and safety incident classification, reduce CPUC late reporting violations, predict compliance issues, and accelerate root cause evaluations.
The position will design and implement models for automated incident classification, predictive violation risk, root-cause clustering, and compliance monitoring. A strong candidate will be highly analytical, innovative, and collaborative, with expertise in data science techniques and a passion for applying AI/ML to real-world compliance challenges. Headquarter location is PG&E Oakland General Office and may require occasional travel across PG&E's service territory. PG&E is providing the salary range that the company in good faith believes it might pay for this position at the time of the job posting. This compensation range is specific to the locality of the job. The actual salary paid to an individual will be based on multiple factors, including, but not limited to, specific skills, education, licenses or certifications, experience, market value, geographic location, and internal equity. Although we estimate the successful candidate hired into this role will be placed towards the middle or entry point of the range, the decision will be made on a case-by-case basis related to these factors. A reasonable salary range is:
- Minimum Base Salary (Bay Area) $140,000.00
- Mid Base Salary (Bay Area) $189,000.00
- Maximum Base Salary (Bay Area) $238,000.00
Responsibilities
- Automated Incident Classification: Develop NLP-based and supervised learning models (e.g., logistic regression, Naive Bayes) to classify electric incidents as reportable vs. non-reportable; implement multiclass classification to identify safety incident type, severity, and other attributes.
- Predictive Violation Risk: Build predictive models to assess risk of CPUC Notice of Violation (NOV) for new incidents; analyze historical violations and identify patterns by asset type, region, and reporting type; apply multi-label classification to assign (CPUC General Order) GO rules to incidents.
- Root-Cause Clustering & Archetypes: Use unsupervised learning to discover recurring incident archetypes that inform preventative programs and corrective actions; generate cluster profiles, stability scores, and mappings from clusters to corrective actions.
- Cause Evaluation Modeling: Apply decision trees, random forests, and gradient boosting to recommend evidence-based root causes.
- Compliance Drift & Model Monitoring: Establish continuous monitoring for deployed models to detect input data drift, label shift, performance decay, and regulatory changes.
- Technical Development & Collaboration: Extract, transform, and load data from multiple PG&E systems for feature engineering; write modular, reusable Python code for data science workflows; partner with sponsor departments and subject matter experts to ensure models align with business needs; present findings and recommendations to senior leadership and act as peer reviewer for complex models.
- Active participation in the external data science/AI/ML community of practice (e.g., volunteering in professional organizations, conference presentations, publications, or similar activities).
- Competency with data science standards and processes (model evaluation, optimization, feature engineering) and best practices for implementation.
- Knowledge of industry trends and current issues in data science as demonstrated through peer-reviewed publications, conference presentations, or open-source contributions.
- Proficiency with commonly used data science and/or operations research programming languages, packages, and tools for building ML models and algorithms.
- Ability to explain technical concepts in breadth and depth, including statistical inference, ML algorithms, software engineering, and model deployment pipelines.
- Mastery in clearly communicating complex technical details and insights to colleagues and stakeholders.
- Strong foundation in mathematical and statistical principles underpinning data science.
- Ability to develop, coach, teach, and mentor others to meet both career and organizational goals.
Qualifications Minimum
- Bachelor's Degree in Data Science, Machine Learning, Computer Science, Physics, Econometrics or Economics, Engineering, Mathematics, Applied Sciences, Statistics, or equivalent field.
- 6 years in Data Science (or no experience, if candidate possess a Doctoral Degree or higher)
Desired
- Doctoral Degree in Data Science, Machine Learning, Computer Science, Physics, Econometrics or Economics, Engineering, Mathematics, Applied Sciences, Statistics, or equivalent field.
- Relevant industry (electric or gas utility, renewable energy, analytics consulting, etc.) experience.
- Proficiency in Python for data science (pandas, scikit-learn, NLP libraries).
- Experience with supervised and unsupervised learning, classification, clustering, and predictive modeling.
- Strong understanding of feature engineering and model evaluation.
- Ability to work with large, complex datasets from multiple sources.
- Skilled in presenting technical findings to non-technical audiences.
|