Title: Site Reliability Engineer
Agency: Texas Health and Human Services Commission
Solicitation: 529601671
Location: 4601 W. Guadalupe Street, Austin, TX 78701.
Duration: On-going, no ending date
Visa: Must be US Citizen, Green Card, EAD. No H1B
Telework Policy: The working position is Hybrid - On Site and Telework. Position will be 3 days remote with 2 days (Mondays and Thursdays) required to be onsite at the location listed above. Program will only accept LOCAL ONLY candidates for this position.
REQUIRED/PREFERRED SKILLS:
- 8 years, Required - experience in systems engineering, DevOps, or site reliability engineering roles
- 8 years, Required - Strong experience with Linux/Unix systems and system internals
- 8 years, Required - Proficiency in one or more programming/scripting languages (Python, Go, Java, Bash)
- 8 years, Required - Experience designing and operating highly available, distributed systems
- 8 years, Required - Strong knowledge of cloud platforms (AWS, or GCP) and cloud-native services
- 8 years, Required - Experience with containerization and orchestration (Docker, Kubernetes)
- 8 years, Required - Strong understanding of monitoring, alerting, and logging concepts
- 8 years, Required - Experience defining and managing SLIs, SLOs, and error budgets
- 8 years, Required - Familiarity with incident management, root cause analysis (RCA), and postmortems
- 8 years, Required - Experience integrating security and compliance into operational workflows
- 4 years, Preferred -Familiarity with observability tools (Prometheus, Grafana, Application Insights, Datadog, Splunk)
- 4 years, Preferred -Experience operating 24x7 production environments with on-call rotations
Site Reliability Engineer will be responsible for ensuring the reliability, availability, performance, and scalability of production systems by applying software engineering practices to infrastructure and operations. Partners with development teams to build resilient, observable, and automated platforms that meet defined service level objectives (SLOs).
- 8 or more years of experience, relies on experience and judgment to plan and accomplish goals, independently performs a variety of complicated tasks, a wide degree of creativity and latitude is expected.
- Understands business objectives and problems, identifies alternative solutions, performs studies and cost/benefit analysis of alternatives.
- Analyzes user requirements, procedures, and problems to automate processing or to improve existing computer system: Confers with personnel of organizational units involved to analyze current operational procedures, identify problems, and learn specific input and output requirements, such as forms of data input, how data is to be; summarized, and formats for reports.
- Writes detailed description of user needs, program functions, and steps required to develop or modify computer program.
- Reviews computer system capabilities, specifications, and scheduling limitations to determine if requested program or program change is possible within existing system.
Texas Health and Human Services Commission requires the services of 1 Systems Analyst 3, hereafter referred to as Candidate(s), who meets the general qualifications of Systems Analyst 3, Applications/Software Development and the specifications outlined in this document for the Texas Health and Human Services Commission.
All work products resulting from the project shall be considered "works made for hire" and are the property of the Texas Health and Human Services Commission and may include pre-selection requirements that potential Vendors (and their Candidates) submit to and satisfy criminal background checks as authorized by Texas law. Texas Health and Human Services Commission will pay no fees for interviews or discussions, which occur during the process of selecting a Candidate(s).