职位描述
Position Summary:
The IT Infrastructure Production Support Engineer provides advanced technical support and troubleshooting for Asia region enterprise infrastructure with deep expertise in virtualization, data storage, and networking technologies. This critical escalation role requires strong diagnostic skills to rapidly identify issues across multiple technology domains and coordinate with specialized teams to ensure swift resolution and minimal business impact.
Key Responsibilities:
Production Support & Incident Management:
Serve as primary escalation point for critical production incidents affecting virtualization, Windows/Linux OS, storage infrastructure, and enterprise networking
Perform rapid root cause analysis across infrastructure layers to identify and isolate issues
Coordinate incident response and engage specialized teams (network, security, compute, application) based on technical assessment
Monitor infrastructure health using tools (SolarWinds, LiveNX, Nagios) and proactively identify potential issues
Maintain incident documentation and contribute to post-incident reviews
Technical Troubleshooting & Problem Resolution:
Troubleshoot complex issues spanning operating systems, storage arrays, backup solutions, and cloud platforms
Diagnose and resolve performance issues related to compute, storage, and network infrastructure
Perform break-fix activities and system performance tuning
Identify network-related issues and coordinate with network engineering teams for resolution
Execute disaster recovery procedures and business continuity plans when required
Cross-Functional Collaboration:
Partner with Enterprise Infrastructure Compute, Security, Network, and Application teams
Effectively communicate technical issues to both technical and non-technical stakeholders
Identify patterns in incidents and work with engineering teams to implement permanent solutions
Collaborate with project teams during infrastructure changes to ensure smooth transitions to production
Documentation & Knowledge Management:
Create and maintain comprehensive system documentation including troubleshooting procedures and runbooks
Document incident resolution steps and contribute to knowledge base
Develop automation scripts to streamline support activities
Basic Qualifications/Professional Skills:
B.S. degree in computer science, information technology, computer related discipline or 5-7+ years IT work experience in a multi-site global infrastructure environment
Progressive advancement demonstrated proven troubleshooting and problem-solving abilities
Fluent in English; Mandarin proficiency preferred
Strong communication, collaboration, and interpersonal skills
Self-motivated with keen attention to detail and excellent judgment under pressure
Ability to manage multiple concurrent incidents in high-pressure situations
Team player with customer-focused mindset
Technical Skills/Experience:
Virtualization and OS Systems (Strong/Required):
Proven experience with VMware in large-scale virtualized environments
Experience with virtual machine troubleshooting and performance optimization
Strong troubleshooting skills for Windows/Linux operating system issues
Deep understanding with Red Hat and other Linux versions (CentOS, RHEL, Oracle Linux, SUSE Linux)
Experience with Red Hat Satellite and automation solutions such as Ansible or Puppet
Proficiency in scripting languages including Shell, Ruby, and Perl for automation
Storage & Backup (Strong/Required):
5+ years of experience with enterprise storage and backup solutions
Experience with multiple storage platforms including Dell/EMC, NetApp, and Pure
Knowledge of image-level backups, array-based replication, and hypervisor-based replication
Experience with storage configuration, volume management (LVM, MPIO, EMC PowerPath)
Familiarity with SAN, NAS operations and monitoring tools
Understanding of data lifecycle management and tiering strategies
Network Knowledge (Working Knowledge/Required):
Strong understanding of network topology concepts and technologies
Ability to identify network-related issues and determine appropriate escalation path
Knowledge of core LAN/WAN network technologies
Familiarity with Cisco networking technologies and basic troubleshooting
Understanding of network security concepts and protocols
Ability to work with network teams to diagnose connectivity and performance issues
Knowledge of load balancers and network accelerators
Additional Technical Skills:
Strong understanding of network and server security
Experience with converged hardware platforms including DELL, HPE and Cisco
Experience with system monitoring tools and techniques