System Administrator (Mountain View, CA) in Mountain View, CA at MDI Group

Date Posted: 10/5/2018

Job Snapshot

Job Description

MDI Group is a premier IT workforce solutions provider with more than 25 years of expertise in finding “best fit” IT talent for mid-sized to Fortune 500 clients. We have established relationships with our clients and work directly with the hiring managers. Why MDI Group?...You can expect that we will learn what is most important to you in your job search and match that to the needs of our clients. We offer career coaching and resume services, skills certifications, interview preparation skills, health benefits and a 401K plan. We are currently interviewing for the following Direct hire position:

System Administrator

Direct Hire

Mountain View, CA

Role Purpose:

We are looking for a well-rounded and motivated individual to support and administer our GPU computing cluster.  The cluster is primarily used for high performance deep learning applications.  A variety of roles will be involved, including hardware maintenance, and troubleshooting, software administration, and providing support to the researchers that utilize the cluster.  This work is critical to continue innovating in the deep learning domain.

The position should be balanced between maintaining GPU cluster and maintaining other SW development tools such as GIT, JIRA, Confluence, etc. The candidate is responsible for maintaining/controlling GPU cluster, and also, maintaining and troubleshooting technical issues related to open source/collaborative SW development tools (GIT, JIRA, Confluence, etc.)

Major Responsibilities:

  • Maintaining existing hardware, tracking down and fixing issues
  • Utilizing Bright Cluster Management software to monitor and administer nodes on the cluster
  • Working with various teams, to identify their needs and help them get the most out of our resources
  • Investigating software and hardware solutions, to determine how best to grow and expand our existing infrastructure
  • Supporting researchers with setting up software, and running experiments
  • Writing scripts to automate a variety of tasks
  • Manage SDK, API, and Plug-In repositories
  • Conduct user management activities
  • Perform system and application maintenance
  • Conduct system and application performance tuning
  • Triage crashes and user issues & perform root cause analysis
  • Perform workflow customizations
  • Perform data migrations (project import/export)
  • Conduct application training to end-users

Background, Experience & Qualifications:

  • BS in Computer Science, Electrical Engineering or related technical field required
  • 2 - 3 years’ experience in system admin or SW development
  • Experience working with GIT, Confluence, JIRA and other development tools
  • Ability to multi-task in a fast-pace global business environment
  • Must have excellent written and verbal communication skills
  • Postsecondary degree in a relevant field
  • 5 to 10 years’ experience administering and deploying Linux servers
  • Extensive understanding of high performance computing applications, hardware, networking and operating systems
  • Programming experience in languages like Bash and Python, mostly related to scripting
  • Good interpersonal skills, ability to seamlessly communicate and interact with others


  • Experience with “Infiniband”
  • GPU Admin Experience is highly desired


Search IT Jobs