Questions and Answers (06/20/2006)
Q1: The NCCS did not end up having emergency downtime on Tuesday, was there a reason there was no email sent to the user community?
Answer: This was oversight on the part of the NCCS, and we will try to make sure that users are well informed both when downtimes occur and when they are cancelled.
Q2: When will general access be available to users on the new linux cluster?
Answer: The system is scheduled to physically arrive on the center by the week ending June 30th. The system will require a series of integration tests and hardware certifications, before general access will be made available. The NCCS anticipates that general access will be around mid to late July.
Q3: The I/O on palm seems very slow recently, is this a known issue?
Answer: The systems administrators are aware of the issue and are currently investigating a solution. The NCCS will communicate this solution with the user community as soon as possible. There was also a heavy I/O load on the system during the weekend of June 17-18, due to an individual user code which did affect several other users.
Q4: What is the NCCS's opinion on the overall stability of palm?
Answer: Since the OS upgrades performed early this spring, the NCCS is pleased with the increased stability seen on the systems. Unfortunately, it seems that there have been a large number of hardware failures in the last three weeks, both on the Altix system and its peripheral disk systems. Due to the nature of the system architecture, unfortunately this means total system downtime.
Q5: Does the NCCS have any plans to run additional diagnostics and schedule preventative maintenance of the hardware to address some of these downtime issues?
Answer:
The NCCS is looking at a series of diagnostic tools provided by the vendor, however has some initial concern about the impact of running those tools on the systems during normal operational periods. In addition, the NCCS would like to begin a series of preventative maintenance downtimes on select portions of the system at a time, but has to be extremely sensitive to impact this would have on user workload.
|