SCCS Update

First, we’re in the process of fixing the issue. Fran, the primary ITS Unix admin (apparently there’s now an additional part-timer, and I was just confused) is helping us out; unfortunately circumstances prevented us from addressing the problem earlier.

A brief reconstructed timeline:

  1. All power to campus is lost sometime between 9:30 and 10:00 this morning, central time. This is Penn Electric’s fault.
  2. SCCS goes down because our battery backup is old and sucky.
  3. ITS continues to run on the Beardsley generator and UPSes.
  4. Some sort of surge on one of the lines into Beardsley shuts down the generator and all of the UPSes at around noon central time. This might be Penn Electric’s fault, this might be Facilities’ fault.
  5. All of the ITS machines shutdown. Hard.
  6. All of ITS runs around like so many headless chickens trying to diagnose the problem and resuscitate all of their machines
  7. Some time between 1:30 and 2:30 this afternoon central time, power to campus comes back.
  8. ITS’ machines come back.
  9. SCCS’ machines come back, but Roc can’t restart fully without human intervention. This is our fault.
  10. Fran, the ITS person who is allowed to look inside the SCCS’ password lockbox, agrees to help out after their fixage is completed.
  11. Currently, Fran is using printed instructions and over-the-phone coaching by Dan (who is actually still a sys-admin) to bring Roc back up.

Therefore, we expect service to be restored within the hour. Note that the above may be totally inaccurate, since I actually wasn’t there.

Big Rock Candy Mountain” from O Brother, Where Art Thou? by Harry McClintock

7 comments on “SCCS Update
  1. Nicolas Ward says:

    We’re back up! Yay! Thanks Fran!

  2. ccommack says:

    But mail folders are unwritable, making e-mail useless. Normal boot wrinkle, or time to dispatch a New Jerseyan sysadmin? (Also, why are you and your livejournal the points of contact for SCCS when it goes down? Aren’t you 5prun}{0r? And shouldn’t SCCS have an off-site status page, a la status.livejournal.com?)

  3. Nicolas Ward says:

    The main reason is that members of SWIL are generally our most active users, and members of SWIL also seem to check their LJ friends page obsessively. As the most reason SCCS SWILlie, I’m the official liaison, or something.

    I think CS has some space set up for us, but we haven’t set it up yet. And, in this case, that wouldn’t have done us any good.

  4. irilyth says:

    SWARPAnet could probably be persuaded to host status.sccs.swarthmore.edu, or sccs.swarpa.net, or some other name with a site that y’all could update with status information during an outage.

  5. ricerurouni says:

    Well, speaking as someone who was there, the Science Center only lost power for about an hour, from approx. 10:50-11:50. I don’t know whether that was indicative of power to the rest of campus returning or not.

  6. Nicolas Ward says:

    I have no explanation for how “recent” become “reason”.

  7. foxfour says:

    CS does provide us with relevant space, but it’s way out of date – status hasn’t been updated in a long time, says he-who-hasn’t-checked-recently.

    but yes, it wouldn’t have done much good in this case.

    boy am i glad i was asleep through all this. though in the future, i will be a close-living sysadmin, living an hour and a half from swat. hm.

Nurd Up!