Meeting (to be) held on 2014/11/20 at 18:00 in Zepler CLS Lecture Room

Icinga Alerts

  • EAPOL-* checks
    • Should we change them so the checks are direct?
    • We probably need one check via JANET and checks for each server directly for ECS and SOTON
  • Node NFSEN checks
    • Temporarily disabled whilst we sort out problems with nfcapd
  • Node SYSLOG checks
    • Package upgrade should have fixed this issue. Will need to upgrade other nodes as they come online. We will be warned if more than one process is running on a node so we know there is a problem or it needs to be upgraded. Max connections with syslog server has been removed.
  • Node SSH-NODE-PASSWORD checks
    • Do we need the SSH check on nodes as well?
      • No, we should make SSH check passive and send a critical check report via NCSA when the SSH-NODE-PASSWORD check fails because it cannot to the node.
    • How frequent should be these checks?
    • Why does Dropbear (SSH) keep dying on Carnation Road node? Can we produce an upgrade to check the the status of Dropbear and (re)start it if necessary?
      • morse will patch SSH on #263 to see if we can figure out what is wrong.
      • If the problem turns out to be something we cannot fix we will add a hook to check and restart SSH when necessary.
    • Any ideas of how to avoid this failing as often.
    • BACKUP2/BACKUPTRANSFER also went critical yesterday

Building Zepler / SOWN@coordinates nodes

Todo List


  • None at present.
