Green Ganglia
From Local
- Studied architecture of Ganglia cluster monitoring framework
- gmond is the daemon that runs on each node of the cluster which monitors the state of each node.
- Each gmond multicasts the state of the cluster that it sees
- All other gmond instances listen to these updates and maintain an up-to-date state of the cluster
- gmetad is an external database which connects to any of the gmond's (and fails over to the others) of each cluster to maintain the state of all clusters and potentially show it to the user through a web interface
- gexec is a cluster execution tool, which listens to the state of the cluster and sorts the list of nodes based on it's utilization and executes the job on the least loaded nodes
- Nodes have 2 states - Alive and Dead
- Proposed changes
- Add a new state - Alive, Dead and Sleeping
- Put machines which are not executing any gexec jobs to sleep. Before sleeping announce that the node is going to sleep and make sure any of the other gmonds have noticed the announcement.
- gexec will execute jobs on the same nodes until the utilization is high, and only then will it choose a different node (when the user provides an option to save power)
- gexec will also consider various other scheduling parameters such as:
- Deadlines for job finishing
- Priority of jobs
- Energy allowance
- Floor plan of the cluster (possibly turning of HVAC when none of the nodes in a cluster are active)
- ...