This post is entirely to help the next person who has a similar issue monitoring riak with sensu using Basho’s https://github.com/basho/riak_nagios
The error message
UNKNOWN: Couldn't find unused nodename, too many concurrent checks.
This error message is entirely unhelpful, and led down the garden path of attempting to change the connection name for erlang which was ultimately futile.
Troubleshooting
As ubuntu user:
$ /usr/lib/riak/erts-5.9.1/bin/escript /usr/local/sbin/check_node --node riak@`hostname -f` --name sensu@`hostname -f` --cookie riak node_up
OKAY: riak@ip-XX-XX-XX-XX.ec2.internal is responding to pings
As sensu user:
$ sudo -u sensu /usr/lib/riak/erts-5.9.1/bin/escript /usr/local/sbin/check_node --node riak@`hostname -f` --name sensu@`hostname -f` --cookie riak node_up
UNKNOWN: Couldn't find unused nodename, too many concurrent checks.
Patched check_node.erl to expose the error with the help of @hq1aerosol
diff --git a/src/check_node.erl b/src/check_node.erl
index aeff65e..3905e5b 100644
--- a/src/check_node.erl
+++ b/src/check_node.erl
@@ -68,10 +68,10 @@ retry_connect(Name0, Number, Node, Cookie) ->
end;
{error, Reason} ->
case Reason of
- {shutdown, _} ->
+ {shutdown, Foo} ->
case Number < 250 of
true -> retry_connect(Name0, Number + 1, Node, Cookie);
- false -> {unknown, "Couldn't find unused nodename, too many concurrent checks.", []}
+ false -> {unknown, "Foo ~p", [Foo]}
end;
_ ->
case check_cookie() of
Great! Now lets see what happens?
$ sudo -u sensu /usr/lib/riak/erts-5.9.1/bin/escript /usr/local/sbin/check_node --node riak@`hostname -f` --name sensu@`hostname -f` --cookie riak node_up
UNKNOWN: Foo {child,undefined,net_sup_dynamic,
{erl_distribution,start_link,
[['250sensu@ip-XX-XX-XX-XX.ec2.internal']]},
permanent,1000,supervisor,
[erl_distribution]}
In the end, determined that the sensu apt package installs to /opt/sensu and creates a sensu user with /opt/sensu as its home directory, which is unwritable by the user. Erlang requires a writable HOME directory for .erlang.cookie.
Quite obviously, the error “{child,undefined,net_sup_dynamic, {erl_distribution,start_link” means that there was an error writing the user’s connection cookie. Obviously
Solution
In the end I had two choices:
- Let the sensu user have write permissions to binaries, gems, etc. Nope.
- Wrap check_node with an environment change for its home directory. Fine.
Wrapping the check_node command with a new HOME environment seemed like the lesser of the two evils. Here’s how I accomplished it:
riak-check-node.sh
#!/bin/bash COOKIE=`grep ^-setcookie /etc/riak/vm.args | awk '{print $2;}'` HOSTNAME=`hostname -f` ESCRIPT=/usr/lib/riak/erts-5.9.1/bin/escript # Erlang requires a writeable $HOME for $HOME/.erlang.cookie if [ ! -w $HOME ]; then mkdir -p /tmp/$USER || { echo "No writeable homedir for .erlang.cookie." exit 1 } if [ ! -w /tmp/$USER ]; then echo "No /tmp/$USER not writeable for .erlang.cookie." exit 1 fi export HOME=/tmp/$USER fi $ESCRIPT /usr/local/sbin/check_node \ --node riak@$HOSTNAME \ --cookie $COOKIE \ $1
riak.json
{ "checks": { "riak-up": { "handlers": ["default"], "command": "/usr/local/sbin/riak-check-node.sh node_up", "interval": 60, "subscribers": ["riak"], "standalone": true } } }
Hope this helps