

I hope this tool helps you out and saves you some time. okie file from /etc/ejabberd/bin of ejabberdnode1 (The intended master) to /etc/ejabberd/bin of ejabberdnode2 (the intended slave). Note: this requires doing a complete mnesia db copy over the network, this might hang for a bit if your databases are large or network latency is high (or both) it, you’re done! ejabberd is a Jabber/XMPP + MQTT + SIP server written in Erlang, featuring: distributed operation with load-balancing across a cluster fault-tolerant. ejabberd Administration I have installed ejabberd on two ubuntu servers in the same network using the binary installers, I have copied the. To use the plugin to create a second, third, Nth master node is super simple.

You could also run additional masters with their own slaves as a failover cluster. The purpose of ejabberd cluster is for fault tolerance and scalability, being able to use multiple servers for a single or a small group of large domains. That being said I cannot comment on the performance issues one might experience running a masters only cluster at a massive scale so using a handful of masters with regular non-copy/remote slaves might be a better architecture if you’re dealing with large clusters, but I don’t know, so I can’t advise one way or the other at this point. We’re only running a small cluster of 3 nodes so we’ve deployed all 3 nodes as masters, this allows us to kill/cycle any of them without any special procedures needed to leave/join a cluster after failure/reboot/etc. I’ve updated the plugin to support this functionality so deploying failover masters is easier. The fix doesn’t so much involve ejabberd as it involves just using proper erlang/mnesia replication and removing dependencies on remote tables for underlying data stores used for route and session management by ejabberd. If you have a single point of failure that is capable of crippling an entire cluster… well, it’s going to bite you in the ass.Īfter about a day of piecing together old mailing lists, stackoverflow posts and articles on mnesia replication, several forums talking about modifying ejabberdctl, and tinkering we were able to finally track down a multimaster cluster that could handle any of the nodes dying without bringing down the entire cluster. This is messy, it’s stupid, but most importantly it pretty much kills the benefit of having a cluster in the first place. To make things worse fixing the master didn’t fix the cluster, you would have to completely cycle the ejabberd instances of every slave to get them working properly again, or input the mnesia commands to make master communicate with the slaves. We quickly discovered that no matter how many slaves you had deployed as soon as your master node experienced issues the entire cluster became inoperable. Once you have a functioning cluster using those steps and you want to add multi-master support for better failover and fault tolerance of your ejabberd cluster come back here and read this post.Īfter getting our small cluster up and running we started testing cluster failure handling. Easy ejabberd clustering (multi-master, fault tolerant, failover)įirst and foremost, for general cluster configuration steps see this post:
