I have an EMQX cluster of three nodes at v5.0.8 and having an issue where a broker is constantly waiting for a QOS2 REL reply from the client and not timing out. Almost like the default 300 seconds for the property await_rel_timeout is being ignored. When the client tries to reconnect it stays connected for 3 seconds and then is disconnected. When I put a trace on the client id, it is waiting to receive REL messages that are really old (like days old). The only way I’ve been able to fix it is to clean the session.
I’m wondering if there were any fixes related to this functionality or any particular parameters I should set. Maybe I’m misunderstanding the protocol or even might be a client issue, but it seems odd that the server keeps waiting for a REL that is clearly outdated.
Below is a sample of the error logs I am seeing.
line: 821, mfa: emqx_session:redispatch_shared_messages/1, msg: qos2_lost_no_redispatch