NLB and WSS config

Hi, I’ve set the CertManager, the Kubernetes operator version 2.1.1 with CRDs and image for the EMQX is 5.0.23. Kubernetes version is 1.22 and storage used is gp3. This initial cluster was formed with 3 Cores (no replicants) and looks good. API version used for Kubernetes resource is v2alpha1.

I am interested to expose only WSS port via an AWS Network Load Balancer. The info found so far is quite scarce. I tried to expose it via the EMQX definition and also tried via separate ingress NLB. Certificate used is generated via AWS Certificate Manager. I’ve created an user using build-in database, authorization is “Publish&Subscribe” to topic “test” (tried even is Superuser). NLB would expose would expose port 8084 (TLS) and port 18083 for the dashboard.

Testing via mqtt client always shows “Unable to connect. Reason: ‘handshake timed out after 10000ms’”. Trying via the dashboard websocket client says “… is Disconnected”.
String tested with mqtt client:
mqtt publish --topic test --message Hello --host nlbaddress.eu-central-1.amazonaws.com --port 8084 -ws -u username -pw password

For the dashboard websocket client, left the standard values, only updated the adress, port to 8084 and added the username and password.

Increasing logging to debug shows nothing, only lots of these messages which seem cluster related:
“2023-04-25T15:37:58.620833+00:00 [info] msg: terminate, mfa: emqx_connection:terminate/2, line: 666, peername: 10.33.22.27:12288, reason: {shutdown,tcp_closed”

What I want to achieve:

  • Expose WSS port to devices on 8084 via NLB, prefferably on path /mqtt
  • Make use of SSL termination of the Network Load Balancer
  • Expose the dashboard on 18083
    (- Get metrics via Prometheus and monitor)

Setup is the following:

apiVersion: apps.emqx.io/v2alpha1
kind: EMQX
metadata:
  name: emqx
  namespace: emqx-operator-system
  labels:
    app.kubernetes.io/name: emqx
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "external"
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
    service.beta.kubernetes.io/aws-load-balancer-attributes: load_balancing.cross_zone.enabled=true
    service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=true
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:.....certificate here.......
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-ssl-ports: 8084,wss    
spec:
  image: emqx:5.0.23
  imagePullPolicy: IfNotPresent
  bootstrapConfig: |
    node {
      cookie = emqxsecretcookie
      data_dir = "data"
      etc_dir = "etc"
    }
    cluster {
      discovery_strategy = dns
      dns {
        record_type = srv
        name:"emqx-headless.emqx-operator-system.svc.cluster.local"
      }
    }
    dashboard {
      listeners.http {
          bind: 18083
      }
      default_username: "admin"
      default_password: "somepasshere"
    }
    listeners.tcp.default {
      bind = "0.0.0.0:8083"
      max_connections = 1024000
    }
    sysmon.vm.long_schedule = disabled
  coreTemplate:
    metadata:
      name: emqx-core
      labels:
        apps.emqx.io/instance: emqx
        apps.emqx.io/db-role: core
    spec:
      volumeClaimTemplates:
        storageClassName: gp3-store
        resources:
          requests:
            storage: 1Gi
        accessModes:
        - ReadWriteOnce
      replicas: 3
      command:
        - "/usr/bin/docker-entrypoint.sh"
      args:
        - "/opt/emqx/bin/emqx"
        - "foreground"
      ports:
        - containerPort: 8083
      podSecurityContext:
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
        fsGroupChangePolicy: Always
      containerSecurityContext:
        runAsUser: 1000
        runAsGroup: 1000
      livenessProbe:
        httpGet:
          path: /status
          port: 18083
        initialDelaySeconds: 60
        periodSeconds: 30
        failureThreshold: 3
      readinessProbe:
        httpGet:
          path: /status
          port: 18083
        initialDelaySeconds: 10
        periodSeconds: 5
        failureThreshold: 12
      lifecycle:
        preStop:
          exec:
            command: ["/bin/sh","-c","emqx ctl cluster leave"]
  replicantTemplate:
    metadata:
      name: emqx-replicant
      labels:
        apps.emqx.io/instance: emqx
        apps.emqx.io/db-role: replicant
    spec:
      replicas: 0
      command:
        - "/usr/bin/docker-entrypoint.sh"
      args:
        - "/opt/emqx/bin/emqx"
        - "foreground"
      ports:
        - containerPort: 1883
      podSecurityContext:
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
        fsGroupChangePolicy: Always
        supplementalGroups:
          - 1000
      containerSecurityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
      livenessProbe:
        httpGet:
          path: /status
          port: 18083
        initialDelaySeconds: 60
        periodSeconds: 30
        failureThreshold: 10
      readinessProbe:
        httpGet:
          path: /status
          port: 18083
        initialDelaySeconds: 10
        periodSeconds: 5
        failureThreshold: 30
      lifecycle:
        preStop:
          exec:
            command: ["/bin/sh","-c","emqx ctl cluster leave"]
  dashboardServiceTemplate:
    metadata:
      name: emqx-dashboard
    spec:
      selector:
        apps.emqx.io/db-role: core
      ports:
        - name: "dashboard-listeners-http-bind"
          protocol: TCP
          port: 18083
          targetPort: 18083
  listenersServiceTemplate:
    spec:
      type: LoadBalancer
      ports:
        - name: "wss"
          protocol: TCP
          port: 8084
          targetPort: 8083

The targets appear as healthy and there is no NLB ip restriction (open to 0.0.0.0). What could be the reason the handshake fails / I get disconnected via dashboard tester? Did I get something wrong? Thanks!

Tried also just plain MQTT on port 1883, no SSL, no websocket, but still the same…

Connection is closed immediately:

mqttx conn -h ‘lb-address’ -p 1883 -u ‘username’ -P ‘password’
[4/26/2023] [4:28:11 PM] › … Connecting…
[4/26/2023] [4:28:11 PM] › :heavy_multiplication_x: Connection closed

Exceed the maximum reconnect times limit, stop retry

mqtt publish --topic test --message Hello --host lb-address --port 1883 -u username -pw password --debug
Client ‘UNKNOWN@lb-address.amazonaws.com’ sending CONNECT MqttConnect{keepAlive=60, cleanStart=true, sessionExpiryInterval=0, simpleAuth=MqttSimpleAuth{username and password}}
Client ‘UNKNOWN@lb-address.amazonaws.com’ DISCONNECTED Server closed connection without DISCONNECT.: com.hivemq.client.mqtt.exceptions.ConnectionClosedException: Server closed connection without DISCONNECT.
Unable to connect: com.hivemq.client.mqtt.exceptions.ConnectionClosedException: Server closed connection without DISCONNECT.
at com.hivemq.client.internal.mqtt.MqttBlockingClient.connect(MqttBlockingClient.java:101)

Logs show these kind of lines, even if I try something or not.

2023-04-26T13:10:48.667386+00:00 [info] msg: terminate, mfa: emqx_connection:terminate/2, line: 666, peername: 10.67.196.210:18566, reason: {shutdown,malformed_packet}
2023-04-26T13:10:49.427551+00:00 [debug] msg: raw_bin_received, mfa: emqx_connection:when_bytes_in/3, line: 771, peername: 10.67.195.212:49191, bin: 0D0A0D0A000D0A515549540A211100544F71ABD20A43C3D4F371075B030004C2AF681304003E0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000104600044D51545404C2003C00166D7174742D6578706C6F7265722D346234613462313100107465737438323435684B6C515571757500107465737438323435684B6C5155717575, size: 172, type: hex
2023-04-26T13:10:49.427736+00:00 [info] mfa: emqx_connection:parse_incoming/2, line: 807, peername: 10.67.195.212:49191, at_state: <<“clean”>>, input_bytes: <<13,10,13,10,0,13,10,81,85,73,84,10,33,17,0,84,79,113,171,210,10,67,195,212,243,113,7,91,3,0,4,194,175,104,19,4,0,62,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,…>>, parsed_packets: [], reason: malformed_packet
2023-04-26T13:10:49.428044+00:00 [debug] msg: mqtt_packet_received, mfa: emqx_connection:handle_msg/2, line: 567, peername: 10.67.195.212:49191, packet: {frame_error,malformed_packet}, tag: MQTT
2023-04-26T13:10:49.428250+00:00 [debug] msg: emqx_connection_terminated, mfa: emqx_connection:terminate/2, line: 661, peername: 10.67.195.212:49191, reason: {shutdown,malformed_packet}, tag: SOCKET
2023-04-26T13:10:49.428400+00:00 [info] msg: terminate, mfa: emqx_connection:terminate/2, line: 666, peername: 10.67.195.212:49191, reason: {shutdown,malformed_packet}
2023-04-26T13:10:49.753300+00:00 [debug] msg: raw_bin_received, mfa: emqx_connection:when_bytes_in/3, line: 771, peername: 10.67.197.19:16448, bin: 0D0A0D0A000D0A515549540A20000007030004A9B87E8F, size: 23, type: hex
2023-04-26T13:10:49.753537+00:00 [info] mfa: emqx_connection:parse_incoming/2, line: 807, peername: 10.67.197.19:16448, at_state: <<“clean”>>, input_bytes: <<13,10,13,10,0,13,10,81,85,73,84,10,32,0,0,7,3,0,4,169,184,126,143>>, parsed_packets: [], reason: malformed_packet
2023-04-26T13:10:49.753699+00:00 [debug] msg: mqtt_packet_received, mfa: emqx_connection:handle_msg/2, line: 567, peername: 10.67.197.19:16448, packet: {frame_error,malformed_packet}, tag: MQTT
2023-04-26T13:10:49.753851+00:00 [debug] msg: emqx_connection_terminated, mfa: emqx_connection:terminate/2, line: 661, peername: 10.67.197.19:16448, reason: {shutdown,malformed_packet}, tag: SOCKET

I was expecting plain MQTT to work at least :frowning:

Does anyone have a working Kubernetes config with NLB? Or any ideas on what it might be wrong here?

Hello,

It looks like there is protocol mismatch because of invalid listener configuration:

    listeners.tcp.default {
      bind = "0.0.0.0:8083"
      max_connections = 1024000
    }

This creates a tcp listener (a.k.a. plain text MQTT listener) on port 8083, not WSS.

Hi and thanks dmif for looking into this one! But if I create/open a wss listener, don’t I need to provide also the certificate? At least if I try to open one in the web console, it asks me. I’m hoping to terminate SSL/TLS on the Network Load Balancer and just open the 8083 (plain websocket port), something like they did for MQTT in the docs:

Just the example is not on Kubernetes but plain EC2, not using ingress NLB but plain Classic Load Balancer (which is obsolete) and using MQTT/MQTT-SSL. But the principles should be the same, if I am not wrong?! :blush:

Ok, update on the adventure…

I have now working plain MQTT and plain WS. I send test messages and they work. But anything on TLS just does not work…

These are the opened listeners (anything else is default):

    listeners.tcp.default {
      bind = "0.0.0.0:1883"
      max_connections = 1024000
    }
    listeners.ws.default {
      bind = "0.0.0.0:8083"
      max_connections = 1024000
      websocket.mqtt_path = "/mqtt"
    }

This is the load balancer section:

  listenersServiceTemplate:
    metadata:
      name: emqx-listeners
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
        service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
        service.beta.kubernetes.io/aws-load-balancer-attributes: load_balancing.cross_zone.enabled=true
        #service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=true
        service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:eu-west-1:....:certificate/f.......
        service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
        service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "8883,8084"
    spec:
      type: LoadBalancer
      loadBalancerClass: service.k8s.aws/nlb
#      selector:
#        apps.emqx.io/db-role: core
      ports:
        - name: "plain"
          protocol: TCP
          port: 1883
          targetPort: 1883
        - name: "secure"
          protocol: TCP
          port: 8883
          targetPort: 1883
        - name: "ws"
          protocol: TCP
          port: 8083
          targetPort: 8083
        - name: "wss"
          protocol: TCP
          port: 8084
          targetPort: 8083

So flow is:

user → NLB - > TCP 8083 → TCP 8083 working
TLS 8084 → TCP 8083 not
TCP 1883 → TCP 1883 working
TLS 8883 → TCP 1883 not

errors on TLS are “handshake timed out after 10000” or ‘Timeout while waiting for CONNACK’

NLB logs are kind of useless in this case, all I see is this:

tls 2.0 2023-04-27T12:45:22 net/k8s-emqxoper-emqxlist-d528e8bb37/6a19709fb667dca6 7d1e926689cbdd38 my_ip_here:57688 10.67.195.6:8883 29934 - 0 0 - - - - - - - - - -

Is this setup not possible with the newest version of emqx:5.0.23? I saw similar conceptual setups with previous versions…

What is a bit strange for me is this official example:

“Add some annotations in EMQX custom resources’ metadata, just as shown below:”

## Specifies the ARN of one or more certificates managed by the AWS Certificate Manager.
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:us-west-2:xxxxx:certificate/xxxxxxx
## Specifies whether to use TLS for the backend traffic between the load balancer and the kubernetes pods.
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
## Specifies a frontend port with a TLS listener. This means that accessing port 1883 through AWS NLB service requires TLS authentication,
## but direct access to K8S service port does not require TLS authentication
service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "1883"

Why is the port here stated as 1883? Shouldn’t it be 8883? Or the actual port number is irrelevant in this context, what matters is the annotation itself?

Anyway, in my case, as this setup works on non-TLS listeners/ports but not on TLS listeners, it surely must be something related to TLS. Unfortunately logging does not help at all :frowning:

Ok, so it is working now with the setup above. It was related to clients used to test (and their ability to test MQTT TLS websocket) and the parameters used. TLS handshake was being dropped at NLB. Very important also is to check with the domain associated with the certificate. Hopefully the above setup will be useful for others.