Original Video:Slack: Real-Time Communication with HAProxy and Route53 on AWS

Preface:

Original video comes from AWS “My Architect”, it introduce how slack to build their service on top of AWS.

This video is very easy to understand within 5 minutes, it make me to think about - Could we use this video to think about how to build slack-like service from scratch?

原始影片是在講解如何透過 AWS 的服務來架設 Slack ,不過裡面有許多思考脈絡很適合去思考 ”如何架設一個類似 Slack 的架構”. 這邊很歡迎大家去看原始影片,只要 05:43 而已,但是裡面有許多細節可以提供我們慢慢地思考.

Disclaimer

This article just made by me which to think about how to make a slack-like service from scratch. It not offical slack report or documentation here. Every ideas are just my assumptions.

Feel free to correct me once you find anything is wrong. :)

How to build slack-like service from scratch? (from my point of view)

From beginning

Slack is a “online messaging service”, so if we want to build something like slack. Your choice might be

  • xmpp
  • webRTC (not only online messaging, but also video/audio streaming)
  • Or we could build something just using websocket?

user use “websocket” to connect server and communicate with each other.

So, the basic architecture is very simple.

user -> websocket -> server <- websocket <- user

Challenge 1: More users

When the lots of user come-in use your slack-like sytem, you have to build a HAPROXY to load-balance your websocket connection.

user -> websocket -> HAPROXY -> server(s) <- HAPROXY <- websocket <- user

Challenge 2: Need sercurity of your transmission.

You need a way to protect the trasmission between each server. In that way, you might need IPSEC to increase your transmission security.

user -> websocket -> HAPROXY -> server <- IPSEC -> server <- HAPROXY <- websocket <- user

Challenge 3: Need reduce the huge amount request cause server heavy loading.

Although we using the HAPROXY to do load-balancing of websocket request, but if too much user it will have huge amount of database access connection and access request to server.

It have following problems(challenges):

  • Start connection take too long and expensive
  • Client data footprint become large for mobile user
  • Reconnection storm are resource intensive

How could we try to optimize it? Slace have their own edge-cache system which call “Flannel” (Application-level edge cache)

(without Flannel)

(with Flannel)

Flannel” provide following features:

  • Keep persistent connection with server but provide slimmed down version client side data to client.
  • So, client don’t need handle huge connection bootstrape data instead of using few data with flannel.
  • Provide a lazily load from client side
  • Reduce mobile user connection waiting time and data size.

EX: Client side will connect to flannel for “auto-complete” result from “at”(@) someone.

user -> websocket -> HAPROXY -> Flannel -> server <- IPSEC -> server <- Flannel <- HAPROXY <- websocket <- user

Challenge 4: Large pre-load message

When client is login after long-time, it might need take full bootstrape login process. It will also to retrival all up-to-date information to keep your slack channel data.

In that way, user will connect to slack-like.com to retrieval related data. In other way, we also need DNS server to serve huge domain name translation service.

AWS Route 53” is AWS DNS service, it also provide HTPROXY health check to make sure every load balancer works well.

(Offline data sync) Client -> Route 53 --> HAPROXY -> Flannel...

Question: Why use HAPROXY and ELB in the same system?

Mobile user and some use case need slimmed down data transfer will go by HAPROXY with self-defined LB rule to “Flannel”

Other requests (“CloudFront”) will use ELB.

Wrap-up

This article just trying to build a slack-like service from scratch. Then, we trying to improvde our service when we have some new challenge here.

It is a good practice for me to think about whole architecture for realtime messaging service. Hope you enjoy it.

Reference:


Evan

Attitude is everything