[TIL] How to build a slack-like service from scratch

Original Video:Slack: Real-Time Communication with HAProxy and Route53 on AWS

Preface:

Original video comes from AWS “My Architect”, it introduce how slack to build their service on top of AWS.

This video is very easy to understand within 5 minutes, it make me to think about - Could we use this video to inference how to build slack-like service from scratch?

原始影片是在講解如何透過 AWS 的服務來架設 Slack ,不過裡面有許多思考脈絡很適合去思考 ”如何架設一個類似 Slack 的架構”. 這邊很歡迎大家去看原始影片,只要 05:43 而已,但是裡面有許多細節可以提供我們慢慢地思考.

How to build slack-like service from scratch?

From beginning

Slack is a “online messaging service”, so if we want to build something like slack. Your choice might be

  • xmpp
  • webRTC (not only online messaging, but also video/audio streaming)
  • Or we could build something just using websocket?

user use “websocket” to connect server and communicate with each other.

So, the basic architecture is very simple.

user -> websocket -> server <- websocket <- user

Challenge 1: More users

When the lots of user come-in use your slack-like sytem, you have to build a HAPROXY to load-balance your websocket connection.

user -> websocket -> HAPROXY -> server(s) <- HAPROXY <- websocket <- user

Challenge 2: Need sercurity of your transmission.

You need a way to protect the trasmission between each server. In that way, you might need IPSEC to increase your transmission security.

user -> websocket -> HAPROXY -> server <- IPSEC -> server <- HAPROXY <- websocket <- user

Challenge 3: Need reduce the huge amount request cause server heavy loading.

Although we using the HAPROXY to do load-balancing of websocket request, but if too much user it will have huge amount of database access connection and access request to server.

It have following problems(challenges):

  • Start connection take too long and expensive
  • Client data footprint become large for mobile user
  • Reconnection storm are resource intensive

How could we try to optimize it? Slace have their own edge-cache system which call “Flannel” (Application-level edge cache)

(without Flannel)

(with Flannel)

“Flannel” provide following features:

  • Keep persistent connection with server but provide slimmed down version client side data to client.
  • So, client don’t need handle huge connection bootstrape data instead of using few data with flannel.
  • Provide a lazily load from client side
  • Reduce mobile user connection waiting time and data size.

EX: Client side will connect to flannel for “auto-complete” result from “at”(@) someone.

user -> websocket -> HAPROXY -> Flannel -> server <- IPSEC -> server <- Flannel <- HAPROXY <- websocket <- user

Challenge 4: Large pre-load message

When client is login after long-time, it might need take full bootstrape login process. It will also to retrival all up-to-date information to keep your slack channel data.

In that way, it need CDN from messaging server. We could use AWS EC2 for this case to reduce pre-load latency or connection intensive.

For easy way to access EC2, AWS Route 53 provide scalable, high-availability DNS service.

Wrap-up

This article just trying to build a slack-like service from scratch. Then, we trying to improvde our service when we have some new challenge here.

It is a good practice for me to think about whole architecture for realtime messaging service. Hope you enjoy it.

Reference:

[TIL] Learn from Taipei 5G Summit 2017 (一)

Opportunity about new 5G Radio Broadband

簡短心得:

第一天的會議主軸主要是新的頻段可能帶來的商機與應用,當然不脫出三個主要的期待 (更高的頻段,更多的資料預載量,更有彈性的應用)

除了這些之外,不脫出 5G 的重要關鍵因素 (低延遲,高併發與高頻寬)

5G NR(New Radio)

  • Broadband range from 4G(6G HZ) to 5G NR(100 GHZ)
  • Also include mmWave (24 GHZ ~ 100 GHZ)

(refer slide)

mmWave(millimeter-wave)

  • Multi-Giga byte data rate
  • Much more capacity
  • Flexible deployments

(refer to QualComm slide)

Key Factors

  • eMBB (enhanced Mobile Broadband):
    • It means more data bandwidth for mobile system.
  • mMTC (massive Machine Type Communications)
    • For IOT or multiple concurrency IOT related industry. It include massive connectivity for IOT(or other) device.
  • URLLC (Ultra-Reliable and Low Latency Communications)
    • It point to very low latency communication normally use for auto-driving.

三個面向就是 “高頻寬”,”高併發” 跟 “極低延遲” .就是 5G 的三個面向.

[論文解讀][Bloom Filter] 深入討論 Bloom Filter

關於 Bloom Filter

應用場景

處理大型資料的時候,往往需要一個索引可以快速的找到資料.這樣的索引就被成為 filter.

針對要搜尋一個數字的位址或是是否存在,簡單的方式就是每一個都找過一次,這樣下去的時間複雜度就是

也有一個比較快的方式就是將所有的數字變成一個陣列,然後該數字存在就將其紀錄為 “1” 的 (Mapping Table) 方式,這樣的時間複雜度就會優化為 但是空間複雜度就會變成了

那麼是否有一個資料結構能夠兼具 的時間複雜度,但是又不需要有 的空間複雜度的 Filter 呢?

Bloom Filter 是一個 Hashing table 提供你快速的搜尋 ( ) 的資料結構. Bloom Filter 運作方式如下:

  • 每一個資料會透過 Hash Function 後存在某個 Array 位置裡面
  • 只存放 (1: 存在, 0:不存在) 兩種資料

Bloom Filter 實作的細節:

實作 Bloom Filter 有一些細節需要去取捨:

  • False Pasitive 的機率 ()
  • 需要幾個 Filter 來表示全部的資料 ()
  • 每個 Filter 需要多少資料欄位 ()
  • 需要幾層的 Hash Function ()

在預先設定的 機率與給定希望的 filter size 下, 我們可以計算出所需要的資料欄位 與 幾層 Hash Function .

而計算式子如下,其實代碼裡面都有:

可以參考這個網頁

Bloom Filter 優點:

  • 快速地查詢時間 O(1)
    • (不要想說,資料庫更快.因為很多資料庫都是透過 Bloom Filter 來做為查詢 (ex: Cassandra Partition Query ))
  • 索引資料結構小 (只有一個 bit 0:1)

Bloom Filter 缺點(限制):

但是這樣快速的資料結構其實有它先天上的限制:

  • 只能回答你絕對不在,但是無法確認該物件一定在.會有誤判 (false positive) 的可能. (但是不會有 False Negtive)
  • 資料節點,只能夠新增進去無法刪除.

補充: 一張很容易了解 (False Positive, False Negtive, True Positive, True Negtive )

Bloom Filter 應用場景 - Cassandra Partition Cache

或許大家會思考,要找一個資料在不在某個地方.不是直接去找資料庫就好了嗎? 其實,在 Cassandra 裡面就有利用到 Bloom Filter 來決定該資料是否存在某個 Partition

透過 Bloom Filter 可以快速知道資料是否放在 Partition Key Cache ,如果不在就要再透過資料直接去 Partition Index 尋找資料.

Ref: Datastax About Read

Bloom Filter 改良版 Counting Filter

Bloom Filter 有著不能刪除新資料的限制,因為你把原先為 1 的改成 0的時候,你不知道原先有多少個節點是對應該該位置.後來在資料欄位上新增了計數器 (counting) 就能夠解決不能刪除節點的問題.

透過新增計數單位可以記錄總共有幾個點是透過 Hashing Function Mapping 到該節點上面.到時候要刪除的時候也就可以確認是否所有對應到該節點的個數都有確切地被刪除.

運作方式為:

  • 新增一個節點進( CBF: Counting Bloom Filter ) 就將該 Hashing 過的位置裡面數值加一
  • 要刪除節點的時候就會把該數值減一.

Counting Bloom Filter 的缺點

為了要解決不能刪除節點而加入的計數欄位,就會變成讓資料量反而又變大了.

程式碼:

多說無益,來看程式碼

Reference

[好文分享] Observability 3 ways: Logging, Metrics & Tracing

Zipkin (跟 OpenTrace 很類似的 Distributed Tracing 服務) 的 Team Lead 在 Dot Conference 裡面談到量測的三種方式:

  • Log
  • Metrics
  • Tracing

並且透過顯示 Response time 這個範例來展示,如何透過 Log, Metrics 與 Tracing 來顯示.並且也讓你更了解該如何應用這三種量測的方式來解決你的問題….

[好文分享] Pascal at Apple

很有趣的一篇考古論文導讀。 Pascal at Apple.

裡面有很多有趣小故事:

  1. Apple 開發出特有的 pascal, 並且有 世界上第一個 OOP App 生態系統在 Lisa
  2. 那時候要寫 Mac App 每個工程師需要買 Lisa. (備註 Lisa 買下去要當時五千美金))o(
  3. Lisa 內建 pascal native compiler
  4. 當時許多的 Mac App 都是使用該特定 Pascal 開發。包括該編譯器。

太有趣的小歷史。

文章在這裡….

[研討會心得][iThome GopherDay] What can Golang do? (Using project 52 as examples)

演講摘要

入手一個新的語言永遠不是一件簡單的事情。

不論是要學習 toolset 跟語言的語法,一直以來在過程中最大的問題永遠都會是:這個語言究竟能做些什麼?

講者曾經在 2015/06~ 2016/06 一年中挑戰自己每個禮拜寫一個小的專案,名為 “Project 52” 。透過 “Project 52”,講者會告訴你,究竟 Golang 能做些什麼:

簡潔的語法 強大的工具鏈 超高的效能 Google 開發出的 Golang 不僅僅能讓你更專注開發,還能幫助你開發高效的應用程式。

Slide

iTHome Gopher Day 2017: What can Golang do? (Using project 52 as examples) from Evan Lin

心得

第一次台灣舉辦的 Golang 研討會。感謝 iThome 的工作人員。 整個會場人爆滿,加上有便當跟餐點都很棒。謝謝各位講者的分享。也希望每個聽眾都能 enjoy.