[論文解讀][Bloom Filter] Cuckoo Filter

(Image: Coolshell)

[Cuckoo Filter: Practically Better Than Bloom] [slide]

接續前文, Bloom Filter 深入討論

複習一下 Bloom Filter 與 Counting Filter

Bloom Filter 快速的搜尋索引結構,具有以下優點:

  • 能夠在 \(O(1)\) 的時間複雜度下快速確認該物品有沒有存在(* 具有 False Positive)

缺點:

  • 但是只支援新增節點卻無法刪除節點. (* 需要透過重新建立 Bloom Filter)
  • False positive 的問題,需要透過增加節點(m)來減少,但是這樣又會造成空間複雜度的增加.

為了要解決刪除節點需要重新建立整個 Bloom Filter 的缺點, Counting Filter 就因為這樣而產生.

Counting Filter 缺點

空間過大

原先 Couting Filter 藉由增加一個資料欄位來讓 Bloom Filter 無法刪除資料點的限制去除.但是卻又因為每個節點需要增加一個資料欄位而讓空間使用上變得相當沒有效率.

空間的利用率上,根據這篇論文提到. 你可能需要使用到 3~4 倍的空間使用率來維持相同的 False Positive Rate .

為何使用 Couting Filter 的空間複雜度會增加 3~4 倍

主要是因為要能夠符合 dynamic deletion 這個條件限制,所以在插入資料的時候會選擇 \(k\) 的個數來插入你的資料. (也就是一份資料 duplicate 到 \(k\) 份)

原因如下:

  • 由於需要符合 dynamic deletion ,雖然增加了一個數值單位來記錄總共有幾個.但是如果只有一份的話,再刪除的時候很容易刪除掉其他 hashing collision 的資料
  • 避免這樣的問題,只好多複製幾分來做為 hashing collision 的解決方式.

細談增加 3~4 倍的原因

在這裡列出一些符號如下:

  • 第 i 個 counting filter 被增加 j 次的機率
  • \(n\) 元素個數
  • \(k\) hashing 個數,也就是你需要複製幾份
  • \(m\) 為原來的 couter 的表現大小的個數.
\[P(c(i) = j) = (^nk _j) (1/m)^j (1-1/m)^nk-j\]

之前的 Bloom Filter 對於 \(k, m, n\) 的限制: \(k ≤ (ln2)m/n\) ,所以式子可以改成:

\[P( (max)_i c(i) >= j) <= m ((e ln 2)^j/ j)\]

所以如果 counter \(i\) 是 4 個,那麼 \(j\) 就得 16 才能滿足以上的式子.

透過這樣的方式來計算,大概需要 3~4 倍的空間複雜度,來滿足原先 Bloom Filter 的條件.

Cuckoo Filter

優點:

在提到更多的 implement 的過程中,我們先來查看整個 cuckoo filter 的優點.

  • 支援動態的新增,刪除節點(跟 Counting Filter 相同)
  • 當整個結構接近使用完畢的時候, Cuckoo Filter 具有更好的 Lookup 效能 (95% 空間利用率)
  • 比起 quotient filter 更容易實現
  • 具有 False Positive Rate 比 Bloom Filter 更小 (< 3%) 並且具有更少的空間使用率

與其他結構的比較一覽:

(image from paper “Cuckoo Filter: Practically Better Than Bloom”)

這一張列表清楚列出 Cuckoo Filter 的優點.可以看得出來, 以下稍微做個條列式:

  • Bloom Filter 與 Blocked Bloom Filter 都不知元 deletion (指的是動態支援)
  • Counting Filter 需要到 3~4 倍左右的空間成本.

原理:

Cuckoo Filter 的原理相當簡單:

  • 透過 footprint 來計算每個資料的特徵值(作為插入節點用得值),再次要注意一下, footprint 的存在使得 Cuckoo Filter 的 loopup 效能相當的快.
  • 當要插入的 bucket 已經有值的時候,就會學 Cuckoo 把原先的鳥踢出去依樣. 將原先的數值踢走,該數值再透過另外的 hashing function 找到下一個位置.如果又有人,就把它踢掉.(以此類推)
  • 如果踢走元素的次數,超過了自定義的 MaxCount .代表整個資料已經滿了,還是需要增加節點,重現建構.

缺點:

還是有一些缺點,可以分享一下:

  • 由於進行 XOR 的運算,使得 Filter 個數必須為 2 的整數次冪(也就是 2 的某個次方數)
  • 儲存 footprint 在 filter 內當作該數值的特徵值,雖然有一些好處.但是也多了另外一個方面的碰撞機會而造成 false positive

Source Code

這裡有寫了一個簡單的範例,當然我推薦看看這個很棒的範例

Reference

[TIL] Kubernetes all traffic between pod or between pod to node are drop

Symptom:

  • Service/Pod could create success, but could not connect to pod.
  • Could not connect to another pod in another node (even in the same node)
  • All kubectl status works well
  • Your docker is newer than 1.13 (it works well if your docker version is 1.12)

It will happen on “kubeadm” but not happen in “minikube”.

Diagnosis:

Check iptable rule.

sudo iptables-save

-A INPUT -j KUBE-FIREWALL
-A FORWARD -j DOCKER-ISOLATION
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -
j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j DROP
-A FORWARD -i docker0 -o docker0 -j DROP
-A OUTPUT -j KUBE-FIREWALL
-A DOCKER-ISOLATION -j RETURN

As you could observe “A FORWARD -i docker0 ! -o docker0 -j DROP

Root cause:

Refer to moby issue 40182 (still not resolve until kubernetes 1.8)

Solution:

  1. Downgrade to docker v1.12.x
  2. Add iptable forward rule to all (not suggest)
    • sudo iptables -P FORWARD ACCEPT
  3. Start every container with docker --iptables=false (not easy when you use kubernetes)

Refer great slide “All The Troubles You Get Into When Setting up a Production-ready Kubernetes Cluster” by Jimmy Lu

Reference:

[TIL][Golangtw] note for Dave Cheney session in Golangtw

Meetup note

pprof -> no clue about how many CPU we used. trace -> understand the process time and channel send -> channel recev

“Synchronization blocking profile” is a good clue to figure out problem related to unbuffered channel.

Execution-tracer also help to figure out problem when you trace running http server. It could help us to figure out network request and http serve diagram.

View Option -> Flow Arrow to display all the incoming and outcoming flow.

Goroutine announce -> to understand the goroutine path

Execution tracker is low overhead, until the lots golang routine waiting (not must but almost 10%).

[TIL] How to build a slack-like service from scratch

Original Video:Slack: Real-Time Communication with HAProxy and Route53 on AWS

Preface:

Original video comes from AWS “My Architect”, it introduce how slack to build their service on top of AWS.

This video is very easy to understand within 5 minutes, it make me to think about - Could we use this video to think about how to build slack-like service from scratch?

原始影片是在講解如何透過 AWS 的服務來架設 Slack ,不過裡面有許多思考脈絡很適合去思考 ”如何架設一個類似 Slack 的架構”. 這邊很歡迎大家去看原始影片,只要 05:43 而已,但是裡面有許多細節可以提供我們慢慢地思考.

Disclaimer

This article just made by me which to think about how to make a slack-like service from scratch. It not offical slack report or documentation here. Every ideas are just my assumptions.

Feel free to correct me once you find anything is wrong. :)

How to build slack-like service from scratch? (from my point of view)

From beginning

Slack is a “online messaging service”, so if we want to build something like slack. Your choice might be

  • xmpp
  • webRTC (not only online messaging, but also video/audio streaming)
  • Or we could build something just using websocket?

user use “websocket” to connect server and communicate with each other.

So, the basic architecture is very simple.

user -> websocket -> server <- websocket <- user

Challenge 1: More users

When the lots of user come-in use your slack-like sytem, you have to build a HAPROXY to load-balance your websocket connection.

user -> websocket -> HAPROXY -> server(s) <- HAPROXY <- websocket <- user

Challenge 2: Need sercurity of your transmission.

You need a way to protect the trasmission between each server. In that way, you might need IPSEC to increase your transmission security.

user -> websocket -> HAPROXY -> server <- IPSEC -> server <- HAPROXY <- websocket <- user

Challenge 3: Need reduce the huge amount request cause server heavy loading.

Although we using the HAPROXY to do load-balancing of websocket request, but if too much user it will have huge amount of database access connection and access request to server.

It have following problems(challenges):

  • Start connection take too long and expensive
  • Client data footprint become large for mobile user
  • Reconnection storm are resource intensive

How could we try to optimize it? Slace have their own edge-cache system which call “Flannel” (Application-level edge cache)

(without Flannel)

(with Flannel)

Flannel” provide following features:

  • Keep persistent connection with server but provide slimmed down version client side data to client.
  • So, client don’t need handle huge connection bootstrape data instead of using few data with flannel.
  • Provide a lazily load from client side
  • Reduce mobile user connection waiting time and data size.

EX: Client side will connect to flannel for “auto-complete” result from “at”(@) someone.

user -> websocket -> HAPROXY -> Flannel -> server <- IPSEC -> server <- Flannel <- HAPROXY <- websocket <- user

Challenge 4: Large pre-load message

When client is login after long-time, it might need take full bootstrape login process. It will also to retrival all up-to-date information to keep your slack channel data.

In that way, user will connect to slack-like.com to retrieval related data. In other way, we also need DNS server to serve huge domain name translation service.

AWS Route 53” is AWS DNS service, it also provide HTPROXY health check to make sure every load balancer works well.

(Offline data sync) Client -> Route 53 --> HAPROXY -> Flannel...

Question: Why use HAPROXY and ELB in the same system?

Mobile user and some use case need slimmed down data transfer will go by HAPROXY with self-defined LB rule to “Flannel”

Other requests (“CloudFront”) will use ELB.

Wrap-up

This article just trying to build a slack-like service from scratch. Then, we trying to improvde our service when we have some new challenge here.

It is a good practice for me to think about whole architecture for realtime messaging service. Hope you enjoy it.

Reference:

[TIL] Learn from Taipei 5G Summit 2017 (一)

Opportunity about new 5G Radio Broadband

簡短心得:

第一天的會議主軸主要是新的頻段可能帶來的商機與應用,當然不脫出三個主要的期待 (更高的頻段,更多的資料預載量,更有彈性的應用)

除了這些之外,不脫出 5G 的重要關鍵因素 (低延遲,高併發與高頻寬)

5G NR(New Radio)

  • Broadband range from 4G(6G HZ) to 5G NR(100 GHZ)
  • Also include mmWave (24 GHZ ~ 100 GHZ)

(refer slide)

mmWave(millimeter-wave)

  • Multi-Giga byte data rate
  • Much more capacity
  • Flexible deployments

(refer to QualComm slide)

Key Factors

  • eMBB (enhanced Mobile Broadband):
    • It means more data bandwidth for mobile system.
  • mMTC (massive Machine Type Communications)
    • For IOT or multiple concurrency IOT related industry. It include massive connectivity for IOT(or other) device.
  • URLLC (Ultra-Reliable and Low Latency Communications)
    • It point to very low latency communication normally use for auto-driving.

三個面向就是 “高頻寬”,”高併發” 跟 “極低延遲” .就是 5G 的三個面向.

[論文解讀][Bloom Filter] 深入討論 Bloom Filter

關於 Bloom Filter

應用場景

處理大型資料的時候,往往需要一個索引可以快速的找到資料.這樣的索引就被成為 filter.

針對要搜尋一個數字的位址或是是否存在,簡單的方式就是每一個都找過一次,這樣下去的時間複雜度就是 \(O(n)\) .

也有一個比較快的方式就是將所有的數字變成一個陣列,然後該數字存在就將其紀錄為 “1” 的 (Mapping Table) 方式,這樣的時間複雜度就會優化為 \(O(1)\) 但是空間複雜度就會變成了 \(n\) .

那麼是否有一個資料結構能夠兼具 \(O(1)\) 的時間複雜度,但是又不需要有 \(n\) 的空間複雜度的 Filter 呢?

Bloom Filter 是一個 Hashing table 提供你快速的搜尋 ( \(O(1)\) ) 的資料結構. Bloom Filter 運作方式如下:

  • 每一個資料會透過 Hash Function 後存在某個 Array 位置裡面
  • 只存放 (1: 存在, 0:不存在) 兩種資料

Bloom Filter 實作的細節:

實作 Bloom Filter 有一些細節需要去取捨:

  • False Positive 的機率 (\(p\))
  • 需要幾個 Filter 來表示全部的資料 (\(n\))
  • 每個 Filter 需要多少資料欄位 (\(m\))
  • 需要幾層的 Hash Function (\(k\))

在預先設定的 \(p\) 機率與給定希望的 filter size \(n\) 下, 我們可以計算出所需要的資料欄位 \(n\) 與 幾層 Hash Function \(k\).

而計算式子如下,其實代碼裡面都有:

\[m = -1 * (n * lnP)/(ln2)^ 2\] \[k = m/n * ln2\]

可以參考這個網頁

Bloom Filter 優點:

  • 快速地查詢時間 O(1)
    • (不要想說,資料庫更快.因為很多資料庫都是透過 Bloom Filter 來做為查詢 (ex: Cassandra Partition Query ))
  • 索引資料結構小 (只有一個 bit 0:1)

Bloom Filter 缺點(限制):

但是這樣快速的資料結構其實有它先天上的限制:

  • 只能回答你絕對不在,但是無法確認該物件一定在.會有誤判 (false positive) 的可能. (但是不會有 False Negtive)
  • 資料節點,只能夠新增進去無法刪除.

補充: 一張很容易了解 (False Positive, False Negtive, True Positive, True Negtive )

Bloom Filter 應用場景 - Cassandra Partition Cache

或許大家會思考,要找一個資料在不在某個地方.不是直接去找資料庫就好了嗎? 其實,在 Cassandra 裡面就有利用到 Bloom Filter 來決定該資料是否存在某個 Partition

透過 Bloom Filter 可以快速知道資料是否放在 Partition Key Cache ,如果不在就要再透過資料直接去 Partition Index 尋找資料.

Ref: Datastax About Read

Bloom Filter 改良版 Counting Filter

Bloom Filter 有著不能刪除新資料的限制,因為你把原先為 1 的改成 0的時候,你不知道原先有多少個節點是對應該該位置.後來在資料欄位上新增了計數器 (counting) 就能夠解決不能刪除節點的問題.

透過新增計數單位可以記錄總共有幾個點是透過 Hashing Function Mapping 到該節點上面.到時候要刪除的時候也就可以確認是否所有對應到該節點的個數都有確切地被刪除.

運作方式為:

  • 新增一個節點進( CBF: Counting Bloom Filter ) 就將該 Hashing 過的位置裡面數值加一
  • 要刪除節點的時候就會把該數值減一.

Counting Bloom Filter 的缺點

為了要解決不能刪除節點而加入的計數欄位,就會變成讓資料量反而又變大了.

程式碼:

多說無益,來看程式碼

Reference