Blog E

May 14th, 2015

[MOOCS][Golang]MIT6_824 Distributed Systems Week1

##前言:

主要是Golang.Tw上面有人在問，加上這門課程的作業本身是使用Golang來開發．就還蠻有興趣玩玩看．

課程鏈結在這裡．6.824: Distributed Systems

MIT 6.824 分散式系統系列文章

6.824: Distributed Systems

##第一週課程:

###Paper1: 鼎鼎大名的Google Mapreduce 論文: 課程內容主要是要讀鼎鼎大名的mapreduce系統論文．

###Lab1: 寫一個簡單的Map跟Reduce 然後來寫一個Lab1 Word Count的小程式．裡面主要要完成兩個functions：

Map: 把文章裡面的字拆出來後，放入一個List． List 裡面的內容是文字與出現的次數．
Reduce: 就每個檔案裡面的所以出現單字找出來，並且回傳它的加總．

課程算很簡單，主要是要對於mapreduce有基本的了解．我一開始不是很了解，比對了參考文章內的內容跟論文演算法就看懂了．其實還挺有趣的．寫完裡面還有go test 可以去驗證你的想法有沒有問題．

###簡單介紹RPC

第一週後半段其實還有一部分是RPC的部分，還好最近有點研究．所以也很快地讀完筆記．

主要講解內容如下:

什麼是RPC Server．
RPC 架構下會有什麼問題？ (斷線，命令未收到，命令未完成)
如何解決:
- “最少一次”:
  - 優點：可以確認server有執行到．
  - 使用範圍:
    - 可以用: 重複性質的讀取
    - 不能使用: 存款
- “最多一次” (Go RPC就是這種)
  - 會把重複的需求挑出來，直接回答上一次的結果而不是再跑一次．
  - 怎麼分辨是不是重複：透過unique ID．
  - Lab2 要玩這部分．
- “剛好一次”:
  - 也就是最多一次的系統加上unbounded retries跟容錯設計．Lab3要玩這個．
Go thread:
- 何時使用Go channel 何時使用 shared memory + lock:
  - Channels: 確定要兩個thread互相溝通，一個等待另一個回傳．
  - Shared memory + locks: 需要共用資料，但是不需要等待另一個結果．

##參考文章:

關於MapReduce的Invert Index

May 12th, 2015

[iOS/Golang]Server-Side In-App-Purchase Verification in Apple Store

Steps

Connect to Apple iTune

// iOS Http code
NSMutableDictionary *parameters = [NSMutableDictionary dictionary];
[parameters addEntriesFromDictionary:[credentials dictionary]];

// receipt is an object of my own making, but base64String just returns an
// NSString representation of the receipt data.
parameters[PURCHASE_RECEIPT] = [receipt base64String];

NSURLRequest *request =
    [[AFHTTPRequestSerializer serializer]
        requestWithMethod:@"POST"
                URLString:urlString
                parameters:parameters];

AFHTTPRequestOperation *operation =
    [[AFHTTPRequestOperation alloc]
        initWithRequest:request];
operation.responseSerializer = [AFJSONResponseSerializer serializer];

<snip>

[operation start];

// Json data
{"status":0,
    "environment":"Sandbox",
    "receipt":
    {"receipt_type":"ProductionSandbox",
        "adam_id":0,
        "bundle_id":"<snip>",
        "application_version":"1.0",
        "download_id":0,
        "request_date":"2013-11-12 01:43:06 Etc\/GMT",
        "request_date_ms":"1384220586352",
        "request_date_pst":"2013-11-11 17:43:06 America\/Los_Angeles",
        "in_app":[
                  {"quantity":"1",
                      "product_id":"<snip>",
                      "transaction_id":"1000000092978110",
                      "original_transaction_id":"1000000092978110",
                      "purchase_date":"2013-11-12 01:36:49 Etc\/GMT",
                      "purchase_date_ms":"1384220209000",
                      "purchase_date_pst":"2013-11-11 17:36:49 America\/Los_Angeles",
                      "original_purchase_date":"2013-11-12 01:36:49 Etc\/GMT",
                      "original_purchase_date_ms":"1384220209000",
                      "original_purchase_date_pst":"2013-11-11 17:36:49 America\/Los_Angeles",
                      "is_trial_period":"false"}
                  ]
    }
}

Reference

May 11th, 2015

[Golang] 利用build tags達到不同的build config

##前言

為了要寫一個Web Service但是有可能會發生以下的需求:

不同Server，有不同的設定參數
不同的OS有不同的command set
想要快速的開啟或是關閉debug log

在有以上的需求的時候，一開始都是使用OS偵測或是configuration file來區隔．但是到後其實是希望能透過不同的build config能夠產生不同的binary．

決定研究了一下: go build 有以下兩種方式可以達到部分的效果．

##Go build -ldflags

這可以在go build的時候，先設定一些變數的名稱．通常我自己比較習慣透過OS環境變數來設定，然後程式裡面再去讀取．

在你的主程式裡面，可以先定義一個變數flagString:

package main

import (
	"fmt"
)

var flagString string

func main() {
	fmt.Println("This build with ldflag:", flagString)
}

透過外在go build來設定這個變數 go build -ldflags '-X main.flagString "test"' 這樣你的結果會是

>> This build with ldflag: test

這個方式可以直接設定參數，讓initialize value透過外部設定來跑．

##Go build -tags

透過go build -tags 可以達到加入不同的檔案在compiling time．由於這樣，你能夠放在這個檔案裡面的東西就有很多．可以是:

不同系列的define (達到ifdef的效果)
不同的function implement (達到同一個function 在不同設定下有不同實現)

以下增加一個簡單的範例，來達到不同的build config可以載入不同的define value．

file: debug_config.go

//+build debug

package main

var TestString string = "test debug"
var TestString2 string = " and it will run every module with debug tag."

func GetConfigString() string {
	return "it is debug....."
}

請注意: //+build debug 前後需要一個空行（除非你在第一行)

另外，我們也有一般的設定檔 release_config.go

//+build !debug

package main

var TestString string = "test release"
var TestString2 string = " and it will run every module as no debug tag."

func GetConfigString() string {
	return "it is release....."
}

最後在主要的main裡面，可以直接去參考這兩個define value file: main.go

package main

import (
	"fmt"
)

func main() {
	fmt.Println("This build is ", TestString, TestString2, GetConfigString())
}

在這裡如果你是跑 go build -tags debug 那麼執行結果會是:

>> This build is  test debug  and it will run every module with debug tag. it is debug.....

如果跑的是 go build會預設去讀取!debug，那麼結果會是:

>> This build is  test release  and it will run every module as no debug tag. it is release.....

可以看到他不只可以加入define參數，也可以把不同function 有著不同的implement.

##使用時機討論與心得

使用上差異:

ldflags 可以加入一些參數，就跟gcc的ldflags 一樣-
tags 很像是 gcc -D 不過由於在檔案裡面要定義 //+build XXXX，感覺有點繁瑣．不過由於可以以檔案來區隔，你可以加入多個定義值跟function

ldflags 使用時機:

個人認為可能可以拿來改變初始值得設定，或是去改變一些程式內的設定． **比如說: **

buffer value: 透過build 來改變buffer size，來做不同的測試與應用．
log flag: 決定要不要印log

tags 使用時機:

tags的使用時機相當的多，列出幾個我看到的:

debug log/encrypt : 定義debug func 然後再release/debug有不同的implement (印或是不印log)，也可以在某些狀況下開啟或是關閉encrypt來測試．
跨平台部分: 不同OS平台需要不同的設定與implement

其他更多的部分，可以看到在逐漸加上去．

##參考鏈結

May 6th, 2015

[Golang]學習disque(一)

(pic: source from here)

以上的圖是利用Redis 達到 Message Queue 的方式，也是Disque要達到的事情．

##前言

這一兩個月，比較常聽到大家討論的就是Disque的使用方式與疑問．本來我對於Message Queue的系統(尤其是backend那一塊)比較不熟．於是還是花了一點時間把Disque裝起來，並且把sample code跑了一下．希望對於基本架構有一些了解．

##關於Disque

Disque 是 Redis原作者Salvatore Sanfilippo根據大家在Redis上面針對Message Queue處理的部份來加強，並且下去拿Redis的source code加以修改，改造出這套專門處理Message Queue的系統．

主要特色如下: (參考這裡)

消息發送可以選擇至少一次或者最多一次。
消息需要消費者確認。
如果沒有確認，會一直重發，直至到期。確認信息會廣播給擁有消息副本的所有結點，然後消息會被垃圾收集或者刪除。
隊列是持久的。
Disque默認只運行在內存裡，持久性是通過同步備份實現的。
隊列為了保證最大吞吐量，不是全局一致的，但會儘力提供排序。
在壓力大的時候，消息不會丟棄，但會拒絶新的消息。
消費者和生產者可以通過命令查看隊列中的消息。
隊列盡力提供FIFO。
一組master作為中介，客戶端可以與任一結點通信。
中介有命名的隊列，無需消費者和生產者干預。
消息發送是事務性的，保證集群中會有所需數量的副本。
消息接收不是事務性的。
消費者默認是接收時是阻塞的，但也可以選擇查看新消息。
生產者在隊列滿時發新消息可以得到錯誤信息，也可以讓集群非同步地複製消息。
支持延遲作業，粒度是秒，最久可以長達數年。但需要消耗內存。
消費者和生產者可以連接不同的結點。

###優缺點與比較:

優點其實蠻容易被瞭解的:

容易安裝使用，而且小．本身就有資料庫(類似 redis)
當不需要太複雜的傳輸格式與介面的時候，disque效能應該不差 (base on Redis v.s. RabbitMQ)

幾個缺點，可能需要注意:

disque 目前還是alpha
disque 目前只有單線程

關於與Kafka的比較，Salvatore Sanfilippo在他的推特有以下有趣的回應:

    Salvatore Sanfilippo: Disque VS Kafka is the new Redis VS PosgreSQL which was the new Apple VS Orange :-)

##安裝與使用

###安裝與使用Disque

git clone https://github.com/antirez/disque
make
make test (make sure port and compiling result success.)
cd src
run disque-server

###簡單的操作

當你跑disque之後，就會自動連接local disque server．

    //Add a job by producer
    ADDJOB job1 "This is sample job1" 0   #create job1 with comment
    ->DI3b2954204f8f86168198266221515fb011a1eea005a0SQ #server response task id.
    
    //Get a job by consumer
    GETJOB from job1
    ->1) 1) "job1"
    ->2) "DI3b2954204f8f86168198266221515fb011a1eea005a0SQ"
    ->3) "This is sample job1"

透過Golang 來測試job-queue

主要是使用go-disque的source code 來當作job-queue 的機制．

以下有三段程式碼，分別是兩個worker跟一個發送者，發送者(disque-enque)會發送兩個訊息排程給disque，當worker上線或是開始抓取訊息後，就會執行該訊息定義的事項．

主要程式碼並沒有做太大修改：

Worker(Consumer 消費者): Downloader : 會下載指定網頁的資料

    package main
    
    import (
    	"github.com/EverythingMe/go-disque/tasque"
    
    	"crypto/md5"
    	"fmt"
    	"io"
    	"net/http"
    	"os"
    	// "time"
    )
    
    // Step 1: Define a handler that has a unique name
    var Downloader = tasque.FuncHandler(func(t *tasque.Task) error {
    
    	u := t.Params["url"].(string)
    	res, err := http.Get(u)
    	if err != nil {
    		return err
    	}
    	defer res.Body.Close()
    
    	fp, err := os.Create(fmt.Sprintf("/tmp/%x", md5.Sum([]byte(u))))
    	if err != nil {
    		return err
    	}
    	defer fp.Close()
    
    	if _, err := io.Copy(fp, res.Body); err != nil {
    		return err
    	}
    	fmt.Printf("Downloaded %s successfully\n", u)
    
    	return nil
    
    }, "download")
    
    // Step 2: Registering the handler and starting a Worker
    
    func main() {
    
    	// Worker with 10 concurrent goroutines. In real life scenarios set this to much higher values...
    	worker := tasque.NewWorker(10, "127.0.0.1:7711")
    
    	// register our downloader
    	worker.Handle(Downloader)
    
    	// Run the worker
    	worker.Run()
    
    }

Worker(Consumer 消費者): foo : 顯示指定字串

    package main
    
    import (
    	"fmt"
    	"github.com/EverythingMe/go-disque/tasque"
    )
    
    // Step 1: Define a handler that has a unique name
    var fooWorker = tasque.FuncHandler(func(t *tasque.Task) error {
    	u := t.Params["text"].(string)
    	fmt.Printf("foo worker runs, param is %s\n", u)
    	return nil
    
    }, "foo")
    
    // Step 2: Registering the handler and starting a Worker
    
    func main() {
    
    	// Worker with 10 concurrent goroutines. In real life scenarios set this to much higher values...
    	worker := tasque.NewWorker(10, "127.0.0.1:7711")
    
    	// register our downloader
    	worker.Handle(fooWorker)
    
    	// Run the worker
    	worker.Run()
    
    }

最後就是把工作派發出去的Enque (角色上被稱為是製造者 Producer)

package main

import (
	"fmt"
	"github.com/EverythingMe/go-disque/tasque"
	"time"
)

func main() {

	client := tasque.NewClient(5*time.Second, "127.0.0.1:7711")
	task := tasque.NewTask("download").Set("url", "http://google.com")
	err := client.Do(task)
	if err != nil {
		panic(err)
	}
	fmt.Println("First work queue run..")

	client = tasque.NewClient(5*time.Second, "127.0.0.1:7711")
	task = tasque.NewTask("foo").Set("text", "I am the kind of world.")
	err = client.Do(task)
	if err != nil {
		panic(err)
	}
	fmt.Println("Second work queue run..")
}

##心得

Disque善用了Redis的特性，並且幫大家把一些基本功能都勾勒出來．簡單講就是看到太多人把Messagq Queue弄在上面，原作者才會這樣改．

不過由於我本身用到的地方比較少，我相信有機會要使用的時後應該會更容易上手才對．

##相關文章

Golang client for Disque, the Persistent Distributed Job Priority Queue
- 另外一個Golang上面的disque client.
Adventures with Disque
- 關於disque 的評論也把幾個disque的client拿出來比較了一下．
Disque: Disque 使用教程
- 把安裝與使用教學翻譯成中文.. 相當有用
Geek News: Disque：Redis作者新开发的消息队列
- 中文新聞介紹disque
Redis作者谈Redis应用场景
Redis能干啥？细看11种Web应用场景
Hacker News: disque
[李会军•宁静致远 Redis作为消息队列与RabbitMQ的性能对比](http://devres.zoomquiet.io/data/20110714104018/index.html)
書籍: 使用Redis实现任务队列

April 23rd, 2015

[Heroku][Golang] 一些關於筆記關於 Heroku 與 Golang

給自己做個筆記….

關於 Heroku

以下指令都需要安裝Heroku Toolbelt後登入Heroku. heroku login

查詢Server 目前的log: heroku logs -t 其他更多指令在這裡
Heroku App/Worker crash如何重啟: heroku restart web.1 更多指令
如何在Heroku上面快速的rollback: 到control panel 下 [activity] 選擇你要的change旁邊有[roll back to here].
Performance Status: 可以透過add-on Librato來監測你目前在Heroku的webapp.

關於Golang

Golang 的scope是依照package，所以同目錄下面的檔案，只要package一樣都會被歸在一起．
init() 跑的順序是依照 alphabet axxx.go bxxx.go …… zxxx.go
關於go import 更多使用要看這（好像是FAQ)
- import _ “xxx”: 代表的是使用 Init 但是不把整個package 套進來(不會有compiler error if no use)
- import . “fmt”: 打所有function 不用加上namespace
- import m “fmt”: fmt.Println 變成 m.Println

參考資料

April 21st, 2015

[Ubuntu][Golang]關於Ubuntu Server連線能力設定

前言:

架設Golang Server選擇可以有兩個方向，大部分的人可能選擇自己架設主機來跑Golang Server如果要能夠達到10k甚至是100k連線數的話．這邊把一些經驗記錄下來．

架設環境

這邊的伺服器主要是架設在以下的設備與軟體:

Ubuntu 14.04 X64
MySQL + Apache + Ejabberd

了解主機原先設定的上限與調整

先就不討論硬體架構的連線能力狀況下，還是有一些軟體設定需要調整．

MySQL 連線上限調整

首先會遇到的問題就會是MySQL query 卡住的問題，每一次的Query Request 都會卡在MySQL那一端．主要是以下的設定需要調高．max_connections原先設定只有100．設定方式如下:(細節參考這裡)

    mysql> set GLOBAL max_connections=65536;

可能還是會遇到有一些問題，建議把另外一個max_connect_errors一併也調高

    mysql> set GLOBAL max_connect_errors=65536;

改完MySQL 接下來可能出現的會是 Ubuntu 設定

改完了MySQL的設定之後，整個設定就可以變得相當的順暢．但是偶爾會出現一些問題，就得接下去追查原因．

問題：一次能進入的TCP connection 太少

首先可能去做調整就是去/etc/sysctl.conf來增加TCP連線數，以下是參考設定．

    net.ipv4.tcp_syncookies = 1
    net.ipv4.tcp_tw_reuse = 1
    net.ipv4.tcp_tw_recycle = 1
    net.ipv4.tcp_fin_timeout = 30
    net.ipv4.tcp_keepalive_time = 1200
    net.ipv4.ip_local_port_range = 1024 65000
    net.ipv4.tcp_max_syn_backlog = 8192

問題: “too many open files”

解決了大量連線數無法接收的問題，卻發生了"too many open files"的error．

根據這個Stackoverlow，由於每一個TCP的Connection在Linux系統本身都被當作是一個file來存取，所以面對到大量connection的時候，可能遇到的問題就是依照前面的方式把TCP connection 打開到65000之後，但是由於每一個TCP connection 會對應到一個檔案的開啟，接下來就是檔案開啟的上限被打爆．於是就會出現….

    Too many open files

這個問題可能是關於到Ubuntu原本的設定．

    ulimit -n //console mode 可以顯示目前可以同時開啟起檔案的最大量． 預設: 1000

修改方式主要是打開 /etc/security/limits.conf

    *         hard    nofile      500000
    *         soft    nofile      500000
    root      hard    nofile      500000
    root      soft    nofile      500000

這樣應該可以初步的把server request產生到同時間至少有3000以上．