動機
當成複習
GFS
Describe a sequence of events that would result in a client reading stale data from the Google File System.
就是這張圖
只要一個secondary沒有寫到,primary只會回fail,之後client retry,就有了這個畫面。
VM FT
How does VM FT handle network partitions? That is, is it possible that if the primary and the backup end up in different network partitions that the backup will become a primary too and the system will run with two primaries?
Test-and-Set,有搶到就當leader,沒搶到就自己放棄
raft
Suppose we have the scenario shown in the Raft paper’s Figure 7: a cluster of seven servers, with the log contents shown. The first server crashes (the one at the top of the figure), and cannot be contacted. A leader election ensues. For each of the servers marked (a), (d), and (f), could that server be elected? If yes, which servers would vote for it? If no, what specific Raft mechanism(s) would prevent it from being elected?
上requestVote的code
func (rf *Raft) RequestVote(args *RequestVoteArgs, reply *RequestVoteReply) {
rf.Lock()
lastIdx := rf.getLastLogIdxX()
lastTerm := rf.getLogTermX(lastIdx)
if args.Term < rf.Term {
reply.VoteGranted = false
} else {
if args.Term > rf.Term {
rf.setFollowerX(args.Term, -1)
}
if rf.VoteFor != -1 && rf.VoteFor != args.Id || !(lastTerm < args.AtTerm || lastTerm == args.AtTerm && lastIdx <= args.AtIndex) {
reply.VoteGranted = false
} else {
reply.VoteGranted = true
//DPrintf("(%v) [RequestVote] accept1 args:%v log:%v", rf.Me, args, rf.Log)
rf.setFollowerX(args.Term, args.Id)
rf.resetElectionTimer()
}
}
reply.Term = rf.Term
rf.Unlock()
}
省略,term與投過的case,重點是
!(lastTerm < args.AtTerm || lastTerm == args.AtTerm && lastIdx <= args.AtIndex)
- 看最後一個log的term
- 最後一個log的term一樣就比長度
a會拿到b, e, f的票,所以可以成為leader d任何票都拿的到,所以可以成為leader f最後一個log的term比任何人都小,不可能成為leader
Could a received InstallSnapshot RPC cause the state machine to go backwards in time? That is, could step 8 in Figure 13 cause the state machine to be reset so that it reflects fewer executed operations? If yes, explain how this could happen. If no, explain why it can’t happen.
根據Log Matching,entry的index與term不會變 根據Leader Completeness,被commit的entry之後都會在log中 根據State Machine Safety,被apply之後就不會被apply別的entry
也就是說,commit過的log等於state machine
因為installSnapshot是只trim commit過的log,所以reset state machine沒差
Zookeeper
One use of Zookeeper is as a fault-tolerant lock service (see the section “Simple locks” on page 6). Why isn’t possible for two clients to acquire the same lock? In particular, how does Zookeeper decide if a client has failed and it can give the client’s locks to other clients?
就flock 用EPHEMERAL創個檔案,exist就是有人上鎖。 其他要等的就是上watch,之後就算lock holder出事zookeeper也會自己清這個檔案(鎖),其他人透過watch知道unlock
Chain Replication
Suppose Chain Replication replied to update requests from the head, as soon as the next chain server said it received the forwarded update, instead of responding from the tail. Explain how that could cause Chain Replication to produce results that are not linearizable.
這個就與raft沒有等到過半就commit一樣 如果不是最後一個好了才reply,有可能中間出事之後後面的server就會少這筆write
Distributed Transactions
6.033 Book. Read just these parts of Chapter 9: 9.1.5, 9.1.6, 9.5.2, 9.5.3, 9.6.3. The last two sections (on two-phase locking and distributed two-phase commit) are the most important. The Question: describe a situation where Two-Phase Locking yields higher performance than Simple Locking.
Simple Locking: 先全拿lock,再跑,之後全放 Two-Phase Locking: 一個一個拿,等跑完,再一個一個放
A: read x, write y B: read y
simple會變成B要等A做完才能跑(或是A等B) 2PL可以是先read x(A), read y(B), write y(A)
Spanner
Spanner Suppose a Spanner server’s TT.now() returns correct information, but the uncertainty is large. For example, suppose the absolute time is 10:15:30, and TT.now() returns the interval [10:15:20,10:15:40]. That interval is correct in that it contains the absolute time, but the error bound is 10 seconds. See Section 3 for an explanation TT.now(). What bad effect will a large error bound have on Spanner’s operation? Give a specific example.
r/w commit要等到他的timestamp結束,所以誤差越大r/w commit要等越久
Frangipani
Suppose a server modifies an i-node, appends the modification to its log, then another server modifies the same i-node, and then the first server crashes. The recovery system will see the i-node modification in the crashed server’s log, but should not apply that log entry to the i-node, because that would un-do the second server’s change. How does Frangipani avoid or cope with this situation?
有人需要某個檔案是被已經crash的server改過的會怎樣?
Frangipani的client在改file之前會把op送到Petal的log,所有crash有兩個case
- 有log
- 沒log
有log就讓要求lock的server去replay 沒有log就直接上
replay就帶來現在問的問題,如果多台(其中有crash的)改到同一個file怎麼辦? 怎麼處理衝突?
檔案(inode)有版本號,所有op都有版本號+1,這樣只要op的版本號小於inode的,就不用管
COPS
The last sentence in Section 4.3 says a client clears its context after a put, replacing the context with just the put. The text observes “This put depends on all previous key-version pairs and thus is nearer than them.” Why does clearing the context and replacing it with just the put make sense? You might think that the client’s subsequent puts would need to carry along the dependency information about previous gets. What entity ultimately uses the context information, and why does it not need the information about gets before the last put?
回想raft的snahpshot的問題,為什麼snapshot不會讓state machine後退? 因為snapshot的都是commited的log
在這裡log就是context,也就是執行過的指令(所以有遞移性)。 一旦執行完成就是commited,所以可以刪掉(因為狀態被改變了,前面的也沒用了)
那為什麼讀的指令要保留? 讓後面的讀可以知道前面的讀已經發生,確保順序 (可以配合SUNDR的question一起看)
FaRM
Suppose there are two FaRM transactions that both increment the same object. They start at the same time and see the same initial value for the object. One transaction completely finishes committing (see Section 4 and Figure 4). Then the second transaction starts to commit. There are no failures. What is the evidence that FaRM will use to realize that it must abort the second transaction? At what point in the Section 4 / Figure 4 protocol will FaRM realize that it must abort?
FaRM是樂觀鎖,所以它會讓其他人做事,出事(FaRM透過檢查確認出事)再retry 悲觀鎖是先讓對方等
FaRM前面有兩個檢查階段
- LOCK
- VALIDATE
都會檢查lock與資料的版本號 LOCK可以當成是檢查寫過的 VALIDATE是有讀到的
這題是在同一個變數做++,所以在LOCK會被abort
Memcache at Facebook
Section 3.3 implies that a client that writes data does not delete the corresponding key from the Gutter servers, even though the client does try to delete the key from the ordinary Memcached servers (Figure 1). Explain why it would be a bad idea for writing clients to delete keys from Gutter servers.
gutter是備援選手,在主server出事時接手 同時cache的目的是不要讓大量traffic打爆backend
所以回答問題,為什麼不用invalidate gutter? gutter的key的過期時間很短,取代invalidate,雖然說會有一點stale data,但這樣可以確保backend不會被打爆
SUNDR
In the simple straw-man, both fetch and modify operations are placed in the log and signed. Suppose an alternate design that only signs and logs modify operations. Does this allow a malicious server to break fetch-modify consistency or fork consistency? Why or why not?
可以回去想為什麼raft的read要過raft的共識 如果沒有過共識,read可能拿到缺了某些write的state
Spark
What applications can Spark support well that MapReduce/Hadoop cannot support?
密集運算或是需要反覆r/w的場合
Bitcoin
Try to buy something with Bitcoin. It may help to cooperate with some 6.824 class-mates, and it may help to start a few days early. If you decide to give up, that’s OK. Briefly describe your experience.
沒$,跳過
Blockstack
Why is it important that Blockstack names be unique, human-readable, and decentralized? Why is providing all three properties hard?
因為name是識別user的方法,也是資安的開始,所以重要
unique: 不能重複 human-readable: 方便使用 decentralized: 解釋權不被壟斷
human-readable + decentralized: 自己的筆記 human-readable + unique: email unique + decentralized: public key
centralized很像root folder,在裡面可以human-readable與unique 一但decentralized就很容易撞名