Last Time: * MapReduce - Functional transforms - Expose abstractions -- can ask GFS for position of bits of file - Tasks >> workers for load management - Replicate tail for stragglers - Atomic updates for simple consistency Today: * End of class logistics * Course evals * Consistency models * Strong vs. weak consistency & what it means to programmer (usually not much), but matters when: - Operating system - Implementing synch library - Wait-free data structure Take-home exam is reasonable. Nominal exam time is Monday, 4/21 from 4 PM to 6 PM. (Also guaranteed to have 4/22 1:30-3:30 exam time free.) Exam will be published 10 AM on 4/21 and due at 10 PM on 4/22. Draft policy: If you use a source outside the class material, cite it. If you use a source outside class material that provides the answer, be prepared to explain why it's the right answer. Project reports: by rule, they can't be due later than the 15th, so they're due on the 15th. However, late work will be accepted with a 2% penalty per day up to 4/20. Report outline: * Intro * Serial problem * Concurrency architecture * Performance results (wins are not the "goodness" criterion, "good science" is) * Lessons Learned (what to avoid if project turned into standard project) * Conclusion Length: just explain everything asked for -- needs to be long enough to explain what you did. If you have questions about level of detail, mail bnoble. No "preconceived notions" of length. (deduction: Formatting like research paper would be ftw) Demos: if you have something really cool, feel free to show. Inform of demo by 4/18. --insert half an hour of evals-- On to consistency models! Let's have a look at two examples to implement synchronization constraints in a wait-free way. Happens-before, Snippet 1: P1 P2 data = //something; while(!flag) {} flag = true myData = data; Snippet 2, a simplified version of Dekker's algorithm "at-most once" P1 P2 xflag = true; yFlag = true; if(!yflag) { if(!xflag) { critical(); critical(); } } Notice: these are just garden-variety loads and stores. Remember conservative cache consistency: - missed the recap, hopefully it'll come around again Formal requirements are called "sequential consistency", then relax to get: "processor consistency" and then "weak ordering" Sequential consistency is the strongest model we can imagine using. It has 3 requirements: 0. Processor sees its own memory operations in program order (unless it doesn't matter). 1. Each processor makes its memory ops visible in program order (even if it "doesn't matter"). 2. All processors agree on SOME global interleaved order. Note: real processors have a tendency to not work this way. The conservative cache consistency model we talked about at the beginning of class implements this model: No read allowed to complete before writes are flushed, and no write can complete before shared copies are invalidated. Sequential consistency is a pretty restrictive model Model 2: Processor consistency. Relax condition 2 -- all processors might not agree on any one particular global ordering. In particular, I see my own accesses in order, I expose my writes in order, but reads can happen "early". The at-most-once example (snippet 2) does not work in a processor consistency model because it's possible for both processors to enter the critical section (the reads happen early). Model 3: Weak consistency. * Still see own accesses in order. * all "normal" loads/stores can be issued to other processors in any order. * read-modify-write instructions are guaranteed to read before they write. * There are also some additional special instructions. Weak consistency breaks snippet 1 as well because the write to data can be seen after flag is set to true and P2 reads data. Let's talk about how to enforce consistency on the Itanium, which has weak consistency (never mind it also being VLIW). * Can decorate memory accesses with modifiers: - acquire: must be made visible before any subsequent accesses (in program order) - release: all prior accesses must be made visible before this one +1 instruction: mf: memory fence (no reordering of memory stuff before/after) So, to make snippet 1 work on itanium: P1 P2 data = foo; acquire while(!flag) {} release flag = true; myData = data; To fix snippet 2, we need a fence before the if blocks, which returns to sequential consistency.