The text in square brackets refers to chapters and sections of the required textbook Attiya, H., Welch, J.: Distributed Computing Fundamentals, Simulations, and Advanced Topics (Second Edition). John Wiley and Sons, Inc.(2004) (not the first edition)
Message-passing systems and basic algorithms. [Chs 1 and 2] A formal definition of a message-passing model. Synchronous and asynchronous variants. Time and message complexities. An asynchronous broadcast algorithm down a rooted tree. An asynchronous algorithm for constructing a spanning tree given a root. An "eventually holding invariant" technique for proving correctness of asynchronous algorithms.
Leader election on rings. [Ch 3] The leader election problem. The notions of uniform and anonymous algorithms. Impossibility of solving leader election by a deterministic anonymous algorithm on a ring due to inability to "break symmetry". An O(n log n) message complexity uniform non-anonymous asynchronous algorithm that uses a distributed analog of the divide and conquer technique. A lower bound of Omega(n log n) on message complexity of leader election on asynchronous uniform rings. A technique for proving lower bounds by "reapplying schedules". The lower bound requires that message deliveries may be delayed. A gap between synchronous and asynchronous models. A synchronous uniform algorithm with O(n) message complexity for a more general model of unsynchronized starts.
Causality and time. [Ch 6] Logical clock, or Lamport clock, as a way to totally order events in a distributed system and offer the illusion of global time available to processors. The happens-before relation that captures the possibility of one event causally affecting other event, and two events not affecting each other (concurrent events). An implementation of happens-before relation using Vector clock of n coordinates for n processors and a proof of its correctness. A lower bound of n on the number of coordinates of any vector implementation of happens-before relation. The notion of consistent cut that "freezes" processors when "every effect has a cause". The consistent cut algorithm for finding a consistent cut majorized by a cut; processors must maintain arrays of vector clocks. The distributed snapshot algorithm for finding a consistent cut using a "lightweight message flood" initiated by any processor at any time; messages must be delivered in the order they were sent, for any pair of processors. A model of a distributed system with hardware clocks without drift but with unknown offsets, and the clock synchronization problem. An algorithm that achieves u(1-1/n) synchronization where u is the uncertainty in message processing time. An impossibility result of achieving synchronization better than u(1-1/n).
Mutual exclusion in shared memory. [Ch 4] A formal definition of shared memory model and mutual exclusion problem. An algorithm ensuring no deadlock that uses a single one bit test&set register. An algorithm providing a stronger fairness property i.e., FIFO entry order (and hence no lockout) that uses a single read-modify-write register with 2log n bits. A lower bound of n on the number of memory states required to ensure the strong fairness property of k-bounded waiting (implied by FIFO). Mutual exclusion can be done from "nothing" i.e., from read-write registers. The Bakery algorithm that provides no lockout, but uses unbounded registers. A two-processor algorithm that uses only three bits and the idea of exchanging "priority" between processors, and its generalization to n processors using a tournament tree.
Fault-tolerant consensus. [Ch 5] Definition of the consensus problem with termination, agreement and validity properties. Crash failures of processors as a way to model benign faults in real distributed systems, and Byzantine failures as a way to model malicious faults. Synchronous message-passing algorithms: one that tolerates at most f crash failures and uses f+1 rounds, and one that tolerates at most f Byzantine failures and uses 2(f+1) rounds but restricts f<n/4. The algorithms use a "rotating coordinator" technique to ensure that there is at least one round "coordinated" by a non-faulty processor. Byzantine failures make consensus difficult. Impossibility of achieving consensus in synchronous message passing model when f>=n/3 and there can be f Byzantine failures. Asynchrony that makes consensus quite difficult, too. Impossibility of achieving consensus by a wait-free algorithm in a shared-memory system with single-writer multi-reader registers. The proof shows that no processor can decide in a certain admissible execution because we can always maintain "ambiguity" (or bivalence).
Advanced topics. Fault-tolerant scheduling for computing on the Internet: overview of paper [1].
| [1] | Malewicz, G., Rosenberg, A., Yurkewych, M.: On Scheduling Complex Dags for Internet-Based Computing. 19th IEEE International Parallel & Distributed Processing Symposium (IPDPS'05) (2005) 66 |