Engineering Cloud Computing
ENGR-E 516 (Fall 2025)
Announcements
Course Description
This course will teach the fundamental concepts, engineering principles, and practical skills pertaining to the effective use of cloud computing. This course will focus on both cloud applications and the design of cloud platforms. We will cover the relevant concepts from operating systems, computer networks, and distributed systems.
This course should be useful to anyone who wants a deeper understanding of how the cloud works, as well those who want to learn how to easily and effectively use the cloud for running their applications at low cost. We will look at a wide spectrum of cloud-based applications such as a parallel data processing (e.g., MapReduce), data storage and caching (e.g., key-value stores).
We will also look at the challenges involved in the efficient operation of large-scale cloud platforms with hundreds of thousands of servers. The course will cover a wide gamut of data center optimization techniques such as hardware virtualization, distributed resource management, and software-defined datacenters.
This course will expose students to popular cloud platforms such as Amazon EC2, Google Cloud Platform, Microsoft Azure, etc., and introduce students to new developments such as serverless computing, edge-clouds, and sustainable computing.
Prerequisites
The course has no official prerequisites. However, it requires a high comfort-level with systems programming and debugging. The assignments in this course will include nontrivial programming in the language of your choice. A good way to gauge your preparedness is to see how comfortable you are with the first programming assignment: Simple Key-Value Store. Spawn processes and sockets .
References
Text-books
- DS. Distributed Systems: Principles and Paradigms, 3rd Edition (Maarten Van Steen and Andrew Tanenbaum) Online version
Papers
Other references
- CCTP. Cloud Computing Theory and Practice. Dan C. Marinescu. (2nd edition)
- OS3EP : Operating Systems in Three Easy Pieces http://pages.cs.wisc.edu/~remzi/OSTEP/
Schedule, notes, and readings
Please see the readings for each module.
| Lecture | Topic | Slides | Reading |
|---|---|---|---|
| A0 | Course Intro | cloud/0-admin.pdf | Berkeley View |
| A1 | Intro to cloud computing | cloud/1-intro-annot.pdf | |
| A2 | ..continued | (same as above) | |
| A3 | OS: system calls | cloud/2-OS-1.pdf | OS3EP Chapter 4 |
| A4 | OS: concurrency | cloud/2-OS-annot.pdf | OS3EP |
| A5 | Networks | cloud/3-net-2-annot.pdf | |
| A6 | Networks: Socket programming | cloud/3-net-2-annot.pdf | |
| B1 | Client-server modeling | cloud/4-servers-annot-1.pdf | Markov Chains |
| B2 | -More queueing theory- | cloud/4-servers-annot2.pdf | M/M/1 Queues |
| F3 | Parallel scaling | cloud/6-scaling-annot.pdf | Amdahl's Law |
| F4 | Elastic scaling | cloud/6-scaling-annot.pdf | |
| B1 | Map-Reduce | [[cloud/7-MapRed-annot-1.pdf cloud/7-MapRed-annot-2pdf.pdf | 1. MapReduce |
| C1 | Cloud infrastructure | cloud/10-iaas.pdf | |
| C2 | OS Virtualization | cloud/13-osvirt.pdf | |
| C3 | Cloud Storage | cloud/15-storage-annot.pdf | 11 |
| C4 | Functions as a Service | cloud/16-serverless-annot.pdf | 9, 10 |
| Midterm | |||
| D1 | Hardware Virtualization | cloud/11-virt-1.pdf | |
| D2 | CPU Virt | cloud/11-virt-2.pdf | 4. VMWare, 5. KVM |
| D3 | Paravirtualization | cloud/11-virt-3.pdf | 3. Xen |
| D4 | Memory Virtualization | cloud/11-virt-4.pdf | |
| D5 | Live Migration | cloud/11-virt-5.pdf | 6. Xen-migration |
| D6 | Cluster management | cloud/12-clustmgmt-annot.pdf | 7. ESX, 8. Remus |
| E2 | Transient Computing | cloud/transient.pdf | 12. SpotCheck |
| Serverless pt 2 | serverless2 | ||
| Containers vs. VMs | containers-vms | ||
| 28 | Energy and Carbon | carbon | |
| Course Wrapup |
Evaluation Criteria
Cloud computing is a fast evolving field. In the same spirit, the course is going to be fluid in its structure and evaluation, and also depend on student interest and capabilities. This is not a conventional "paint by numbers" course with structured homework etc.
The rough breakdown is as follows, but is subject to change:
| Component | Weight |
|---|---|
| Programming assignments (4) | 40% |
| Homework and Readings | 10% |
| Midterm Exam | 20% |
| Final exam | 20% |
| Lecture notes and class participation | 10% |
All work must be your own. The use of generative AI "tools" such as large language models is strictly prohibited.
Assignments
Students will implement various classic distributed algorithms (such as Map-Reduce, distributed key-value stores) on public clouds, and learn to use various cloud services such as Functions as a Service, various storage services, and how to use cloud VMs to develop and deploy applications.
The design oriented assignments will involve a large degree of programming and debugging. In most cases, the programming assignments are language agnostic (you can pick any reasonable programming language). However, you should be comfortable in systems programming in C for the final assignment.
A key learning objective of this course is to design, architect, and implement a distributed system from scratch, and to design useful test-cases for evaluating the implementation. Therefore, no starter-code or templates will be provided, to give students the maximum flexibility and freedom to explore the unconstrained design space. Points will be awarded for correct and faithful designs, complete implementation, adequate testing, and reports and documentation.
Most programming assignments will take significantly longer than you anticipate. Start early. Please see the assignment descriptions below (from last year), to get a sense of how they will look like. In general, all programming assignments in this course only specify the "end goal", and you must figure out how to get there: what and how to implement, what libraries to use, etc. There will be no starter-code, no templates, no training wheels. You are on your own. There are no group assignments.
Likely assignments and schedule:
| # | Task | Approx Due Date |
|---|---|---|
| 0 | Simple Key-Value Store. Spawn processes and sockets | Lec 5 |
| 1 | Producer/Consumer Queue System | Lec 10 |
| 2 | Deploy Assign 1 on GCP VMs using APIs | Lec 14 |
| 3 | Map-Reduce with functions | Lec 20 |
| 4 | LXC resource controller | End |
Exams
The exams will test how well students have understood various virtualization techniques, cloud performance and cost tradeoffs, and how techniques learnt in class can be applied to emerging cloud offerings and applications.
Participation and Lecture notes
Since a majority of the class instruction is on the whiteboard, each lecture will have two scribes who will prepare notes with the major concepts and questions covered in class.
Late submission policy
Students can avail a total of four late-days and use them as they wish. Beyond that, late submissions will not be accepted.
There is a tight integration of assignments and lectures. Hence, late submissions are discouraged. It is strongly recommended to start early—completing the assignments always takes more time than you think.
Administrative Information
Class Information
| Where | When |
|---|---|
| Global Studies (GA) 1112 | Mondays and Wednesdays 9:35–11:00 |
Office Hours
| Who | Office Location | Office Hours | |
|---|---|---|---|
| Prateek Sharma | prateeks | Luddy 4126 | Wed 4–5 |
| Abdul Rehman | abrehman | TBD | TBD |