Codegenerierung für Datenverarbeitung
Contents
Code generation is a key technique for efficient program execution and data processing. This lecture will cover the following topics from a practical perspective with accompanying hands-on exercises:
- Execution models of programs (interpretation, bytecode, machine code generation, etc.)
- Program representations (source code, intermediate representations (IRs), different forms of bytecode)
- Classical techniques of code generation
- SSA and optimization techniques, exemplary described on LLVM-IR
- Machine code generation: instruction selection and register allocation
- Execution of programs in virtual machines (e.g., WebAssembly, BPF, JavaScript)
- Sandboxing and optimizations for JIT compilation
- Execution of database queries (e.g., SQL, data frame API)
- Execution models and code representations
- Execution of machine code/binary translation (e.g., RISC-V)
- Specifics when translating machine code
Organization
- Lecture with integrated exercises: Thu 10–14 (c.t., with break) in 02.11.018
- Exercises will include hands-on programming tasks
- (Basic) C++ knowledge is required
- Language: English
- Module: CIT3230001, 6 ECTS, Bachelor/Master elective
- Area "Databases and Information Systems" for Informatics, Wirtschaftsinformatik/Information Systems, Informatics: Games Engineering, Biomedical Engineering
- B1.2 "Advanced Topics in Data Engineering" for Data Engineering and Analystics
- Written exam (90 minutes), might change to oral on low registration count. Exam from winter 2022/23: [exam2223.pdf].
- Zulip stream for this lecture; private contact via e-mail
Prerequisites
The course is aimed at bachelor/master students who have taken the following (or similar) courses:
- IN0004 Introduction to Computer Architecture
- IN0008 Fundamentals of Databases
Material
Material and exercises will be regularly provided throughout the semester.
Script (updated regularly with new lecture content, includes content of slides and additional information and comments): [book.pdf]
Note: The schedule below is preliminary and is likely to change during the semester.
Date | Lecture Topic | Homework |
---|---|---|
17.10. | Overview, Motivation, Interpretation Techniques
[lec01.pdf]
Exercise: (no exercise session) | [hw1.txt] [hw1-sol.c] |
24.10. | Compiler Front-end
[lec02.pdf]
Exercise: discussion of homework | [hw2.txt] |
31.10. | IR Concepts, Control Flow Graph, SSA Construction
[lec03.pdf] [prs03.pdf]
Exercise: IR design [ex03.txt] | |
07.11. | LLVM-IR and IR Design Considerations
Exercise: getting started with LLVM | |
14.11. | Analyses and Transformations
Exercise: writing an LLVM-IR pass | |
21.11. | Vectorization
Exercise: auto-vectorization | |
28.11. | Instruction Selection
Exercise: TBD | |
05.12. | (no lecture/exercise, Dies Academicus) | |
12.12. | Register Allocation
Exercise: SSA destruction | |
19.12. | Object Files, Linker, Loader
Exercise: ELF file construction | |
09.01. | Unwinding and Debuginfo
Exercise: unwind info generation | |
16.01. | JIT-compilation and Sandboxing
Exercise: WebAssembly hands-on | |
23.01. | Query Compilation
Exercise: SQLite hands-on | |
30.01. | Binary Translation
Exercise: QEMU hands-on | |
06.02. | TBD (possible guest lecture)
Exercise: TBD/wrap up |