The Murgia Hack System

FOSDEM 2018: Everything is a device!

My presentation about MH at FOSDEM2018 is online.

Overview of a MH System.

In a system running MH, each process has its own local bus. The kernel mantains a global list of devices.
If it has enough permissions, a process can add a device to its own local bus and use it.

There are three broad kinds of devices:

  • User devices. These are devices created by processes, and it's the only inter-process communication mechanism present in the system.
  • Kernel Devices. These are special devices created by the kernel, used to provide kernel services to a process – timers, memory allocation, etc.
  • Hardware Devices. This is a mapping from real hardware to I/O devices. A special hardware device is the platform device, which gives access to the whole machine to a process.

The Process-Device interface

A process and a device communicate using three mechanisms:

  • IO ports. Similar to Intel I/O space, a process can write, or read, an inline value to a specified port of the device. The meaning assigned to these action is device specific.
  • DMA. A device can read or write directly to a process space. An I/O MMU mechanism is present so a device can only access memory explicitely exported by the process.
  • IRQs. A device can send interrupts to a process.

The MH Process/Device Interface

MH Process/Device Interface

Virtual Memory Mappings States in MH

I like to go back at the drawing board. Gives you opportunity to think, to look at what you have, and to reconsider what you need.

The MH's memory map system (pmap) move to machine independent part – similar to the change that did this for the CPUs and IPIs management subsystem) – offered such opportunity for this key component.

Compared to other memory subsystems I have seen in the past, I always thought of MH's virtual memory to be extremely simple. So, before blindly porting a code that shows signs of ageing and where different strata starts emerging, I decided to look at the states of a virtual mapping. Afterall, I expected it to be simple.

The first layer was easy to draw:

Top-level view

I.e., a user PTE is either unmapped, or mapped to a hardware I/O page, or mapped to a memory page. The dot points to the default state, that is at start we expect all virtual memory to be unmapped.

Of course, it's not that simple. Mapped pages can be wired by the kernel, i.e. made unable to be unmapped by userspace programs. The act of wiring a page is still initiated by the user, by exporting a memory to a hwdev. Wiring is needed because exporting a page to a hwdev allows doing DMA on that memory region.

An updated graph will look like this:

Refinement 1

Which is – still – very simple. The square represents what's left of the node called Mapped in the earlier figure.

But this is ignoring something really important: copy-on-write. MH supports forking of processes. As essentially any other OS who supports paging, it uses copy-on-write as an optimization for duplicating address spaces: instead of being copied, the memory is shared but set read only.

When a process tries to write to a shared page, it will generate a page fault, which will be forwarded as an interrupt to the process. It is userspace responsibility to create a new page, copying its content, and substitute the read-only shared mapping with a writable copy.

This then means that this state, RO Shared, can only be changed by userspace by unmapping it. There is an exception of course, which is due to an optimization: when all processes but one have all created their own copies, the page is effectively not shared anymore, and can be re-made private.

An important note to add at this point is that copy-on-write is the only case in MH where memory is shared between processes.

Updating the graph yields:

Refinement 2

This starts to look interesting, and messy enough to look real. It is not complete, yet.

As said earlier, the read/write information is lost during the process of copy-on-write, so in the current implementation the page returns into being private, but read-only, no matter the original state. This is not a problem, though, as long as the userspace process can handle unexpected RO page faults by fixing them back to writable. The MRG system library supports this.

Adding details about R/W state brings to a more complete graph, with clear signs of state explosion:

Refinement 3

Is this complete? Of course not. We're missing information about the executable state of a page. A page can be executable or not, and as long as it is private and not wired its state can be changed.

I won't display this, though, as it is essentially an orthogonal state with regards to the lifetime of a page.

History and motivation

As opposed to other kernels and microkernels – probably –, MH is based on a completely random ideology, picked arbitrarily, in a Cambridge pub, after evidently too many beers.

Unimpressed by the lack of shape in modern software, some day in 2014 I thought that it would be really cool to build a system made of tangible abstractions. A system described in terms of objects that can be very easily understood would be – I decided – very pleasant to play with, and to use as a base for complex systems!

A system built with a single tangible abstraction – I continued – would be even more pleasant and simple!

Abstractions and inspirations

The search for that abstraction wasn't easy. Ruled out exokernels and L4 pretty quickly, I decided to have a look at the classics.

Mach is beautiful, if you don't look at the code. I had my share of fun hacking it and it is definitely made of abstraction that are easy to tinker with, and that have proven themselves definitely capable of building complex systems since 1985. But its abstractions are not clearly linkable to well defined existing objects: every introduction to Mach needs to explain what is a port, and port sets, and port rights, and memory objects.

The beauty of Mach, and what I wanted to take from it, is that it defines its basic abstractions, and a set of principles, and maps every possible activity of a computer into these abstractions. Mach calls you into experimenting with it. That's what I wanted to have.

Another system I wanted to steal from is the UNIX operating system. "Everything is a file", despite being a lie since at least the addition of networking, is an incredibly powerful principle.

The early UNIXes loosely presented to the user a model of what a machine was at the time: a single cpu, interrupts (signals), a disk (filesystem), and a terminal.

World views

The world of a userspace program in Mach is made of ports and memory objects, in UNIX is that of a simplified computer. I liked the latter. A computer is understandable by a programmer.

I decided to move toward a system that presented something familiar, a UNIX process model, in a world where a computer is not made only of internal disks and not many terminals are around. Furthermore, I wanted to achieve an extensibility similar to that of Mach by letting userspace processes export the same abstractions that the kernel uses to export its services. And finally, I wanted a system fun to use and extend.