Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

Container FS Interfaces

James Bottomley

About Me


Container evangelist

Open Source Advocate

  • Converting Business to Open Source

Kernel Developer

  • SCSI Subsystem Maintainer
  • PA-RISC architecture Maintainer
Linux Containers API
Traditional Virtualization (Hypervisor) is about emulating Hardware

Container Virtualization is about Virtualizing the Operating System itself
Container "guests" don't have a second kernel

Block I/O







Network NS


Mount NS



User NS

Cgroup NS

Making it all work: User Namespaces
User namespace gives enhanced privileges to a user
Like ability to create other namespaces
also does a mapping between interior and exterior ids





shadow-utils provides newuidmap and newgidmap for this
retains the concept of an "owning" uid which is root like if mapped to uid 0
Unmapped uids are inaccessible, even to "root" in the namespace

So s_user_ns is the answer?

Unfortunately: no

To use it, you need a superpblock mount inside the container

Most container roots are bind mounts (no superblock)

struct mount, vfsmount and super_block (oh my!)

struct mount is fs internal, struct vfsmount is kernel visible

A filesystem tree is a set of struct mounts

Every mount namespace has a separate set of struct mounts (copied from the original)

Every struct mount points to a refcounted struct super_block

But the relation is many to one

If I mount something external I get a new super_block

But if I bind mount, I get a pointer to an old super_block

Old solution is bindfs: fuse based shifting bind mount
simply remounts a subtree with a uid/gid shift (requires root to mount)
Proposed solution: shiftfs
A bind mount with a superblock
Picks up s_user_ns when mounted
requires admin to mark which part of the tree is shifting bindable
so that the container can then remount it


unshare, nsenter, arch containers
What happend at LSF/MM

faking a bind mount simply to get a superblock feels "icky"

Can't we make the s_user_ns work?

Old proposal: inode view of user (uid, kuid, iuid)

Prototyped using security xattrs and it can be made to work

But: can't be made to work with s_user_ns

WTF is James thinking now?

In other words, make s_user_ns a property of vfsmount

Allows other vfsmount properties as well

Final alternative: abandon s_user_ns

Add inode mark (xattr) saying use user view or kernel view

Still no agreed solution
But hopefully will be soon
have several ways to solve the issue, not sure which is best
Presented using impress.js by Bartek Szopka

Web Developer!
Thank You!