Container FS Interfaces

Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

Container FS Interfaces

James Bottomley
jejb@linux.vnet.ibm.com

About Me

Container evangelist

Open Source Advocate

Converting Business to Open Source

Kernel Developer

SCSI Subsystem Maintainer
PA-RISC architecture Maintainer

Linux Containers API

Traditional Virtualization (Hypervisor) is about emulating Hardware

Container Virtualization is about Virtualizing the Operating System itself

Container "guests" don't have a second kernel

Block I/O

CPU

Devices

Memory

Network

Freezer

...

Network NS

IPC NS

Mount NS

PID NS

UTS NS

User NS

(4.6+)

Cgroup NS

Making it all work: User Namespaces

User namespace gives enhanced privileges to a user

Like ability to create other namespaces

also does a mapping between interior and exterior ids

/proc/<pid>/uid_map

/proc/<pid>/gid_map

/proc/<pid>/projid_map

/proc/<pid>/setgroups

shadow-utils provides newuidmap and newgidmap for this

retains the concept of an "owning" uid which is root like if mapped to uid 0

Unmapped uids are inaccessible, even to "root" in the namespace

So s_user_ns is the answer?

Unfortunately: no

To use it, you need a superpblock mount inside the container

Most container roots are bind mounts (no superblock)

struct mount, vfsmount and super_block (oh my!)

struct mount is fs internal, struct vfsmount is kernel visible

A filesystem tree is a set of struct mounts

Every mount namespace has a separate set of struct mounts (copied from the original)

Every struct mount points to a refcounted struct super_block

But the relation is many to one

If I mount something external I get a new super_block

But if I bind mount, I get a pointer to an old super_block

Old solution is bindfs: fuse based shifting bind mount

https://github.com/mpartel/bindfs/

simply remounts a subtree with a uid/gid shift (requires root to mount)

Proposed solution: shiftfs

A bind mount with a superblock

Picks up s_user_ns when mounted

requires admin to mark which part of the tree is shifting bindable

so that the container can then remount it

Demo

unshare, nsenter, arch containers

What happend at LSF/MM

faking a bind mount simply to get a superblock feels "icky"

Can't we make the s_user_ns work?

Old proposal: inode view of user (uid, kuid, iuid)

Prototyped using security xattrs and it can be made to work

But: can't be made to work with s_user_ns

WTF is James thinking now?

In other words, make s_user_ns a property of vfsmount

Allows other vfsmount properties as well

Final alternative: abandon s_user_ns

Add inode mark (xattr) saying use user view or kernel view

Conclusions

Still no agreed solution

But hopefully will be soon

have several ways to solve the issue, not sure which is best

Presented using impress.js by Bartek Szopka

Web Developer!

Thank You!

Questions?