415 lines
14 KiB
Groff
415 lines
14 KiB
Groff
.TH UNSHARE 1 "February 2016" "util-linux" "User Commands"
|
|
.SH NAME
|
|
unshare \- run program in new namespaces
|
|
.SH SYNOPSIS
|
|
.B unshare
|
|
[options]
|
|
.RI [ program
|
|
.RI [ arguments ]]
|
|
.SH DESCRIPTION
|
|
The
|
|
.B unshare
|
|
command creates new namespaces
|
|
(as specified by the command-line options described below)
|
|
and then executes the specified \fIprogram\fR.
|
|
If \fIprogram\fR is not given, then ``${SHELL}'' is
|
|
run (default: /bin/sh).
|
|
.PP
|
|
By default, a new namespace persists only as long as it has member processes.
|
|
A new namespace can be made persistent even when it has no member processes
|
|
by bind mounting
|
|
/proc/\fIpid\fR/ns/\fItype\fR files to a filesystem path.
|
|
A namespace that has been made persistent in this way can subsequently
|
|
be entered with
|
|
.BR \%nsenter (1)
|
|
even after the \fIprogram\fR terminates (except PID namespaces where
|
|
a permanently running init process is required).
|
|
Once a persistent \%namespace is no longer needed,
|
|
it can be unpersisted by using
|
|
.BR umount (8)
|
|
to remove the bind mount.
|
|
See the \fBEXAMPLES\fR section for more details.
|
|
.PP
|
|
.B unshare
|
|
since util-linux version 2.36 uses /\fIproc/[pid]/ns/pid_for_children\fP and \fI/proc/[pid]/ns/time_for_children\fP
|
|
files for persistent PID and TIME namespaces. This change requires Linux kernel 4.17 or newer.
|
|
.PP
|
|
The following types of namespaces can be created with
|
|
.BR unshare :
|
|
.TP
|
|
.B mount namespace
|
|
Mounting and unmounting filesystems will not affect the rest of the system,
|
|
except for filesystems which are explicitly marked as
|
|
shared (with \fBmount \-\-make-shared\fP; see \fI/proc/self/mountinfo\fP or
|
|
\fBfindmnt \-o+PROPAGATION\fP for the \fBshared\fP flags).
|
|
For further details, see
|
|
.BR mount_namespaces (7).
|
|
.IP
|
|
.B unshare
|
|
since util-linux version 2.27 automatically sets propagation to \fBprivate\fP
|
|
in a new mount namespace to make sure that the new namespace is really
|
|
unshared. It's possible to disable this feature with option
|
|
\fB\-\-propagation unchanged\fP.
|
|
Note that \fBprivate\fP is the kernel default.
|
|
.TP
|
|
.B UTS namespace
|
|
Setting hostname or domainname will not affect the rest of the system.
|
|
For further details, see
|
|
.BR uts_namespaces (7).
|
|
.TP
|
|
.B IPC namespace
|
|
The process will have an independent namespace for POSIX message queues
|
|
as well as System V \%message queues,
|
|
semaphore sets and shared memory segments.
|
|
For further details, see
|
|
.BR ipc_namespaces (7).
|
|
.TP
|
|
.B network namespace
|
|
The process will have independent IPv4 and IPv6 stacks, IP routing tables,
|
|
firewall rules, the \fI/proc/net\fP and \fI/sys/class/net\fP directory trees,
|
|
sockets, etc.
|
|
For further details, see
|
|
.BR network_namespaces (7).
|
|
.TP
|
|
.B PID namespace
|
|
Children will have a distinct set of PID-to-process mappings from their parent.
|
|
For further details, see
|
|
.BR pid_namespaces (7).
|
|
.TP
|
|
.B cgroup namespace
|
|
The process will have a virtualized view of \fI/proc\:/self\:/cgroup\fP, and new
|
|
cgroup mounts will be rooted at the namespace cgroup root.
|
|
For further details, see
|
|
.BR cgroup_namespaces (7).
|
|
.TP
|
|
.B user namespace
|
|
The process will have a distinct set of UIDs, GIDs and capabilities.
|
|
For further details, see
|
|
.BR user_namespaces (7).
|
|
.TP
|
|
.B time namespace
|
|
The process can have a distinct view of
|
|
.B CLOCK_MONOTONIC
|
|
and/or
|
|
.B CLOCK_BOOTTIME
|
|
which can be changed using \fI/proc/self/timens_offsets\fP.
|
|
For further details, see
|
|
.BR time_namespaces (7).
|
|
.SH OPTIONS
|
|
.TP
|
|
.BR \-i , " \-\-ipc" [ =\fIfile ]
|
|
Unshare the IPC namespace. If \fIfile\fP is specified, then a persistent
|
|
namespace is created by a bind mount.
|
|
.TP
|
|
.BR \-m , " \-\-mount" [ =\fIfile ]
|
|
Unshare the mount namespace. If \fIfile\fP is specified, then a persistent
|
|
namespace is created by a bind mount.
|
|
Note that \fIfile\fP must be located on a mount whose propagation type
|
|
is not \fBshared\fP (or an error results).
|
|
Use the command \fBfindmnt \-o+PROPAGATION\fP
|
|
when not sure about the current setting. See also the examples below.
|
|
.TP
|
|
.BR \-n , " \-\-net" [ =\fIfile ]
|
|
Unshare the network namespace. If \fIfile\fP is specified, then a persistent
|
|
namespace is created by a bind mount.
|
|
.TP
|
|
.BR \-p , " \-\-pid" [ =\fIfile ]
|
|
Unshare the PID namespace. If \fIfile\fP is specified, then a persistent
|
|
namespace is created by a bind mount.
|
|
(Creation of a persistent PID namespace will fail if the
|
|
.B \-\-fork
|
|
option is not also specified.)
|
|
.IP
|
|
See also the \fB\-\-fork\fP and
|
|
\fB\-\-mount-proc\fP options.
|
|
.TP
|
|
.BR \-u , " \-\-uts" [ =\fIfile ]
|
|
Unshare the UTS namespace. If \fIfile\fP is specified, then a persistent
|
|
namespace is created by a bind mount.
|
|
.TP
|
|
.BR \-U , " \-\-user" [ =\fIfile ]
|
|
Unshare the user namespace. If \fIfile\fP is specified, then a persistent
|
|
namespace is created by a bind mount.
|
|
.TP
|
|
.BR \-C , " \-\-cgroup" [ =\fIfile ]
|
|
Unshare the cgroup namespace. If \fIfile\fP is specified then persistent namespace is created
|
|
by bind mount.
|
|
.TP
|
|
.BR \-T , " \-\-time" [ =\fIfile ]
|
|
Unshare the time namespace. If \fIfile\fP is specified then a persistent
|
|
namespace is created by a bind mount. The \fB\-\-monotonic\fP and
|
|
\fB\-\-boottime\fP options can be used to specify the corresponding
|
|
offset in the time namespace.
|
|
.TP
|
|
.BR \-f , " \-\-fork"
|
|
Fork the specified \fIprogram\fR as a child process of \fBunshare\fR rather
|
|
than running it directly. This is useful when creating a new PID namespace.
|
|
Note that when \fBunshare\fR is waiting for the child process,
|
|
then it ignores SIGINT and SIGTERM and does not forward any signals to the
|
|
child. It is necessary to send signals to the child process.
|
|
.TP
|
|
.B \-\-keep\-caps
|
|
When the \fB\-\-user\fP option is given, ensure that capabilities granted
|
|
in the user namespace are preserved in the child process.
|
|
.TP
|
|
.BR \-\-kill\-child [ =\fIsigname ]
|
|
When \fBunshare\fR terminates, have \fIsigname\fP be sent to the forked child process.
|
|
Combined with \fB\-\-pid\fR this allows for an easy and reliable killing of the entire
|
|
process tree below \fBunshare\fR.
|
|
If not given, \fIsigname\fP defaults to \fBSIGKILL\fR.
|
|
This option implies \fB\-\-fork\fR.
|
|
.TP
|
|
.BR \-\-mount\-proc [ =\fImountpoint ]
|
|
Just before running the program, mount the proc filesystem at \fImountpoint\fP
|
|
(default is /proc). This is useful when creating a new PID namespace. It also
|
|
implies creating a new mount namespace since the /proc mount would otherwise
|
|
mess up existing programs on the system. The new proc filesystem is explicitly
|
|
mounted as private (with MS_PRIVATE|MS_REC).
|
|
.TP
|
|
.BI \-\-map\-user= uid|name
|
|
Run the program only after the current effective user ID has been mapped to \fIuid\fP.
|
|
If this option is specified multiple times, the last occurrence takes precedence.
|
|
This option implies \fB\-\-user\fR.
|
|
.TP
|
|
.BI \-\-map\-group= gid|name
|
|
Run the program only after the current effective group ID has been mapped to \fIgid\fP.
|
|
If this option is specified multiple times, the last occurrence takes precedence.
|
|
This option implies \fB\-\-setgroups=deny\fR and \fB\-\-user\fR.
|
|
.TP
|
|
.BR \-r , " \-\-map\-root\-user"
|
|
Run the program only after the current effective user and group IDs have been mapped to
|
|
the superuser UID and GID in the newly created user namespace. This makes it possible to
|
|
conveniently gain capabilities needed to manage various aspects of the newly created
|
|
namespaces (such as configuring interfaces in the network namespace or mounting filesystems in
|
|
the mount namespace) even when run unprivileged. As a mere convenience feature, it does not support
|
|
more sophisticated use cases, such as mapping multiple ranges of UIDs and GIDs.
|
|
This option implies \fB\-\-setgroups=deny\fR and \fB\-\-user\fR.
|
|
This option is equivalent to \fB\-\-map-user=0 \-\-map-group=0\fR.
|
|
.TP
|
|
.BR \-c , " \-\-map\-current\-user"
|
|
Run the program only after the current effective user and group IDs have been mapped to
|
|
the same UID and GID in the newly created user namespace. This option implies
|
|
\fB\-\-setgroups=deny\fR and \fB\-\-user\fR.
|
|
This option is equivalent to \fB\-\-map-user=$(id -ru) \-\-map-group=$(id -rg)\fR.
|
|
.TP
|
|
.BR "\-\-propagation private" | shared | slave | unchanged
|
|
Recursively set the mount propagation flag in the new mount namespace. The default
|
|
is to set the propagation to \fIprivate\fP. It is possible to disable this feature
|
|
with the argument \fBunchanged\fR. The option is silently ignored when the mount
|
|
namespace (\fB\-\-mount\fP) is not requested.
|
|
.TP
|
|
.BR "\-\-setgroups allow" | deny
|
|
Allow or deny the
|
|
.BR setgroups (2)
|
|
system call in a user namespace.
|
|
.sp
|
|
To be able to call
|
|
.BR setgroups (2),
|
|
the calling process must at least have CAP_SETGID.
|
|
But since Linux 3.19 a further restriction applies:
|
|
the kernel gives permission to call
|
|
.BR \%setgroups (2)
|
|
only after the GID map (\fB/proc/\fIpid\fB/gid_map\fR) has been set.
|
|
The GID map is writable by root when
|
|
.BR \%setgroups (2)
|
|
is enabled (i.e., \fBallow\fR, the default), and
|
|
the GID map becomes writable by unprivileged processes when
|
|
.BR \%setgroups (2)
|
|
is permanently disabled (with \fBdeny\fR).
|
|
.TP
|
|
.BR \-R , " \-\-root=\fIdir"
|
|
run the command with root directory set to \fIdir\fP.
|
|
.TP
|
|
.BR \-w , " \-\-wd=\fIdir"
|
|
change working directory to \fIdir\fP.
|
|
.TP
|
|
.BR \-S , " \-\-setuid \fIuid"
|
|
Set the user ID which will be used in the entered namespace.
|
|
.TP
|
|
.BR \-G , " \-\-setgid \fIgid"
|
|
Set the group ID which will be used in the entered namespace and drop
|
|
supplementary groups.
|
|
.TP
|
|
.BI \-\-monotonic " offset"
|
|
Set the offset of
|
|
.B CLOCK_MONOTONIC
|
|
which will be used in the entered time namespace. This option requires
|
|
unsharing a time namespace with \fB\-\-time\fP.
|
|
.TP
|
|
.BI \-\-boottime " offset"
|
|
Set the offset of
|
|
.B CLOCK_BOOTTIME
|
|
which will be used in the entered time namespace. This option requires
|
|
unsharing a time namespace with \fB\-\-time\fP.
|
|
.TP
|
|
.BR \-V , " \-\-version"
|
|
Display version information and exit.
|
|
.TP
|
|
.BR \-h , " \-\-help"
|
|
Display help text and exit.
|
|
.SH NOTES
|
|
The proc and sysfs filesystems mounting as root in a user namespace have to be
|
|
restricted so that a less privileged user can not get more access to sensitive
|
|
files that a more privileged user made unavailable. In short the rule for proc
|
|
and sysfs is as close to a bind mount as possible.
|
|
.SH EXAMPLES
|
|
The following command creates a PID namespace, using
|
|
.B \-\-fork
|
|
to ensure that the executed command is performed in a child process
|
|
that (being the first process in the namespace) has PID 1.
|
|
The
|
|
.B \-\-mount-proc
|
|
option ensures that a new mount namespace is also simultaneously created
|
|
and that a new
|
|
.BR proc (5)
|
|
filesystem is mounted that contains information corresponding to the new
|
|
PID namespace.
|
|
When the
|
|
.B readlink
|
|
command terminates, the new namespaces are automatically torn down.
|
|
.PP
|
|
.in +4n
|
|
.EX
|
|
.B # unshare \-\-fork \-\-pid \-\-mount-proc readlink /proc/self
|
|
1
|
|
.EE
|
|
.in
|
|
.PP
|
|
As an unprivileged user, create a new user namespace where the user's
|
|
credentials are mapped to the root IDs inside the namespace:
|
|
.PP
|
|
.in +4n
|
|
.EX
|
|
.B $ id \-u; id \-g
|
|
1000
|
|
1000
|
|
.B $ unshare \-\-user \-\-map-root-user \e
|
|
.B " sh \-c \(aqwhoami; cat /proc/self/uid_map /proc/self/gid_map\(aq"
|
|
root
|
|
0 1000 1
|
|
0 1000 1
|
|
.EE
|
|
.in
|
|
.PP
|
|
The first of the following commands creates a new persistent UTS namespace
|
|
and modifies the hostname as seen in that namespace.
|
|
The namespace is then entered with
|
|
.BR nsenter (1)
|
|
in order to display the modified hostname;
|
|
this step demonstrates that the UTS namespace continues to exist
|
|
even though the namespace had no member processes after the
|
|
.B unshare
|
|
command terminated.
|
|
The namespace is then destroyed by removing the bind mount.
|
|
.PP
|
|
.in +4n
|
|
.EX
|
|
.B # touch /root/uts-ns
|
|
.B # unshare \-\-uts=/root/uts-ns hostname FOO
|
|
.B # nsenter \-\-uts=/root/uts-ns hostname
|
|
FOO
|
|
.B # umount /root/uts-ns
|
|
.EE
|
|
.in
|
|
.PP
|
|
The following commands
|
|
establish a persistent mount namespace referenced by the bind mount
|
|
.IR /root/namespaces/mnt .
|
|
In order to ensure that the creation of that bind mount succeeds,
|
|
the parent directory
|
|
.RI ( /root/namespaces )
|
|
is made a bind mount whose propagation type is not
|
|
.BR shared .
|
|
.PP
|
|
.in +4n
|
|
.EX
|
|
.B # mount \-\-bind /root/namespaces /root/namespaces
|
|
.B # mount \-\-make-private /root/namespaces
|
|
.B # touch /root/namespaces/mnt
|
|
.B # unshare \-\-mount=/root/namespaces/mnt
|
|
.EE
|
|
.in
|
|
.PP
|
|
The following commands demonstrate the use of the
|
|
.B \-\-kill-child
|
|
option when creating a PID namespace, in order to ensure that when
|
|
.B unshare
|
|
is killed, all of the processes within the PID namespace are killed.
|
|
.PP
|
|
.in +4n
|
|
.EX
|
|
.BR "# set +m " "# Don't print job status messages"
|
|
.B # unshare \-\-pid \-\-fork \-\-mount\-proc \-\-kill\-child \-\- \e
|
|
.B " bash \-\-norc \-c \(aq(sleep 555 &) && (ps a &) && sleep 999\(aq &"
|
|
[1] 53456
|
|
# PID TTY STAT TIME COMMAND
|
|
1 pts/3 S+ 0:00 sleep 999
|
|
3 pts/3 S+ 0:00 sleep 555
|
|
5 pts/3 R+ 0:00 ps a
|
|
|
|
.BR "# ps h \-o 'comm' $! " "# Show that background job is unshare(1)"
|
|
unshare
|
|
.BR "# kill $! " "# Kill unshare(1)"
|
|
.B # pidof sleep
|
|
.EE
|
|
.in
|
|
.PP
|
|
The
|
|
.B pidof
|
|
command prints no output, because the
|
|
.B sleep
|
|
processes have been killed.
|
|
More precisely, when the
|
|
.B sleep
|
|
process that has PID 1 in the namespace (i.e., the namespace's init process)
|
|
was killed, this caused all other processes in the namespace to be killed.
|
|
By contrast, a similar series of commands where the
|
|
.B \-\-kill\-child
|
|
option is not used shows that when
|
|
.B unshare
|
|
terminates, the processes in the PID namespace are not killed:
|
|
.PP
|
|
.in +4n
|
|
.EX
|
|
.B # unshare \-\-pid \-\-fork \-\-mount\-proc \-\- \e
|
|
.B " bash \-\-norc \-c \(aq(sleep 555 &) && (ps a &) && sleep 999\(aq &"
|
|
[1] 53479
|
|
# PID TTY STAT TIME COMMAND
|
|
1 pts/3 S+ 0:00 sleep 999
|
|
3 pts/3 S+ 0:00 sleep 555
|
|
5 pts/3 R+ 0:00 ps a
|
|
|
|
.B # kill $!
|
|
.B # pidof sleep
|
|
53482 53480
|
|
.EE
|
|
.in
|
|
.PP
|
|
The following example demonstrates the creation of a time namespace
|
|
where the boottime clock is set to a point several years in the past:
|
|
.PP
|
|
.in +4n
|
|
.EX
|
|
.BR "# uptime \-p " "# Show uptime in initial time namespace"
|
|
up 21 hours, 30 minutes
|
|
.B # unshare \-\-time \-\-fork \-\-boottime 300000000 uptime \-p
|
|
up 9 years, 28 weeks, 1 day, 2 hours, 50 minutes
|
|
.EE
|
|
.in
|
|
.SH AUTHORS
|
|
.UR dottedmag@dottedmag.net
|
|
Mikhail Gusarov
|
|
.UE
|
|
.br
|
|
.UR kzak@redhat.com
|
|
Karel Zak
|
|
.UE
|
|
.SH SEE ALSO
|
|
.BR clone (2),
|
|
.BR unshare (2),
|
|
.BR namespaces (7),
|
|
.BR mount (8)
|
|
.SH AVAILABILITY
|
|
The unshare command is part of the util-linux package and is available from
|
|
https://www.kernel.org/pub/linux/utils/util-linux/.
|