Throughout the kernel there are access restrictions relating to jailed processes. Usually, these restrictions only check whether the process is jailed, and if so, returns an error. For example:
if (jailed(td->td_ucred)) return (EPERM);
System V IPC is based on messages. Processes can send
each other these messages which tell them how to act. The
functions which deal with messages are: msgctl(3),
msgget(3), msgsnd(3) and msgrcv(3). Earlier, I
mentioned that there were certain sysctls you could turn on or
off in order to affect the behavior of
jail. One of these sysctls was
security.jail.sysvipc_allowed
. By default,
this sysctl is set to 0. If it were set to 1, it would defeat
the whole purpose of having a jail;
privileged users from the jail
would be able to affect processes outside the jailed
environment. The difference between a message and a signal is
that the message only consists of the signal number.
/usr/src/sys/kern/sysv_msg.c
:
msgget(key, msgflg)
:
msgget
returns (and possibly creates) a
message descriptor that designates a message queue for use
in other functions.
msgctl(msgid, cmd, buf)
: Using this
function, a process can query the status of a message
descriptor.
msgsnd(msgid, msgp, msgsz, msgflg)
:
msgsnd
sends a message to a
process.
msgrcv(msgid, msgp, msgsz, msgtyp,
msgflg)
: a process receives messages using
this function
In each of the system calls corresponding to these functions, there is this conditional:
/usr/src/sys/kern/sysv_msg.c
:
if (!jail_sysvipc_allowed && jailed(td->td_ucred))
return (ENOSYS);
Semaphore system calls allow processes to synchronize execution by doing a set of operations atomically on a set of semaphores. Basically semaphores provide another way for processes lock resources. However, process waiting on a semaphore, that is being used, will sleep until the resources are relinquished. The following semaphore system calls are blocked inside a jail: semget(2), semctl(2) and semop(2).
/usr/src/sys/kern/sysv_sem.c
:
semctl(semid, semnum, cmd, ...)
:
semctl
does the specified
cmd
on the semaphore queue indicated by
semid
.
semget(key, nsems, flag)
:
semget
creates an array of semaphores,
corresponding to key
.
key and flag take on the same meaning as they
do in msgget.
semop(semid, array, nops)
:
semop
performs a group of operations
indicated by array
, to the set of
semaphores identified by
semid
.
System V IPC allows for processes to share memory. Processes can communicate directly with each other by sharing parts of their virtual address space and then reading and writing data stored in the shared memory. These system calls are blocked within a jailed environment: shmdt(2), shmat(2), shmctl(2) and shmget(2).
/usr/src/sys/kern/sysv_shm.c
:
shmctl(shmid, cmd, buf)
:
shmctl
does various control operations
on the shared memory region identified by
shmid
.
shmget(key, size, flag)
:
shmget
accesses or creates a shared
memory region of size
bytes.
shmat(shmid, addr, flag)
:
shmat
attaches a shared memory region
identified by shmid
to the address
space of a process.
shmdt(addr)
:
shmdt
detaches the shared memory region
previously attached at
addr
.
Jail treats the socket(2)
system call and related lower-level socket functions in a
special manner. In order to determine whether a certain
socket is allowed to be created, it first checks to see if the
sysctl
security.jail.socket_unixiproute_only
is
set. If set, sockets are only allowed to be created if the
family specified is either PF_LOCAL
,
PF_INET
or PF_ROUTE
.
Otherwise, it returns an error.
/usr/src/sys/kern/uipc_socket.c
:
int
socreate(int dom, struct socket **aso, int type, int proto,
struct ucred *cred, struct thread *td)
{
struct protosw *prp;
...
if (jailed(cred) && jail_socket_unixiproute_only &&
prp->pr_domain->dom_family != PF_LOCAL &&
prp->pr_domain->dom_family != PF_INET &&
prp->pr_domain->dom_family != PF_ROUTE) {
return (EPROTONOSUPPORT);
}
...
}
The Berkeley Packet Filter provides a raw interface to data link layers in a protocol independent fashion. BPF is now controlled by the devfs(8) whether it can be used in a jailed environment.
There are certain protocols which are very common, such as
TCP, UDP, IP and ICMP. IP and ICMP are on the same level: the
network layer 2. There are certain precautions which are
taken in order to prevent a jailed process from binding a
protocol to a certain address only if the
nam
parameter is set.
nam
is a pointer to a
sockaddr
structure, which describes the
address on which to bind the service. A more exact definition
is that sockaddr
"may be used as a template
for referring to the identifying tag and length of each
address". In the function
in_pcbbind_setup()
, sin
is a pointer to a sockaddr_in
structure,
which contains the port, address, length and domain family of
the socket which is to be bound. Basically, this disallows
any processes from jail to be able
to specify the address that does not belong to the
jail in which the calling process
exists.
/usr/src/sys/netinet/in_pcb.c
:
int
in_pcbbind_setup(struct inpcb *inp, struct sockaddr *nam, in_addr_t *laddrp,
u_short *lportp, struct ucred *cred)
{
...
struct sockaddr_in *sin;
...
if (nam) {
sin = (struct sockaddr_in *)nam;
...
if (sin->sin_addr.s_addr != INADDR_ANY)
if (prison_ip(cred, 0, &sin->sin_addr.s_addr))
return(EINVAL);
...
if (lport) {
...
if (prison && prison_ip(cred, 0, &sin->sin_addr.s_addr))
return (EADDRNOTAVAIL);
...
}
}
if (lport == 0) {
...
if (laddr.s_addr != INADDR_ANY)
if (prison_ip(cred, 0, &laddr.s_addr))
return (EINVAL);
...
}
...
if (prison_ip(cred, 0, &laddr.s_addr))
return (EINVAL);
...
}
You might be wondering what function
prison_ip()
does.
prison_ip()
is given three arguments, a
pointer to the credential(represented by
cred
), any flags, and an IP address. It
returns 1 if the IP address does NOT belong to the
jail or 0 otherwise. As you can
see from the code, if it is indeed an IP address not belonging
to the jail, the protocol is not
allowed to bind to that address.
/usr/src/sys/kern/kern_jail.c:
int
prison_ip(struct ucred *cred, int flag, u_int32_t *ip)
{
u_int32_t tmp;
if (!jailed(cred))
return (0);
if (flag)
tmp = *ip;
else
tmp = ntohl(*ip);
if (tmp == INADDR_ANY) {
if (flag)
*ip = cred->cr_prison->pr_ip;
else
*ip = htonl(cred->cr_prison->pr_ip);
return (0);
}
if (tmp == INADDR_LOOPBACK) {
if (flag)
*ip = cred->cr_prison->pr_ip;
else
*ip = htonl(cred->cr_prison->pr_ip);
return (0);
}
if (cred->cr_prison->pr_ip != tmp)
return (1);
return (0);
}
Even root
users within the
jail are not allowed to unset or
modify any file flags, such as immutable, append-only, and
undeleteable flags, if the securelevel is greater than
0.
/usr/src/sys/ufs/ufs/ufs_vnops.c:
static int ufs_setattr(ap) ... { ... if (!priv_check_cred(cred, PRIV_VFS_SYSFLAGS, 0)) { if (ip->i_flags & (SF_NOUNLINK | SF_IMMUTABLE | SF_APPEND)) { error = securelevel_gt(cred, 0); if (error) return (error); } ... } }/usr/src/sys/kern/kern_priv.c
int priv_check_cred(struct ucred *cred, int priv, int flags) { ... error = prison_priv_check(cred, priv); if (error) return (error); ... }/usr/src/sys/kern/kern_jail.c
int prison_priv_check(struct ucred *cred, int priv) { ... switch (priv) { ... case PRIV_VFS_SYSFLAGS: if (jail_chflags_allowed) return (0); else return (EPERM); ... } ... }
All FreeBSD documents are available for download at https://download.freebsd.org/ftp/doc/
Questions that are not answered by the
documentation may be
sent to <freebsd-questions@FreeBSD.org>.
Send questions about this document to <freebsd-doc@FreeBSD.org>.