You are on page 1of 4

Anatomy of SSSD user lookup

This blog post describes how a user lookup request is handled in SSSD. It should help you understand
how the SSSD architecture looks like, how the data flows in SSSD and as a result help identify which
part might not be functioning correctly on your system. It is aimed mostly at users and administrators
for developers, we have a separate document about SSSD internals on the SSSD wiki written by Yassir
Elley. This document re-uses some of the info from the internals one.

Well look at the most common operation, looking up user info on a remote server. I wont go into
server-specific details, so most of the info should be equally true for LDAP, Active Directory or
FreeIPA servers. Theres also more functionality in SSSD than looking up users, such as sudo or
autofs integration, but they are out of scope of this post as well.

Before going into SSSD details, lets do a really quick intro into what happens on the system in
general when you request a user from a remote server. Lets say the admin configured SSSD and tests
the configuration by requesting the admin user:

$ getent passwd admin

When user information is requested about a user (with getent, id or similar), typically one of the
functions of the Name Service Switch, such as getpwnam() or initgroups() in glibc is called. Theres
lots of information about the Name Service Switch in the libc manual, but for our purposes, its
enough to know that libc opens and reads the config file /etc/nsswitch.conf to find out which modules
should be contacted in which order. The module that all of us have on our Linux machines is files
which can read user info from /etc/passwd and user info from /etc/groups. There also exists an ldap
module that would read the info directly from an LDAP server and of course an sss module that talks
to SSSD. So how does that work?

The first thing to keep in mind is that, unlike nss_ldap or pam_ldap, the SSSD is not just a module
that is loaded in the context of the application, but rather a deamon that the modules communicate
with. Almost no logic is implemented in the modules, all the functionality happens in the deamon.
A user-visible effect during debugging is that using strace is not too helpful as it would only show
if the request made it to the SSSD. For debugging the rest, the SSSD debug logs should be used.

Earlier I said that SSSD is a deamon. Thats really not too precise, SSSD is actually a set of deamons
that communicate with one another. There are three kinds of SSSD processes. One is the sssd process
itself. Its purpose is to read the config file after startup and spawn the other processes according
to the config file. Then there are responder or front end processes that listen to queries from the
applications, like the query that would come from the getent command. If the responder process needs
to contact the remote service for data, it talks to the last SSSD process type, which is the data
provider or back end process. This architecture allows for a pluggable setup where there are different
back end processes talking to different remote servers, while all these remote servers can be accessed
from a range of applications or subsystems by the same lookup code in the responders.

Each process is represented by a section in the sssd.conf config file. The main sssd process is
represented by the [sssd] section. The front end processes are defined on the services line in the
[sssd] section and each can be configured in a section named after the service. And finally, the
back end processes are those configured in the [domain] sections. Each process also logs into its
own logfile.

Lets continue with the getent passwd admin example. To illustrate the flow, there is a diagram
that the text follows. The full arrows represent local IO operation (like opening a file), the empty
arrows represent local IPC over UNIX sockets and the dotted arrow represents a network IO.

The user issued the getent command which calls libcs getpwnam (diagram step 1), then the libc opens
the nss_sss module as per nsswitch.conf and passes in the request. First, the nss_sss memory-mapped
cache is consulted, thats step 2 on the diagram. If the data is present in the cache, it is just
returned without even contacting the SSSD, which is extremely fast. Otherwise, the request is passed
to the SSSDs responder process (step 3), in particular sssd_nss. The request first looks into the
SSSD on-disk cache (step 4). If the data is present in the cache and valid, the nss responder reads
the data from the cache and returns them to the application.

If the data is not present in the cache at all or if its expired, the sssd_nss request queries
the appropriate back end process (step 5) and waits for reply. The back end process connects to the
remote server, runs the search (step 6) and stores the resulting data into the cache (step 7). When
the search request is finished, the provider process signals back to the responder process that the
cache is updated (step 8). At that point, the front-end responder process checks the cache again.
If theres any data in the cache after the back end has updated it, the data is returned to the
application even in cases when the back end failed to update the cache for some reason, its
better to return stale data than none. Of course, if no data is found in the cache after the back
end has finished, an empty result is returned back. This final cache check is represented by step
9 in the diagram.

When I said the back end runs a search against the server, I really simplified the matter a lot.
The search can involve many different steps, such as resolving the server to connect to,
authenticating to the server, performing the search itself and storing the resulting data into the
database. Some of the steps might even require a helper process, for instance authenticating against
a remote server using a keytab is done in a heper process called ldap_child that logs into its own
logfile called /var/log/sssd/ldap_child.log.

Given most steps happen in the back end itself, then most often, the problem or misconfiguration
lies in the back end part. But it is still very important to know the overall architecture and be
able to identify if and how the request made it to the back end at all. In the next part, well
apply this new information to perform a small case study and we will repair a buggy sssd setup.

Troubleshooting a failing SSSD user lookup.

With the SSSD architecture in mind, we can try a case study. Consider we have an IPA client, but
no users, not even the admin show up:
$ getent passwd admin
$ echo $?
2

The admin user was not found! Given our knowledge of the architecture, lets first see if the system
is configured to query sssd for user information at all:

$ grep passwd /etc/nsswitch.conf


passwd: files sss

It is. Then the request was passed on to the nss responder process, since the only other possibility
is a successful return from the memory cache. We need to raise the debug_level in the [nss] section
like this:

[nss]
debug_level = 7

and restart sssd:

# systemctl restart sssd

Then well request the admin user again and inspect the NSS logs:

[sssd[nss]] [accept_fd_handler] (0x0400): Client connected!


[sssd[nss]] [sss_cmd_get_version] (0x0200): Received client version [1].
[sssd[nss]] [sss_cmd_get_version] (0x0200): Offered version [1].
[sssd[nss]] [nss_cmd_getbynam] (0x0400): Running command [17] with input [admin].
[sssd[nss]] [sss_parse_name_for_domains] (0x0200): name 'admin' matched without domain, user is
admin
[sssd[nss]] [nss_cmd_getbynam] (0x0100): Requesting info for [admin] from []
[sssd[nss]] [nss_cmd_getpwnam_search] (0x0100): Requesting info for [admin@ipa.example.com]
[sssd[nss]] [sss_dp_issue_request] (0x0400): Issuing request for [0x4266f9:1:admin@ipa.example.com]
[sssd[nss]] [sss_dp_get_account_msg] (0x0400): Creating request for
[ipa.example.com][4097][1][name=admin]
[sssd[nss]] [sss_dp_internal_get_send] (0x0400): Entering request
[0x4266f9:1:admin@ipa.example.com]
[sssd[nss]] [sss_dp_get_reply] (0x1000): Got reply from Data Provider - DP error code: 1 errno: 11
error message: Fast reply - offline
[sssd[nss]] [nss_cmd_getby_dp_callback] (0x0040): Unable to get information from Data Provider
Error: 1, 11, Fast reply - offline
Will try to return what we have in cache

Well, apparently the request for the admin user was received and passed on to the back end process,
but the back end replied that it switched to offline mode..that means we need to also enable debugging
in the domain part and continue investigation there. We need to add debug_level to the [domain] section
and restart sssd again. Then run the getent command and inspect the file called
/var/log/sssd/sssd_ipa.example.com starting with the time that corresponds to the NSS responder
sending the data (as indicated by sss_dp_issue_request in the nss log). In the domain log we see:

[sssd[be[ipa.example.com]]] [fo_resolve_service_done] (0x0020): Failed to resolve server


'master.ipa.example.com:389': Domain name not found
[sssd[be[ipa.example.com]]] [set_server_common_status] (0x0100): Marking server
'master.ipa.example.com:389' as 'not working'
[sssd[be[ipa.example.com]]] [be_resolve_server_process] (0x0080): Couldn't resolve server
(master.ipa.example.com:389), resolver returned (11)
[sssd[be[ipa.example.com]]] [be_resolve_server_process] (0x1000): Trying with the next one!
[sssd[be[ipa.example.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
[sssd[be[ipa.example.com]]] [get_server_status] (0x1000): Status of server
'master.ipa.example.com:389' is 'not working'
[sssd[be[ipa.example.com]]] [get_port_status] (0x1000): Port status of port 0 for server '(no name)'
is 'not working'
[sssd[be[ipa.example.com]]] [get_server_status] (0x1000): Status of server
'master.ipa.example.com:389' is 'not working'
[sssd[be[ipa.example.com]]] [fo_resolve_service_send] (0x0020): No available servers for service
'IPA'
[sssd[be[ipa.example.com]]] [be_resolve_server_done] (0x1000): Server resolution failed: 5
[sssd[be[ipa.example.com]]] [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline
(5 [Input/output error])
[sssd[be[ipa.example.com]]] [be_ptask_create] (0x0400): Periodic task [Check if online (periodic)]
was created
[sssd[be[ipa.example.com]]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]:
scheduling task 70 seconds from now [1426087775]
[sssd[be[ipa.example.com]]] [be_run_offline_cb] (0x0080): Going offline. Running callbacks.

OK, that gets us somewhere. Indeed, our /etc/resolv.conf file was ponting to a bad nameserver. And
indeed, after fixing the resolver settings and restarting SSSD, everything seems to be working:

$ getent passwd admin


admin:*:1546600000:1546600000:Administrator:/home/admin:/bin/bash