LDAP 101: It’s A Database, Stupid
We’re doing an LDAP-for-authentication rollout at my day job – the sort of thing there are lots of docs about already. One of the things we’ve run into is the lack of a single, complete document describing the whole tool ecosystem, from what LDAP is and how it works all the way through to how to use it to authenticate users.
So I thought I’d write one.
This post will cover some introductory knowledge about LDAP. Subsequent posts will introduce some LDAP tools, go into more detail on the data stored in a directory and on the implementation of an authentication system.
“LDAP” is an extensive subject, and I’m not going to try to cover every aspect of it. (For that, see the links at the bottom of this post.) I’ll be demonstrating simple bind authentication, without SASL or Kerberos/GSSAPI, and I won’t be going into too much detail outside of users and groups. In particular, I won’t be covering too much history, and I won’t be covering ActiveDirectory (the other widely-deployed authentication and directory service built on LDAP).
LDIF
I’ll be using LDIF to describe LDAP data throughout this series. Don’t worry about the details too hard for now; I’ll cover it in more detail later on. For now, all you’ll need to understand is how attributes and objects will appear in simple common cases.
It’s A Database, Stupid
LDAP is a protocol and a data model for providing access to a hierarchal (tree-shaped) database. The database itself is normally referred to as a “directory” (hence the name), a bit of nomenclature inherited from its predecessor, X.500, populated with objects that are bundles of attributes. LDAP supports direct lookup of an object by name, searches through subtrees or through the whole directory for objects matching a sophisticated pattern language, atomic updates to part of or all of individual objects, and network federation.
Attributes
LDAP’s data model is built on top of attributes. Attributes form the basis of object names, store data for user applications, and store data maintained by and for the directory service itself.
Each attribute has a name and a value. Some attributes can appear multiple times in the same object; others can appear only once.
Attributes are written in LDIF at the start of line. The attribute name appears first, followed by a colon, followed by the attribute value. Attributes can span multiple lines; a line that begins with whitespace is assumed to be a continuation of the preceeding line’s attribute value.
This LDIF fragment describes a single attribute:
givenName: John
This is a pair of attributes:
givenName: John surname: Franklin
This is an attribute with multiple values:
mail: john@example.com mail: jfranklin@example.net
Attributes are divided into two groups: user attributes, which hold data for applications using the directory, and operational attributes, which hold data for use by the directory system.
Names
Every object in an LDAP directory has a distinguished name (DN) which identifies its location in the directory’s tree. An object’s name is a sequence of attributes, written with the root of the tree on the right. In a DN, attributes are written in name=value form and are separated by commas, rather than LDIF’s name: value form.
This is a DN:
givenName=John,o=Example
DNs have a parent-and-child relationship. In the example above, givenName=John,o=Example is the DN of a child of the object named o=Example. Directory trees can have very deep hierarchies of names, or very shallow hierarchies of names, depending on the needs of the application.
Every directory tree has a special DN called its base DN which acts as the root of the directory tree. The base DN of a directory tree may contain multiple segments, which would imply that it has parents, but the parent objects do not need to exist in any directory tree. Every directory service has a second special DN, identified by the empty string, naming the root of the directory service.
Objects
Objects hold the data stored in the directory. Each object has a name, distinct from the names of all other objects, and a suite of attributes describing the object.
In LDIF, objects are written as a list of attributes, starting with a dn attribute holding the object’s DN. Multiple objects can appear in a single LDIF file, separated by a blank line.
This is a pair of objects:
dn: o=Example objectClass: organization o: Example dn: commonName=John Franklin,o=Example objectClass: person givenName: John surname: Franklin commonName: John Franklin
(The dn attribute in the LDIF description is not an attribute of the object, but rather part of the LDIF language. It must appear first in any LDIF representation.)
Every object has a user attribute named objectClass, which identifies what kind of data the object holds.
Schemata
Unlike more recent key/value databases, LDAP validates stored data against a schema. Each object’s objectClass attributes identify the schemata that apply to that object.
Object classes are broken down into two groups: structural classes and auxiliary classes. (There’s a third group, abstract classes, which we won’t cover here. Abstract classes do not usually appear in directory objects’ objectClass values.) Every object in the directory must have exactly one objectClass attribute identifying a structural class, and can have many (or no) objectClass attributes identifying auxiliary classes.
The schema for an object class includes the class’s names (most object classes have a single name, but aliases are permitted, with the first name in the schema being the canonical one), a description, the name of the class it expands on, the structural or auxiliary nature of the class, a list of attributes objects with the class must have, and a list of attributes objects with the class may have.
Each attribute in an object class definition has a corresponding attribute type definition. The attribute type definition includes the attribute’s names (as with object classes, attributes can have multiple names; this is relatively common), description, the rules for comparing two values for equality, ordering, or substring presence, the syntax rules for the attribute’s values, flags indicating whether the attribute can have multiple values, and flags indicating whether the attribute is for user or directory tree usage.
The LDAP specifications (RFC 4510 and friends) provide a set of common object classes and attribute types that are available on most LDAP servers. There are also common schemata for a number of use cases, including acting as an address book, authentication database, or network information service. Most directory servers also allow users to provide their own schemata to customize the directory for their particular needs.
Parent/child relationships between objects are not specified by schema information. Instead, each application and deployment works out the hierarchy that best fits the problem at hand. Several common hierarchy conventions are documented in RFCs and reference manuals, but most LDAP tools are flexible enough to work with a wide variety of tree shapes.
Finding Things
LDAP provides the ability to search for objects within a directory tree. A search request comes with four things:
- The search base DN.
- The search scope.
- A filter.
- A list of attributes to return.
The search base and the search scope combine to limit the parts of the tree that the search will examine. There are four scopes:
base- Return only the object identified by the search base DN, or no object.
one- Return only objects that are immediate children of the object identified by the search base DN.
subtree- Return objects that are descendents of the object identified by the search base DN, and the object identified by the search base DN.
children- Return objects that are descendents of the object identified by the search base DN. (This is a common extension, rather than part of the LDAP standard.)
The directory applies the search filter to the objects that are in the search scope, including only objects that match the filter in the final result. The filter is specified using a small expression language that permits several kinds of matching on attributes. Primitive filters, which match attributes against values or patterns, have the form (NAME OPERATOR VALUE), while composite filters, which combine filters together, have the form (OPERATOR FILTER [FILTER...]).
Some examples:
| Filter | Description |
|---|---|
(givenName=John) |
Match any object with a givenName attribute whose value is equal to John (under the equality rules given in the givenName attribute type definition in the schema). |
(objectClass=person) |
Match any person object. |
(objectClass=*) |
Match any object. (Idiomatic; every object has an objectClass attribute, and * matches all values. This is the default for many tools if no other filter is provided.) |
(&(objectClass=person) |
Match any person object with a mail attribute that ends with @example.com. The filters are logically ANDed together. |
The full filter language includes support for partial matches, numerical comparison, presence or absence of an attribute, and several other features that I’m not going to go into here.
Finally, the directory server extracts each of the attributes named in the search request. Searches can ask for attributes by exact name, or by requesting the attribute * (which returns all user attibutes) or + (which returns all operational attributes). Attributes with multiple names are normally returned under their canonical name – for example, asking for the surname attribute will normally return attributes named sn under commonly used schemata. The final search result includes the matched objects’ DNs, as well as the requested attributes.
Indexing
Like any database system, LDAP directory services generally provide indexing to ensure common searches return quickly, without having to walk the entire directory tree to find the matches. I’ll cover this in more detail for OpenLDAP in a later post.
Next
Next time, we’ll look at OpenLDAP and the UNIX ldap utility programs, and construct a simple address book.
References
3 Comments
Other Links to this Post
RSS feed for comments on this post. TrackBack URI

By David Dossot, November 25, 2011 @ 5:06 pm
Thank you for bringing LDAP up.
I wish it would be used more often and I blame myself for not using it every time I have the weakness to create a “user” (or “player”) table in a DB.
LDAP servers are solid and scalable, the protocol itself is versatile enough to support most use cases, so why am I still creating user tables? Why, despite my will to use it, I still don’t do it?
Maybe because the protocol is borderline fuggly and the client libraries out there suck bags? Maybe I’m just lazy and a tainted LDAP aficionado :P
By Owen, November 26, 2011 @ 2:00 pm
I’ve actually tried to use LDAP in anger for some of my own data storage needs (specifically, users) and run into aggravating problems that sent me back to other databases. I really want to like LDAP more than I do, but I keep being frustrated by
Extremely limited transaction support. In particular, no XA support in common implementations. (I don’t think pervasive XA is a good thing, but one data store and one message queue in the same transaction is a common pattern in my code and it works well.)
Difficulty rolling out custom schemata. The mechanism for adding a new schema is implementation-dependent, and frequently quite invasive (for example, in OpenLDAP you need to modify the server configuration to add a new schema).
Deeply legacy assumptions embedded in standard schemata. The person/orgPerson/inetOrgPerson schemata, which are used very widely, contain most of these assumptions about names.
Poor client library support. I took a good run at using Spring-LDAP to handle the authentication backend for something I’m working on, and could not figure out how to delete an attribute. That, plus the lack of ODirectoryM libraries that really understand things like attribute aliases, makes writing general data-access code against LDAP pretty obnoxious.
By David Dossot, December 13, 2011 @ 5:16 pm
Any experience/thoughts about Apache Directory or 389 Directory Server? They seem to both support dynamic schema reconfiguration.