Definitions: Zones, Databases and Datasets

This section is a definition of terms to clarify the differences between the concepts of zones, databases and datasets and their meaning in the Spamhaus data.

  • A zone is referred to as a DNS or HTTP API endpoint used to access a certain set of databases (one or more). For example, zen is the endpoint exposing the three databases named sbl, xbl and pbl together, so they can all be queried with a single request.

  • A database is a corpus of data that is distributed as a single entity, and is composed of one or more datasets. For example, the sbl database groups together the actual sbl dataset, the css dataset and the DROP dataset. Usually the record returned by each dataset can be distinguished based on the return code, in order to allow the querying software to take different actions based on the actual dataset matched by the query.

  • A dataset is a set of records that share the same purpose and policy, and are usually built by the same processes. Sometimes this actually hides an additional layer of datasets that are in truth separate in terms of origin for technical reasons, but are seen as a single entity as they can just be treated the same way from the consumer point of view. An example of this is css, that is in truth composed of the two separate datasets ccs4 (for IPv4 data) and css6 (for IPv6 data). This last level is usually invisible to the end user, tho, and it’s reported only for the sake of completeness.

Sometimes the distinction above can be perceived as confusing, as some zones have the same name as a database and a dataset. This happens for historical reasons: the SBL was originally a database consisting of a single dataset and published as a zone. With time other datasets started being created and added to the same zone as integrations.

Usually this is not a big issue, as when it comes to consuming the data all the querier sees and should care about is the return code received as reply to a query, as this is independent from the zone the query was performed against.

References to -for example- “listed by sbl” should therefore be intended as “listed by the sbl dataset”, unless specified otherwise.