Ansible: Using host info on other hosts

Problem: you want to use data for one set of hosts to configure something on a different set of hosts. Specifically, there are some values in the host variables of the first group that you need to collect and use on one or more other hosts.

If you recall my previous post on registering multiple results in a variable, you’ll remember I had a YAML data structure that described the set of database instances on a host:

db_instances:
  instance1:
    shmmem: 8G
    uid: 6953
  instance2:
    shmmem: 16G
    uid: 6954

This is the same data structure, but with an extra field shown: the ‘uid’ value is the hardcoded user ID of the instance user account (which is named after the instance). Each database instance runs under a dedicated account (to which we apply the project resource limits previously covered). So on the host, we create user accounts based on the information for each instance:

- name: create DB instance user
  user:
    name: "{{ item.key }}"
    uid: "{{ item.value.uid }}"
    comment: "{{ item.key }} database user"
  with_dict: "{{ db_instances }}"

This is fine, but each of these hosts is actually a Solaris virtual zone running on one of a set of physical, clustered nodes. In Solaris by default, listing the processes in the global zone (the top level, ‘parent’ OS instance for the physical server) also shows all the processes running in the local zones attached to that node but crucially, the owning user IDs for those processes will be displayed numerically because those users don’t exist in the global zone’s namespace. The database administrators find this confusing and ugly; to fix it, we’ll need to create the same accounts with the same UIDs (for all the instances on all hosts) in the global zones. (Note that each instance UID is unique across the entire cluster.)

To achieve this in Ansible, we’ll need to access the db_instances data for all the database hosts, but for use on a different set of hosts. Normally, host variable data is considered specific to that host. My first thought was to reference it via the special hostvars dictionary. That turned out to be a non-starter, since I’d need to loop through all the instances within the hostvars entries for all the relevant hosts, within a third loop applying that to each global zone. Ansible’s task loops are quite extensive but the more complex ones, such as with_nested, operate on separate lists rather than dictionaries. Both the structures in question here are of the latter form, and the elements we want from hostvars are nested. with_subelements can sort of handle this, but not the outermost loop as well. (This is a good reason to sometimes prefer lists for structured data even when a dictionary feels more appropriate - there are more methods available to parse and iterate lists than dictionaries.)

It was then that I discovered playbook delegation, which allows you to run a task intended for the current host in play against a different host altogether. (One of the things I really like about Ansible is that one can always find a suitable filter or module to achieve even quite complex tasks that initially appear insurmountable. Studying the documentation in detail helps, but quite often some facets are only referenced in forum examples and ServerFault answers.)

The example use for delegation given in the manual is to update a central load balancer configuration when a backend node is added or removed. However, I can use it here to run the user creation task against the global zones as well as the database host zone. Here’s another task to do this:

- name: create dummy accounts for all database instance users on nodes
  delegate_to: "{{ item.0 }}" # run task on node, not this host
  user:
    name: "{{ item.1 }}"  # instance name (user name)
    uid: "{{ db_instances[item.1].uid }}" # instance UID value
    comment: "{{ item.1 }} database user"
    shell: '/bin/false'
  # run in nested loops:
  # for each <global_zone>:
  #   for each <db_instance>:
  with_nested:
    - "{{ global_zones }}"
    - "{{ db_instances.keys() }}"

It’s critical to note that this task is run against each database host. global_zones is a list of the hostnames for all the nodes; it’s actually taken from an existing Ansible host group, extracted from the groups dictionary. Within a nested loop, we delegate the user creation task to each global zone in turn, looping through the database instances defined for the host and creating ‘dummy’ accounts based on the instance name and assigned UID as before. These dummy accounts have an invalid shell (and could also share a common home directory), since they shouldn’t be used for login or program execution on the global zones; they’re purely to give the ps command something to map numeric UIDs against sensibly.

We loop over the defined instances by extracting the names of the keys from the db_instances dictionary; the keys() operator does this (and is a good example of a feature that is discussed in forums but not covered in the Ansible documentation, since it’s actually a Python dict method inherited from the underlying implementation). Within the loop, we can then use this instance name (assigned to item.1) to look up the uid value in the dictionary.

This solution works well, but one aspect of it makes me slightly uneasy. The use of delegation appears to break the (informal) contract with the Ansible user that a role is normally expected to operate (i.e. change state) only on the host(s) to which it is applied. In this case, we apply the role containing the task to one set of hosts (the database zones) but it actually creates users on a second, unrelated set of hosts (the global zones). True, the global_zones list has to be explicitly defined and passed across, but it feels somewhat non-intuitive. Additionally, a limited set of hosts specified via the --limit option to ansible-playbook will not work as might be expected in this case, since it won’t affect the global_zones list. (I did try using the intersection of this list with the ansible_play_hosts list, but the latter only contains the name of the host currently executing the role at any one time rather than all the hosts in scope; we need an ansible_limit_hosts magic variable too.)

In Ansible terms, it would be more typical if the role were applied to the global zones and referenced data from the database zones - but that gets us back to trawling through hostvars (oh, if only I hadn’t chosen a dictionary!) What I’ve done here isn’t against the letter of our coding standard, but it arguably violates the spirit of the standard (which is that tasks and roles should be as simple and transparent in their use as possible) significantly. (As I don’t kowtow religiously to that spirit anyway, I settled for adding a bold warning to the README - caveat emptor!)

What sucks, who sucks and you suck