Skip to main content

Troubleshoot node connectivity issues in Ansible Automation Platform controller

Check connectivity to hosts you're managing with your AAP controller and get a spreadsheet-based summary of any exceptions.

Photo by JJ Ying on Unsplash

Imagine you have hundreds or thousands of hosts to manage from your Ansible Automation Platform (AAP) controller, but you cannot reach some of them. It could be because firewalls are blocking you, or maybe the service account or sudo is not yet configured on your managed nodes. Another possibility is that the environment changed, and suddenly you cannot automate some of your nodes.

[ Get started with IT automation with the Ansible Automation Platform beginner's guide. ]

Check the documentation for more information about what AAP requires to connect to its targets.

If you have only a handful of exceptions, you can just grab the output of your Ansible playbook and check them case by case.

But what if you have dozens or hundreds of cases to investigate? Wouldn't it be nice to have a summary of all these exceptions that you could open in a spreadsheet and distribute to your fellow sysadmins and network subject matter experts to help you?

Read on to learn how I solved this issue in three steps.

1. Check connectivity to the targets

First, I wrote a playbook to check connectivity to my targets:

---
- name: Check Connectivity and Report
  hosts: nodes
  gather_facts: false
  tasks:
  
    - name: 01 - Test Connectivity
      ansible.builtin.ping:
      register: connectivity
      ignore_unreachable: true

    - name: 02 - Save summary of connectivity check
      ansible.builtin.set_fact:
        summary: "{{ (summary | default([])) + [ item + ';' + _result ] }}"
      vars:
        _result: "{{ (hostvars[item]['connectivity']['msg'] | default('OK')).splitlines() | join()  }}"
      loop: "{{ ansible_play_hosts }}"
      delegate_to: localhost    
      run_once: true

    - name: 03 - Show result
      ansible.builtin.debug:
        msg: "{{ summary }}"
      delegate_to: localhost    
      run_once: true

    - name: 04 - Save result to csv file
      ansible.builtin.copy:
        content: "{{ (summary | sort | join('\n')) + '\n' }}"
        dest: /tmp/connectivity_test.csv
      delegate_to: aapwork
      run_once: true
...

The playbook runs against all my nodes, and I explicitly set gather_facts to false because I want to accomplish the connectivity test in a task with a special flag (ignore_unreachable).

Some comments about the tasks:

  • 01 - Test Connectivity: ignore_unreachable is set to true. Without this, the playbook would not execute the remaining tasks for this node. Notice that the next tasks run on localhost, but that is all I need to use the connectivity test's results for my summary.
  • 02 - Save summary of connectivity check: This executes after all nodes are tested. I run a loop based on ansible_play_hosts (an Ansible magic variable containing a list of all hosts processed in this playbook). For each host, I add an element into the array/list named summary. I used some Jinja2 filters to handle cases where a line feed appears in the output. This summarization task includes:
    • delegate_to = localhost runs on the localhost (AAP controller or Ansible controller).
    • The run_once = true loop processes the list of hosts, but I only invoke the task once (instead of running the loop multiple times).
  • 03 - Show result: This is a simple display of the array/list accumulated in the previous task. Also, it's executed only once and on the localhost. (And yes, these two tasks could be coded as a block.)
  • 04 - Save result to a CSV file: The last task dumps the array/list containing the summary to a file, which is an external server in my example. Here are some important aspects:
    • I executed this in my AAP, so the localhost is my Execution Environment. This is why I want to write the file to a server I can connect to later to retrieve the output file. Saving a file and trying to retrieve it from an EE would require additional steps, which are not necessary for this use case.
    • If you run this playbook from the command line, it is OK to use localhost as the delegated host in this task because it is easy to get the file manually.
    • The Jinja2 templates sort the output and convert each list item to a line in the file.

[ Get an Ansible Automation Platform trial subscription. ]

2. Execute the playbook

Here's a look at the playbook's execution in AAP:

Image
Screenshot of execution in Ansible Automation Platform
(Roberto Nozaki, CC BY-SA 4.0)

Notice that the playbook finished successfully (as I had the ignore_unreachable option set to True).

Also, in my limited inventory, I had one case of "Invalid/incorrect password" and another case of "Failed to connect to the host via ssh."

In a more realistic environment, I would have many more hosts and issues to analyze, which is where this playbook could be really useful.

[ Learn about migrating to Ansible Automation Platform 2. ]

3. View the output in a spreadsheet

In the last task, I wrote a CSV file, which I grabbed and opened using a spreadsheet application.

Follow the steps to open the CSV file in your favorite spreadsheet tool. Remember to select the semicolon character (and only it) as the field separator because my playbook uses this in task 02 - Save summary of connectivity check.

Image
Screenshot of a small part of a spreadsheet listing errors
(Roberto Nozaki, CC BY-SA 4.0)

Wrap up

In a scenario where you could have many different issues for many hosts, having a summary like this in a spreadsheet might be really helpful.

Connectivity problems to your managed hosts can happen at the beginning of a project when groups of hosts are added (during the acquisition of another company, for example) or due to network, firewall, or security changes. If this happens to you, this troubleshooting method may help you identify the source of your problems more efficiently.

[ Looking for more on system automation? Get started with The Automated Enterprise, a complimentary book from Red Hat. ]

Topics:   Ansible   Automation   Troubleshooting  
Author’s photo

Roberto Nozaki

Roberto Nozaki (RHCSA/RHCE/RHCA) is an Automation Principal Consultant at Red Hat Canada where he specializes in IT automation with Ansible. More about me

Try Red Hat Enterprise Linux

Download it at no charge from the Red Hat Developer program.