Start a Long running script in windows and detach

Basically, the need is to start a bat file in particular working directory finish the playbook run without waiting for the script to finish.

Start-Process powershell seems to be the best method for this. If I run the command from the server itself the child processes of the bat file run and all is well.

When I run this from ansible they fail to start. I’ve tried a number of different parameter combinations and haven’t found any success.

- name: execute long running script
  ansible.windows.win_powershell:
    script: |
      Set-Location -Path "{{ long_running_directory }}"
      $process = Start-Process -WorkingDirectory "{{ long_running_directory }}" -FilePath "{{ long_running_script }}" -PassThru
      $process.id
      Get-CimInstance win32_process -Filter "ParentProcessId = $($process.id)" | select Name, ProcessId, CommandLine
  register: long_running_id

The bat file has 3 steps

forfiles /p path /D -15 /M  *.err /C "cmd /c del @path"
forfiles /p pathj /D -15 /M *.txt /C "cmd /c del @path"
perl script.pl parm1 param1

What else can be done to run this?

What do these failures look like? Some detail here might help us help you.

I get the processID of the start-process. There is no error. But the child processes that I would expect do not spawn.

 "long_running_id": {
        "changed": true,
        "debug": [],
        "error": [],
        "failed": false,
        "host_err": "",
        "host_out": "",
        "information": [],
        "output": [
            3868,
            {
                "CommandLine": "\\??\\C:\\Windows\\system32\\conhost.exe 0x4",
                "Name": "conhost.exe",
                "ProcessId": 18992
            }
        ],
        "result": {},
        "verbose": [],
        "warning": []
    }
}

This bit of the response looks like its the id you want

When you launch it, can you check for a matching process id or check event logging for what might be causing it to stop?

When you start a process through a network logon like WinRM/SSH/etc it will be scoped to what Windows calls a “Job”. When the main process that controls this job ends (the Ansible task finished), Windows will stop any remaining processes in that job and thus kills your long running process.

You can use async with poll: 0 to have it run in the background but keep in mind async tasks always have a set timeout. If this is truely something that should be running in the background you really should look at either running it through the task scheduler or a service rather than this adhoc method.

async with poll: 0 can be added to anything? I don’t see that param in the docs

async with poll: 0 can be added to anything? I don’t see that param in the docs

EDIT: Asynchronous actions and polling — Ansible Community Documentation
found these and tried. but it did not leave it running

I looked at service first but it interacts poorly with that. 1. it is a bat file which requires another thing to launch it. 2. used nssm and it throws and error on start, then shows running → paused flipping.

scheduled tasks is not centrally managed and not really how this should be done across large fleet of machines.

Not sure what you mean, there’s modules in Ansible to generate these tasks and do whatever you need.

As for the async/pool your task needs to look like

- name: run process in background
  ansible.windows.win_command:
    cmd: '"{{ batch_script_path }}"'
    chdir: '{{ long_running_directory }}'
  # This will cap the time to just 60 seconds, set it whatever you need
  async: 60
  poll: 0  

The trouble with using async here is

  • You need to specify a timeout, you cannot just have it run indefinitely
  • There is no way to easily retrigger this outside of Ansible
  • There is no way to monitor the process

This is why using a service or scheduled task is more ideal for such scenarios. It is using the official Windows mechanisms to run tasks in the background and gives you the ability to check if it’s still running, restart as needed, and define more complex actions like what to do on a failure, etc. Granted using a service is a bit harder for simple scripts as you need to use something like nssm to wrap it but using a scheduled task is definitely doable here.

I set the timeout to 24 hours in seconds and it did not work…

I hear what you are saying that you think this is incorrect operation, but I have already stated the reasons your stated better solutions are not better for this case.

calling start-process should be able to spawn a new process that is not dependent on the initiating process. I have tested this outside of using ansible. What is ansible doing to break the behavior?

What I am doing shouldn’t require async, simply kick off a process and be done.

This gets into why async is not a great option, there’s no real way to determine why the process failed without then checking the status with async_status or manually checking the async task file to figure out what went wrong. Once you get out of Ansible there is no status checking outside of checking if the process is still running through something like the task manager or procexp.

I’ve stated why that doesn’t work but will try and clarify. Say I have this Ansible task

- ansible.windows.win_powershell:
    script: |
        Start-Process -FilePath powershell.exe -ArgumentList '-NoExit "Some Id"'
        # Time to capture child process details manually
        Start-Sleep -Seconds 60

If we were to capture the process tree of this task as it is running you’ll see something like this (the actual tree may slightly differ depending on your connection plugin):

You can see that the spawned PowerShell process in the tree (4136) is a child process of the Ansible task (7356). If we were to look at the process metadata we can see that 4136 is part of a Windows job where one of the job limits is “Kill on Job Close”. Ansible does not set this metadata as it is applied as part of the service which handles the connection and automatically applied to any child process spawned inside that connection to ensure any dangling processes are cleaned up on exit.

The only way to avoid that is to explicitly spawn a process that either breaks away from the job which unfortunately is not exposed at any level in PowerShell or .NET but at the C layer with the CREATE_BREAKAWAY_FROM_JOB flag.

This is essentially what async does and it achieves this with some more complicated code to call CreateProcess directly with this flag to spawn the process explicitly not in the job.

So taking my example by before and slightly tweaking it to

- name: run process in background
  ansible.windows.win_command:
    cmd: powershell.exe Start-Sleep -Seconds 120
  async: 60
  poll: 0

You can see in the process tree that it is no longer tied to the connection (and subsequently the job set to kill any process on close) and has continued to run even though the Ansible task has finished and exited

You can also use async_status with the jid to get the status of the async task which if it failed will include the error message

ansible win-host -m async_status -a "jid=..."

In my case it returns an error because the task exceeded the 60 second timeout which was expected.

$ ansible SERVER2022 -m async_status -a "jid=j744454629280.9928"
[ERROR]: Task failed: Module failed: failure during async watchdog: timed out waiting for module completion
Origin: <adhoc 'async_status' task>

{'action': {'module': 'async_status', 'args': {'jid': 'j744454629280.9928'}}, 'timeout': 0, 'async_val': 0, 'poll': 15}

SERVER2022 | FAILED! => 
    ansible_async_watchdog_pid: 4508
    ansible_job_id: j744454629280.9928
    changed: false
    finished: 1
    msg: 'failure during async watchdog: timed out waiting for module completion'
    results_file: C:\Users\vagrant-domain\.ansible_async\j744454629280.9928
    started: 1
    stderr: ''
    stderr_lines: <omitted>
    stdout: ''
    stdout_lines: <omitted>
1 Like

Thank you for the details.

My first thought was to use service, but wasn’t a fit with the scripts. Scheduled task unfortunately is missing an execute from ansible.

I’ve taken your suggestion and using that plus powershell script to initiate them.