In this article I am going to explain how batch processing in AX2009 works, I don't mean
how to set up a batch group or any of that kind of thing that you find in the manual,
what I mean is what each AOS is doing in the background to decide how and when to pick
up batches and process and complete them. Understanding this background can help in
advanced batch troubleshooting or development scenarios.
In AX2009 batch processing changed. Now we have AOSes which can run batch processes
directly, if you want to see what's happening with a batch process, it can be more difficult than
in AX3 or AX4 as there is no client sitting there running to look at.
What happens now is that each AOS has a dedicated thread which checks for batches, basically
all this does is calls Classes\BatchRun.ServerGetTask() once every 60 seconds (timing is not
configurable) and if there is any work for that AOS to do then the AOS will pick up a task from here.
I'll give an example of an end-to-end batch process to show what happens where and when:
- A report is sent to batch by a user, it goes into the batch queue in BATCHJOB (header) and
BATCH (the batch tasks).
- Once every 60 seconds each AOS that has been configured for batch processing
(in administration->setup->server configuration) will call the X++ method -
Classes\BatchRun.serverGetTask()
- In serverGetTask() the logic is exposed in X++ so we can all see what happens, this is the
main place that we decide what to pick up for batch processing. Basically it checks if there
is any tasks in the BATCH table waiting for this AOS - based on the batch groups that this
AOS is configured to process, and based on the time that the records in BATCH are due to
be processed (i.e. something processes at 21:00 each day then it won't get picked up until
21:00 despite the fact that the AOS polls every 60 seconds). There are a few stages to this method:
1. First we check if there is a task (a task is a record in BATCH table) ready for us, the query for
this is like this:
select firstonly pessimisticlock RecId, CreatedBy, ExecutedBy, StartDateTime, Status,
SessionIdx,SessionLoginDateTime, Company, ServerId, Info
from batch
where batch.Status == BatchStatus::Ready
&& batch.RunType == BatchRunType::Server
&& (Session::isServer() || batch.CreatedBy == user)
join Language from userInfo
where userInfo.Id == batch.CreatedBy
&& userInfo.Enable == true
exists join batchServerGroup
where batchServerGroup.ServerId == serverId
&& batch.GroupId == batchServerGroup.GroupId;
2. If a task is returned in step 1 then there's nothing more to do and we start processing that task.
If no task is returned then we look to see if any batch jobs need to be started, the query for this is like this:
update_recordset batchJob setting
Status = BatchStatus::Executing,
StartDateTime = thisDate
where batchJob.Status == BatchStatus::Waiting
&& batchJob.OrigStartDateTime <= thisDate
exists join batch
where batch.BatchJobId == batchJob.RecId
exists join batchServerGroup
where batch.GroupId == batchServerGroup.GroupId
&& batchServerGroup.ServerId == serverId;
3. After step 2 we will run Classes\batchRun.serverProcessDependencies(). In here something
interesting happens - we see that we use this table "BatchGlobal", this is used as a focal point,
because we might have several AOSes running batch processing in the same environment,
and so for some operations we look to this table to see if another AOS has already done
something, to decide whether the current AOS needs to do it as well or not. For dependencies
we just make sure that another AOS is not doing this in the same second. So if we continue here,
the queries we run to set more tasks (again tasks are just records in the BATCH table) ready for
processing are below - you can see in the queries how we update the status on the BATCH
table records, checking that we only do it for records which are ready and do not have any
constraints that are not completed yet:
//There are no more available tasks and the user is asking for any task.
Search for more tasks with
//dependencies
update_recordset batch setting Status = BatchStatus::Ready
where batch.Status == BatchStatus::Waiting
&& batch.ConstraintType == BatchConstraintType::Or
exists join batchJob
where batchJob.Status == BatchStatus::Executing
&& batch.BatchJobId == batchJob.RecId
exists join constraintsOr
where constraintsOr.BatchId == batch.RecId
exists join batchDependsOr
where
(
((batchDependsOr.Status == BatchStatus::Finished
&& (constraintsOr.ExpectedStatus == BatchDependencyStatus::Finished
|| constraintsOr.ExpectedStatus == BatchDependencyStatus::FinishedOrError))
|| (batchDependsOr.Status == BatchStatus::Error
&& (constraintsOr.ExpectedStatus == BatchDependencyStatus::Error
|| constraintsOr.ExpectedStatus == BatchDependencyStatus::FinishedOrError)))
&& constraintsOr.DependsOnBatchId == batchDependsOr.RecId
);
update_recordset batch setting Status = BatchStatus::Ready
where batch.Status == BatchStatus::Waiting
&& batch.ConstraintType == BatchConstraintType::And
exists join batchJob
where batchJob.Status == BatchStatus::Executing
&& batch.BatchJobId == batchJob.RecId
notexists join constraintsAnd exists join batchDependsAnd
where
(
constraintsAnd.DependsOnBatchId == batchDependsAnd.RecId
&& constraintsAnd.BatchId == batch.RecId
&& ((batchDependsAnd.Status != BatchStatus::Finished && batchDependsAnd.Status != BatchStatus::Error)
|| (constraintsAnd.ExpectedStatus == BatchDependencyStatus::Finished
&& batchDependsAnd.Status == BatchStatus::Error)
|| (constraintsAnd.ExpectedStatus == BatchDependencyStatus::Error
&& batchDependsAnd.Status == BatchStatus::Finished))
);
4. When this serverProcessDependencies() is complete in step 3 we call again to serverGetOneTask() (same as in step 1), if there were some more tasks set to "ready" in step 3 then we might pick up a task to work on here. Of course if not tasks were "ready" in step 3 then we won't find a task and we'll just do nothing.
- So our report which we sent to batch, if in the steps numbered 1-4 above, we found this record was ready to process, and we picked it up, what happens next inside the AOS kernel is that we start a worker session, which can be thought of a bit like a client session, just without a client, it will have it's own session ID and you'll see the ID recorded against the record in the Batch table. From this point it calls BatchRun.runJobStatic() and actually runs the batch process - this is just normal X++ running the process here. When this runJobStatic() completes we call BatchRun.ServerFinishTask(), which just sets the status of the record in BATCH to either "finished" or "error" (if it failed for some reason).
- Now our batch task is finished - the record in the BATCH table. But the header for this batch, the Tables\BatchJob record is not set to finished yet. For this part there is another background process running every 60 seconds on each AOS which just calls into BatchRun.serverProcessFinishedJobs(). Now we can see in this X++ method what it does - we use this BatchGlobal table again, to make sure that between all AOSes we only check for finished jobs a maximum of once every 60 seconds, if it has been 60 seconds then we will run a whole load of queries (too many to copy here but you can check there to see them) to create the batch history (various tables), set the BatchJob record to finished and delete the completed tasks and constraints.
There are a couple of other background things that happen in the AOS kernel for batch processing:
1. Every 5 minutes it will call to BatchRun.serverCleanUpDeadTasks() - again we use the BatchGlobal table, so that we'll only run this once every 5 minutes between all AOSes. This just sets tasks back to "ready" if the session ID for the worker session (I mentioned this earlier - we create this worker session when we start processing a task) is no longer a valid session - basically if a task fails with an X++ exception, or something like that, then the worker session will end, and if you have configured this batch task to allow some retries, then it's this method which will reset the task for it to have a retry.
2. Every 5 minutes each AOS will check the server settings, to see if it's supposed to process the same batch groups - or if it's not supposed to be a batch server any more, all those settings.