Opened 17 years ago

Closed 17 years ago

#636 closed task (fixed)

Kill a running job

Reported by: Johan Enell Owned by: Nicklas Nordborg
Priority: major Milestone: BASE 2.6
Component: core Version:
Keywords: Cc:

Description (last modified by Nicklas Nordborg)

It should be possible for a user to kill a running job. The core sends a kill signal to a job and the job tells the plugin to end all processes(threads?) it has opened. If the job is running on a job agent this will have to be communicated through the a job agent.

This also means that a job agent should stop all jobs gracefully when the job agent itself is stopped (see #798).

I have been thinking a bit of this. It is not trivial since it requires that the plug-in is willing to be killed.

There are two main approaches:

1. Send a signal using the Thread class

  • The BASE core is calling Thread.interrupt() on the thread that is executing the plug-in.
  • The plug-in must regularly check Thread.interrupted(). If this return true, it should stop what it is doing, rollback any changes and exit.
  • The call to Thread.interrupt() may also throw an InterruptedException if the plug-in is waiting in a blocking call.

There are a number of problems with this approach:

  • Plug-ins must be aware of this and be coded to check the Thread.interrupted() status and be able to act on it. Most plug-ins today are not coded like this.
  • If the plug-in is running on a job agent the kill signal must be communicated to the job agent first. The problem is that the only way to know which job agent a plug-in is running on is to ask all of them.
  • Plug-ins may be running on something else that we don't know of.

Possible solutions to the above problems include:

  • Create a tagging interface: Killable. If a plug-in implements this interface it must also promise to check and act on the Thread.interrupted() status. The core (and web client) will know if a plug-in is killable or not and doesn't have to put up a "Kill" button for those plug-ins that aren't killable.
  • Store some kind of callback hook in the database. A callback hook is registered by the application that is starting the job (ie. internal job queue, job agent, etc.). If no hook has been registered for a job, it can't be killed (even if it implements the Killable interface). The core could provide simple implementations for "same-process-hook" and "external-process-hook".

2. Send a signal using the ProgressReporter

The idea is that since most plug-ins already regularly reports their progress, the same mechanism could be used to convey information in the other direction. This could be implemented as a flag in the Jobs table. The flag is set when a user clicks the "Kill" button in the web interface. When the progress reporter is about to update the status, it could just as well check this flag and throw an exception if it has been set.

The good thing with this solution is that it requires no (or very little) cooperation from the plug-in side or from the job agent side. Plug-ins should already be prepared to handle exceptions properly (rollback, cleanup and exit).

There are some problems also:

  • Plug-ins are not required to use the progress reporter at all, and even if they do, there may be long intervals between the updates. For example, the Base1PluginExecuter doesn't update the progress when the external plug-in code is running, thus it would only be possible to kill this plug-in during the data export or import phase.

Note! The ability to kill a plug-in via the progress reporter could just be a special hook in solution 1 above. The Killable interface could have one method where the core could query the plug-in if it should use the thread or the progress reporter to kill it.

Change History (13)

comment:1 by Jari Häkkinen, 17 years ago

Milestone: BASE 2.5BASE 2.6
Summary: Kill a runing jobKill a running job

comment:2 by Nicklas Nordborg, 17 years ago

Description: modified (diff)

comment:3 by Johan Enell, 17 years ago

Milestone: BASE 2.6BASE 2.x+

comment:4 by Nicklas Nordborg, 17 years ago

Description: modified (diff)

comment:5 by Nicklas Nordborg, 17 years ago

Milestone: BASE 2.x+BASE 2.6
Status: newassigned

I'll start with this now. We had some discussions around the coffe table, and a lot of more good ideas came up. Options 1 is the foundation but has been extended a lot more and it will also be possible to cover case 2 with the solution. Here is some more information.

  1. Generalize the API to be able to send any kind of signal to a plug-in (or any interested party). We will create a Signal class and define the Signal.KILL signal.
  2. Plug-ins that are able to receive signals should implement the Signalable interface. The plug-ins are required to provide a SignalHandler and information about which signals are supported.
  3. The SignalHandler is another interface and we will provide two implementations. A ThreadSignalHandler that uses Thread.interrupt and a ProgressReporterSignalHandler that uses the progress reporter.
  4. To solve the problem of transporting signals from the web server to the signal handler we introduce two more interfaces SignalTransporter and SignalReceiver. The implementation comes in pairs, the transporters knows how to send a signal to a receiver. It can be in the local VM or through a socket or through web services. We will provide an implementation that uses sockets: SocketSignalTransporter and SocketSignalReceiver
  5. Creating a receiver is the responsibility of the job queue manager, ie. our internal job queue or the job agent. If everything is in place the core will make sure to register each signal handler with the receiver and put some vital information inside the database.
  6. The vital information is information about how to create a signal transporter that can send signals to the receiver. The information include which transporter class to use and some kind of initialisation parameters (for example the IP address and port number of the receiver).

When a plug-in is started the following happens:

  1. The core checks if it implements the Signalable interface.
  2. If so a SignalHandler is created.
  3. This handler is registered with the SignalReceiver provided by the queue manager.
  4. The vital transporter information is saved to the database

When the user wants to kill a job the following happens:

  1. The core checks if transporter information has been saved in the database
  2. If so, a SignalTransporter object is created
  3. The send(KILL) method is called
  4. The transporter knows how to send this to the receiver
  5. The receiver gets the signal and routes it to the registered signal handler
  6. The signal handler notifies the plug-in (for example setting Thread.interrupt()
  7. The plug-ins becomes aware of the notification and takes some action

comment:6 by Nicklas Nordborg, 17 years ago

(In [4073]) References #636: Kill a running job

Infrastructure for signal processing is set up. Need more implementors. Need more support in the core to be able to use it (in Job class). Internal job queue and job agents must also be updated.

comment:7 by Nicklas Nordborg, 17 years ago

(In [4074]) References #636: Kill a running job

Infrastructure is now in place. Implementation for internal job queue has been tested and working. Still need:

  • Implement signal transportation to job agents and via progress reporter
  • Implement support in other core plug-ins (only import plug-ins based on FlatFileParser are working right now)
  • Document everything

comment:8 by Nicklas Nordborg, 17 years ago

(In [4078]) References #636: Kill a running job

  • Signals are now working with job agents and the progress reporter.
  • Lowess and intensity calculator plug-in now support signals
  • Documentation has been written.

More things to do:

  • Add support in rest of core plug-ins

comment:9 by Nicklas Nordborg, 17 years ago

(In [4081]) References #636: Kill a running job

Updated UML diagram

comment:10 by Nicklas Nordborg, 17 years ago

(In [4118]) References #636: Kill a running job

Implemented support for cancelling the job in several of the core plug-ins and in a lot of places in the core that are used by the plug-ins.

comment:11 by Nicklas Nordborg, 17 years ago

(In [4120]) References #636: Kill a running job

All core plug-ins can now be killed. Added this to the example plug-ins as well. Support for interruption in some more places in the core.

comment:12 by Nicklas Nordborg, 17 years ago

(In [4128]) References #636: Kill a running job

Support for interruption in some more places in the core.

comment:13 by Nicklas Nordborg, 17 years ago

Resolution: fixed
Status: assignedclosed

Reopen or create new tickets if there are problems or support for interruption is missing somewhere.

Note: See TracTickets for help on using tickets.