Dahu: online data analysis server
Dahu is a JSON-RPC server operated over Tango able to execute some Python code remotely for online data-analysis with a low latency.
The dahu server executes jobs:
Each job lives in its own thread (yes, thread, not process, it the plugin’s developer to ensure the work he is doing is GIL-compliant).
Each job executes one plugin, provided by the plugin’s developer (i.e. the scientist)
The job (de-) serializes JSON strings coming from/returning to Tango
Jobs are executed asynchronously, the request for calculation is answered instantaneously with a jobid (an integer, unique for the process).
The jobid can be used to poll the server for the status of the job or for manual synchronization (mind that Tango can time-out!).
When jobs are finished, the client is notified via Tango events about the status
Results can be retrieved after the job has finished.
Jobs execute plugin:
Plugins are written in Python (extension in Cython or OpenCL are common)
Plugins can be classes or simple functions
The input and output MUST be JSON-seriablisable as simple dictionnaries
Plugins are dynamically loaded from Python modules
Plugins can be profiled for performance analysis
Offline processing
All jobs can be run offline using the dahu-reprocess command line tool. This tool is not multithreaded and plugins are directly run, it is intended for:
offline developments
re-processing some failed online processing (where performances are less critical).
Dahu is light !
Dahu is a small project started at ESRF in 2013 with less than 1000 lines of code. It is used in production since then on a couple of beamlines. With its FIFO scheduler, dahu is very fast (1µs locally, 0.3ms from Tango)