All the tests were done sending 10000 logs in total, sending them as fast as possible:
Scenario | Process | Threads | Output | Logging Proxy | Central Logger | Oneway | Total | Average | Min | Max |
---|---|---|---|---|---|---|---|---|---|---|
1 | Client | 1 | Log (stdout) | 18380 us | 1.8 us | 1.6 us | 308 us | |||
1 | Client | 1 | Log (stdout) | 601893 us | 60.1 us | 5.9 us | 525.8 us | |||
2 | Client | 1 | Log (stdout) | 681522 us | 68.1 us | 52.8 us | 531.7 us | |||
3 | Client | 1 | Log (stdout) | 337458 us | 33.7 | 19.2 us | 353.4 us | |||
4 | Client | 1 | Log (file) | 18639 us | 1.8 us | 1.6 us | 295 us | |||
4 | Client | 1 | Log (file) | 548046 us | 54.8 us | 6.3 us | 783.2 us | |||
5 | Client | 1 | Log (file) | 619841 us | 61.9 us | 52.5 us | 946.9 us | |||
6 | Client | 1 | Log (file) | 320333 us | 32.0 | 16.8 us | 393.5 us | |||
7 | Client | 1 | stdout | 18841 us | 1.8 us | 0.8 us | 447.9 us | |||
8 | Client | 1 | file | 21358 us | 2.1 us | 1.4 us | 332.8 us | |||
9 | Client | 1 | no-op | 834 us | 0.1 us | 0.02 us | 95.2 us | |||
10 | Client | 10 | Log (stdout) | 64520.4 us | 6.4 us | 1.7 us | 610.9 us | |||
10 | Client | 10 | Log (stdout) | 1415860 us | 141.5 us | 4.9 us | 12617.4 us | |||
11 | Client | 10 | Log (stdout) | 2950060 us | 294.902 us | 66.257 us | 5005.26 us | |||
12 | Client | 10 | Log (stdout) | 958351 us | 95.75 us | 16.7 us | 65872.4 us | |||
13 | Client | 10 | Log (file) | 63096.3 us | 6.2 us | 1.7 us | 1007.8 us | |||
13 | Client | 10 | Log (file) | 930486 us | 93.0 us | 6.8 us | 15150.4 us | |||
14 | Client | 10 | Log (file) | 2319040 us | 231.8 us | 54.7 us | 5164.8 us | |||
15 | Client | 10 | Log (file) | 660748 us | 66.0 us | 15.3 us | 67563.6 us | |||
16 | Client | 10 | stdout | 301182 us | 30.1 us | 0.8 us | 1655.3 us | |||
17 | Client | 10 | file | 210906 us | 21.06 us | 1.5 us | 285.3 us | |||
18 | Client | 10 | no-op | 2409 us | 0.2 us | 0.02 us | 257.84 us | |||
19 | Component | 1 | Log (stdout) | |||||||
20 | Component | 1 | Log (stdout) | |||||||
21 | Component | 1 | Log (stdout) | |||||||
22 | Component | 1 | Log (file) | |||||||
23 | Component | 1 | Log (file) | |||||||
24 | Component | 1 | Log (file) | |||||||
25 | Component | 1 | stdout | |||||||
26 | Component | 1 | file | |||||||
27 | Component | 1 | no-op | |||||||
28 | Component | 10 | Log (stdout) | |||||||
29 | Component | 10 | Log (stdout) | |||||||
30 | Component | 10 | Log (stdout) | |||||||
31 | Component | 10 | Log (file) | |||||||
32 | Component | 10 | Log (file) | |||||||
33 | Component | 10 | Log (file) | |||||||
34 | Component | 10 | stdout | |||||||
35 | Component | 10 | file | |||||||
36 | Component | 10 | no-op |
From the statistics results, we can infer that (in a local environment) the biggest performance hit comes from the LoggingProxy itself and not the communication with the Central Logger. Going a bit further we found two slow downs down the path:
xmlElement = XMLParser::parseString(iter->c_str(), true); |
And
ts.tv_sec=0; ts.tv_nsec=100; //FIXME: This nanosleep provide stability to the logging system. It shouldn't exist :( nanosleep(&ts,NULL); |
Which combined with several mutexes, end up generating thread contention on the logging system.
The impact is of about 60/70 μs on average for each log call, which is not a big number, but we would like to be closer to the 1/10 μs range.
There are different strategies that could be used to fix this problem, such as refactoring the code, replacing the XML parsing library or the mechanism used within the library, re-structuring the use of mutexes to multiple instances or changing them to be RW mutexes.
In the case of the central logger, it's not a mystery the slow down comes from the remote call to the central logger.
m_logger->writeRecords(reclist); |
Here the slowdown accounts for ~70/80 μs on a local deployment, but can get closer to 1 ms when the central logger is on a different machine than the process generating the logs.
Again, there are several alternatives to improve the thread contention problems, for example by queuing this task and dispatching them on a separate thread. It is also possible to replace the writeRecords method to be oneway on the IDL, which makes it 2x/3x faster on the local deployment, and it also makes it invariant to the deployment as this call does not wait for a reply from the remote object. The drawback is losing the ability to assess whether the message was sent or there was a problem along the way, which could be leveraged by the use of a callback object.