OpenId() is slow, why?
Hi Community! I need your advice.
The method below is 6 times slower on production server than on developing machine:
ClassMethod runme4()
{
s cnt=615210
s st=$zh
for i=1:1:cnt {
s p=##class(digi.packet).%OpenId("packet||5237")
w:p="" "not found",!
k p
}
w $zh-st,!
}
BUT the server is 6 times faster if OpenId replaced with simple read of a large global (s p=^someLargeGlobal). Any ideas what makes OpenId so slow only on the server?
And another question: I wanted to profile the code with
Thanks!
Some details:
- The server is almost idle at the experiment time.
- The database is exactly same.
- Cache version is same (2017.2) but the server is Linux and dev pc is Windows.
- digi.packet is a regular
%Persistent class with nothing special
A rule of thumb:
Global buffer size ~50% of physical memory (set it manually !)
- enable / enforce large pages
- run it in a loop.
- the first OPEN has to fetch the whole global structure from disk, all following load from Global buffers
Thanks. What do you mean with "run it in a loop"?
Sorry I saw your FOR loop only when the reply was out already
So you are not hurt by First-Touch effect anyhow.
As it is always the same object even buffer size should have no influence.
is there any other load on the server that could have influence ?
>is there any other load on the server that could have influence ?
The load is first explanation I had in mind. But I've done the comparison several times with server load near to none.
One more thought:
Depending on the call structure of your methos/classmethods it's necessary to increase routine buffers too.
In our case 128 MB was too less. We configure to 256 MB on all production systems and it runs flawless since.
Hi Alexander,
how did you realize that you have to adjust the buffers? What kind of problems you had?
We did some execution time measurements and had extremely high times in some nested calls compared between two machines. The calls were between some very big routines. The most extreme has its object code splitted to 46 ^rOBJ subnodes and has a total size of >1MB.
You realize the difference only in very long loops (as in the sample above) and deep nested calls to big routines (BTW: classes are compiled to routines too).
The measurements were made on a system with high load so the routine buffer was quickly invalidated by other processes.
I hope this helps.
You can check the mgstat's RouLas (routine loads and saves) - see this article that includes discussion of sizing routine buffers, among many other tips for sizing a system's memory config!
https://community.intersystems.com/post/intersystems-data-platforms-and-performance-part-4-looking-memory
Hello @Arto Alatalo
If there is no overload on production server, perhaps can you perform a D ^%BENCHLANG on production and dev machine?
It's benchmark CPU against ObjectScript language.
Compare the COSMark result.
Good idea! Never heard about BENCHLANG before. The server is almost idle at night time so now I know what it can do in its spare time.
The other commenters have good suggestions but I would recommend additionally collecting a pButtons to compare. Try and determine if there are any differences other than OS - hardware, memory configuration, global mappings, or anything else. We want to make sure we're comparing apples to apples! If everything seems to be the same, then you could reach out to the WRC who can help investigate this further.
For sure there are tons of differences in the hardware! And the price is primary of them! The productions server cost is I guess like ten of my developing laptops. And this is one of the reasons I expect the server be fast :)
Thanks. Yes, I'm now automatically collecting execution time every 15 mins to check the performance overnight when all the users are gone.
Any particular reason your id is not an integer?
I'd think it's a parent-child relationship but why a parent id is a string?
Hi Eduard,
no, it is not parent-child relationship, it's composite primary key, and I have good reasons for it.
Which are?
Can you compare the performance to index opening:
s p=##class(digi.packet).IDKEYOpen("packet", 5237)
Yes, I can try it but how do you mean it is related to my problem? My problem is that my laptop executes very same code against very same database 6 times faster than powerful idle server.
Without reports suggested by other users there's not much I can tell you.
What are global buffer sizes on prod/dev machines?
Preheat global cache by looping over data global and compare the timings after that.
List of utilities/reports that should help you:
Check this series of articles by @Murray Oldfield.
This IDKEYOpen looks as capricious beast. Because of some reasons digi.packet has no such method. But its sibling class do have it although in lowercase:
zidkeyOpen(K1,K2,concurrency=-1,sc) public { Set:'($data(sc)#2) sc=1 Quit ..%OpenId(K1_"||"_K2,concurrency,.sc) }
A few other classed have IDKEYOpen in upper case:
zIDKEYOpen(K1,concurrency=-1,sc) public { Set:'($data(sc)#2) sc=1 Quit ..%OpenId(.K1,concurrency,.sc) }
Looks as the compiler generates this method only in some cases. But why in lower case? Strange.
What for the performance, the method looks as a very simple wrapper around %OpenId, so no much performance difference to expect.
The name of the <index>Open method is based on the name of the index. I suspect that in one case the name of the IDKEY index is "IDKEY" and in another it is "idkey". No <index>Open method is generated if the index is not unique, primary or id.
Oh yes, indeed. Thanks Daniel.
You are wondering about a SLOW production server and a FAST development machine?
I show you, how to turn your machines (production or development or both) into a SLOW and FAST systems by using your own measurement method.
ClassMethod Unclean() { s p=##class(digi.packet).%OpenId("packet||5237") d ..runme4() } ClassMethod Clean() { d ..runme4() }
Login to either of your systems and try the abowe methods in a terminal session. On my local system (of course, with an own persistent class) the one method is 12 times faster then the other.
Maybe you have an "Unclean" situation on your development system...
Also, sometime it is worth to put a line like
do $system.OBJ.ShowObjects()
as a first line into the runme4() method
No, this is not my case :) I've exact copy of the test method on each machine.
I can see some difference between the methods: with p existing, each next Open spends some time checking if the object already open. But 12 times... I wonder what makes it.
Thanks for your time.
The %OpenId() method checks, if the object to be opened ALREADY open (in the current session/job). If it's open then it just returns with an success. In other case, it loads it from the disk.
If on your development system, for whatever reason, there is an open instance then you save 6.1 million times the "load" operation.
You wrote
while not specifically answering your general question, note that opening an object is very different than
Set p=^SomeLargeGlobal.
When you call %OpenId
Set p=^SomeLargeGlobal
and %OpenId can be quite different, at least academically.
If your class as relationships and/or serial/embedded classes I do not believe those properties are fully loaded into memory, but I could be incorrect.
In practice, if I need an object I use objects, if I need only a small part of the object/row I'll use embedded SQL for better performance.
Again this does not specifically answer your general question but I think it is useful to understand what %OpenId does and why it's not the same as
Set p=^SomeLargeGlobal.
Yes, for sure pure global reading is not the same as Open. With this test I just wanted to make sure the slowness is not related to global access speed.
You wrote:
Just to see how much gain I can have, I compared options 0 and 1, and the diff is surprisingly small: the test with 0 takes 34 secs, and with 1 takes 35. I would expect a bigger difference.
The difference between runs can be caused simply by normal variations in system load. The difference between Concurrency = 0 and Concurrency = 1 is only present when the object is stored in multiple global nodes. There are five possible values that are described in the %Library.Persistent class documentation (viewable through the SMP). Concurrency = 1 is referred to as "atomic read". For single node objects there is no extra work required for atomic read.