Speeding up $listget
Hi,
I'm trying to find the faster way to get the data from a class, and I find it very slow compared to traditional globals. So, I hope some of you can bring some light to me :-)
I have thousands of registers in a class, and to access it quickly I'm going with $o at the index. From there, I get the values using $listget(). Something like that:
s FromDateH = (+$h-1)
for {
set id=$order(^TestI("StartDateIDX",FromDateH,id))
quit:id=""
set dat=$lg(^TestD(id)) //dat=$lb("a","b","c","d","e")
}
I find it quite slow, so I've tried the same code but changing the conents of ^TEST from a List to a String:
s FromDateH = (+$h-1)
for {
set id=$order(^TESTINDEX("StartDateIDX",FromDateH,id))
quit:id=""
set dat=$g(^TEST2(id)) //dat = "a#b#c#d#e"
}
I find that getting the data from a String (like "a#b#c#d#e") is 4 to 5 times faster than getting it from a List (like $lb("a","b","c","d","e")). When you are managing a few records it may not make a differenc but in my case its having a huge impact.
Does anyone know how can I speed up the $listget() so the performance is similar as the old fashioned MUMPS way to get the string separated values?
Cheers! :-)
Once you have the $list value in dat, use $listnext rather than a loop with $listget.
For example:
set ptr = 0 while $listnext(dat,ptr,value) { <do something with your value in value> }
Thanks David!
But the problem is that that what takes looong time is getting the list values, not throttling over them.
Yes, sorry, my mistake. Actually the line should be:
compared to:
Thanks to Julius suggestion, I've ran the %SYS.MONLBL analysis tool and clearly something is messing up when trying to get the data from a list:
If you're sure that your id bigger ids are generated later, you can only get the first id from index and after that iterate the data global directly:
Also you can use three-argument form of $order to iterate and get data in one command.
Finally, consider checking work-heavy system with %SYS.MONLBL to verify what lines consume more time.
$get works as fast as global could be read. Some ideas:
I'm not sure I'm reading this correctly, but I believe the key difference in the cold runs is 10,399 vs 5,853, again suggesting that ^ListData went to disk more often.
I'm surprised that it makes such a big difference, but I suspect what's happening here is that your copy of ^ListData into ^StringData resulted in a more space-efficient organization. You might want to look at the packing column of a detailed report from the %GSIZE utility.
It's possible that something about your data causes $list to store it less efficiently, but your data hasn't convinced me of that. If you copied ^ListData unchanged into, say, ^ListData2, my guess is that you would see a similar improvement.
You are doing disk block reads in the one case which is why it is slower, how big is your global buffer pool? Also how big are your globals ^TestD and ^TEST2, use 'd ^%GSIZE' to find their sizes on disk. The $lb version will be slightly bigger as there is a byte taken as a type byte for each element and another length byte, this shows up when the data is very small like these single character ASCII elements, but $lb does mean you never need to worry about data that contains '#' characters and it preserves types where as the "a#b#..." format needs to convert everything into a string before storing it which adds runtime overhead too.
--
Mark
Compared to a delimited string, lists have the overhead of storing the length of each element, typically one extra byte. Numbers and Unicode characters are also stored differently, sometimes more efficiently, sometimes less. Otherwise, there is no difference between fetching a delimited string or a list.
The DataBlkRd and DataBlkBuf columns shows that ^StringGlobal was read entirely from global buffers, whereas ^ListGlobal had to read over 9,000 blocks. In each case, it seems that the global occupies about 17,000 blocks; about 136 MB, assuming 8 KB blocks.
I suggest that you do the following:
Based on your numbers, the first runs will be cold, and should take a minute or two. The second runs should be essentially instantaneous.
Finally I've made some tests. I have duplicated the listglobal changint its values to string, so I can compare two different globals wth the same data but stored differently. Results, show that accessing a list is much slower.
But finally, after reading a bit of doc I found that I could improve the performance by changing the database from 8kb to 64kb. And it really worked:
So
You are comparing apples wih oranges!
The line (your case 1):
set dat=$lg(^TestD(id)) //dat=$lb("a","b","c","d","e")
sets dat to the FIRST list item (if present, or to "", if the list item is NOT present) of ^TestD(id) but ONLY if ^TestD(id) is defined AND it is a Cache list.
In elsecase, you will get an <UNDEF> if ^TestD(id) does not exists or an <LIST> error, if ^TestD(id) exists but the content is not a list!
The line (your case 2):
set dat=$g(^TEST2(id)) //dat = "a#b#c#d#e"
sets dat to the content of ^TEST2(id) , if it exists or to "", if there is no ^TEST2(id)