stream.Read() only reading in chunks of 1K
I use the following code to calculate the SHA1 of a file :
set stream = ##class(%Stream.FileBinary).%New()
do stream.LinkToFile(filename)
write $SYSTEM.Encryption.Base64Encode($SYSTEM.Encryption.SHA1HashStream(stream))
ObjectScriptObjectScript
This code is called thousands of time and performance is critical. I have tried to code same logic in another language (which is lower level) and it's almost twice as fast. It's unclear why so I started investigating.
Using Process Monitor, it shows that files are read in chunks of 1024 bytes (1K) which is suboptimal. Reading a file of 1MB while require 1024 file system calls. Usually bigger buffer is used (eg : 4096 or 81920).
The SHA1HashStream() function is implemented this way :
do $System.Encryption.SHAHashReset(160)
set sc=stream.Rewind() If $$$ISERR(sc) Quit ""
while 'stream.AtEnd {
do $System.Encryption.SHAHashInput(160, stream.Read(32000,.sc))
if $$$ISERR(sc) Quit
}
quit $System.Encryption.SHAHashResult(160)
ObjectScriptObjectScript
stream.Read(32000) will do the following call :
Read:32000
ObjectScriptObjectScript
So I except it to read the file in chunks of 32000 bytes, but that's not the case.
Is this excepted behavior ? Is there a way to change this ?
EDIT: I have been able to force 1024 bytes reads in the other language implementation and it's still about twice faster so performance issue is probably due to something else.
The documentation for the Read method says "Some stream classes use this to optimize the amount of data returned to align this with the underlying storage of the stream." I take this to mean that for a file stream, it might be trying to read in a way that aligns with how the drive is formatted. Can you run the command below and see if the Bytes Per Cluster is 1024?
Hello, I got the same as you (4096) :