Image Download from URL
Hello,
I'm currently struggeling with a HTTP request to a URL, which contains an jpeg image file.
Testing the request with a browser or Postman results in the image being shown normally.
Using a %Net.HttpRequest with different configurations has resulted in a corrupted file.
My code works for some URLs from other servers perfectly fine, but with some it produces corrupted file contents which do not represent a jpeg.
Set REQ=##class(%Net.HttpRequest).%New()
Set REQ.Server="www.distrelec.de"
set REQ.SSLConfiguration="agimero.quwiki.de"
SET REQ.FollowRedirect=1
SET REQ.ContentType="image/jpeg;charset=UTF-8"
DO REQ.SetHeader("Connection","keep-alive")
DO REQ.SetHeader("Accept-Encoding","gzip,deflate,br")
SET REQ.Port=443
set REQ.Https=1
SET STATUS=REQ.Get("/Web/WebShopImages/landscape_medium/_t/if/sortimentsboxen-1.jpg")
SET STREAM=REQ.HttpResponse.Data
I have tried different approaches like http with Port 80, different SSL Configurations, using $system.Util.Decompress but STREAM always contains less data, than the amount I can see in Postman or my browser. The image should be 18 KB but STREAM only contains 13 KB.
Converting to Base64 also yielded no displayable result.
Here are the headers sent and received with Postman.
To download that image, you need just a few lines of code
Class Some.Class Extends %RegisteredObject { ClassMethod GetImage() { s req=##class(%Net.HttpRequest).%New() s req.Server="www.distrelec.de" s req.SSLConfiguration="SSL" // use your SSL-Config-Name d req.Get("/Web/WebShopImages/landscape_medium/_t/if/sortimentsboxen-1.jpg",1) q req.HttpResponse } }
So your code is more or less OK, but the rest of the process is ominous
set rsp=##class(Some.Class).GetImage() zw rsp rsp=7@%Net.HttpResponse ; <OREF> +----------------- general information --------------- | oref value: 7 | class name: %Net.HttpResponse | reference count: 3 +----------------- attribute values ------------------ | ContentBoundary = "" | ContentInfo = "charset=UTF-8" | ContentLength = 17759 | ContentType = "image/jpeg;charset=UTF-8" | Data = "8@%Stream.GlobalCharacter" |Headers("CACHE-CONTROL") = "max-age=0" |Headers("CONTENT-LENGTH") = 17759 |Headers("CONTENT-TYPE") = "image/jpeg;charset=UTF-8" | Headers("DATE") = "Mon, 14 Nov 2022 10:57:10 GMT" | Headers("ETAG") = "a919e895229c7883864aecbfa2717516" |Headers("LAST-MODIFIED") = "Thu, 01 Jan 1970 00:00:01 GMT" |Headers("SET-COOKIE") = "visid_incap_2373370=1U8JalxbRJOzRFviQpi05AYfcmMAAAAAQUIPAAAAAADRm" |Headers("STRICT-TRANSPORT-SECURITY") = "max-age=31536000; includeSubDomains; preload" | Headers("X-CDN") = "Imperva" | Headers("X-IINFO") = "5-19652015-0 0CNN RT(1668423430526 51) q(0 -1 -1 0) r(0 -1)" | HttpVersion = "HTTP/1.1" | ReasonPhrase = "OK" | StatusCode = 200 | StatusLine = "HTTP/1.1 200 OK" +-----------------------------------------------------
The sender says, content type is "image/jpeg", which is OK, but charset=UTF-8 is, I think, a problem. A jpeg-image usually starts with (hex) bytes:
FF D8 FF E0 00 10 4A 46 49 46 ...
The HTTP-Response gives us
do rsp.Data.Rewind() zzdump rsp.Data.Read(10) 3F 3F 3F 10 4A 46 49 46 00 01 ???.JFIF..
But I'm in no way a web-expert, but it seems to me, Cache tries to decode (according to content-type =image/jpg; charset=UTF-8) the incomming raw (jpeg) data. The first byte, FF, will already give an error (no utf-8 encoded byte can start with FF) and returns an "?" char as a replacement. The next two "?" (hex: 3F) chars are also arised from (inpossible) decoding. Why the same page works, if you try it with Chrome or Firefox: I think, they either ignore the charset=UTF-8 or just show the raw data after the first decoding error.
Thank you for the thorough explanation!
Is there any way to prevent the UTF decoding when using %Net.HttpRequest? Or maybe revert it?
No, you can't revert, because the first chars (3 x '?') are replacement-chars. You could try to contact WRC.
I think, I have a solution for you
ClassMethod GetImage() { s req=##class(%Net.HttpRequest).%New() s req.Server="www.distrelec.de" s req.SSLConfiguration="SSL" s req.ReadRawMode=1 // <<---- this is your solution d req.Get("/Web/WebShopImages/landscape_medium/_t/if/sortimentsboxen-1.jpg") q req.HttpResponse }
To get the image
s rsp=##class(Some.Class).GetImage() i rsp.StatusCode=200 { s file="c:\temp\imageName.jpg" o file:"nwu":0 i $t u file d rsp.Data.Rewind(),rsp.Data.OutputToDevice() c file }
That's all...
The ReadRawMode did the trick! Thank you very much.