InterSystems IRIS currently limits classes to 999 properties.
But what to do if you need to store more data per object?
This article would answer this question (with the additional cameo of Community Python Gateway and how you can transfer wide datasets into Python).
The answer is very simple actually - InterSystems IRIS currently limits classes to 999 properties, but not to 999 primitives. The property in InterSystems IRIS can be an object with 999 properties and so on - the limit can be easily disregarded.
Approach 1.
Store 100 properties per serial property. First create a stored class that stores a hundred properties.
Class Test.Serial Extends %SerialObject { Property col0; ... Property col99; }
And in your main class add as much properties as you need:
Class Test.Record Extends %Persistent
{
Property col00 As Test.Serial;
Property col01 As Test.Serial;
...
Property col63 As Test.Serial;
}
This immediately raises your limit to 99900 properties.
This approach offers uniform access for all properties via SQL and object layers (we always know property reference by it's number).
Approach 2.
One $lb property.
Class Test.Record Extends %Persistent
{
Property col As %List;
}
This approach is simpler but does not provide explicit column names.
Use SQL $LIST* Functions to access list elements.
Approach 3.
Use Collection (List Of/Array Of) property.
Class Test.Record Extends %Persistent
{
Property col As List Of %Integer;
}
This approach also does not provide explicit column names for individual values (but do you really need it?). Use property parameters to project the property as SQL column/table.
Docs for collection properties.
Approach 4.
Do not create properties at all and expose them via SQL Stored procedure/%DispatchGetProperty.
Class Test.Record Extends %Persistent
{
Parameter GLVN = {..GLVN("Test.Record")};
/// SELECT Test_Record.col(ID, 123)
/// FROM Test.Record
///
/// w ##class(Test.Record).col(1, )
ClassMethod col(id, num) As %Decimal [ SqlProc ]
{
#define GLVN(%class) ##Expression(##class(Test.Record).GLVN(%class))
quit $lg($$$GLVN("Test.Record")(id), num + 1)
}
/// Refer to properties as: obj.col123
Method %DispatchGetProperty(Property As %String) [ CodeMode = expression ]
{
..col(..%Id(), $e(Property, 4, *))
}
/// Get data global
/// w ##class(Test.Record).GLVN("Test.Record")
ClassMethod GLVN(class As %Dictionary.CacheClassname = {$classname()}) As %String
{
return:'$$$comClassDefined(class) ""
set strategy = $$$comClassKeyGet(class, $$$cCLASSstoragestrategy)
return $$$defMemberKeyGet(class, $$$cCLASSstorage, strategy, $$$cSDEFdatalocation)
}
The trick here is to store everything in the main $lb and use unallocated schema storage spaces to store your data. Here's an article on global storage.
With this approach, you can also easily transfer the data into Python environment with Community Python Gateway via the ExecuteGlobal method.
This is also the fastest way to import CSV files due to the similarity of the structures.
Conclusion
999 property limit can be easily extended in InterSystems IRIS.
Do you know other approaches to storing wide datasets? If so, please share them!
The question is how csvgen could be upgraded to consume csv files with 1000+ cols.
While I always advertise CSV2CLASS methods for generic solutions, wide datasets often possess an (un)fortunate characteristic of also being long.
In that case custom object-less parser works better.
Here's how it can be implemented.
1. Align storage schema with CSV structure
2. Modify this snippet for your class/CSV file:
Parameter GLVN = {..GLVN("Test.Record")}; Parameter SEPARATOR = ";"; ClassMethod Import(file = "source.csv", killExtent As %Boolean = {$$$YES}) { set stream = ##class(%Stream.FileCharacter).%New() do stream.LinkToFile(file) kill:killExtent @..#GLVN set i=0 set start = $zh while 'stream.AtEnd { set i = i + 1 set line = stream.ReadLine($$$MaxStringLength) set @..#GLVN($i(@..#GLVN)) = ..ProcessLine(line) write:'(i#100000) "Processed:", i, ! } set end = $zh write "Done",! write "Time: ", end - start, ! } ClassMethod ProcessLine(line As %String) As %List { set list = $lfs(line, ..#SEPARATOR) set list2 = "" set ptr=0 // NULLs and numbers handling. // Add generic handlers here. // For example translate "N/A" value into $lb() if that's how source data rolls while $listnext(list, ptr, value) { set list2 = list2 _ $select($g(value)="":$lb(), $ISVALIDNUM(value):$lb(+value), 1:$lb(value)) } // Add specific handlers here // For example convert date into horolog in column4 // Add %%CLASSNAME set list2 = $lb() _ list2 quit list2 }
Thanks, Ed!
Could you make a PR?
I have no concrete ideas on how to automate this.
This is a more case-by-case basis.
After more than 42 years of M-programming and in total of 48 years of programming experience I would say, if you need a class with about 1000 or more properties than something is wrong with your (database) design. There is nothing more to say. Period.
Wide datasets are fairly typical for:
I'm fairly sure there's more areas but I have not encountered them myself.
Recently I have delivered a PoC with classes more than 6400 columns wide and that's where I got my inspiration for this article (I chose approach 4).
@Renato Banzai also wrote an excellent article on his project with more than 999 properties.
Overall I'd like to say that a class with more than 999 properties is a correct design in many cases.
You probably right for a majority of tasks. But how do you manage with AI tasks which NEED to manage thousands of features of entities? And features are properties/fields from data storage perspective.
Anyway, I'm really curious how do you deal with AI/ML tasks in IRIS or Caché.
I fully agree with Julius.
Any class that exceeds 100 properties is dead wrong.
We're not talking tables (relational) (and even then it's wrong) but OO design, which doesn't seem to be much in fashion these days.
Redesign your solution
Entity–attribute–value model is usually used for this purpose.
I have already written about this at the time: SQL index for array property elements.
That's good and well for sparse datasets (where say you have a record with 10 000 possible attributes but on average only 50 are filled).
EAV does not help in dense cases where every record actually has 10 000 attributes.
My EAV implementation is the same as your Approach 3, so it will work fine even with fully filled 4.000.000 attributes.
Since the string has a limit of 3,641,144, approaches with serial and %List are dropped.
All other things being equal, everything depends on the specific technical task: speed, support for Objects/SQL, the ability to name each attribute, the number of attributes, and so on.
The approach 1 doesn't "raises your limit to 99900 properties" but rather to 6600 properties.
You can test it through large.utils.cls
w ##class(large.utils).init(66,100)
deleting large.serial100
creating large.serial100
compiling large.serial100
Compilation started on 07/31/2023 14:48:16
Compiling class large.serial100
Compiling routine large.serial100.1
Compilation finished successfully in 0.218s.
creating large.c66
compiling large.c66
Compilation started on 07/31/2023 14:48:16
Compiling class large.c66
Compiling table large.c66
Compiling routine large.c66.1
Compilation finished successfully in 8.356s.
1
w ##class(large.utils).init(67,100)
deleting large.serial100
creating large.serial100
compiling large.serial100
Compilation started on 07/31/2023 14:48:27
Compiling class large.serial100
Compiling routine large.serial100.1
Compilation finished successfully in 0.213s.
creating large.c67
compiling large.c67
Compilation started on 07/31/2023 14:48:27
Compiling class large.c67
Compiling table large.c67
Compiling routine large.c67.1
ERROR #5002: ObjectScript error: <MAXSTRING>CompileRtns+286^%occRoutine
ERROR #5002: ObjectScript error: <NOROUTINE>DescribePhase2+9^%occSysDescriptor *large.c67.1
Detected 2 errors during compilation in 6.896s.
0 a(<MAXSTRING>CompileRtns+286^%occRoutineÛCompileRtns+286^%occRoutine IRISAPP³e^ReturnError+2^%occSystem^1!e^CompileList+229^%occCompile^1e^CompileList+23^%apiOBJ^1e^Compile+1^%apiOBJ^1e^Compile+1^%SYSTEM.OBJ.1^1^init+50^large.utils.1^1e^^^0K0 G¦
large.c67/ IRISAPP#!e^CompileRtns+388^%occRoutine^1
IRISAPP>
I guess we are hitting some other limit.
I went with approach 4 myself.