How to select a random row from a table?

Question

Question

Henry Pereira · May 20, 2020

Hi community,

I need to write an SQL query to fetch a random record from a table, that table has millions of data.

In postgresql, for example, there is a RANDOM() function to do something like that:

SELECT column FROM table
ORDER BY RANDOM()
LIMIT 1

Is it possible to do something like that in Caché?

Thanks in advance

Discussion (8)0

Log in or sign up to continue

Jon Willeke · May 20, 2020

One thing I've done to split machine learning datasets is to use an auxiliary table that maps IDs to a random number. I write a stored procedure that returns a random number. I create a table with two columns, one for the ID of the source table, and one to hold a random number. I populate the column for the source IDs:

insert into random_table (source_id)
select id from source_table

I then populate the column for the random number:

update random_table
set random_number = MySchema.MySP_Random(1E9)

Then I can select with an ORDER BY clause on the random number:

select top 10 source_id
from random_table
order by random_number, source_id

It depends on your use case whether this will be appropriate for a source table with millions of rows. It's an expensive way to select just one row.

0 0

Robert Cemper · May 20, 2020

As there is no default RANDOM() in Caché/IRIS SQL you have to write it yourself.
Create this class:

Class User.Henry { ClassMethod Random(maxval As %Integer = 2) As %Integer [ SqlName = RANDOM, SqlProc ] { quit $random(maxval) } }

Assuming your table has defautl ID (%Integer, MINVAL=1)

SELECT TOP 1 column FROM table
HAVING ID >= RANDOM(MAX(ID))

>= ensures that you just get an existing ID and don't drop on a deleted one

Of course, if you know in advance the highest existing ID (e.g. maxid) then RANDOM(maxid) will do it.

3 0

score 0 · Answer 1 · 2020-05-20T15:23:34-04:00

Eduard Lebedyuk May 20, 2020 to Jon Willeke

Jonathan, where do you perform the training/fit part of the AI/ML pipeline?

0 0

score 0 · Answer 2 · 2020-05-20T16:52:22-04:00

Jon Willeke May 20, 2020 to Eduard Lebedyuk

This is for a simple train-test split as part of testing for IntegratedML, currently in beta.

0 0

score 0 · Answer 3 · 2020-05-20T15:01:07-04:00

Henry Pereira · May 20, 2020

Thank you very much guys

0 0

score 0 · Answer 4 · 2020-05-21T10:57:41-04:00

I'm curious to know what you're doing that you are looking for a random record in such a large dataset? Sincerely interested in what you're doing so I can learn and not challenging the validity of your task!

score 0 · Answer 5 · 2020-05-21T11:59:13-04:00

Henry Pereira May 21, 2020 to Michael Davidovich

Hi @Mike.Davidovich is to audit the data with impartiality

0 0

score 0 · Answer 6 · 2020-05-21T12:08:14-04:00

Michael Davidovich May 21, 2020 to Henry Pereira

Well that makes perfect sense!

0 0