JDBC counts don't match
I'm using a JDBC driver to connect PGSQL -> Cache. I'm noticing when I run SELECT queries and COUNT(*) command against the same table, I get different result sets. I'm pretty new to Cache in general - so I'm trying to understand why these would be different.
Examples (TransID and InvNum should occur in every "row"):
SELECT COUNT(*) FROM ACCT.Services = 1,090,324 WHERE ACCT.Dept = 483
SELECT TransID FROM ACCT.Services = 1,085,776 WHERE ACCT.Dept = 483
SELECT InvNum FROM ACCT.Services = 586,023 WHERE ACCT.Dept = 483
(I tried the same thing in MSSQL via LinkedServer and it unsurprisingly yielded the same result.)
I'm suspecting this has to do with indexes, possibly? The only reason I say this is because the TransID seems to be the main index on this table. Does the index only include 'active rows' entries? Appreciate any insights. Thanks.
Sorry for the typo - that should have been
SELECT COUNT(*) FROM ACCT.Services WHERE ACCT.Dept = 483
Result: 1,090,324
SELECT TransID FROM ACCT.Services WHERE ACCT.Dept = 483
Result: 1,085,776
SELECT InvNum FROM ACCT.Services WHERE ACCT.Dept = 483
Result: 586,023
Just to clarify, in those bottom two queries, is that the entire query? No grouping, no sorting, no COUNT() functions?
If these are counts, it sounds like you've got some nulls in those columns. Try SELECT COUNT(*) FROM ACCT.Services WHERE TransID IS NULL and see what it says.
Thanks for quick reply, David. I wanted to double-check a few things based on your recommendations.
No, there are no grouping or sorting functions in the syntax. The only filtering I'm using against the data is the Dept = 483.
In my additional testing, I tried some additional syntax and noticed the following:
* SELECT COUNT(*) FROM ACCT.Services WHERE ACCT.Dept = 483 yields 1,085,776.
* SELECT COUNT(*), TransID FROM ACCT.Services WHERE ACCT.Dept = 483 AND TransID IS NULL yields 1,085,776.
* SELECT COUNT(*), TransID, ActnCode FROM ACCT.Services WHERE ACCT.Dept = 483 yields 1,085,776.
* SELECT COUNT(*), TransID, ActnCode, Comment WHERE ACT.Dept = 482 yields a result of 1,090,324.
* SELECT COUNT(*), TransID, ActnCode, Comment...[All the rest of the columns; too long to included here] WHERE ACT.Dept = 482 yields a result of 1,090,324.
I did some additional SELECT testing with other fields and consistently saw that if I only added index fields (e.g. TransID and ActnCode - either both or separately) I get the same 1,085,776 result. If I add a non-indexed field (e.g. Comment, TransUserID) to the SELECT listing, I get the higher (1,090,324) result. This holds true in both of those cases whether I have the COUNT(*) included in the SELECT statement or not.
Thanks again.
It looks the issue with indexing. Did you add some new indexes while you already had some data?
IRIS requires to manually rebuild index after adding or changes.
look at the documentation
Thanks for the quick reply, Dmitry. No - there haven't been any indexes added or changed at all. I'll take a look at the documentation link you provided. Thanks!
In any way, in such cases the first thing is would be to try to rebuild indexes. It should not take to much time, or even if you are able to do it, purge indexes first, then rebuild from scratch.
if it’s still shows differently. In this unexpected case, I would likely to see the your table definition.
I'm not sure if this will help cinch the indexing being the culprit or not, but I pulled just the TransIDs out (into a table by themselves) and then tried to do a SELECT * FROM ACCT.Services WHERE TransID = ######### AND ACCT.Dept = 482 to pull specif TransID rows. I spot checked a few that corresponded to the results in the SELECT COUNT(*), TransID, ActnCode, Comment WHERE ACT.Dept = 482 query and have already hit about three that didn't have any results. I'm assuming at this point it is because they're somehow part of the index, but not actual "active rows". My SQL-brain can't comprehend how I can have a query display a result but not find the detail that corresponds with that row id...but I'm sure it's my lack of Cache-101 knowledge...
For performance reasons, it's possible to define an Index in a way, that some of the columns will be as part of the index itself, just for search, and some data could be in the data part of that index, which will be used to display in the result if requested. So, if your index is somehow corrupted, the SQL engine will expect the values there, and will look for it, so, it will not go to the place where the data originally seat. And as a result, you may not see some of the data in the output rows.