Sampling can be Evil

I was stranded in a problem for a couple of hours this afternoon due to trusting too much in sampling. As usual, I used the following query to get some flavor of the MySQL table I was interested in.

select * from some_table limit 10;

Super huh? Short and quick, tells you something both about the schema and data within seconds. It worked well until this afternoon when I saw a string typed column across the 10 rows has the same value “AAA”. I took for granted (without even thinking) that all the hundreds of thousands of rows would have “AAA” in this column. I tried desperately (yup, putting logs everywhere) to figured out why my program saw “BBB” instead of “AAA”. In the end, I saw the light and removed “limit 10” in the query and those “BBB”s flew across my screen like crazy.

Leave a Reply

Your email address will not be published. Required fields are marked *