Brute force is good, sometimes
I had to try to extract about 12,000 IP addresses from our click database, over a period of one and a half months. The table is famously big[1] (140,000 rows a day) and indexed on the date.
I don’t know what flew in me, but my first attempt was to generate SQL queries for each IP and run that query over the timespan. That’s 12,000 queries at about a minute apiece, in other words 8 days. Plus I couldn’t run it all the time, as it bogged down the database.
Then I found a better way. I took all rows for the timespan and checked if the IP was in a hash. If it was, I kept it, otherwise I just went on to the next one. I can’t believe I was so stupid I tried the first approach at all, and that I’m now blogging about it…
[1] in our system, you probably have much bigger tables.
Posted at 21:22,
in the comp category. Comments [0]
Submit this story to: » del.icio.us
» digg
» reddit. Search for it on
technorati.
Submit a comment
Please enter comments as plain text only; HTML is not supported. Submitting an URL is optional.
Comments are moderated and may not appear immediately.
Comments are closed for this story.