Strategy to use for very large report output script?

skypanther

Exalted Code Master!
I have a client that once or twice a year needs to print out a report containing a lot of data--a couple of hundred database records with up to four photos each which must be resized before including in the report.

Ultimately, they'd like a PDF file out of this. Right now, I have a page that with some CSS does a fairly decent job of outputting their data in the format they want. But the script takes a long time to run (many minutes). It involves dozens of db queries, plus the PHP to do the image resizing and output. Thus, it puts a rather heavy burden on the shared servers here.

I guess I'm going to modify the script to use AJAX to pull each record individually, so that the data is at least pulled asynchronously with many but smaller page requests. That would eliminate the PDF output option, though. And, it just trades one big request for lots of smaller ones.

Before I go twiddling code, I was hoping maybe someone could offer some alternate strategies that I could investigate. A more efficient programming scheme would be nice. But, I'm even thinking non-webby stuff too. Like, some sort of way to dump and FTP the data to a local PC and then do the processing offline. But, it has to be automated and simple, as they aren't very tech savvy.

Thanks,
Tim
 
I have a client that once or twice a year needs to print out a report containing a lot of data--a couple of hundred database records with up to four photos each which must be resized before including in the report.

Ultimately, they'd like a PDF file out of this. Right now, I have a page that with some CSS does a fairly decent job of outputting their data in the format they want. But the script takes a long time to run (many minutes). It involves dozens of db queries, plus the PHP to do the image resizing and output. Thus, it puts a rather heavy burden on the shared servers here.

I guess I'm going to modify the script to use AJAX to pull each record individually, so that the data is at least pulled asynchronously with many but smaller page requests. That would eliminate the PDF output option, though. And, it just trades one big request for lots of smaller ones.

Before I go twiddling code, I was hoping maybe someone could offer some alternate strategies that I could investigate. A more efficient programming scheme would be nice. But, I'm even thinking non-webby stuff too. Like, some sort of way to dump and FTP the data to a local PC and then do the processing offline. But, it has to be automated and simple, as they aren't very tech savvy.

Thanks,
Tim

I am assuming that the image resizing part is why it takes so long. Unless the pictures are absolutely massive, could you do it in a way where you just set the width and height of the images instead of actually resizing them. It may not look amazing in a browser, but if you output that to a PDF or just print it directly, it will look fine. Also, you say there are a lot of queries, and I don't know your SQL abilities, but combining some of them, wherever possible, may not hurt.
 
Thanks doctorallia. The images are resized on upload to 1024x768. But for the output report, they must be sized down to fit on the printed page. So, roughly 1/4 size. I guess since there are four images per artisan and roughly 250 artisans in the database, that's 1000 resize operations. Like you say, it might be better to do browser-side resizing and use up a bit of their transfer bandwidth to save processing time.

As for the query optimization, the biggest issue is how my code must loop and repeat queries. Artisans are categorized into a dozen or so categories and must be output that way. So, as I loop through each category, I query to get each artisan in that category. Then of course I have to loop through the artisans to output them. I guess the net of it is maybe 15 queries, so it's probably not that big of a deal. (The images aren't stored in the db, just paths to the physical files.)

They don't need to do any further post-process sorting, so till now I hadn't thought of doing something like fetching all the artisans with a single query and then sorting client-side. I'm not sure JavaScript could handle that much data.

Rather than spending a lot of time reworking my code, it would probably take me less time to once a year dump the db, ftp down all the images to my localhost server, run the report here and create a PDF file that I email to my client.

Tim
 
It should still be possible to output all that data with a single query rather than using 15 in a loop, although I suppose the JOINs could get messy.
It would probably save database processing time though. If the database is being accessed remotely the roundtrip delay could become a problem.
15 queries on 12 categories on a server with a 200ms ping would result in 36 seconds processing time minimum if handled synchronously :)
 
Back
Top