Performance in converting large amounts of Documents

Using OLE Automation API
Post Reply
alan
Posts:7
Joined:Thu Feb 14, 2008 5:01 pm
Performance in converting large amounts of Documents

Post by alan » Thu Feb 14, 2008 5:38 pm

I'm using the automation API to convert a large amount of documents. I use a PHP script to dump them into a "queue" directory and then every couple of minutes I fire off a PHP script that first checks to see if there's another convert job still running, and if so exit, otherwise, get a list of files in this directory and start converting.

It seems to work fairly well. The problem is that many documents can't be converted for whatever reason (Too big, requires user input to complete the printing process, etc...). I've found a timeout that seems to balance the time to print long documents versus just wasting time waiting for documents that just won't convert.

So, my question is: How do I scale this process up? Can I run multiple queues and web server (apache) processes so I can make better use of my dual CPU cores (CPU is idle a lot of the time)? Or will the underlying windows printing subsystem be the choke point and make each job wait? Will there be a problem if 2 queues are trying to print the same type of document (so for example print2flash is opening 2 instances of word behind the scenes to convert both docs)?

Thanks for any insight/ideas.

staff
Posts:267
Joined:Sat Dec 15, 2007 4:48 pm

Re: Performance in converting large amounts of Documents

Post by staff » Sat Feb 16, 2008 12:48 pm

This is indeed a problem as it is difficult to determine a "compromise" timeout value that would work well both for large documents taking much time to print normally and for documents that cannot be printed at all due to a number of reasons you mentioned. As in general case it is not possible to know beforehand if the document will print or not, we'd have to wait for the timeout to learn of such a situation.

Currently, Print2Flash can convert only a single document at a time. This is due to Windows printing architecture. You may safely send multiple documents for document conversion to Print2Flash using Automation API or Batch Processing, and it will put them in queue and process them in turn so no abnormal behavior is expected in such a case.

One of the possible solutions would be providing support for multiple threads. This feature is already on our future road map. Having multiple threads, there would be a possibility both for utilization of the CPU time in more degree and for less wasting of time due to waiting for timeouts of the documents that would not print.

Another solution would be providing a special API for converting documents of some types: Office documents and possibly PDF which will utilize internal APIs of MS Office and Adobe Acrobat. This way if a document cannot be printed, we expect that timeout to learn about this will not be needed. This is also planned for future versions. Could you tell us with which types of documents you experience timeout problems?

rickey
Posts:2
Joined:Mon Mar 10, 2008 4:39 pm

Re: Performance in converting large amounts of Documents

Post by rickey » Mon Mar 10, 2008 4:46 pm

when should we expect support for multiple threads?

staff
Posts:267
Joined:Sat Dec 15, 2007 4:48 pm

Re: Performance in converting large amounts of Documents

Post by staff » Tue Mar 18, 2008 5:09 pm

We decided to include this feature in our next major version release which should be released in early May.

Post Reply