Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

Version 1 Current »

The Textract Timings.xlsx file data was gathered by putting the files on S3 and using the list of file/objects to programmatically execute a series of start_document_text_detection() calls in Python in quick succession. All of the 16:XX (mi:ss) times listed in the file are with reference to Tue, 18 Jan 2022 02:16:XX GMT, when the script was executed.

The Comprehend Timings.xlsx file data was gathered by submitting Comprehend jobs in a pipeline.

  1. The first job in the list failed (pointing to an input that didn’t exist). There were none of our jobs in the pipeline at this point.

  2. The second job completed, but no other jobs were submitted until it completed 8 minutes later. This 8-minute time was something we saw previously for single jobs.

  3. The remaining jobs were submitted with a decreasing time between jobs (400s, 200s, 100s, …, 4s, 4s). This didn’t eliminate a long processing time, it still took at least 6 minutes per job.

  • No labels