- When a worker is lost, we retry the task
- Impact: We expect an improvement of total number of Flows which complete, even in the event when an infrastructure level change occurs (like a pod dying).
- PDF page results for a given file are now cached for 24 hours to prevent re-computation.
- We internally split PDFs into multiple pages and then re-join them to reduce memory usage of PDFService.
- Each step in a Flow is now limited to 2 minutes at maximum.
- Each image produced per PDF page is limited to 30MB
- Reduce disk memory usage for files emitted during a Process File run
- Improve quality of JPEG processing
- Process Files:
- produce_metadata_list is now to set ‘false’ by default
- Impact: The height / width of documents won’t be returned in the final IBOCR
- produce_word_metadata is now set to ‘false’ by default
- Impact: The coordinate level metadata for extracted words will no longer be available in the IBOCR
- A newer OCR model is used by default
- Impact: We expect that overall, we will get better OCR results with this newer deployed model. Let’s monitor closely how many successful times we OCR a page.
- Preview/Experimental Release: Emerson OCR: Targeted at poor-quality ID cards. Currently for experimentation on 1-3 documents at a time.
- A new ‘fonts’ flag has been added to OCR options, which will help OCRing specific fonts like OCR-A, OCR-B, and MICR.
- A correct_resolution_auto flag, which should is an improved version of resolution correction for images
- Roll out new file browser UI
Fixed regression where users using JWT authentication received an HTTP 400 authentication error.