LLMWhisperer Modes
Feature matrix for LLMWhisperer modes
Native Text | Low Cost | High Quality | Form | |
---|---|---|---|---|
PDF (not scanned) | ✓ | ✓ | ✓ | ✓ |
PDF (scanned) | ✗ | ✓ | ✓ | ✓ |
PDF (with forms) | ✗ | ✗ | ✗ | ✓ |
Images | ✗ | ✓ | ✓ | ✓ |
MS Office Document | ✗ | ✓ | ✓ | ✓ |
MS Office Excel | ✗ | ✓ | ✓ | ✓ |
MS Office Powerpoint | ✗ | ✓ | ✓ | ✓ |
LibreOffice Writer | ✗ | ✓ | ✓ | ✓ |
LibreOffice Calc | ✗ | ✓ | ✓ | ✓ |
LibreOffice Impress | ✗ | ✓ | ✓ | ✓ |
Checkbox and Radio button detection | ✗ No | ✗ No | ✗ No | ✓ Yes |
Lines reproduction in output | ✗ No | ✓ Yes | ✓ Yes | ✓ Yes |
Extraction performance | Very fast | Fast | Medium | Medium |
Image preprocessing (median filter and gaussian blur) | ✗ No | ✓ Yes | ✗ No | ✗ No |
Line splitting stratergy choice | ✓ Yes | ✓ Yes | ✓ Yes | ✓ Yes |
Supported languages | All (unicode) | 120*+ | 300+ | 300+ |
Handwritting recognition | ✗ No | Basic support | ✓ Yes | ✓ Yes |
Layout preserving output | ✓ Yes | ✓ Yes | ✓ Yes | ✓ Yes |
AI/ML based enhancement | ✗ No | ✗ No | ✓ Yes | ✓ Yes |
Rotation and skew compensation | NA | ✗ No | ✓ Yes | ✓ Yes |
Auto repair PDFs | ✓ Yes | ✓ Yes | ✓ Yes | ✓ Yes |
Dense text content | Best performance | Very good | Very good | Very good |
High entropy content (each page contains large variery of text sizes) | Best performance | Very good | Very good | Very good |
Recommended use cases
Native Text | Low Cost | High Quality | Form | |
---|---|---|---|---|
Recommended use cases | • Low latency requirement • All documents are PDFs • PDFs are native text PDFs • Cost sensitive application | • High quality scanned PDFs • High quality scanned images • No handwritten documents | • Medium/low quality scanned PDFs • Medium/low quality scanned images • Handwritten documents | • Checkbox and radio button detection • Medium/low quality scanned PDFs • Medium/low quality scanned images • Handwritten documents |