Skip to main content
Version: 2.0.0

Highlight API

Retrieve the line metadata which helps with highlighting

Endpoint/highlights
URLhttps://llmwhisperer-api.us-central.unstract.com/api/v2/highlights
MethodGET
Headersunstract-key: <YOUR API KEY>

Parameters

ParameterTypeDefaultRequiredDescription
whisper_hashstringYesThe whisper hash returned while starting the whisper process.
linesstringYesDefine which lines metadata to retrieve. You can specify which lines metadata to retrieve with this parameter. Example 1-5,7,21- will retrieve lines metadata 1,2,3,4,5,7,21,22,23,24... till the last line meta data.

Example Curl Request

curl -X GET --location 'https://llmwhisperer-api.us-central.unstract.com/api/v2/highlights?whisper_hash=XXXXXXXXXXXXXXXXXXX&lines=1-2' \
-H 'unstract-key: <Your API Key>'
info

To include the headers in the response use curl -i in the request.

Response

HTTP StatusContent-TypeHeadersDescription
200application/jsonLine metadata
400application/jsonError while retrieveing. Refer below for JSON format

Example 200 Response

{
"1": {
"base_y": 0,
"base_y_percent": 0,
"height": 0,
"height_percent": 0,
"page": 0,
"page_height": 0,
"raw": [
0,
0,
0,
0
]
},
"2": {
"base_y": 155,
"base_y_percent": 4.8927,
"height": 51,
"height_percent": 1.6098,
"page": 0,
"page_height": 3168,
"raw": [
0,
155,
51,
3168
]
}
}

Example 400 Response

{
"message": "Line metadata not found"
}

Line Metadata

This data can be used show the bounding box of each line of text extracted from the document in the original document. Useful for highlighting the extracted text in the original document.

The line metadata contains the bounding box coordinates for each line of text extracted from the document. For each line, an array of arrays is provided with the bounding box coordinates. The bounding box information is represented as a array of four integers: [page_no, y, height, max_height], where:

  • page_no: The page number of the document.
  • base_y: The y-coordinate of the bounding box (bottom).
  • height: The height of the bounding box.
  • page_height: The height of the original page.
info

Highlighting is for the entire row of the original document. The x coordinates are not provided as they will effectively be the width of the page. x1=0 and x2=page_width always.


(0,y-height) +-------------------------+ (page_width,y-height)
| |
| |
(0,y) +-------------------------+ (page_width,y)

line_metadata = {
"1": { # Line 1
"base_y": 0,
"base_y_percent": 0,
"height": 0, # Height
"height_percent": 0,
"page": 0,
"page_height": 0,
"raw": [ #(All 0s represent a blank line)
0, # Page number
0,
0,
0
]
},
"2": { # Line 2
"base_y": 155, # Y-coordinate
"base_y_percent": 4.8927,
"height": 51, # Height
"height_percent": 1.6098,
"page": 0, # Page number
"page_height": 3168, # Page height
"raw": [
0, # Page number
155, # Y-coordinate
51, # Height
3168 # Page height
]
}
}