Highlight API
Retrieve the line metadata which helps with highlighting
Endpoint | /highlights |
URL | https://llmwhisperer-api.us-central.unstract.com/api/v2/highlights |
Method | GET |
Headers | unstract-key: <YOUR API KEY> |
Parameters
Parameter | Type | Default | Required | Description |
---|---|---|---|---|
whisper_hash | string | Yes | The whisper hash returned while starting the whisper process. | |
lines | string | Yes | Define which lines metadata to retrieve. You can specify which lines metadata to retrieve with this parameter. Example 1-5,7,21- will retrieve lines metadata 1,2,3,4,5,7,21,22,23,24... till the last line meta data. |
Example Curl Request
curl -X GET --location 'https://llmwhisperer-api.us-central.unstract.com/api/v2/highlights?whisper_hash=XXXXXXXXXXXXXXXXXXX&lines=1-2' \
-H 'unstract-key: <Your API Key>'
To include the headers in the response use curl -i
in the request.
Response
HTTP Status | Content-Type | Headers | Description |
---|---|---|---|
200 | application/json | Line metadata | |
400 | application/json | Error while retrieveing. Refer below for JSON format |
Example 200
Response
{
"1": {
"base_y": 0,
"base_y_percent": 0,
"height": 0,
"height_percent": 0,
"page": 0,
"page_height": 0,
"raw": [
0,
0,
0,
0
]
},
"2": {
"base_y": 155,
"base_y_percent": 4.8927,
"height": 51,
"height_percent": 1.6098,
"page": 0,
"page_height": 3168,
"raw": [
0,
155,
51,
3168
]
}
}
Example 400
Response
{
"message": "Line metadata not found"
}
Line Metadata
This data can be used show the bounding box of each line of text extracted from the document in the original document. Useful for highlighting the extracted text in the original document.
The line metadata contains the bounding box coordinates for each line of text extracted from the document. For each line, an array of arrays is provided with the bounding box coordinates. The bounding box information is represented as a array of four integers: [page_no, y, height, max_height], where:
page_no
: The page number of the document.base_y
: The y-coordinate of the bounding box (bottom).height
: The height of the bounding box.page_height
: The height of the original page.
Highlighting is for the entire row of the original document. The x coordinates are not provided as they will effectively be the width of the page. x1=0
and x2=page_width
always.
(0,y-height) +-------------------------+ (page_width,y-height)
| |
| |
(0,y) +-------------------------+ (page_width,y)
line_metadata = {
"1": { # Line 1
"base_y": 0,
"base_y_percent": 0,
"height": 0, # Height
"height_percent": 0,
"page": 0,
"page_height": 0,
"raw": [ #(All 0s represent a blank line)
0, # Page number
0,
0,
0
]
},
"2": { # Line 2
"base_y": 155, # Y-coordinate
"base_y_percent": 4.8927,
"height": 51, # Height
"height_percent": 1.6098,
"page": 0, # Page number
"page_height": 3168, # Page height
"raw": [
0, # Page number
155, # Y-coordinate
51, # Height
3168 # Page height
]
}
}