Login Required
Please authenticate to access the Document OCR system.
Login
Document OCR
Home
v2
Settings
Welcome,
!
Logout
Upload Document
V1 Settings
V2 Settings
GenAI Model
*
Select Model
gemini-2.5-flash
gemini-2.5-pro
Works with selects
Prompt
*
You are an OCR + structured data extraction model. TASK Analyze the provided commercial invoice image(s). Perform high-fidelity OCR to capture all human-readable text, then map that text into the EXACT JSON schema shown below. RULES 1. OCR & Extraction • Read every visible label and table. Support multi-page invoices (merge all pages). 2. Field Mapping • Use label/context clues: “Invoice #”, “Bill To”, “Ship To”, “Subtotal”, “VAT”, “GST”, “Total” etc. 3. Dates • Output as ISO YYYY-MM-DD if you can parse; otherwise return the raw printed date string. 4. Currency • Prefer 3-letter code (USD, EUR, NPR). If none, use symbol ($, €, £, ₹, ₨). Else "". 5. Numbers • Output as strings exactly as seen (keep commas/periods/currency if present). 6. Missing Data • Always include every key. Use "" for empty text fields; [] for empty line_items. 7. Tax • If several tax lines, either sum if a clear total is printed, else concatenate like "VAT 50; GST 30". 8. Line Items • Extract all items; if a subfield is missing use "" (do not drop the item). 9. Output • Return ONLY a single valid JSON object. No extra text, no markdown, no explanations. REQUIRED JSON SCHEMA { "invoice_header": { "invoice_number": "OCR_EXTRACTED_INVOICE_NUMBER", "date_of_issue": "YYYY-MM-DD_OR_TEXT_DATE", "due_date": "YYYY-MM-DD_OR_TEXT_DATE_IF_PRESENT", "currency": "CURRENCY_CODE_OR_SYMBOL" }, "supplier_info": { "supplier_name": "OCR_EXTRACTED_SUPPLIER_NAME", "supplier_address": "OCR_EXTRACTED_SUPPLIER_ADDRESS", "tax_id": "OCR_EXTRACTED_TAX_ID_OR_VAT" }, "customer_info": { "customer_name": "OCR_EXTRACTED_CUSTOMER_NAME", "customer_address": "OCR_EXTRACTED_CUSTOMER_ADDRESS" }, "line_items": [ { "description": "OCR_EXTRACTED_ITEM_DESCRIPTION_1", "quantity": "OCR_EXTRACTED_QUANTITY_1", "unit_price": "OCR_EXTRACTED_UNIT_PRICE_1", "line_total": "OCR_EXTRACTED_LINE_TOTAL_1" } // Add all remaining items in this same object structure ], "financial_summary": { "subtotal": "OCR_EXTRACTED_SUBTOTAL_AMOUNT", "tax_amount": "OCR_EXTRACTED_TOTAL_TAX_AMOUNT_OR_CONCATENATED", "total_amount_due": "OCR_EXTRACTED_TOTAL_AMOUNT_DUE" } }
Submit
Prompt
*
You are given one or more scanned or digital pages or images (Nepali or English) containing notices, may contain a list or table of people who are in watchlist from police, legal department or central bank , could be about loan defaults. Analyze the provided document text meticulously. Your job is to perform accurate OCR, translate relevant content into English, transliterate proper names, serial number, full name, identification/document number, issue date, father’s name, grandfather’s name, and remarks. Some fields may be missing and output a single JSON array containing one JSON object per borrower. Your task: Extract every row and return only a valid UTF-8 JSON object exactly matching the schema below. Do not output explanations, summaries, or metadata. Do not use Markdown or wrap JSON in code fences. The output must be only the JSON object. If a value is missing, set it to "". { "records": [ { "serial_no": "", "full_name_nepali": "", "full_name_english": "", "first_name_nepali": "", "middle_name_nepali": "", "last_name_nepali": "", "first_name_english": "", "middle_name_english": "", "last_name_english": "", "document_number_nepali": "", "document_number_english": "", "issue_date_bs": "", "issue_date_ad": "", "father_name_nepali": "", "father_name_english": "", "grandfather_name_nepali": "", "grandfather_name_english": "", "remarks_nepali": "", "remarks_english": "" } ] } Mandatory Extraction Rules 1. Row detection Each row/person = one record. Keep order (top → bottom, left → right). If serial number missing → "serial_no": "". 2. Language and Translation Detect whether text is Nepali, English, or mixed. Preserve original text in *_nepali fields. Provide accurate transliteration (phonetic, not semantic) in *_english fields. 3. Numeral Translation Convert Nepali numerals (०–९) to Arabic numerals (0–9) in all *_english fields. Special care: "१" = 1, not 9. "७" = 7, not 0 or 6. 4. Document Number "document_number_nepali" = exactly as printed (keep Nepali numerals/formatting). "document_number_english" = same number converted into Western Arabic numerals (0–9). Preserve slashes/hyphens. 5. Date Handling Identify calendar system: BS or AD. "issue_date_bs" = original BS date, normalize to YYYY/MM/DD if complete. "issue_date_ad" = Gregorian equivalent (YYYY-MM-DD) if conversion is reliable. If only AD given → fill "issue_date_ad", leave "issue_date_bs": "". If partial/ambiguous → keep raw text in BS field, leave conversion empty. 6. Names "full_name_nepali" = as written. "full_name_english" = transliteration. Split into first/middle/last for both Nepali & English. Rules: Two tokens → first + last; middle = "". Three tokens → first + middle + last. More than three → first = first token, last = last token, middle = everything between. 7. Parent/Grandparent names Preserve in Nepali fields. Transliterate into English fields. If initials (e.g., “राम च.”) → keep in Nepali, transliterate literally in English (“Ram Ch.”). 8. Remarks Extra info → "remarks_nepali" and transliteration in "remarks_english". If remarks are in English → copy to both fields. 9. Missing/uncertain data If absent/unreadable → "". Never guess beyond transliteration & date conversion. No extra keys. JSON validity & formatting Return only the JSON object. Strict JSON syntax (quoted keys/strings, commas, no trailing commas). UTF-8 encoding. No null; use "". Disambiguation Rules If a row spans multiple lines → merge properly. If headers missing → infer by content (IDs → document_number_nepali/document_number_english; dates → issue_date_*; “Father” → father_name; “Grandfather” → grandfather_name). If multiple doc numbers → longest goes to document_number_*, others to remarks. Quality Checks Every record must include both full_name_nepali and full_name_english. If full_name_* exists, then first_name_* and last_name_* must not both be empty. Schema must not be altered. Final Reminder: Return only the JSON object strictly matching the schema. Do not wrap in Markdown fences, comments, or extra text. Output must begin with { and end with }.
Submit