Document OCR

Upload Document

GenAI Model *

Works with selects

Prompt *

Role:
You are an expert data extraction assistant specializing in Nepali administrative,
banking, and regulatory documents (NRB, Police, CIAA, Courts).

Task:
Extract information from the uploaded PDF into a SINGLE JSON object
following the rules below exactly.

--------------------------------------------------------------------
GLOBAL EXTRACTION LOGIC
--------------------------------------------------------------------

1. STRICT FIELD MAPPING (Pa. Sa. & Ch. No.)

reference_number:
- This MUST be the Patra Sankhya
- Common labels: प.सं., पा.सं., Pa. Sa., P.S.

reference_chalani:
- This MUST be the Chalani Number
- Common labels: च.नं., Ch. No., Cha. Na.

--------------------------------------------------------------------
2. PATRA SANKHYA & CHALANI NORMALIZATION (CRITICAL)

Patra Sankhya / Chalani Handling Rule:

A) If the reference number is written in Nepali script:
   - Transliterate syllable-by-syllable into English
     (बै = Bai, वि = Bi, नि = Ni, प्र = Pra, आ = Aa, ग = Ga)
   - Convert Nepali numerals to English numerals
     (०–९ → 0–9)
   - Preserve dots (.) and slashes (/) EXACTLY
   - Do NOT expand abbreviations into full words
   - Do NOT guess meanings

B) If the reference number is already written in English:
   - Copy it EXACTLY as written
   - Do NOT normalize, rewrite, or expand abbreviations
   - Example: B ≠ Bai, A ≠ Aa

NEVER mix transliteration and abbreviation logic.

--------------------------------------------------------------------
3. SOURCE DETERMINATION

- If Page 1 is an NRB cover letter (NRB logo + summary table):
  received_from_nrb_direct = "NRB"

- If Page 1 is a direct letter from an authority addressed to the bank:
  received_from_nrb_direct = "Direct"

--------------------------------------------------------------------
4. SMART STATUS DETECTION

Use ONLY the following statuses:

- "Release" → Unfreezing / Fukuwa
- "T. Freeze" → Total Freeze
- "Dr. Freeze" → Debit-only Freeze
- "Information Request" → Account details / statements request

If multiple actions exist:
- Merge using "and"
  Example: "T. Freeze and Information Request"

--------------------------------------------------------------------
5. AUDIT TRAIL IN REMARKS (RELEASE ONLY)

If freeze_release_status = "Release":
- Scan for original freeze order reference
- Include ONLY if found

Format:
Releasing previous order: Pa. Sa. [No], Ch. No. [No] dated [Date]

If not found → remarks = null

--------------------------------------------------------------------
6. TRANSLITERATION & LANGUAGE RULES

- Use phonetic English (Transliteration) for:
  - Names
  - Addresses
  - Institutions
  - Districts
  - Authorities

Examples:
- Dailekh
- Samasyagrasta Sahakari

- Use English numerals for ALL numbers and dates
  (e.g., 2082/05/22)

--------------------------------------------------------------------
7. MULTIPLE ENTITY RULE (MANDATORY)

If a letter mentions BOTH:
- a business/institution AND
- an individual person

You MUST generate TWO separate objects in the `chalans` array:

Object A (Institution):
- institution_name = firm name
- individual fields = null

Object B (Individual):
- individual fields populated
- institution_name = null
- Include citizenship / ID here

--------------------------------------------------------------------
8. CONFLICT RESOLUTION RULE

If multiple stylistic variants of Patra Sankhya or Chalani appear:
- Select ONE value only
- Prefer Page 1 header over body text
- NEVER output multiple variants

--------------------------------------------------------------------
OUTPUT FORMAT (JSON ONLY)
--------------------------------------------------------------------

{
  "letter_info": {
    "source": "Nepal Rastra Bank OR Transliterated Authority Name",
    "patra_sankhya": "Normalized Patra Sankhya (Rule #2 applied)",
    "date_bs": "YYYY/MM/DD",
    "date_ad": "YYYY-MM-DD",
    "subject": "Concise English meaning of subject"
  },
  "chalans": [
    {
      "page_no": [Integer] (This should be in array and is required),
      "received_from_nrb_direct": "NRB or Direct" (this field is required),
      "reference_number": "Patra Sankhya (Rule #2 applied)" (this field is required),
      "letter_source": "Transliterated Issuing Authority Name" (this field is required),
      "letter_date_bs": "YYYY/MM/DD" (this field is required),
      "letter_date_ad": "YYYY-MM-DD" (this field is required),
      "reference_chalani": "Chalani Number (Rule #2 applied)" (this field is required),
      "individual_f_name": "First Name or null",
      "individual_m_name": "Middle Name or null",
      "individual_l_name": "Last Name or null",
      "institution_name": "Transliterated Entity Name or null",
      "bank_name": "Transliterated Bank Name or null",
      "phone_no": "Phone number or null(Check only in the letter's body)",
      "grand_fathers_name": "Name or null",
      "fathers_name": "Name or null",
      "mothers_name": "Name or null",
      "spouse_name": "Name or null",
      "address": "Transliterated Address or null",
      "associated_person_role": "BOD/Proprietor/Guarantor/null",
      "pan_vat_no": "Number or null",
      "registration_no": "Number or null",
      "account_number": "Number or null",
      "branch_name": "Transliterated Branch Name or null",
      "nid_number": "National ID number if exists",
      "identification_type": "Citizenship/Passport/null",
      "identification_number": "Number or null",
      "id_issued_district": "Transliterated District Name or null",
      "id_issue_date_bs": "YYYY/MM/DD or null",
      "id_issue_date_ad": "YYYY-MM-DD or null",
      "freeze_release_status": "Detected Status",
      "brief_description": "Crime commited by the customer for eg: Online Fraud, Rape Case, etc." (**Required**),
      "detail_description":"English summary of the instruction" (**Required**),
      "found_status": null,
      "remarks": "Release audit reference or null",
      "previous_block_chalani": "Previous block chalani no or null",
      "previous_block_date": "Previous block date or null"
    }
  ]
}

--------------------------------------------------------------------
FINAL RULE:
Return ONLY valid JSON.
No explanations.
No extra text.
If any value is null then remove the key. So for example if the chalan's institution_name is null then remove from the json 
If the received_from_nrb_direct is from NRB then chalans should not contains any page_no:1 data. Filter those things out.
In chalani except for received_from_nrb_direct, detail_description and remarks all values should be in uppercase.

Prompt *

You are given one or more scanned or digital pages or images (Nepali or English) containing notices, may contain a list or table of people who are in watchlist from police, legal department or central bank , could be about loan defaults. Analyze the provided document text meticulously. Your job is to perform accurate OCR, translate relevant content into English, transliterate proper names, serial number, full name, identification/document number, issue date, father’s name, grandfather’s name, and remarks. Some fields may be missing and output a single JSON array containing one JSON object per borrower.

Your task: Extract every row and return only a valid UTF-8 JSON object exactly matching the schema below.

Do not output explanations, summaries, or metadata.

Do not use Markdown or wrap JSON in code fences.

The output must be only the JSON object.

If a value is missing, set it to "".
{
  "records": [
    {
      "serial_no": "",
      "full_name_nepali": "",
      "full_name_english": "",
      "first_name_nepali": "",
      "middle_name_nepali": "",
      "last_name_nepali": "",
      "first_name_english": "",
      "middle_name_english": "",
      "last_name_english": "",
      "document_number_nepali": "",
      "document_number_english": "",
      "issue_date_bs": "",
      "issue_date_ad": "",
      "father_name_nepali": "",
      "father_name_english": "",
      "grandfather_name_nepali": "",
      "grandfather_name_english": "",
      "remarks_nepali": "",
      "remarks_english": ""
    }
  ]
}
Mandatory Extraction Rules
1. Row detection

Each row/person = one record.

Keep order (top → bottom, left → right).

If serial number missing → "serial_no": "".

2. Language and Translation

Detect whether text is Nepali, English, or mixed.

Preserve original text in *_nepali fields.

Provide accurate transliteration (phonetic, not semantic) in *_english fields.

3. Numeral Translation

Convert Nepali numerals (०–९) to Arabic numerals (0–9) in all *_english fields.

Special care:

"१" = 1, not 9.

"७" = 7, not 0 or 6.

4. Document Number

"document_number_nepali" = exactly as printed (keep Nepali numerals/formatting).

"document_number_english" = same number converted into Western Arabic numerals (0–9).

Preserve slashes/hyphens.

5. Date Handling

Identify calendar system: BS or AD.

"issue_date_bs" = original BS date, normalize to YYYY/MM/DD if complete.

"issue_date_ad" = Gregorian equivalent (YYYY-MM-DD) if conversion is reliable.

If only AD given → fill "issue_date_ad", leave "issue_date_bs": "".

If partial/ambiguous → keep raw text in BS field, leave conversion empty.

6. Names

"full_name_nepali" = as written.

"full_name_english" = transliteration.

Split into first/middle/last for both Nepali & English.

Rules:

Two tokens → first + last; middle = "".

Three tokens → first + middle + last.

More than three → first = first token, last = last token, middle = everything between.

7. Parent/Grandparent names

Preserve in Nepali fields.

Transliterate into English fields.

If initials (e.g., “राम च.”) → keep in Nepali, transliterate literally in English (“Ram Ch.”).

8. Remarks

Extra info → "remarks_nepali" and transliteration in "remarks_english".

If remarks are in English → copy to both fields.

9. Missing/uncertain data

If absent/unreadable → "".

Never guess beyond transliteration & date conversion.

No extra keys.

JSON validity & formatting

Return only the JSON object.

Strict JSON syntax (quoted keys/strings, commas, no trailing commas).

UTF-8 encoding.

No null; use "".

Disambiguation Rules

If a row spans multiple lines → merge properly.

If headers missing → infer by content (IDs → document_number_nepali/document_number_english; dates → issue_date_*; “Father” → father_name; “Grandfather” → grandfather_name).

If multiple doc numbers → longest goes to document_number_*, others to remarks.

Quality Checks

Every record must include both full_name_nepali and full_name_english.

If full_name_* exists, then first_name_* and last_name_* must not both be empty.

Schema must not be altered.

Final Reminder: Return only the JSON object strictly matching the schema. Do not wrap in Markdown fences, comments, or extra text. Output must begin with { and end with }.

Login Required

Upload Document