Document processors are at the heart of the Anesya system. They analyze, extract, and structure the information contained in your documents (invoices, contracts, forms…). This guide explains how to create a JSON schema compatible with OpenAI JSON, in order to define the exact structure of the output data you expect after extraction.
A valid schema always starts with a root object like this:
{
"type": "object",
"required": [...], // Array of your fields's name
"properties": { ... } // Place your properties inside
}type: "object": this is mandatory at the root.required: a list of fields that must appear.properties: a dictionary of fields to extract, with their name, type, and description.
Free text string.
"customer_name": {
"type": "string",
"description": "Customer name"
}Decimal number (can include cents, etc.).
"amount": {
"type": "number",
"description": "Invoice amount"
}Whole number without decimals.
"quantity": {
"type": "integer",
"description": "Quantity ordered"
}Boolean value (true or false).
"is_signed": {
"type": "boolean",
"description": "Is the document signed?"
}Predefined list of values (closed set of choices).
"status": {
"type": "enum",
"enum": ["pending", "approved", "rejected"],
"description": "Invoice status",
"context:descriptions": [
"Pending approval",
"Invoice approved",
"Invoice rejected"
]
}Tip:
context:descriptionsis optional but useful to give more context to each value.
Sub-object containing its own fields.
"invoice_amount": {
"type": "object",
"required": ["amount", "iso_4217_currency_code"],
"properties": {
"amount": {
"type": "number",
"description": "Amount"
},
"iso_4217_currency_code": {
"type": "string",
"description": "ISO currency code (e.g. EUR, USD)"
}
},
"description": "Invoice amount"
}List of items, typically objects or strings.
"orders": {
"type": "array",
"description": "List of orders",
"items": {
"type": "object",
"required": ["order_id", "customer_name"],
"properties": {
"order_id": {
"type": "string",
"description": "Order identifier"
},
"customer_name": {
"type": "string",
"description": "Customer name"
}
}
}
}| ❌ Mistake | ✅ Solution |
|---|---|
Missing or invalid type | Make sure each field has a valid type (among those listed above) |
Missing description | Each field must have a clear description |
Mismatch in required | All fields listed in required must exist in properties |
Malformed object or array | Objects must have properties and required, arrays must have items |
| Duplicates or typos | Example: invoice_amoun2t instead of invoice_amount |
Before submitting a schema, you can test it in our interface or with our API.
If an error is detected, you’ll see a message like:
❌ Invalid schema: root.status.type must be string or listContext: You want to extract the following information from an e-commerce invoice:
- The order number
- The buyer’s first and last name
- The items included in the invoice, with their price and quantity
Here’s how you should design your schema:
Order Number
→ Add a field of typestring(If your order numbers never contain letters, you may usenumberinstead.)Buyer Information
→ Add a field of typeobjectnamedcustomer_information
→ Inside this object, define twostringfields:first_nameandlast_nameInvoice Items
→ Add a field of typearraynameditems
→ Theitemsarray will contain multipleobjectentries
→ Each object will include twonumberfields:priceandquantity
{
"type": "object",
"required": [
"order_number",
"customer_information",
"items"
],
"properties": {
"order_number": {
"type": "string",
"description": "The unique number assigned to the order"
},
"customer_information": {
"type": "object",
"required": [
"first_name",
"last_name"
],
"properties": {
"first_name": {
"type": "string",
"description": "The buyer's first name"
},
"last_name": {
"type": "string",
"description": "The buyer's last name"
}
},
"description": "Information about the buyer"
},
"items": {
"type": "array",
"description": "List of items included in the invoice",
"items": {
"type": "object",
"required": [
"price",
"quantity"
],
"properties": {
"price": {
"type": "number",
"description": "Price of the item"
},
"quantity": {
"type": "number",
"description": "Quantity of the item ordered"
}
}
}
}
}
}Context: You want to extract the following information from a standard French payslip:
- The employee’s first and last name
- The employer’s name
- The net and gross salary amounts
- The pay period covered by the payslip
Here’s how you should design your schema:
Employee Information
→ Add anobjectfield namedemployee
→ Inside this object, define twostringfields:first_nameandlast_nameEmployer Name
→ Add astringfield namedemployerSalary Amounts
→ Add twonumberfields:net_salaryandgross_salary
→ These correspond to the salary before and after deductionsPay Period
→ Add anobjectfield namedpay_period
→ Inside, define twostringfields:startandend
→ Usecontext:type: "date"to specify date format
{
"type": "object",
"required": ["employee", "employer", "net_salary", "pay_period"],
"properties": {
"employee": {
"type": "object",
"required": ["first_name", "last_name"],
"properties": {
"first_name": {
"type": "string",
"description": "Employee's first name"
},
"last_name": {
"type": "string",
"description": "Employee's last name"
}
},
"description": "Information about the employee"
},
"employer": {
"type": "string",
"description": "Name of the employer"
},
"net_salary": {
"type": "number",
"description": "Net amount paid to the employee"
},
"gross_salary": {
"type": "number",
"description": "Gross salary before deductions"
},
"pay_period": {
"type": "object",
"required": ["start", "end"],
"properties": {
"start": {
"type": "string",
"description": "Start date of the pay period",
"context:type": "date"
},
"end": {
"type": "string",
"description": "End date of the pay period",
"context:type": "date"
}
},
"description": "Period covered by the payslip"
}
}
}Context: You want to extract the following information from a residential rental agreement:
- The landlord’s name
- The tenant’s first and last name
- The address of the rented property
- The monthly rent and security deposit
- The start and end dates of the lease
Here’s how you should design your schema:
Landlord Information
→ Add astringfield namedlandlordTenant Information
→ Add anobjectfield namedtenant
→ Inside this object, define twostringfields:first_nameandlast_nameProperty Address
→ Add astringfield namedproperty_addressRent & Deposit
→ Add twonumberfields:rental_amountanddeposit_amountLease Period
→ Add anobjectfield namedlease_period
→ Inside, define twostringfields:start_dateandend_date
→ Usecontext:type: "date"to specify date format
{
"type": "object",
"required": ["landlord", "tenant", "property_address", "rental_amount", "lease_period"],
"properties": {
"landlord": {
"type": "string",
"description": "Name of the landlord"
},
"tenant": {
"type": "object",
"required": ["first_name", "last_name"],
"properties": {
"first_name": {
"type": "string",
"description": "Tenant's first name"
},
"last_name": {
"type": "string",
"description": "Tenant's last name"
}
},
"description": "Information about the tenant"
},
"property_address": {
"type": "string",
"description": "Full address of the rented property"
},
"rental_amount": {
"type": "number",
"description": "Monthly rent amount"
},
"deposit_amount": {
"type": "number",
"description": "Amount of the security deposit"
},
"lease_period": {
"type": "object",
"required": ["start_date", "end_date"],
"properties": {
"start_date": {
"type": "string",
"description": "Start date of the lease",
"context:type": "date"
},
"end_date": {
"type": "string",
"description": "End date of the lease",
"context:type": "date"
}
},
"description": "Duration of the lease"
}
}
}
Once your schema is created, you can use it to:
- Trigger an extraction via API
- Run an extraction directly on the Anesya platform
- Create a workflow using this schema