Get Started

# How to Build a JSON Extraction Schema

Schemas define the exact JSON structure you want Anesya to return during extraction.

Schemas are currently created and managed in the dashboard, then reused in the public API through the `schema` field of [Create extract](/api/schema/extracts/extract_create).

This guide explains how to build a JSON schema compatible with Anesya extraction, so you can define the exact structure of the output data you expect from invoices, contracts, forms, and other business documents.

## Overall Schema Structure

A valid schema always starts with a root object like this:


```json
{
  "type": "object",
  "required": ["invoice_number"],
  "properties": {
    "invoice_number": {
      "type": "string",
      "description": "Invoice number"
    }
  }
}
```

* `type: "object"`: this is mandatory at the root.
* `required`: a list of fields that must appear in the final result.
* `properties`: a **dictionary of fields** to extract, with their name, type, and description.


## Supported Field Types

### `string`

Free text string.


```json
"customer_name": {
  "type": "string",
  "description": "Customer name"
}
```

### `number`

Decimal number (can include cents, etc.).


```json
"amount": {
  "type": "number",
  "description": "Invoice amount"
}
```

### `integer`

Whole number without decimals.


```json
"quantity": {
  "type": "integer",
  "description": "Quantity ordered"
}
```

### `boolean`

Boolean value (`true` or `false`).


```json
"is_signed": {
  "type": "boolean",
  "description": "Is the document signed?"
}
```

### `enum`

Predefined list of values (closed set of choices).


```json
"status": {
  "type": "string",
  "enum": ["pending", "approved", "rejected"],
  "description": "Invoice status",
  "context:descriptions": [
    "Pending approval",
    "Invoice approved",
    "Invoice rejected"
  ]
}
```

> **Tip**: `context:descriptions` is optional. If you use it, keep the same order as the `enum` values.


### `object`

Sub-object containing its own fields.


```json
"invoice_amount": {
  "type": "object",
  "required": ["amount", "iso_4217_currency_code"],
  "properties": {
    "amount": {
      "type": "number",
      "description": "Amount"
    },
    "iso_4217_currency_code": {
      "type": "string",
      "description": "ISO currency code (e.g. EUR, USD)"
    }
  },
  "description": "Invoice amount"
}
```

### `array`

List of items, typically objects or strings.

#### Example with an array of objects:


```json
"orders": {
  "type": "array",
  "description": "List of orders",
  "items": {
    "type": "object",
    "required": ["order_id", "customer_name"],
    "properties": {
      "order_id": {
        "type": "string",
        "description": "Order identifier"
      },
      "customer_name": {
        "type": "string",
        "description": "Customer name"
      }
    }
  }
}
```

## Required Fields: `required`

In every `object`, the `required` key lists the fields that **must be extracted**. Each of these must exist in `properties`.

## Common Mistakes to Avoid

| ❌ Mistake | ✅ Solution |
|  --- | --- |
| Missing or invalid `type` | Make sure each field has a valid `type` (among those listed above) |
| Missing `description` | Each field must have a clear `description` |
| Mismatch in `required` | All fields listed in `required` must exist in `properties` |
| Malformed `object` or `array` | Objects must have `properties` and `required`, arrays must have `items` |
| Using `"type": "enum"` | Use `"type": "string"` with an `enum` array |
| Duplicates or typos | Example: `invoice_amoun2t` instead of `invoice_amount` |


## Special Fields: Dates

To indicate that a field contains a **date**, you can add an extension:


```json
"signature_date": {
  "type": "string",
  "description": "Signature date",
  "context:type": "date"
}
```

## Special Fields: Addresses

To indicate that a field contains an **address**, you can also add a context extension:


```json
"supplier_address": {
  "type": "string",
  "description": "Supplier address",
  "context:type": "address"
}
```

Use this only when you want Anesya to treat the field as a structured address.

## Testing Your Schema

Before using a schema in production, create it in the dashboard and test it with a sample document by running [Create extract](/api/schema/extracts/extract_create).

If an error is detected, you’ll see a message like:


```
❌ Invalid schema: root.properties.status.type must be string or list
```

## 🧪 Schema Examples

### E-commerce Invoice

**Context**:
You want to extract the following information from an e-commerce invoice:

* The **order number**
* The **buyer’s first and last name**
* The **items** included in the invoice, with their **price** and **quantity**


Here’s how you should design your schema:

1. **Order Number** → Add a field of type `string`
(If your order numbers never contain letters, you may use `number` instead.)
2. **Buyer Information**  → Add a field of type `object` named `customer_information`  → Inside this object, define two `string` fields: `first_name` and `last_name`
3. **Invoice Items**  → Add a field of type `array` named `items`  → The `items` array will contain multiple `object` entries  → Each object will include two `number` fields: `price` and `quantity`


br

```json
{
    "type": "object",
    "required": [
        "order_number",
        "customer_information",
        "items"
    ],
    "properties": {
        "order_number": {
            "type": "string",
            "description": "The unique number assigned to the order"
        },
        "customer_information": {
            "type": "object",
            "required": [
                "first_name",
                "last_name"
            ],
            "properties": {
                "first_name": {
                    "type": "string",
                    "description": "The buyer's first name"
                },
                "last_name": {
                    "type": "string",
                    "description": "The buyer's last name"
                }
            },
            "description": "Information about the buyer"
        },
        "items": {
            "type": "array",
            "description": "List of items included in the invoice",
            "items": {
                "type": "object",
                "required": [
                    "price",
                    "quantity"
                ],
                "properties": {
                    "price": {
                        "type": "number",
                        "description": "Price of the item"
                    },
                    "quantity": {
                        "type": "number",
                        "description": "Quantity of the item ordered"
                    }
                }
            }
        }
    }
}
```

br
### Payslip

**Context**:
You want to extract the following information from a standard French payslip:

* The **employee’s first and last name**
* The **employer’s name**
* The **net and gross salary amounts**
* The **pay period** covered by the payslip


Here’s how you should design your schema:

1. **Employee Information** → Add an `object` field named `employee` → Inside this object, define two `string` fields: `first_name` and `last_name`
2. **Employer Name** → Add a `string` field named `employer`
3. **Salary Amounts** → Add two `number` fields: `net_salary` and `gross_salary` → These correspond to the salary before and after deductions
4. **Pay Period** → Add an `object` field named `pay_period` → Inside, define two `string` fields: `start` and `end` → Use `context:type: "date"` to specify date format


br

```json
{
  "type": "object",
  "required": ["employee", "employer", "net_salary", "pay_period"],
  "properties": {
    "employee": {
      "type": "object",
      "required": ["first_name", "last_name"],
      "properties": {
        "first_name": {
          "type": "string",
          "description": "Employee's first name"
        },
        "last_name": {
          "type": "string",
          "description": "Employee's last name"
        }
      },
      "description": "Information about the employee"
    },
    "employer": {
      "type": "string",
      "description": "Name of the employer"
    },
    "net_salary": {
      "type": "number",
      "description": "Net amount paid to the employee"
    },
    "gross_salary": {
      "type": "number",
      "description": "Gross salary before deductions"
    },
    "pay_period": {
      "type": "object",
      "required": ["start", "end"],
      "properties": {
        "start": {
          "type": "string",
          "description": "Start date of the pay period",
          "context:type": "date"
        },
        "end": {
          "type": "string",
          "description": "End date of the pay period",
          "context:type": "date"
        }
      },
      "description": "Period covered by the payslip"
    }
  }
}
```

br
### Rental Agreement

**Context**:
You want to extract the following information from a residential rental agreement:

* The **landlord’s name**
* The **tenant’s first and last name**
* The **address** of the rented property
* The **monthly rent** and **security deposit**
* The **start and end dates** of the lease


Here’s how you should design your schema:

1. **Landlord Information** → Add a `string` field named `landlord`
2. **Tenant Information** → Add an `object` field named `tenant` → Inside this object, define two `string` fields: `first_name` and `last_name`
3. **Property Address** → Add a `string` field named `property_address`
4. **Rent & Deposit** → Add two `number` fields: `rental_amount` and `deposit_amount`
5. **Lease Period** → Add an `object` field named `lease_period` → Inside, define two `string` fields: `start_date` and `end_date` → Use `context:type: "date"` to specify date format


br

```json
{
  "type": "object",
  "required": ["landlord", "tenant", "property_address", "rental_amount", "lease_period"],
  "properties": {
    "landlord": {
      "type": "string",
      "description": "Name of the landlord"
    },
    "tenant": {
      "type": "object",
      "required": ["first_name", "last_name"],
      "properties": {
        "first_name": {
          "type": "string",
          "description": "Tenant's first name"
        },
        "last_name": {
          "type": "string",
          "description": "Tenant's last name"
        }
      },
      "description": "Information about the tenant"
    },
    "property_address": {
      "type": "string",
      "description": "Full address of the rented property"
    },
    "rental_amount": {
      "type": "number",
      "description": "Monthly rent amount"
    },
    "deposit_amount": {
      "type": "number",
      "description": "Amount of the security deposit"
    },
    "lease_period": {
      "type": "object",
      "required": ["start_date", "end_date"],
      "properties": {
        "start_date": {
          "type": "string",
          "description": "Start date of the lease",
          "context:type": "date"
        },
        "end_date": {
          "type": "string",
          "description": "End date of the lease",
          "context:type": "date"
        }
      },
      "description": "Duration of the lease"
    }
  }
}
```

br
## What’s Next?

Once your schema is created, you can use it to:

* Trigger an extraction [via API](/api/schema/extracts/extract_create) with your schema ID
* Run an extraction directly on the [Anesya platform](https://anesya.app/extract)
* [Create a workflow](https://anesya.app/workflow) using this schema