Get Started

# Documents

> Upload, store, retrieve, and download reusable source documents.


Documents is the Anesya resource for storing source files in your workspace before downstream processing.

Use it when you want a stable document ID that can later be reused by:

* parsing
* extract
* list and search workflows
* operational or audit flows around uploaded files


If you do not need a reusable stored file, you can sometimes skip this step and send a file or URL directly to parsing or extract.

## What documents is for

Documents is the right entry point when you want to:

* upload one local file to Anesya
* keep a reusable file reference in the workspace
* list previously uploaded files
* search documents by filename or metadata
* download the stored source file later
* trim a PDF before parsing or extraction


The common workflow is:


```mermaid
flowchart LR
    A[Upload file] --> B[Get document ID]
    B --> C[Create parsing]
    B --> D[Create extract]
    B --> E[Retrieve or download later]
```

## Quick start

The most common document workflow has three steps:

1. upload one file
2. keep the returned document ID
3. reuse that ID in parsing or extract


### Upload a document

Use [Create document](/api/schema/documents/document_create) to store a new file.


```bash
curl -X POST "https://api.anesya.app/v0/documents" \
  -H "X-API-Key: $ANESYA_API_KEY" \
  -F "file=@invoice.pdf;type=application/pdf" \
  -F "filename=invoice.pdf" \
  -F 'metadata={"source":"api"}'
```

Example response:


```json
{
  "id": "300f339f-da71-4f9f-80f6-c25a63baae75",
  "filename": "invoice.pdf",
  "file_url": "/v0/documents/300f339f-da71-4f9f-80f6-c25a63baae75/download",
  "metadata": {
    "source": "api"
  },
  "page_count": 3,
  "created_at": "2025-06-12T14:56:10.682461Z",
  "updated_at": "2025-06-12T14:56:10.682461Z"
}
```

Save the returned `id`. This is the document ID you can reuse later.

### Reuse the document in parsing


```bash
curl -X POST "https://api.anesya.app/v0/parsing" \
  -H "X-API-Key: $ANESYA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document": "YOUR_DOCUMENT_ID",
    "model": "PIGALLE"
  }'
```

### Reuse the document in extract


```bash
curl -X POST "https://api.anesya.app/v0/extract" \
  -H "X-API-Key: $ANESYA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document": "YOUR_DOCUMENT_ID",
    "schema": "YOUR_SCHEMA_ID"
  }'
```

## What you get back

The document resource returned by create, retrieve, and list contains:


```json
{
  "id": "300f339f-da71-4f9f-80f6-c25a63baae75",
  "filename": "invoice.pdf",
  "file_url": "/v0/documents/300f339f-da71-4f9f-80f6-c25a63baae75/download",
  "metadata": {
    "source": "api"
  },
  "page_count": 3,
  "created_at": "2025-06-12T14:56:10.682461Z",
  "updated_at": "2025-06-12T14:56:10.682461Z"
}
```

### Key fields

| Field | What it contains |
|  --- | --- |
| `id` | Stable document identifier |
| `filename` | Stored file name |
| `file_url` | Relative download endpoint for this file |
| `metadata` | Custom metadata object |
| `page_count` | Number of pages in the stored document |
| `created_at` | Creation timestamp |
| `updated_at` | Last update timestamp |


## Endpoints

The documents resource exposes four public actions.

### `POST /v0/documents`

Upload one new file and create a document resource.

#### Multipart fields

| Field | Type | Required | Description |
|  --- | --- | --- | --- |
| `file` | binary | yes | Source document file |
| `filename` | string | no | Optional filename override |
| `metadata` | object | no | Optional metadata object |


#### Query parameters

| Parameter | Type | Description |
|  --- | --- | --- |
| `pdf_page_start` | integer | Optional first PDF page to keep |
| `pdf_page_end` | integer | Optional last PDF page to keep |


This PDF trimming happens at upload time.

It is useful when:

* the original PDF is larger than the subset you need
* you want to reduce downstream parsing work
* only a known page range matters


### `GET /v0/documents`

List documents in the workspace.

#### Query parameters

| Parameter | Type | Description |
|  --- | --- | --- |
| `page` | integer | Page number |
| `size` | integer | Page size |
| `search` | string | Search in filename and metadata values |
| `created_at_after` | date-time | Filter lower bound |
| `created_at_before` | date-time | Filter upper bound |


### `GET /v0/documents/{id}`

Retrieve one document by its UUID.

### `GET /v0/documents/{id}/download`

Get the download URL for the stored file.

Important behavior:

* the documented response is `302`
* clients that want the actual file should follow the redirect


## Response format details

Documents has three main response shapes.

### Create or retrieve

Both `POST /v0/documents` and `GET /v0/documents/{id}` return one document object.

### List documents

`GET /v0/documents` returns a paginated resource:


```json
{
  "count": 1,
  "next": null,
  "previous": null,
  "results": [
    {
      "id": "300f339f-da71-4f9f-80f6-c25a63baae75",
      "filename": "invoice.pdf"
    }
  ]
}
```

### Download document

`GET /v0/documents/{id}/download` does not return the normal document object.

Instead, it redirects to the underlying file URL.

## Search and filtering

Document listing supports lightweight search and date filtering.

### Search

Use the `search` query parameter to search in:

* the document filename
* metadata values


Example:


```bash
curl -X GET "https://api.anesya.app/v0/documents?search=invoice" \
  -H "X-API-Key: $ANESYA_API_KEY"
```

### Date filtering

Use:

* `created_at_after`
* `created_at_before`


Example:


```bash
curl -X GET "https://api.anesya.app/v0/documents?created_at_after=2026-04-01T00:00:00Z&created_at_before=2026-04-30T23:59:59Z" \
  -H "X-API-Key: $ANESYA_API_KEY"
```

## When to use documents vs direct parsing or extract

Use documents first when:

* the file should be reused later
* you want a stable internal document ID
* the workflow may involve several downstream steps
* the source file should be discoverable in list views
* you want to trim a PDF once and reuse the trimmed version


Skip documents and go directly to parsing or extract when:

* you only need one one-off processing call
* the file is already available as a public or pre-signed URL
* you do not need a stored document lifecycle


## Best practices

### 1. Use documents when reuse matters

If the same file may be parsed again, extracted again, or audited later, upload it first and keep the document ID.

### 2. Use metadata intentionally

Good metadata makes list and search workflows much easier.

Typical metadata examples:

* source system
* customer ID
* import batch identifier
* internal reference number


### 3. Trim PDFs early when only part of the file matters

If you only need a subset of pages, use `pdf_page_start` and `pdf_page_end` at upload time.

This avoids carrying an oversized document through the rest of the pipeline.

### 4. Treat the document ID as the canonical reusable reference

Once a document is uploaded, reuse the UUID rather than re-uploading the same file repeatedly.

### 5. Follow redirects for download

If you call the download endpoint programmatically, make sure your client follows `302` redirects or handles the redirected URL explicitly.

## Common pitfalls

### Re-uploading the same file every time

If the file is meant to be reused, upload it once and keep the document ID.

### Forgetting that search includes metadata values

If search results seem broader than expected, remember that metadata values are part of the search surface.

### Expecting the download endpoint to return JSON

The documented behavior is a redirect, not a standard document object.

### Forgetting PDF trimming is applied at upload time

If you trim pages during upload, the stored document itself reflects that subset.

That is usually desirable, but it should be intentional.

## Troubleshooting

### 401 Unauthorized

Your API key is missing or invalid. Check the `X-API-Key` header.

### 404 Not Found on retrieve or download

The document ID does not exist, is not accessible in the current workspace context, or the file is missing.

### Upload fails with validation errors

Check that:

* the `file` field is present
* the multipart request is correctly formed
* the PDF trim parameters are valid if you used them


### Download does not return the file body

Your client may not be following redirects automatically.

Handle the `302` response correctly.

## Related guides

* [API quickstart](/tutorials/quickstart)
* [Parsing](/tutorials/parsing)
* [Extract](/tutorials/extract)
* [Extract guide](/tutorials/extract)
* [Anesya API Reference for Coding Agents](/tutorials/agent-guide)
* [API reference](/api/schema/documents/document_create)