Scout Server

Scout server is a RESTful Flask application that provides endpoints for managing indexes, documents, metadata and performing searches. To get started with Scout, you can simply run the server from the command-line and specify the path to a SQLite database file which will act as the search index.

Note

If the database does not exist, it will be created.

$ scout my_search_index.db
 * Running on http://127.0.0.1:8000/ (Press CTRL+C to quit)

This will run Scout locally on port 8000 using the Werkzeug multi-threaded WSGI server. The werkzeug server is perfect for getting started and small deployments. For a production-ready server powered by gevent, you can instead run:

$ scout_wsgi my_search_index.db

You can find examples of using other high performance WSGI servers in the deployment section.

All of the examples below show raw HTTP requests using curl as well as the equivalent call using the Scout Python client. To initialize the client:

from scout.client import Scout
scout = Scout('http://localhost:8000')

# If authentication is enabled:
scout = Scout('http://localhost:8000', key='secret')

API Endpoints

There are four main concepts in Scout:

  • Indexes

  • Documents

  • Attachments

  • Metadata

Indexes

Named collection of searchable documents.

Documents

Searchable content, may be associated with any number of indexes.

Attachments

Arbitrary files which are associated with a document. For instance, if you were using Scout to provide search over a library of PDFs, your Document might contain the key search terms from the PDF and the actual PDF would be linked to the document as an attachment. A document may have any number of attachments, or none at all.

Attachment file data is stored in a compressed, content-addressable table. If the same file content is attached to multiple documents, only one copy of the data is stored on disk. When an attachment is removed, the underlying data is automatically cleaned up once no other attachment references it.

Metadata

Documents also can have metadata, arbitrary key/value pairs. Besides full-text search, Scout allows filtering based on metadata values. So in addition to storing useful things alongside your documents, you can also use metadata to provide an additional layer of filtering.

Index list

Endpoint: /

The index list endpoint returns a paginated list of indexes and the number of documents contained within each. New indexes can be created by POST-ing a name to this URL.

Valid GET parameters:

  • page: which page of results to fetch, by default 1.

  • ordering: order in which to return the indexes. By default they are returned ordered by name. Valid values are name, id, and document_count. By prefixing the name with a minus sign (“-”) you can indicate the results should be ordered descending.

Example GET request and response:

$ curl localhost:8000/

Using the Python client:

indexes = scout.get_indexes()

Response:

{
  "indexes": [
    {
      "document_count": 75,
      "documents": "/blog/",
      "id": 1,
      "name": "blog"
    },
    {
      "document_count": 36,
      "documents": "/photos/",
      "id": 2,
      "name": "photos"
    }
  ],
  "ordering": [],
  "page": 1,
  "pages": 1,
  "next_url": null,
  "previous_url": null
}

Note

The Scout.get_indexes() method returns just the "indexes" list from the response, not the full paginated envelope.

Example POST request and response:

$ curl -H "Content-Type: application/json" \
       -d '{"name": "test-index"}' \
       localhost:8000/

Using the Python client:

new_index = scout.create_index('test-index')

Response:

{
  "document_count": 0,
  "documents": [],
  "id": 3,
  "name": "test-index",
  "page": 1,
  "pages": 0,
  "next_url": null,
  "previous_url": null
}

The POST response corresponds to the serialized index detail for the newly-created index.

Index detail

Endpoint: /:index-name/

The index detail returns the name and ID of the index, as well as a paginated list of documents associated with the index. The index can be re-named by POSTing a name to this URL.

Valid GET parameters:

  • q: full-text search query.

  • page: which page of results to fetch, by default 1.

  • ordering: order in which to return the documents. By default they are returned in arbitrary order, unless a search query is present, in which case they are ordered by relevance. Valid choices are id, identifier, content, and score. By prefixing the name with a minus sign (“-”) you can indicate the results should be ordered descending. Note: this parameter can appear multiple times.

  • ranking: when a full-text search query is specified, this parameter determines the ranking algorithm. Valid choices are:

    • bm25: use the Okapi BM25 algorithm, which is natively supported by SQLite FTS5.

    • none: do not use any ranking algorithm. Search results will not have a score attribute.

  • Arbitrary metadata filters. See Filtering on Metadata for a description of metadata filtering.

When a search query is present, each returned document will have an additional field named score. This field contains the numerical value the scoring algorithm gave to the document. To disable scores when searching, you can specify ranking=none.

Example GET request and response.

$ curl localhost:8000/test-index/?q=test

Using the Python client:

results = scout.get_index('test-index', q='test')

Response:

{
  "document_count": 20,
  "documents": [
    {
      "attachments": [],
      "content": "test charlie document",
      "id": 115,
      "identifier": null,
      "indexes": [
        "test-index"
      ],
      "metadata": {
        "is_kitty": "no"
      },
      "score": -0.022727272727272728
    },
    {
      "attachments": [
        {
          "data": "/documents/116/attachments/example.jpg/download/",
          "data_length": 31337,
          "filename": "example.jpg",
          "mimetype": "image/jpeg",
          "timestamp": "2016-01-04 13:37:00"
        }
      ],
      "content": "test huey document",
      "id": 116,
      "identifier": null,
      "indexes": [
        "test-index"
      ],
      "metadata": {
        "is_kitty": "yes"
      },
      "score": -0.022727272727272728
    }
  ],
  "filtered_count": 2,
  "filters": {},
  "id": 3,
  "name": "test-index",
  "ordering": [],
  "page": 1,
  "pages": 1,
  "next_url": null,
  "previous_url": null,
  "ranking": "bm25",
  "search_term": "test"
}

The search response includes several fields beyond the document list:

  • document_count: the total number of documents in the queried scope (e.g. the index), regardless of the current search query or filters.

  • filtered_count: the number of documents that match the current search query and any metadata filters. This may be less than document_count.

  • search_term: echoes back the q parameter that was used.

  • ranking: echoes back the ranking algorithm used (bm25 or none).

  • filters: echoes back any metadata filters that were applied.

  • page, pages, next_url, previous_url: pagination controls.

POST requests update the name of the index, and like the index_list view, accept a name parameter. For example request and response, see the above section on creating a new index.

Example of renaming an index:

$ curl -H "Content-Type: application/json" -d '{"name": "new-name"}' localhost:8000/test-index/

Using the Python client:

renamed = scout.rename_index('test-index', 'new-name')

DELETE requests will delete the index, but all documents will be preserved in the database.

Example of deleting an index:

$ curl -X DELETE localhost:8000/photos/

Using the Python client:

scout.delete_index('photos')

Response:

{"success": true}

Filtering on Metadata

Suppose we have an index that contains all of our contacts. The search content consists of the person’s name, address, city, and state. We also have stored quite a bit of metadata about each person. A person record might look like this:

{'content': "Huey Leifer 123 Main Street Lawrence KS 66044"}

The metadata for this record consists of the following:

{'metadata': {
  'dob': '2010-06-01',
  'city': 'Lawrence',
  'state': 'KS',
}}

To search for all my relatives living in Kansas, I could use the following URL:

/contacts-index/?q=Leifer+OR+Morgan&state=KS

Using the Python client:

results = scout.get_index('contacts-index', q='Leifer OR Morgan', state='KS')

Let’s say we want to search our contacts index for all people who were born in 1983. We could use the following URL:

/contacts-index/?dob__ge=1983-01-01&dob__lt=1984-01-01

Using the Python client:

results = scout.get_index(
    'contacts-index',
    q='*',
    dob__ge='1983-01-01',
    dob__lt='1984-01-01')

To search for all people who live in Lawrence or Topeka, KS we could use the following URL:

/contacts-index/?city__in=Lawrence,Topeka&state=KS

Using the Python client:

results = scout.get_index(
    'contacts-index',
    q='*',
    city__in='Lawrence,Topeka',
    state='KS')

Scout will take all filters and return only those records that match all of the given conditions. However, when the same key is used multiple times, Scout will use OR to join those clauses. For example, another way we could query for people who live in Lawrence or Topeka would be:

/contacts-index/?q=*&city=Lawrence&city=Topeka&state=KS

With the client, you could pass these as a list:

results = scout.get_index(
    'contacts-index',
    q='*',
    city=['Lawrence', 'Topeka'],
    state='KS')

As you can see, we’re querying city=XXX twice. Scout will interpret that as meaning (city=Lawrence OR city=Topeka) AND state=KS.

Query operations

There are a number of operations available for use when querying metadata. Here is the complete list:

  • keyname__eq: Default (when only the key name is supplied). Returns documents whose metadata contains the given key/value pair.

  • keyname__ne: Not equals.

  • keyname__ge: Greater-than or equal-to.

  • keyname__gt: Greater-than.

  • keyname__le: Less-than or equal-to.

  • keyname__lt: Less-than.

  • keyname__in: In. The value should be a comma-separated list of values to match.

  • keyname__contains: Substring search.

  • keyname__startswith: Prefix search.

  • keyname__endswith: Suffix search.

  • keyname__regex: Search using a regular expression.

Note

All metadata values are stored as strings, regardless of the type used when creating or updating a document (e.g. passing an integer 33 will be stored as "33"). For the comparison operators __ge, __gt, __le, and __lt, if the filter value looks numeric, Scout will cast both the stored value and the filter value to a real number before comparing. This means that filtering should “just” work as expected - strings are ordered lexicographically and numeric values are ordered numerically when the filter requires it.

Document list

Endpoint: /documents/

The document list endpoint returns a paginated list of all documents, regardless of index. New documents are created by POST-ing the content, index(es) and optional metadata.

Valid GET parameters:

  • q: full-text search query.

  • page: which page of documents to fetch, by default 1.

  • index: the name of an index to restrict the results to. Note: this parameter can appear multiple times. When multiple indexes are specified, the document_count reflects the number of distinct documents across those indexes (a document belonging to multiple filtered indexes is only counted once).

  • ordering: order in which to return the documents. By default they are returned in arbitrary order, unless a search query is present, in which case they are ordered by relevance. Valid choices are id, identifier, content, and score. By prefixing the name with a minus sign (“-”) you can indicate the results should be ordered descending. Note: this parameter can appear multiple times.

  • ranking: when a full-text search query is specified, this parameter determines the ranking algorithm. Valid choices are:

    • bm25: use the Okapi BM25 algorithm, which is natively supported by SQLite FTS5.

    • none: do not use any ranking algorithm. Search results will not have a score attribute.

  • Arbitrary metadata filters. See Filtering on Metadata for a description of metadata filtering.

When a search query is present, each returned document will have an additional field named score. This field contains the numerical value the scoring algorithm gave to the document. To disable scores when searching, you can specify ranking=none.

Example GET request and response. In the request below we are searching for the string “test” in the photos, articles and videos indexes.

$ curl localhost:8000/documents/?q=test&index=photos&index=articles&index=videos

Using the Python client:

results = scout.get_documents(q='test', index=['photos', 'articles', 'videos'])

Response:

{
  "document_count": 207,
  "documents": [
    {
      "attachments": [
        {
          "data": "/documents/72/attachments/example.jpg/download/",
          "data_length": 31337,
          "filename": "example.jpg",
          "mimetype": "image/jpeg",
          "timestamp": "2016-03-01 13:37:00"
        }
      ],
      "content": "test photo",
      "id": 72,
      "identifier": null,
      "indexes": [
        "photos"
      ],
      "metadata": {
        "timestamp": "2016-03-01 13:37:00"
      },
      "score": -0.01304
    },
    {
      "attachments": [
        {
          "data": "/documents/61/attachments/movie.mp4/download/",
          "data_length": 3131337,
          "filename": "movie.mp4",
          "mimetype": "video/mp4",
          "timestamp": "2016-03-02 13:37:00"
        }
      ],
      "content": "test video upload",
      "id": 61,
      "identifier": null,
      "indexes": [
        "videos"
      ],
      "metadata": {
        "timestamp": "2016-03-02 13:37:00"
      },
      "score": -0.01407
    }
  ],
  "filtered_count": 2,
  "filters": {},
  "ordering": [],
  "page": 1,
  "pages": 1,
  "next_url": null,
  "previous_url": null,
  "ranking": "bm25",
  "search_term": "test"
}

POST requests should have the following parameters:

  • content (required): the document content.

  • index or indexes (required): the name(s) of the index(es) the document should be associated with.

  • identifier (optional): an application-defined identifier for the document. If a document with the same identifier already exists, the existing document will be updated instead of creating a new one. Identifiers may only contain letters, digits, _, -, ., and :. Characters such as spaces, slashes, ?, and % are not allowed and will be rejected with a 400 error.

  • metadata (optional): arbitrary key/value pairs.

Warning

Identifiers should not be purely numeric strings (e.g. "42"). When looking up a document, Scout first checks the identifier field, then falls back to matching by internal id. A numeric identifier could collide with the internal ID of an unrelated document, leading to unexpected results.

Example POST request creating a new document:

$ curl \
    -H "Content-Type: application/json" \
    -d '{"content": "New document", "indexes": ["test-index"]}' \
    http://localhost:8000/documents/

Using the Python client:

doc = scout.create_document('New document', 'test-index')

# With multiple indexes, an identifier, and metadata:
doc = scout.create_document(
    'New document',
    ['test-index', 'blog'],
    identifier='my-doc-1',
    author='somebody',
    published='true')

Response on creating a new document:

{
  "attachments": [],
  "content": "New document",
  "id": 121,
  "identifier": null,
  "indexes": [
    "test-index"
  ],
  "metadata": {}
}

Document detail

Endpoint: /documents/:document-id/

The document detail endpoint returns document content, indexes, and metadata. Documents can be updated or deleted by using POST and DELETE requests, respectively. When updating a document, you can update the content, index(es), identifier, and/or metadata.

The :document-id parameter is resolved in the following order:

  1. Scout first attempts to find a document whose user-defined identifier matches the given value.

  2. If no match is found and the value is numeric, Scout falls back to looking up the document by its internal id (rowid).

This means a purely numeric identifier like "42" is ambiguous. If no document has that identifier, the request will silently resolve to whichever document has internal ID 42. For this reason, user-defined identifiers should include at least one non-numeric character (e.g. "doc-42", "post:42").

Warning

If you choose to update metadata, all current metadata for the document will be removed, so it’s really more of a “replace” than an “update”. To clear all metadata for a document, pass "metadata": null or "metadata": {}.

Note

When updating a document, omitting a field preserves its current value. For example, omitting indexes from a POST will leave the document’s index associations unchanged. However, passing an empty list ("indexes": []) will explicitly clear all index associations.

Example GET request and response:

$ curl localhost:8000/documents/118/

Using the Python client:

doc = scout.get_document(118)

Response:

{
  "attachments": [],
  "content": "test zaizee document",
  "id": 118,
  "identifier": null,
  "indexes": [
    "test-index"
  ],
  "metadata": {
    "is_kitty": "yes"
  }
}

Here is an example of updating the content and indexes using a POST request:

$ curl \
    -H "Content-Type: application/json" \
    -d '{"content": "test zaizee updated", "indexes": ["test-index", "blog"]}' \
    http://localhost:8000/documents/118/

Using the Python client:

updated = scout.update_document(
    document_id=118,
    content='test zaizee updated',
    indexes=['test-index', 'blog'])

# Update metadata only (replaces all existing metadata):
updated = scout.update_document(
    document_id=118,
    metadata={'is_kitty': 'yes', 'color': 'gray'})

# Clear all metadata:
updated = scout.update_document(document_id=118, metadata={})

Response:

{
  "attachments": [],
  "content": "test zaizee updated",
  "id": 118,
  "identifier": null,
  "indexes": [
    "blog",
    "test-index"
  ],
  "metadata": {
    "is_kitty": "yes"
  }
}

DELETE requests can be used to completely remove a document. Deleting a document will also remove all of its metadata, index associations, and attachments. Orphaned attachment data (BLOBs not referenced by any other document) will be cleaned up automatically.

Example DELETE request and response:

$ curl -X DELETE localhost:8000/documents/121/

Using the Python client:

scout.delete_document(121)

Response:

{"success": true}

Working with Document Identifiers

Every document in Scout has an auto-generated integer id. Optionally, you can also assign a user-defined identifier, an application-specific string that lets you reference documents without tracking Scout’s internal IDs.

Identifiers are useful because they decouple your application’s data model from Scout’s storage. Instead of storing Scout document IDs in your application database, you can derive the identifier from something you already have (a primary key, a URL slug, a UUID, or any other unique string). This means your application can create, update, retrieve, and delete documents using its own natural keys.

Creating a document with an identifier

$ curl \
    -H "Content-Type: application/json" \
    -d '{"content": "My blog post about cats", "indexes": ["blog"], "identifier": "post-42"}' \
    http://localhost:8000/documents/

Using the Python client:

scout.create_document(
    'My blog post about cats',
    'blog',
    identifier='post-42')

Retrieving and updating by identifier

Once a document has an identifier, you can use it anywhere a document ID is accepted - in the URL path for GET, POST, and DELETE requests:

$ curl localhost:8000/documents/post-42/

Using the Python client:

doc = scout.get_document('post-42')

Updates work the same way:

scout.update_document(
    document_id='post-42',
    content='My updated blog post about cats and dogs',
    metadata={'tags': 'pets'})

Upsert behavior

When you create a document with an identifier that already exists, Scout updates the existing document instead of creating a duplicate. This gives you upsert semantics - your application can push content to Scout without checking whether the document already exists:

# First call creates the document.
scout.create_document('Draft v1', 'blog', identifier='post-99')

# Second call with the same identifier updates it.
scout.create_document('Draft v2 — now with more content', 'blog',
                      identifier='post-99')

# There is still only one document with identifier "post-99".
doc = scout.get_document('post-99')
print(doc['content'])  # Draft v2 — now with more content

This is especially convenient when re-indexing content. You can re-run your indexing script without worrying about creating duplicate documents. As long as you provide consistent identifiers, Scout will update in place.

Deleting by identifier

$ curl -X DELETE localhost:8000/documents/post-42/

Using the Python client:

scout.delete_document('post-42')

Identifier uniqueness

Each identifier can belong to at most one document. If you update a document’s identifier to a value that is currently owned by a different document, the other document’s identifier is silently cleared:

# doc-a owns "shared-id", doc-b has no identifier.
scout.create_document('aaa', 'idx', identifier='shared-id')
doc_b = scout.create_document('bbb', 'idx', identifier='other-id')

# Reassign "shared-id" from doc-a to doc-b:
scout.update_document(doc_b['id'], identifier='shared-id')

# doc-b now owns "shared-id".
scout.get_document('shared-id')['content']  # 'bbb'

# doc-a's identifier has been cleared to None.
scout.get_document(doc_a['id'])['identifier']  # None

The same rule applies during upsert: create_document with an existing identifier always resolves to the current owner. It will never “steal” the identifier to a new row, instead it updates the existing document in place. Theft can only happen through update_document.

Warning

Scout does not return an error or warning when an identifier is taken from another document. If your application shares identifiers across independent subsystems, use a prefix or some other convention to ensure uniqueness.

Update metadata

Endpoint: /documents/:document-id/metadata/

The update metadata endpoint merges new metadata into a document’s existing metadata, rather than replacing it entirely (as happens when you include metadata in a document update via Document detail). This is useful when you need to add or change individual keys without re-supplying the full set.

POST or PUT a JSON object with a metadata key. The value should be an object of key/value pairs to merge. The merge follows these rules:

  • Keys that already exist on the document are overwritten with the new values.

  • New keys are added.

  • Keys whose value is null are deleted from the document’s metadata (if they exist).

  • If metadata is empty ({}), all existing metadata is cleared.

Example POST request:

$ curl \
    -H "Content-Type: application/json" \
    -d '{"metadata": {"k1": "v1", "k2": "v2"}}' \
    http://localhost:8000/documents/118/metadata/

Using the Python client:

scout.update_metadata(118, k1='v1', k2='v2')

Response:

{
  "attachments": [],
  "content": "test zaizee document",
  "id": 118,
  "identifier": null,
  "indexes": [
    "test-index"
  ],
  "metadata": {
    "k1": "v1",
    "k2": "v2"
  }
}

Subsequent calls merge into the existing metadata. For example, after the request above:

$ curl \
    -H "Content-Type: application/json" \
    -d '{"metadata": {"k1": "v1-updated", "k3": "v3"}}' \
    http://localhost:8000/documents/118/metadata/

Using the Python client:

scout.update_metadata(118, k1='v1-updated', k3='v3')

The resulting metadata would be {"k1": "v1-updated", "k2": "v2", "k3": "v3"}.

To delete a specific key, set its value to null:

$ curl \
    -H "Content-Type: application/json" \
    -d '{"metadata": {"k2": null}}' \
    http://localhost:8000/documents/118/metadata/

Using the Python client:

scout.update_metadata(118, k2=None)

The resulting metadata would be {"k1": "v1-updated", "k3": "v3"}.

To clear all metadata, pass an empty object:

$ curl \
    -H "Content-Type: application/json" \
    -d '{"metadata": {}}' \
    http://localhost:8000/documents/118/metadata/

Using the Python client:

scout.update_metadata(118)

Note

This endpoint differs from updating metadata through the document detail endpoint. The document detail endpoint replaces all metadata, while this endpoint merges new values into the existing metadata.

Attachment list

Endpoint: /documents/:document-id/attachments/

The attachment list endpoint returns a paginated list of all attachments associated with a given document. New attachments are created by POST-ing a file to this endpoint.

Valid GET parameters:

  • page: which page of attachments to fetch, by default 1.

  • ordering: order in which to return the attachments. By default they are returned by filename. Valid choices are id, document, hash, filename, mimetype, and timestamp. By prefixing the name with a minus sign (“-”) you can indicate the results should be ordered descending. Note: this parameter can appear multiple times.

Example GET request and response.

$ curl localhost:8000/documents/13/attachments/?ordering=timestamp

Using the Python client:

attachments = scout.get_attachments(13, ordering='timestamp')

Response:

{
  "attachments": [
    {
      "data": "/documents/13/attachments/banner.jpg/download/",
      "data_length": 135350,
      "document": "/documents/13/",
      "filename": "banner.jpg",
      "mimetype": "image/jpeg",
      "timestamp": "2016-03-01 13:37:01"
    },
    {
      "data": "/documents/13/attachments/background.jpg/download/",
      "data_length": 25039,
      "document": "/documents/13/",
      "filename": "background.jpg",
      "mimetype": "image/jpeg",
      "timestamp": "2016-03-01 13:37:02"
    }
  ],
  "ordering": ["timestamp"],
  "page": 1,
  "pages": 1,
  "next_url": null,
  "previous_url": null
}

POST requests should contain the attachments as form-encoded file fields. When uploading files with curl or other HTTP clients, structured data (such as document content or metadata) must be sent alongside the files in a form field called data, encoded as a JSON string. The Scout client handles all encoding automatically.

Example POST request uploading a new attachment:

$ curl \
    -H "Content-Type: multipart/form-data" \
    -F 'data=""' \
    -F "file_0=@/path/to/image.jpg" \
    -X POST \
    http://localhost:8000/documents/13/attachments/

Using the Python client:

from io import BytesIO

result = scout.attach_files(13, {
    'image.jpg': open('/path/to/image.jpg', 'rb'),
})

# Multiple files at once:
result = scout.attach_files(13, {
    'photo.jpg': open('/path/to/photo.jpg', 'rb'),
    'notes.txt': BytesIO(b'Some notes about this document'),
})

Response on creating a new attachment:

{
  "attachments": [
    {
      "data": "/documents/13/attachments/some-image.jpg/download/",
      "data_length": 18912,
      "document": "/documents/13/",
      "filename": "some-image.jpg",
      "mimetype": "image/jpeg",
      "timestamp": "2016-03-14 13:38:00"
    }
  ]
}

Note

You can upload multiple attachments at the same time.

Attachments can also be included when creating or updating a document:

doc = scout.create_document(
    'Document with files',
    'my-index',
    attachments={
        'readme.txt': BytesIO(b'Read me!'),
        'data.csv': open('/path/to/data.csv', 'rb'),
    },
    author='alice')

Each document can have at most one attachment with a given filename. If you upload a file whose name matches an existing attachment on the same document, the previous attachment is replaced. No error is raised. This applies both when attaching files directly and when including files in a document create or update request.

Filenames are sanitized for security on upload. Path separators and dangerous characters are stripped (e.g. ../../etc/passwd becomes etc_passwd). If sanitization produces an empty string, the filename defaults to unnamed. The sanitized filename is what appears in the API and must be used to retrieve or delete the attachment. If you need to preserve the original filename, store it in the document’s metadata.

Attachment detail

Endpoint: /documents/:document-id/attachments/:filename/

The attachment detail endpoint returns basic information about the attachment, as well as a link to download the actual attached file. Attachments can be updated or deleted by using POST and DELETE requests, respectively. When you update an attachment, the original is deleted and a new attachment created for the uploaded content.

Example GET request and response:

$ curl localhost:8000/documents/13/attachments/test-image.png/

Using the Python client:

attachment = scout.get_attachment(13, 'test-image.png')

Response:

{
  "data": "/documents/13/attachments/test-image.png/download/",
  "data_length": 3710133,
  "document": "/documents/13/",
  "filename": "test-image.png",
  "mimetype": "image/png",
  "timestamp": "2016-03-14 22:10:00"
}

Example of updating (replacing) an attachment:

$ curl \
    -H "Content-Type: multipart/form-data" \
    -F 'data=""' \
    -F "file_0=@/path/to/new-image.png" \
    -X POST \
    http://localhost:8000/documents/13/attachments/test-image.png/

Using the Python client:

scout.update_file(13, 'test-image.png', open('/path/to/new-image.png', 'rb'))

DELETE requests are used to detach a file from a document.

Example DELETE request and response:

$ curl -X DELETE localhost:8000/documents/13/attachments/test-image.png/

Using the Python client:

scout.detach_file(13, 'test-image.png')

Response:

{"success": true}

Attachment download

Endpoint: /documents/:document-id/attachments/:filename/download/

The attachment download endpoint is a special URL that returns the attached file as a downloadable HTTP response. This is the only way to access an attachment’s underlying file data.

To download an attachment, simply send a GET request to the attachment’s “data” URL:

$ curl http://localhost:8000/documents/13/attachments/banner.jpg/download/

Using the Python client:

raw_bytes = scout.download_attachment(13, 'banner.jpg')

# Save to a file:
with open('banner.jpg', 'wb') as fh:
    fh.write(raw_bytes)

Global attachment list

Endpoint: /attachments/

The global attachment list endpoint returns a paginated list of all attachments across all documents in the database. This is useful for browsing or searching all uploaded files without knowing which document they belong to.

Valid GET parameters:

  • page: which page of results to fetch, by default 1.

  • ordering: order in which to return the attachments. By default they are returned by filename. Valid choices are filename, mimetype, timestamp, and id. By prefixing the name with a minus sign (“-”) you can indicate the results should be ordered descending.

  • index: restrict results to attachments on documents belonging to the specified index. Note: this parameter can appear multiple times.

  • filename: filter by exact filename.

  • mimetype: filter by exact MIME type.

Example GET request and response:

$ curl localhost:8000/attachments/?mimetype=image/jpeg

Using the Python client:

results = scout.search_attachments(mimetype='image/jpeg')

Response:

{
  "attachments": [
    {
      "data": "/documents/1/attachments/photo.jpg/download/",
      "data_length": 31337,
      "document": "/documents/1/",
      "filename": "photo.jpg",
      "mimetype": "image/jpeg",
      "timestamp": "2016-03-01 13:37:00"
    }
  ],
  "ordering": [],
  "page": 1,
  "pages": 1,
  "next_url": null,
  "previous_url": null
}

Example filtering by index:

$ curl localhost:8000/attachments/?index=blog&index=photos

Using the Python client:

results = scout.search_attachments(index='blog')

# Filter by index and filename:
results = scout.search_attachments(index='blog', filename='header.png')

This will return all attachments belonging to documents in the blog or photos indexes.

Example of using Authentication

Scout provides very basic key-based authentication. You can specify a single, global key which must be specified in order to access the API.

To specify the API key, you can pass it in on the command-line or specify it in a configuration file (described below).

Example of running scout with an API key:

$ scout -k secret /path/to/search.db

If we try to access the API without specifying the key, we get a 401 response stating Invalid API key:

$ curl localhost:8000/
Invalid API key

We can specify the key as a header:

$ curl -H "key: secret" localhost:8000/
{
  "indexes": []
}

Alternatively, the key can be specified as a GET argument:

$ curl localhost:8000/?key=secret
{
  "indexes": []
}

Using the Python client, the key is passed when initializing the client and is automatically included with every request:

scout = Scout('http://localhost:8000', key='secret')
indexes = scout.get_indexes()  # Key is sent automatically.

Configuration and Command-Line Options

The easiest way to run Scout is to invoke it directly from the command-line, passing the database in as the last argument:

$ scout /path/to/search.db

The database file can also be specified using the SCOUT_DATABASE environment variable:

$ SCOUT_DATABASE=/path/to/search.db scout

Scout supports a handful of configuration options to control its behavior when run from the command-line. The following table describes these options:

  • -H, --host: set the hostname to listen on. Defaults to 127.0.0.1.

  • -p, --port: set the port to listen on. Defaults to 8000.

  • -u, --url-prefix: URL path to prefix Scout API with, e.g. /search.

  • -d, --debug: boolean flag to run Scout in debug mode.

  • -c, --config: set the configuration file (a Python module). See the configuration options for available settings.

  • --paginate-by: set the number of documents displayed per page of results. Default is 50. Must be between 1 and 1000.

  • -k, --api-key: set the API key required to access Scout. By default no authentication is required.

  • -C, --cache-size: set the size of the SQLite page cache (in MB), defaults to 64.

  • -f, --fsync: require fsync after every SQLite transaction is committed. By default synchronous writes are disabled for performance.

  • -j, --journal-mode: specify SQLite journal-mode. Default is wal (recommended).

  • -l, --logfile: configure file for log output.

  • -m, --max-request-size: maximum size of request body in bytes. Default is 64MB (67108864 bytes).

  • --migrate: migrate older FTS4 schema to FTS5 in-place (only runs if needed).

Migrating from FTS4 to FTS5

Scout 4.x requires SQLite FTS5. If your database was created by an older version of Scout that used FTS4, the server will refuse to start and print an error directing you to migrate.

To migrate in-place, pass the --migrate flag when starting the server:

$ scout --migrate /path/to/search.db

The migration copies all document content into a new FTS5 table and builds the identifier lookup table. It runs inside a transaction, so if anything goes wrong the database is left unchanged. Once migration succeeds, subsequent starts no longer need the flag. Scout will detect the FTS5 schema and proceed normally.

Tip

As always, back up your database before migrating.

Python Configuration File

For more control, you can override certain settings and configuration values by specifying them in a Python module to use as a configuration file.

The following options can be overridden:

  • AUTHENTICATION (same as -k or --api-key).

  • DATABASE, the path to the SQLite database file containing the search index. This file will be created if it does not exist.

  • DEBUG (same as -d or --debug).

  • HOST (same as -H or --host).

  • PAGINATE_BY (same as --paginate-by).

  • PORT (same as -p or --port).

  • URL_PREFIX (same as -u or --url-prefix), a URL path to prefix the Scout API with.

  • SQLITE_PRAGMAS, a list of 2-tuples specifying SQLite pragmas to set on the database connection (e.g. [('journal_mode', 'wal'), ('cache_size', -65536)]).

  • MAX_CONTENT_LENGTH, maximum request body size in bytes (same as -m or --max-request-size).

Note

Options specified on the command-line will override any options specified in the configuration file.

Example configuration file:

# search_config.py
AUTHENTICATION = 'my-secret-key'
DATABASE = 'my_search.db'
HOST = '0.0.0.0'
PORT = 1234

Example of running Scout with the above config file. Note that since we specified the database in the config file, we do not need to pass one in on the command-line.

$ scout -c search_config.py

You can also specify the configuration file using the SCOUT_CONFIG environment variable:

$ SCOUT_CONFIG=search_config.py scout