Scout Client

Scout comes with a simple Python client. This document describes the client API.

class Scout(endpoint[, key=None])

The Scout class provides a simple, Pythonic API for interacting with and querying a Scout server.

Parameters:
  • endpoint – The base URL the Scout server is running on.

  • key – The authentication key (if used) required to access the Scout server.

Example of initializing the client:

>>> from scout.client import Scout
>>> scout = Scout('https://search.my-site.com/', key='secret!')
search(q, **kwargs)

Retrieve a paginated list of documents matching the search query. Searches can be restricted to one or more indexes by passing index= either as a str (single index) or a list of index names.

The following parameters are supported:

Parameters:
  • q – full-text search query using FTS5 query syntax. Use '*' to retrieve all documents.

  • ordering – columns to sort results by. By default, when you perform a search the results will be ordered by relevance.

  • index – one or more index names to restrict the results to.

  • ranking – ranking algorithm to use. By default this is bm25, however you can specify none.

  • page – page number of results to retrieve.

  • **filters

    Arbitrary key/value pairs used to filter the metadata.

The Filtering on Metadata section describes how to use key/value pairs to construct filters on the document’s metadata.

See Document list for more information.

Note

This method is a thin wrapper around the Scout.get_documents() method. The behavior is identical except that the search() method makes the q= parameter required.

Index methods

get_indexes(**kwargs)

Return the list of indexes available on the server.

See Index list for more information.

create_index(name)

Create a new index with the given name. If an index with that name already exists, you will receive a 400 response.

See the POST section of Index list for more information.

rename_index(old_name, new_name)

Rename an existing index.

delete_index(name)

Delete an existing index. Any documents associated with the index will not be deleted.

get_index(name, **kwargs)

Return the details about the particular index, along with a paginated list of all documents stored in the given index.

The following optional parameters are supported:

Parameters:
  • q – full-text search query to be run over the documents in this index.

  • ordering – columns to sort results by. By default, when you perform a search the results will be ordered by relevance.

  • ranking – ranking algorithm to use. By default this is bm25, however you can specify none.

  • page – page number of results to retrieve.

  • **filters

    Arbitrary key/value pairs used to filter the metadata.

The Filtering on Metadata section describes how to use key/value pairs to construct filters on the document’s metadata.

See Index detail for more information.

Document methods

create_document(content, indexes[, identifier=None[, attachments=None[, **metadata]]])

Store a document in the specified index(es). If an identifier is provided and a document with that identifier already exists, the existing document will be updated.

Parameters:
  • content (str) – Text content to expose for search.

  • indexes – Either the name of an index or a list of index names.

  • identifier – Optional alternative user-defined identifier for document.

  • attachments – An optional mapping of filename to file-like object, which should be uploaded and stored as attachments on the given document.

  • metadata – Arbitrary key/value pairs to store alongside the document content.

update_document(document_id[, content=None[, indexes=None[, metadata=None[, identifier=None[, attachments=None]]]]])

Update one or more attributes of a document that’s stored in the database.

Parameters:
  • document_id – The integer document ID or a string identifier for the document to update.

  • content (str) – Text content to expose for search (optional).

  • indexes – Either the name of an index or a list of index names (optional).

  • metadata – Arbitrary key/value pairs to store alongside the document content (optional).

  • identifier – Set or change the document’s identifier. Use document_id to specify which document to update.

  • attachments – An optional mapping of filename to file-like object, which should be uploaded and stored as attachments on the given document. If a filename already exists, it will be over-written with the new attachment.

Note

The metadata and identifier parameters use sentinel defaults internally. Omitting either parameter preserves the existing value. Explicitly passing None (or {}, for metadata) clears it. For example:

# Preserves existing identifier and metadata:
scout.update_document(doc_id, content='new text')

# Clears the identifier:
scout.update_document(doc_id, identifier=None)

# Clears all metadata:
scout.update_document(doc_id, metadata={})

If you specify metadata with a non-empty dict, existing metadata is replaced entirely (not merged). Use update_metadata() for merge behavior.

Raises:

ValueError – if no fields are provided to update and no attachments are given.

delete_document(document_id)

Remove a document from the database, as well as all indexes, metadata, and attachments.

Parameters:

document_id – The integer document ID, or a user-specified unique identifier.

Raises:

ValueError – if document_id is not provided.

get_document(document_id)

Retrieve content for the given document.

Parameters:

document_id – The integer document ID, or a user-specified unique identifier.

update_metadata(document_id, **metadata)

Update metadata for the document by merging the new values into the existing metadata.

Parameters:
  • document_id – The integer document ID, or a user-specified unique identifier.

  • metadata – Arbitrary key/value metadata.

Metadata is merged into the document’s existing metadata using the following rules:

Keys that exist will be overwritten with new user-provided values, unless the user-provided value is None in which case that key will be deleted (if it exists on the document). If no new data is specified then all existing document metadata will be cleared.

Example:

# Assume Document 1's metadata is empty to begin with: {}

client.update_metadata(1, k1='v1', k2='v2')
# metadata = {'k1': 'v1', 'k2': 'v2'}

client.update_metadata(1, k1='v1x', k3='v3')
# metadata = {'k1': 'v1x', 'k2': 'v2', 'k3': 'v3'}

client.update_metadata(1, k1=None, k4='v4', k99=None)
# metadata = {'k2': 'v2', 'k3': 'v3', 'k4': 'v4'}

client.update_metadata(1)  # Clears metadata.
# metadata = {}
get_documents(**kwargs)

Retrieve a paginated list of all documents in the database, regardless of index. This method can also be used to perform full-text search queries across the entire database of documents, or a subset of indexes.

The following optional parameters are supported:

Parameters:
  • q – full-text search query to be run over the documents in this index.

  • ordering – columns to sort results by. By default, when you perform a search the results will be ordered by relevance.

  • index – one or more index names to restrict the results to.

  • ranking – ranking algorithm to use. By default this is bm25, however you can specify none.

  • page – page number of results to retrieve.

  • **filters

    Arbitrary key/value pairs used to filter the metadata.

The Filtering on Metadata section describes how to use key/value pairs to construct filters on the document’s metadata.

See Document list for more information.

Attachment methods

attach_files(document_id, attachments)
Parameters:
  • document_id – The integer document ID or the user-specified document identifier.

  • attachments – A dictionary mapping filename to file-like object.

Upload the attachments and associate them with the given document.

For more information, see Attachment list.

detach_file(document_id, filename)
Parameters:
  • document_id – The integer document ID or the user-specified document identifier.

  • filename – The filename of the attachment to remove.

Detach the specified file from the document.

update_file(document_id, filename, file_object)
Parameters:
  • document_id – The integer document ID or the user-specified document identifier.

  • filename – The filename of the attachment to update.

  • file_object – A file-like object.

Replace the contents of the current attachment with the contents of file_object.

get_attachments(document_id, **kwargs)
Parameters:

document_id – The integer document ID or the user-specified document identifier.

Retrieve a paginated list of attachments associated with the given document.

The following optional parameters are supported:

Parameters:
  • ordering – columns to use when sorting attachments.

  • page – page number of results to retrieve.

For more information, see Attachment list.

get_attachment(document_id, filename)
Parameters:
  • document_id – The integer document ID or the user-specified document identifier.

  • filename – The filename of the attachment.

Retrieve data about the given attachment.

For more information, see Attachment detail.

download_attachment(document_id, filename)
Parameters:
  • document_id – The integer document ID or the user-specified document identifier.

  • filename – The filename of the attachment.

Download the specified attachment. Returns the raw file bytes.

For more information, see Attachment download.

search_attachments(**kwargs)

Search and filter attachments across all documents in the database.

The following optional parameters are supported:

Parameters:
  • ordering – columns to use when sorting attachments.

  • page – page number of results to retrieve.

  • index – restrict results to attachments on documents in the specified index.

  • filename – filter by exact filename.

  • mimetype – filter by exact MIME type.

For more information, see Global attachment list.

Example:

>>> results = scout.search_attachments(mimetype='image/jpeg')
>>> for attachment in results['attachments']:
...     print(attachment['filename'])

SearchProvider and SearchSite

The client module also provides helper classes for integrating Scout with application models. These make it easy to automatically index and remove objects.

class SearchProvider

Abstract base class that defines how to extract searchable data from an application object.

content(obj)

Return the text content for the given object to be indexed for search. Required.

identifier(obj)

Return a unique identifier string for the given object. Optional; if not implemented, no identifier will be stored.

metadata(obj)

Return a dictionary of metadata key/value pairs for the given object. Optional; if not implemented, no metadata will be stored.

class SearchSite(client, index)

Manages a registry of model classes and their search providers, and provides methods to store and remove objects from a Scout index.

Parameters:
  • client – A Scout client instance.

  • index – The name of the index to use for all operations.

register(model_class, search_provider)

Register a SearchProvider subclass for the given model class. Multiple providers can be registered for the same model class.

Parameters:
  • model_class – The class of objects to be indexed.

  • search_provider – A SearchProvider subclass (not an instance).

unregister(model_class[, search_provider=None])

Remove a search provider registration. If search_provider is None, all providers for the given model class are removed.

Parameters:
  • model_class – The class to unregister.

  • search_provider – Optional specific provider class to remove.

store(obj)

Index the given object using all registered providers for its type. Returns True if the object’s type was registered, False otherwise.

Parameters:

obj – The object to index.

remove(obj)

Remove the given object from the search index. Returns True if the object’s type was registered, False otherwise.

Parameters:

obj – The object to remove.

Example usage:

from scout.client import Scout, SearchProvider, SearchSite

class BlogPostProvider(SearchProvider):
    def content(self, post):
        return '%s\n%s' % (post.title, post.body)

    def identifier(self, post):
        return str(post.id)

    def metadata(self, post):
        return {
            'title': post.title,
            'published': str(post.is_published),
        }

scout = Scout('http://localhost:8000')
site = SearchSite(scout, 'blog-posts')
site.register(BlogPost, BlogPostProvider)

# Index a blog post.
site.store(my_post)

# Remove a blog post from the index.
site.remove(my_post)