.. _example:

Examples
========

This document walks through two complete examples showing how to use Scout in
practice. Both examples use the :ref:`Python client <client>`, but every
operation has an equivalent ``curl`` call documented in the :ref:`server`
reference.

.. contents:: In this document
   :local:
   :depth: 2


Example 1: Personal Blog
-------------------------

Consider a personal blog with two kinds of searchable content: **entries** and
**comments**. We will create a separate index for each, populate them with
documents, attach images to entries, and then search and filter the results.

Setting up
^^^^^^^^^^

Start the server and create a client:

.. code-block:: python

    from scout.client import Scout

    scout = Scout('http://localhost:8000')

Create the two indexes:

.. code-block:: python

    scout.create_index('blog-entries')
    scout.create_index('blog-comments')

Indexing blog entries
^^^^^^^^^^^^^^^^^^^^^

Each blog entry becomes a document whose ``content`` is the full text of the
post. Metadata stores structured fields we will want to filter on later:

.. code-block:: python

    scout.create_document(
        'Welcome to my blog!  This is my very first post, where I introduce '
        'myself and talk about what I plan to write about.',
        'blog-entries',
        identifier='entry-1',
        title='Welcome to my blog!',
        url='/blog/welcome/',
        published='true',
        date='2026-01-15')

    scout.create_document(
        'Today my cat ate a spider and later he was sick.',
        'blog-entries',
        identifier='entry-2',
        title='Spider Adventures',
        url='/blog/spiders/',
        published='true',
        date='2026-02-03')

    scout.create_document(
        'My cat ate another spider, I could tell because there were some '
        'legs on his bed. He was not sick, so it must have been a different '
        'type of spider',
        'blog-entries',
        identifier='entry-3',
        title='More Spider News',
        url='/blog/more-spiders/',
        published='true',
        date='2026-03-20')

    # A draft that has not been published yet.
    scout.create_document(
        'Draft post about pest control...still fleshing out ideas here.',
        'blog-entries',
        identifier='entry-4',
        title='Pest Control (draft)',
        url='/blog/pest-control/',
        published='false',
        date='2024-04-01')

Attaching images
^^^^^^^^^^^^^^^^

If your blog entries have a primary image, you can attach it to the document so
that search results can display a thumbnail:

.. code-block:: python

    scout.attach_files(
        'entry-2',
        {'spider.jpg': open('/path/to/spider.jpg', 'rb')})

    scout.attach_files(
        'entry-3',
        {'spider-remnants.png': open('/path/to/spider-remnants.png', 'rb')})

When you retrieve or search documents later, each result will include an
``attachments`` list with download URLs.

Indexing comments
^^^^^^^^^^^^^^^^^

Comments are stored in a separate index. The metadata includes the parent
entry's identifier (for filtering) and a spam flag:

.. code-block:: python

    scout.create_document(
        'Looking forward to the content',
        'blog-comments',
        identifier='comment-1',
        entry_id='entry-1',
        author='alice',
        spam='false',
        date='2026-01-16')

    scout.create_document(
        'What did the spider look like?',
        'blog-comments',
        identifier='comment-2',
        entry_id='entry-2',
        author='bob',
        spam='false',
        date='2026-02-04')

    scout.create_document(
        'Buy cheap watches at http://example.com',
        'blog-comments',
        identifier='comment-3',
        entry_id='entry-2',
        author='spambot',
        spam='true',
        date='2026-02-04')

Searching entries
^^^^^^^^^^^^^^^^^

Full-text search over all published entries:

.. code-block:: python

    results = scout.get_documents(q='spiders', index='blog-entries', published='true')
    for doc in results['documents']:
        print(doc['metadata']['title'], '-', doc['metadata']['url'], doc['score'])
    # Spider Adventures - /blog/spiders/ -0.268...
    # More Spider News - /blog/more-spiders/ -0.252...

Search with a wildcard to match prefixes:

.. code-block:: python

    results = scout.get_documents(q='spid*', index='blog-entries')
    for doc in results['documents']:
        print(doc['metadata']['title'])
    # Spider Adventures
    # More Spider News

Filtering by date range (all entries from February 2026 onward):

.. code-block:: python

    results = scout.get_documents(
        index='blog-entries',
        published='true',
        date__ge='2026-02-01')
    for doc in results['documents']:
        print(doc['metadata']['date'], doc['metadata']['title'])
    # 2026-02-03 Spider Adventures
    # 2026-03-20 More Spider News

Exclude drafts from results:

.. code-block:: python

    results = scout.get_documents(index='blog-entries', published='true')
    print(len(results['documents']))  # 3 (the draft is excluded)

Searching comments
^^^^^^^^^^^^^^^^^^

Find all non-spam comments on a particular entry:

.. code-block:: python

    results = scout.get_documents(
        index='blog-comments',
        entry_id='entry-2',
        spam='false')
    for doc in results['documents']:
        print(doc['metadata']['author'], ':', doc['content'])
    # bob : What did the spider look like?

Search comments across all entries:

.. code-block:: python

    results = scout.get_documents(q='spiders', index='blog-comments', spam='false')
    for doc in results['documents']:
        print(doc['metadata']['author'], 'on', doc['metadata']['entry_id'])
    # bob on entry-2

Updating and deleting
^^^^^^^^^^^^^^^^^^^^^

Because every document was created with an ``identifier``, we can update and
delete without ever tracking Scout's internal integer IDs.

Publish a draft by updating its metadata:

.. code-block:: python

    scout.update_document(
        document_id='entry-4',
        metadata={
            'title': 'Pest Control',
            'url': '/blog/pest-control/',
            'published': 'true',
            'date': '2026-04-10',
        })

Remove a spam comment:

.. code-block:: python

    scout.delete_document('comment-3')

Re-indexing content
^^^^^^^^^^^^^^^^^^^

Identifiers also give you upsert semantics: calling ``create_document`` with an
identifier that already exists will update the existing document rather than
creating a duplicate. This makes it safe to re-run your indexing script at any
time — for example after editing a blog post:

.. code-block:: python

    # This updates entry-2 in place because the identifier already exists.
    scout.create_document(
        'Today my cat ate a spider and later he was sick. I think it was '
        'one of those little, fast spiders.',
        'blog-entries',
        identifier='entry-2',
        title='Spider Adventures (updated)',
        url='/blog/spiders/',
        published='true',
        date='2026-02-03')

    # Verify there is still only one document with this identifier.
    doc = scout.get_document('entry-2')
    print(doc['content'][-20:])  # ...little, fast spiders.

This pattern is convenient for a full re-index. Iterate over every post in your
application database and call ``create_document`` with the same identifiers.
Scout handles the create-or-update logic for you.

Example 2: News Website
------------------------

A news website has several content types — articles, local events, and sports
scores — each in its own index. A **master** index that contains every document
allows site-wide search.

Setting up
^^^^^^^^^^

.. code-block:: python

    from scout.client import Scout

    scout = Scout('http://localhost:8000')

    scout.create_index('articles')
    scout.create_index('events')
    scout.create_index('sports')
    scout.create_index('master')  # Everything goes here too.

Indexing content
^^^^^^^^^^^^^^^^

A helper function keeps things DRY by always adding documents to the master
index alongside the category-specific index:

.. code-block:: python

    def index_content(content, category, **metadata):
        """Index a piece of content into its category index and the master index."""
        return scout.create_document(
            content,
            [category, 'master'],
            **metadata)

    # Articles
    index_content(
        'The city council voted Tuesday to approve the new downtown park '
        'proposal after months of public debate.',
        'articles',
        identifier='article-100',
        headline='City Council Approves Downtown Park',
        section='local',
        date='2024-06-11')

    index_content(
        'Global markets rallied on Friday after the central bank signaled a '
        'pause in rate hikes.  Tech stocks led the gains.',
        'articles',
        identifier='article-101',
        headline='Markets Rally on Rate Pause Signal',
        section='business',
        date='2024-06-14')

    index_content(
        'A new study shows that urban green spaces significantly improve '
        'mental health outcomes for nearby residents.',
        'articles',
        identifier='article-102',
        headline='Green Spaces Linked to Better Mental Health',
        section='science',
        date='2024-06-15')

    # Local events
    index_content(
        'The annual Summer Jazz Festival returns to Riverside Park on July 4th '
        'with headlining performances by several Grammy-winning artists.',
        'events',
        identifier='event-200',
        title='Summer Jazz Festival',
        venue='Riverside Park',
        date='2024-07-04')

    index_content(
        'Downtown Farmers Market every Saturday morning from 8am to noon.  '
        'Fresh produce, baked goods, and local crafts.',
        'events',
        identifier='event-201',
        title='Downtown Farmers Market',
        venue='Main Street Plaza',
        date='2024-06-01',
        recurring='true')

    # Sports scores
    index_content(
        'The Lions defeated the Bears 27-14 in a dominant home performance.  '
        'Quarterback Smith threw for 3 touchdowns.',
        'sports',
        identifier='game-300',
        home_team='Lions',
        away_team='Bears',
        score='27-14',
        date='2024-06-09')

    index_content(
        'Eagles and Hawks played to a 1-1 draw in a rain-soaked match.  '
        'Both goals came in the second half.',
        'sports',
        identifier='game-301',
        home_team='Eagles',
        away_team='Hawks',
        score='1-1',
        date='2024-06-10')

Searching within a category
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Search only articles:

.. code-block:: python

    results = scout.get_documents(q='park', index='articles')
    for doc in results['documents']:
        print(doc['metadata']['headline'])
    # City Council Approves Downtown Park

Filter articles by section:

.. code-block:: python

    results = scout.get_documents(index='articles', section='business')
    for doc in results['documents']:
        print(doc['metadata']['headline'])
    # Markets Rally on Rate Pause Signal

Search only events at a particular venue:

.. code-block:: python

    results = scout.get_documents(index='events', venue='Riverside Park')
    for doc in results['documents']:
        print(doc['metadata']['title'], '-', doc['metadata']['date'])
    # Summer Jazz Festival - 2024-07-04

Search sports results for a specific team:

.. code-block:: python

    results = scout.get_documents(index='sports', home_team='Lions')
    for doc in results['documents']:
        print(doc['metadata']['home_team'], 'vs', doc['metadata']['away_team'],
              doc['metadata']['score'])
    # Lions vs Bears 27-14

Site-wide search
^^^^^^^^^^^^^^^^

The master index lets you search across every content type at once:

.. code-block:: python

    results = scout.get_index('master', q='park')
    # OR: results = scout.get_documents(q='park', index='master')
    for doc in results['documents']:
        print(doc['indexes'], doc['content'][:60] + '...')
    # ['articles', 'master'] The city council voted Tuesday to approve the new do...
    # ['events', 'master']   The annual Summer Jazz Festival returns to Riverside ...

Using the documents endpoint with multiple indexes achieves the same thing
without a dedicated master index:

.. code-block:: python

    results = scout.get_documents(
        q='park',
        index=['articles', 'events', 'sports'])
    for doc in results['documents']:
        print(doc['content'][:60] + '...')

Date range queries work the same way across all indexes:

.. code-block:: python

    results = scout.get_index(
        'master',
        date__ge='2024-06-10',
        date__le='2024-06-15')
    for doc in results['documents']:
        print(doc['metadata']['date'], doc['content'][:50] + '...')

Equivalent example specifying indexes explicitly:

.. code-block:: python

    results = scout.get_documents(
        index=['articles', 'events', 'sports'],
        date__ge='2024-06-10',
        date__le='2024-06-15')
    for doc in results['documents']:
        print(doc['metadata']['date'], doc['content'][:50] + '...')


Working with attachments
^^^^^^^^^^^^^^^^^^^^^^^^^

Attach a PDF of the full print article:

.. code-block:: python

    scout.attach_files('article-100', {
        'downtown-park-full.pdf': open('downtown-park-full.pdf', 'rb'),
    })

Download the attachment to a local file:

.. code-block:: python

    raw_bytes = scout.download_attachment('article-100', 'downtown-park-full.pdf')
    with open('downloaded-article.pdf', 'wb') as fh:
        fh.write(raw_bytes)

Later, find all PDF attachments across the articles index:

.. code-block:: python

    pdfs = scout.search_attachments(index='articles', mimetype='application/pdf')
    for att in pdfs['attachments']:
        print(att['filename'], att['data_length'], 'bytes')
    # downtown-park-full.pdf 84521 bytes

Using SearchSite for automatic indexing
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If your application uses model classes, you can use :py:class:`~scout.client.SearchSite`
to automatically index and remove objects without manually calling
``create_document`` and ``delete_document``:

.. code-block:: python

    from scout.client import Scout, SearchProvider, SearchSite

    class ArticleProvider(SearchProvider):
        def content(self, article):
            return '%s %s' % (article.headline, article.body)

        def identifier(self, article):
            return 'article-%s' % article.id

        def metadata(self, article):
            return {
                'headline': article.headline,
                'section': article.section,
                'date': str(article.pub_date),
            }

    scout = Scout('http://localhost:8000')
    site = SearchSite(scout, 'articles')
    site.register(Article, ArticleProvider)

    # When a new article is created:
    site.store(article)

    # When an article is deleted:
    site.remove(article)

This pattern works well inside ORM hooks (such as Peewee ``post_save`` and
``post_delete`` signals or Django signals) to keep the search index in sync
with your database automatically.