A partial archive of meta.discourse.org as of Tuesday July 18, 2017.

Effectively logging search queries

sam

For 1.9 we plan to add a search log.

Unfortunately, the "log every search the server makes" approach is incorrect as we perform searches as people are typing which will result in a massively noisy log.

Proposal

  • Create a new table

term, user_id (nullable), ip_address, created_at, clicked_topic_id (nullable), source_id (either header or fullpage)

  • Log on server with the following algorithm on search

UPDATE term
SET term = :new_term
        created_at: :now
WHERE created_at < 5.seconds.ago AND
             position(term in :new_term) = 0 AND
             (user_id = :user_id OR ip_address = :ip_address)

term: new_term,
now: Time.zone.no,
ip_address: request.ip 

If update touches zero rows, then insert a **new** search log row

Or, in English

  • Update existing search log row IF:
    • Same user (for anon use ip address, for logged in use user_id)
    • Previous search started with the text of current search, eg: previous was "dog" and new is "dog in white"
    • Previous search was logged less than 5 seconds ago
  • On click on search result (in either full page search or header) update the clicked_topic_id, (have search results return log id, then update it based on log id + user match + in last 10 minutes)

Limiting log size

So the log does not grow forever there should be a site setting for maximum rows to store. Default should be about a million.

A weekly job can delete oldest rows or something.

Thoughts? Feedback?

elijah

So if I search for something and open three results in three tabs, what happens? Last one "wins"?
:arrow_right: The behavior of "multiple clicks for one search, save last only" is better for times when someone searches, clicks, hits back, clicks again, hits back, clicks a final time. New tabs subvert that though.

If I search for a topic, right click to copy URL, and then paste that in as a "you should look here" answer, is that counted as a "click"?
:arrow_right: Copying URLs from posts does not increase click count, for reference. But this is a case when the desired search result really should be saved.

codinghorror

Hopefully we can make a little progress on this, maybe this week @eviltrout?

eviltrout

I’ve implemented this mostly as specified in the original post:

I also went in and replaced all the mock tests with integration tests and added a separate commit for the clean up job.

eviltrout

Actually I forgot about the click tracking :slight_smile:

Sorry it’s been a difficult couple of days for me, I’ll get it in on Monday.

eviltrout

This commit adds click tracking support: