For 1.9 we plan to add a search log.
Unfortunately, the "log every search the server makes" approach is incorrect as we perform searches as people are typing which will result in a massively noisy log.
Proposal
- Create a new table
term, user_id (nullable), ip_address, created_at, clicked_topic_id (nullable), source_id (either header or fullpage)
- Log on server with the following algorithm on search
UPDATE term
SET term = :new_term
created_at: :now
WHERE created_at < 5.seconds.ago AND
position(term in :new_term) = 0 AND
(user_id = :user_id OR ip_address = :ip_address)
term: new_term,
now: Time.zone.no,
ip_address: request.ip
If update touches zero rows, then insert a **new** search log row
Or, in English
- Update existing search log row IF:
- Same user (for anon use ip address, for logged in use user_id)
- Previous search started with the text of current search, eg: previous was "dog" and new is "dog in white"
- Previous search was logged less than 5 seconds ago
On click on search result (in either full page search or header) update the
clicked_topic_id
, (have search results return log id, then update it based on log id + user match + in last 10 minutes)
Limiting log size
So the log does not grow forever there should be a site setting for maximum rows to store. Default should be about a million.
A weekly job can delete oldest rows or something.
Thoughts? Feedback?