Cookies Psst! Do you accept cookies?

We use cookies to enhance and personalise your experience.
Please accept our cookies. Checkout our Cookie Policy for more information.

Understanding Search Functionality

Questions before we create the function

  • Why does this function make the product better?
  • How can we measure the effect of the function quantitatively?
  • How long does it take to create the function
  • Does this function make the product too complicated?
  • How dangerous is the function?
  • How innovative is the function?
  • Is it aligned with what users want?

Hypothesis that we need to check

  • Do users use the search function a lot? Among the numerous functions used by users, how commonly is the search function?
  • How often do users use the search function in one session? (It might be a problem if they use it too often or too rarely)
  • How much do results clicks occur after searching?
  • where is the results that is clicked after a search?
  • The more users search, the more click they do?
  • What is the retention rate of the search function users?

Data Analysis

1. Percentage of search function usage

SELECT DATE_TRUNC('week', z.session_start) AS week,
       COUNT(*) AS sessions, 
       COUNT(CASE WHEN z.autocompletes > 0 THEN z.session ELSE NULL END) AS with_autocompletes,
       COUNT(CASE WHEN z.runs > 0 THEN z.session ELSE NULL END) AS with_runs
FROM (
SELECT x.session_start,
       x.session,
       x.user_id, 
       COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
       COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
       COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
       session.session,
       session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id, session, MIN(occurred_at) AS session_start, MAX(occurred_at) AS session_end
FROM(

SELECT bounds.*,
       CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
       WHEN last_event IS NULL THEN id 
       ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id, event_type, event_name, occurred_at, 
       occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
       LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
       ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id, occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR
      next_event >= INTERVAL '10 MINUTE' OR
      last_event IS NULL OR 
      next_event IS NULL) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id AND
   e.occurred_at >= session.session_start AND
   e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement') x
GROUP BY x.session_start, x.session, x.user_id) z
GROUP BY week

Image description

Image description

2. The number of sessions with autocompletes

SELECT autocompletes, COUNT(*) AS sessions
FROM (
SELECT x.session_start,
       x.session,
       x.user_id, 
       COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
       COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
       COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
       session.session,
       session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id, session, MIN(occurred_at) AS session_start, MAX(occurred_at) AS session_end
FROM(

SELECT bounds.*,
       CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
       WHEN last_event IS NULL THEN id 
       ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id, event_type, event_name, occurred_at, 
       occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
       LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
       ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id, occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR
      next_event >= INTERVAL '10 MINUTE' OR
      last_event IS NULL OR 
      next_event IS NULL) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id AND
   e.occurred_at >= session.session_start AND
   e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement') x
GROUP BY x.session_start, x.session, x.user_id) z
GROUP BY autocompletes

Image description

3. The number of sessions with full search

SELECT runs, COUNT(*) AS sessions
FROM (
SELECT x.session_start,
       x.session,
       x.user_id, 
       COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
       COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
       COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
       session.session,
       session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id, session, MIN(occurred_at) AS session_start, MAX(occurred_at) AS session_end
FROM(

SELECT bounds.*,
       CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
       WHEN last_event IS NULL THEN id 
       ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id, event_type, event_name, occurred_at, 
       occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
       LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
       ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id, occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR
      next_event >= INTERVAL '10 MINUTE' OR
      last_event IS NULL OR 
      next_event IS NULL) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id AND
   e.occurred_at >= session.session_start AND
   e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement') x
GROUP BY x.session_start, x.session, x.user_id) z
GROUP BY runs

Image description

4. Distribution of the clickthrough after full search

SELECT clicks, COUNT(*)
FROM (
SELECT x.session_start,
       x.session,
       x.user_id, 
       COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
       COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
       COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
       session.session,
       session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id, session, MIN(occurred_at) AS session_start, MAX(occurred_at) AS session_end
FROM(

SELECT bounds.*,
       CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
       WHEN last_event IS NULL THEN id 
       ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id, event_type, event_name, occurred_at, 
       occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
       LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
       ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id, occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR
      next_event >= INTERVAL '10 MINUTE' OR
      last_event IS NULL OR 
      next_event IS NULL) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id AND
   e.occurred_at >= session.session_start AND
   e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement') x
GROUP BY x.session_start, x.session, x.user_id) z
WHERE runs > 0 
GROUP BY clicks

Image description

  • After full search, zero clickthrough is more than half.
SELECT runs, AVG(clicks) ::FLOAT AS average_clicks
FROM (
SELECT x.session_start,
       x.session,
       x.user_id, 
       COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
       COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
       COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
       session.session,
       session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id, session, MIN(occurred_at) AS session_start, MAX(occurred_at) AS session_end
FROM(

SELECT bounds.*,
       CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
       WHEN last_event IS NULL THEN id 
       ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id, event_type, event_name, occurred_at, 
       occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
       LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
       ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id, occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR
      next_event >= INTERVAL '10 MINUTE' OR
      last_event IS NULL OR 
      next_event IS NULL) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id AND
   e.occurred_at >= session.session_start AND
   e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement') x
GROUP BY x.session_start, x.session, x.user_id) z
WHERE runs > 0 
GROUP BY runs

Image description

  • The more users search, the more they click
  • Above 13 runs, the amount of sessions decreased -> not accurate

5. The order of clicked contents

SELECT event_name, COUNT(user_id)
FROM tutorial.yammer_events
WHERE event_name LIKE 'search_click_result_%'
GROUP BY event_name
ORDER BY event_name

Image description

  • The order of contents that users click is distributed.
  • In chart 4, users who click only one result is 2.45%. It implies that it's not the matter of ranking, but the matter of results itself.

Last Stories

What's your thoughts?

Please Register or Login to your account to be able to submit your comment.