A partial archive of meta.discourse.org as of Tuesday July 18, 2017.

What cool data explorer queries have you come up with?

mcwumbly

Continuing the discussion from Data Explorer Plugin:

Here's a couple I'm starting with:

Users Last Seen Since (Since N Weeks Ago)

with intervals as (
    select
        n as start_time, 
        CURRENT_TIMESTAMP as end_time
    from generate_series(CURRENT_TIMESTAMP - INTERVAL '140 days',
                         CURRENT_TIMESTAMP - INTERVAL '7 days',
                         INTERVAL '7 days') n
)
select 
  COUNT(1) as users_seen_since,
  CURRENT_TIMESTAMP - i.start_time as time_ago
FROM USERS u
right join intervals i
on u.last_seen_at >= i.start_time and u.last_seen_at < i.end_time
group by i.start_time, i.end_time
order by i.start_time desc

Users Last Seen Since (Since N Days Ago)

with intervals as (
    select
        n as start_time, 
        CURRENT_TIMESTAMP as end_time
    from generate_series(CURRENT_TIMESTAMP - INTERVAL '30 days',
                         CURRENT_TIMESTAMP - INTERVAL '1 day',
                         INTERVAL '1 day') n
)
select 
  COUNT(1) as users_seen_since,
  CURRENT_TIMESTAMP - i.start_time as time_ago
FROM USERS u
right join intervals i
on u.last_seen_at >= i.start_time and u.last_seen_at < i.end_time
group by i.start_time, i.end_time
order by i.start_time desc
Mittineague

Most of mine are "OK, but could be better"

For example, this one works, but I don't like the duplicate sub-query much

Banner Stats

WITH all_users AS ( SELECT
  COUNT(users.id) AS user_count
  FROM users
  WHERE users.active 
  AND users.suspended_at IS NULL
  AND users.locale IS NOT NULL
)
, read_banner_topic AS ( SELECT 
  COUNT(topic_views.user_id) AS read_count
  FROM topic_views  
  WHERE topic_views.topic_id IN (
    SELECT topics.id 
    FROM topics 
    WHERE topics.archetype IS NOT NULL
    AND topics.archetype LIKE 'banner'
  )
  AND topic_views.user_id IS NOT NULL
)
, dismissed_banner AS ( SELECT 
  COUNT(user_profiles.user_id) AS dismissed_count
  FROM user_profiles
  WHERE user_profiles.dismissed_banner_key IS NOT NULL 
  AND user_profiles.dismissed_banner_key IN (
    SELECT topics.id 
    FROM topics 
    WHERE topics.archetype IS NOT NULL
    AND topics.archetype LIKE 'banner'
  )
  AND user_profiles.user_id IS NOT NULL
)
SELECT all_users.user_count
  , read_banner_topic.read_count
  , dismissed_banner.dismissed_count 
FROM all_users, read_banner_topic, dismissed_banner

* note, read banner topic is not the same as saw banner

mcwumbly

Posts Read (Daily)

Total number of new posts read by all users per day

SELECT visited_at as day,
count(1) as users,
sum(posts_read) as posts_read,
sum(posts_read) / count(1) as avg_posts_read_per_user
FROM user_visits 
group by visited_at
order by visited_at desc

Posts Read Percentiles

Number of posts read for users in each percentile

with tentiles as (
    select posts_read_count as read, 
    ntile(10) 
    over (order by posts_read_count) as tentile 
    from user_stats
)
select (tentile - 1) * 10 as percentile,
    min(read) as min_posts_read, 
    max(read) as max_posts_read
from tentiles
group by tentile
order by tentile desc

Active Readers (Past Month)

Users with the most visits that include reading activity

select user_id, 
    count(1) as visits,
    sum(posts_read) as posts_read
from user_visits
where posts_read > 0
and visited_at > CURRENT_TIMESTAMP - INTERVAL '30 days'
group by user_id
order by visits desc, posts_read desc

Active Readers (Since N Days Ago)

Number of users who have read at least 1 post since N days ago

with intervals as (
    select
        n as start_time, 
        CURRENT_TIMESTAMP as end_time
    from generate_series(CURRENT_TIMESTAMP - INTERVAL '30 days',
                         CURRENT_TIMESTAMP - INTERVAL '1 day',
                         INTERVAL '1 days') n
),
latest_visits as (
    select 
        user_id, 
        max(visited_at) as visited_at
    from user_visits
    where posts_read > 0
    group by user_id
)
select 
  COUNT(1) as active_readers_since,
  CURRENT_TIMESTAMP - i.start_time as time_ago
FROM users u
left join latest_visits v
on u.id = v.user_id
right join intervals i
on v.visited_at >= i.start_time and v.visited_at < i.end_time
group by i.start_time, i.end_time
order by i.start_time desc
DeanMarkTaylor

Count new TL1, TL2, TL3 users past 12 months

Number of members added to Trust Level (TL) 1, 2 and 3 month-by-month over the past year.

Answering the question how many users actually sticking around and interacting with the community and progressing through the trust levels.

Topic Participation

Number (by month) of topics started, posts to new topics, posts to old topics, topics with a response, topics without a response, percentage response rate.

Optionally exclude groups of users and / or impersonate a single users view of topics.

fefrei

Likes from the team

This query assumes there is a group called team, and gives you the likes other users have received, split into likes from the team and likes from others:

SELECT
    pl.user_id,
    SUM(pl.team_likes) as team_likes,
    SUM(pl.student_likes) as student_likes,
    SUM(pl.team_likes + pl.student_likes) as likes
FROM (
    SELECT -- count likes per post
        p.id as post_id_workaround,
        p.user_id as user_id,
        (
            SELECT count(*)
            FROM post_actions pa
            WHERE
                pa.post_id = p.id
                AND post_action_type_id = (
                            SELECT id FROM post_action_types WHERE name_key = 'like'
                )
                AND pa.user_id IN (
                    SELECT gu.user_id
                    FROM group_users gu
                    WHERE gu.group_id = ( SELECT id FROM groups WHERE name ilike 'team' ) 
                )
        ) as team_likes,
        (
            SELECT count(*)
            FROM post_actions pa
            WHERE
                pa.post_id = p.id
                AND post_action_type_id = (
                            SELECT id FROM post_action_types WHERE name_key = 'like'
                )
                AND pa.user_id NOT IN (
                    SELECT gu.user_id
                    FROM group_users gu
                    WHERE gu.group_id = ( SELECT id FROM groups WHERE name ilike 'team' ) 
                )
        ) as student_likes
    FROM badge_posts p
) AS pl
WHERE pl.user_id NOT IN (
    SELECT gu.user_id
    FROM group_users gu
    WHERE gu.group_id = ( SELECT id FROM groups WHERE name ilike 'team' ) 
)
GROUP BY pl.user_id
ORDER BY likes DESC
mcwumbly

Recently Read Topics by User

Show the topics with that have been opened by a given user in the past N days, sorted by the amount of time the user has spent in that topic. (requested on feverbee)

-- [params]
-- integer :user = 1
-- integer :since_days_ago = 7

with topic_timing as (
  select user_id, topic_id, sum(msecs) / 1000 as seconds
  from post_timings
  where user_id = :user
  group by user_id, topic_id
)
SELECT tv.topic_id,
    tv.user_id,
    tv.viewed_at,
    tt.seconds
from topic_views tv
left join topic_timing tt
on tv.topic_id = tt.topic_id
and tv.user_id = tt.user_id
where tv.user_id = :user
and viewed_at + :since_days_ago > CURRENT_TIMESTAMP
order by seconds desc
erlend_sh

6 posts were split to a new topic: "Recently Read Topics by User" query returns Parameter declaration error

Mittineague

While working on this

I was looking for a way to find topics that were created as a result of a moderator splitting posts into a new topic. (i.e. off-topic posts that deserved their own topic)

It wasn't as easy as I had hoped. I looked at various "topic" and "post" tables with no success.

Being sure I could find a lead to what I needed in Admin -> Logs -> Staff Actions I was disappointed to find that "split" data was not there (not that it should, hopefully it doesn't happen all that often)

The only place I could find the id of a topic created this way is in "small-action" posts as part of a link.

Getting the id portion of the string to an integer that can be used is a bit involved.

regexp_matches returns a string array
array_to_string converts the array to a string
CAST converts the string to an integer

WITH new_topics AS (
 SELECT 
 CAST( array_to_string(regexp_matches(posts.raw, '([\d]+)(?:\))$', 'g'), '') AS integer) AS new_topic_id
 FROM posts 
 WHERE action_code LIKE 'split_topic'
 AND raw LIKE '%posts were split to a new topic%'
)
SELECT topics.title 
FROM topics, new_topics
WHERE topics.id = new_topics.new_topic_id
Mittineague

Member Uploads

I put together this query to help find members that post a lot of uploads that potentially might lead to a problem.

Ordered by total upload weight per member

WITH heavy_uploads AS ( SELECT 
    ( SUM(uploads.filesize) / 1024) AS sum_kb
  , COUNT(uploads.filesize) AS upload_count
  , ( (SUM(uploads.filesize) / COUNT(uploads.filesize)) / 1024) AS avg_weight_kb
  , uploads.user_id 
  FROM uploads
  GROUP BY uploads.user_id
)
SELECT heavy_uploads.sum_kb
 , heavy_uploads.upload_count
 , heavy_uploads.avg_weight_kb
 , heavy_uploads.user_id
FROM heavy_uploads 
WHERE heavy_uploads.sum_kb > 100
ORDER BY heavy_uploads.sum_kb DESC

(the three Grimm members uploaded animated woge gifs)

alefattorini

Users active in last 30 days? Active = at least a post in the last 30 days.

riking

select username
from users
where last_posted_at > current_timestamp - interval '30' day
Tom_Newsom

Or just /users?period=monthly surely?

EDIT: No that also counts people who just read. sorry!

HAWK

How would you adjust this with parameters so that previous months can also be checked? Eg: read between 62 and 31 days ago (or similar)?

Mittineague

I haven't figured out a "between" way of doing this, but using "params" can provide a selectable "back to"

-- [params]
-- int :interval = 30

select username
from users
where last_posted_at > current_timestamp - interval ':interval' day
mcwumbly

should be able to do something like this (starting where @Mittineague left off):

-- [params]
-- int :until_days_ago = 30
-- int :since_days_ago = 60

select username
from users
where last_posted_at > current_timestamp - interval ':since_days_ago' day 
and last_posted_at < current_timestamp - interval ':until_days_ago' day
Mittineague

Thanks, the call of Labor Day burgers was distracting me and all I could think of was finding a way to put current_timestamp into the params and getting to the burgers. At times my stomach overpowers my brain.

HAWK

Ok, so what about 'posted during that period' – not last posted during that period.
i.e. a count of active (made at least one post) members during that period?

HAWK

Got what I need (thanks @meglio) so updating this for future posterity.

-- [params]
-- date :date_from
-- date :date_to
-- int  :min_posts = 1

WITH user_activity AS (
    SELECT p.user_id, count (p.id) as posts_count
    FROM posts p
    LEFT JOIN topics t ON t.id = p.topic_id
    WHERE p.created_at::date BETWEEN :date_from::date AND :date_to::date
        AND t.deleted_at IS NULL
        AND t.visible = TRUE
        AND t.closed = FALSE
        AND t.archived = FALSE
        AND t.archetype = 'regular'
        AND p.deleted_at IS NULL
        
    GROUP BY p.user_id
)
SELECT COUNT(user_id)
FROM user_activity
WHERE posts_count >= :min_posts
fefrei

List of all Members of a Group with Custom Field

We use this query to get a list of the name and matriculation number of all team members:

SELECT users.name, user_custom_fields.value as matriculation
FROM users
JOIN group_users ON users.id = group_users.user_id
JOIN groups ON groups.id = group_users.group_id
JOIN user_custom_fields ON users.id = user_custom_fields.user_id
WHERE groups.name = 'Team'
AND user_custom_fields.name = 'user_field_2'

Simply insert the group name and the custom field id in the last two names.

jomaxro

Anyone able to provide the query for all users with a custom title (not a badge granted title)?