- df_sort = df.sort_values(['user_id', 'event_time'])
- diff_timestamp = df_sort.groupby('user_id')['event_time'].diff().reset_index()
- for i in range(len(diff_timestamp)):
- diff_timestamp.loc[i,'second'] = (diff_timestamp.loc[i,'event_time'].seconds)
- new_session = (diff_timestamp['second'].isnull()) | (diff_timestamp['second'] > 600)
- df_sort['session_id'] = df_sort.loc[new_session, ['user_id', 'event_time']].groupby('user_id').rank(method='first').astype(int)
- df_sort['session_id'] = df_sort['session_id'].fillna(method='ffill') #.astype(int)
Rang
From Edgy Parakeet, 4 Months ago, written in Plain Text, viewed 70 times.
URL http://codebin.org/view/ede0e972
Embed
Download Paste or View Raw
— Expand Paste to full width of browser