Identify and count spells (Distinctive events within each group) The Next CEO of Stack...
How can the PCs determine if an item is a phylactery?
Creating a script with console commands
Find a path from s to t using as few red nodes as possible
Why do we say “un seul M” and not “une seule M” even though M is a “consonne”?
Another proof that dividing by 0 does not exist -- is it right?
pgfplots: How to draw a tangent graph below two others?
Is it OK to decorate a log book cover?
A hang glider, sudden unexpected lift to 25,000 feet altitude, what could do this?
Is it a bad idea to plug the other end of ESD strap to wall ground?
Arrows in tikz Markov chain diagram overlap
Incomplete cube
Does int main() need a declaration on C++?
Is a linearly independent set whose span is dense a Schauder basis?
Shortening a title without changing its meaning
What does this strange code stamp on my passport mean?
Avoiding the "not like other girls" trope?
Find the majority element, which appears more than half the time
Is it possible to create a QR code using text?
How seriously should I take size and weight limits of hand luggage?
Oldie but Goldie
Can you teleport closer to a creature you are Frightened of?
Can a PhD from a non-TU9 German university become a professor in a TU9 university?
Car headlights in a world without electricity
Is there a rule of thumb for determining the amount one should accept for a settlement offer?
Identify and count spells (Distinctive events within each group)
The Next CEO of Stack OverflowR - list to data frameCount number of rows within each groupCounting unique / distinct values by group in a data frameR: find relative weight within each group and within the entire dataframeR: how to calculate summary for each group and all the data?count the number of distinct variables in a groupusing tidyverse; counting after and before change in value, within groups, generating new variables for each unique shiftDistinct in r within groups of datahow to get count and distinct count with group by in dataframe RNest a dataframe by group, but include extra rows within each groupChange value by group based in reference within group
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell
is what I'm trying to compute. I've tried using dplyr
's lead
and lag
, but that gets too complicated. I've tried rle
but got nowhere.
ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse
solution.
Assumptions
Data is sorted by
group
and then bytime
There are no gaps in
time
within each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rle
approach by @markus took 0.53 seconds - the
cumsum
approach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rle
anddense_rank
by @tmfmnk took 0.89
r dataframe dplyr time-series tidyverse
add a comment |
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell
is what I'm trying to compute. I've tried using dplyr
's lead
and lag
, but that gets too complicated. I've tried rle
but got nowhere.
ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse
solution.
Assumptions
Data is sorted by
group
and then bytime
There are no gaps in
time
within each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rle
approach by @markus took 0.53 seconds - the
cumsum
approach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rle
anddense_rank
by @tmfmnk took 0.89
r dataframe dplyr time-series tidyverse
2
For someone who is not familiar with how thespell
is computed, can you share a formula or description?
– nsinghs
7 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
7 hours ago
add a comment |
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell
is what I'm trying to compute. I've tried using dplyr
's lead
and lag
, but that gets too complicated. I've tried rle
but got nowhere.
ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse
solution.
Assumptions
Data is sorted by
group
and then bytime
There are no gaps in
time
within each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rle
approach by @markus took 0.53 seconds - the
cumsum
approach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rle
anddense_rank
by @tmfmnk took 0.89
r dataframe dplyr time-series tidyverse
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell
is what I'm trying to compute. I've tried using dplyr
's lead
and lag
, but that gets too complicated. I've tried rle
but got nowhere.
ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse
solution.
Assumptions
Data is sorted by
group
and then bytime
There are no gaps in
time
within each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rle
approach by @markus took 0.53 seconds - the
cumsum
approach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rle
anddense_rank
by @tmfmnk took 0.89
r dataframe dplyr time-series tidyverse
r dataframe dplyr time-series tidyverse
edited 2 hours ago
Thomas Speidel
asked 7 hours ago
Thomas SpeidelThomas Speidel
359216
359216
2
For someone who is not familiar with how thespell
is computed, can you share a formula or description?
– nsinghs
7 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
7 hours ago
add a comment |
2
For someone who is not familiar with how thespell
is computed, can you share a formula or description?
– nsinghs
7 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
7 hours ago
2
2
For someone who is not familiar with how the
spell
is computed, can you share a formula or description?– nsinghs
7 hours ago
For someone who is not familiar with how the
spell
is computed, can you share a formula or description?– nsinghs
7 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
7 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
7 hours ago
add a comment |
6 Answers
6
active
oldest
votes
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
add a comment |
A somehow different possibility could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
add a comment |
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
add a comment |
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
add a comment |
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55463310%2fidentify-and-count-spells-distinctive-events-within-each-group%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
add a comment |
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
add a comment |
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
edited 5 hours ago
answered 7 hours ago
markusmarkus
15k11336
15k11336
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
add a comment |
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
1
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
1
1
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
answered 7 hours ago
MrFlickMrFlick
124k11141173
124k11141173
add a comment |
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
edited 7 hours ago
answered 7 hours ago
Hector HaffendenHector Haffenden
579216
579216
add a comment |
add a comment |
A somehow different possibility could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
add a comment |
A somehow different possibility could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
add a comment |
A somehow different possibility could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
A somehow different possibility could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
answered 6 hours ago
tmfmnktmfmnk
3,6211516
3,6211516
add a comment |
add a comment |
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
add a comment |
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
add a comment |
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
answered 5 hours ago
M-MM-M
7,17962146
7,17962146
add a comment |
add a comment |
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
add a comment |
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
add a comment |
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
edited 1 hour ago
answered 1 hour ago
akrunakrun
418k13207282
418k13207282
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55463310%2fidentify-and-count-spells-distinctive-events-within-each-group%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
For someone who is not familiar with how the
spell
is computed, can you share a formula or description?– nsinghs
7 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
7 hours ago