Hello,
is there a way to calculate the days between two dates when the dates are in different columns and different lines? I am working with the ARIS MashZone 2.0.0.
An example:
DATA: Process#,Activity#,Start-Date,End-Date
P1,A1,Start-Date,End-Date P1,A2,Start-Date,End-Date P1,A3,Start-Date,End-Date P2,A1,...
I want to calculate the duration of a process. Therefore I have to subtract the P1,A1,Start-Date from the P1,A3, End-Date. I´ve found a previous post with a great way to calculate the difference between these two dates, but since the dates in my example are in different lines and columns that way doesn´t work for me.
Does somebody have an idea?
Thanks in advance!
Hi,
to calculate durations between two dates you need convert the two column's data type from date to number. This operation results in rather big numbers, i.e. milliseconds since reference point (1970...).
Based on the numbers you may calculate the duration. Afterwards you need some divisions on top to convert milliseconds into days or hours, respectively.
Regards
Stephan
Hello,
thank you very much for you quick and detailed response, Stephan! The biggest problem isn´t the calculation of the duration of an activity (one line in the log) but to calculate the duration of a process (e.g. all lines for P1). To do that I have to somehow perform the following steps:
- Find the activity 1 of a new process (P*,A1,...)
- Find the last activity of that process (that can be P*,A3 or A4 or so on)
- Subtract the Start-Date of A1 from the end-date of the last activity of the P* process.
The data is structured like this:
P1,A1,Start-Date,End-Date P1,A2,Start-Date,End-Date P1,A3,Start-Date,End-Date P2,A1,Start-Date,End-Date P2,A2,Start-Date,End-Date P2,A3,Start-Date,End-Date P3,A1,Start-Date,End-Date P3,A2,Start-Date,End-Date P4,A1,Start-Date,End-Date
Do you think there is way to perform those tasks or is MashZone the wrong tool to work with in this scenario?
Thanks in advance!
Actually, to do such operations ARIS Process Performance Manager is the tool of choice. It scales nicely and calculates durations and much more even with billions of process instances. And it integrates with MashZone. So as soon as your data volume grows, you may reconsider...
For the mean time: Why do you want to find "the last activity"? Each row has start and end already. So you may calculate its duration on an individual process step basis.
To find out an entire process' duration you may utilize the aggregate operator. It has been described already. Adapted to your problem it sums up all durations, grouped by process identifier.
So setting its aggregation column attribute to duration and its dimension attribute to process identifier results in two columns or something like this:
P1,Duration (sum of all P1 steps) P2,Duration (sum of all P2 steps) P3,Duration (sum of all P3 steps) P4,Duration (sum of all P4 steps)
Thank you very much, Stephan! I want to calculate the days between the start of a process (e.g. customer order) and the end (e.g. sent invoice). I cannot use the aggregate operator, because there are some activities that take place within one day. If I calculate the duration of each activity and sum that up, grouped by the process-id, I would not get the actual process duration in days.
A colleague of mine is working with the ARIS Process Performance Manager at the moment. Unfortunately there is much more prior knowledge needed to work with PPM thant to work with MashZone.
I will now try to get the input data in a different format. The calculation with dates as described before seems to be no problem. Thank you very much for your effort!
Kind regards,
Andre
You can calculate brutto process time from start and end time of activities recorded in different rows using MashZone as well. Keep in mind that in following example there are multiple rows for process start and process end for a single process instance so one identifies real process start by the earliest date of record that can represent process start, and the latest for the process end. Here's the procedure:
- use two data source objects (connected to the same source file), one for getting start time and other for getting end time
- filter rows that can represent process start from one Data source object and those that represent process end from the other Data source object
- use Latest/eraliest date object to aggregate (to identify earliest record for real process start and latest for real process end)
- combine two data feeds by including identical key values of both sources, this will ensure you only get those process instances that contain information on both process start and process end date
- rest is simple calculation of process duration already described in other posts (convert date to number, substract start from end time, divide by 1000 for seconds, then 60 for minutes etc).
And here's the data feed: