TimeSleuth is an open source software tool for generating temporal rules from sequential data. It implemets the TIMERS algorithm, and can
judge the causality or acausality of the rules it generates. The package includes source code in Java.
The paper Generation and Interpretation of Temporal
Decision Rules provides a rather complete overview of how the system works.
More information on TIMERS and TimeSleuth can be found in papers here.
The complete source code for C4.5 can be found from Ross Quinlan's homepage.
To download TimeSleuth please click
here and select the download tab. Alternatively, navigate to this download page.
Kamran Karimi can be contacted at
kamkar@users.sourceforge.net
TimeSleuth is a tool for learning rules from observed data. It can
also be used to test the causality/acausality of the rules. TimeSleuth is based on
C4.5, and expands its capabilities in important ways, including analysis
abilities that allows the user to investigate the relations among the
attributes. This makes TimeSleuth a data mining tool (relationships that are meant
to be interpreted by experts) as well as a machine learning tool (exact
rules to be used automatically). TimeSleuth can output Prolog statements, thus
making its output automatically executable.
TimeSleuth is part of a data mining and machine learning project in the Department of Computer Science at the University of Regina. It is in Java. TimeSleuth can be compiled and executed on any platform with Java 1.5 or higher. You need C4.5 in order to run TimeSleuth. TimeSleuth requires certain changes to be made to C4.5 for the new functionalities to be available. A patch file is provided for this purpose. The patch, when applied to C4.5 Release 8, allows C4.5 to respect temporal order. However, TimeSleuth can be used with standard (non-patched) C4.5. In this case TimeSleuth can still be used to discover temporal rules. Options such as "Add Time to Names" are meant to help the user in such situations. It can also simply be used as a graphical user interface for C4.5.
In TimeSleuth time can flow either forward or backward, or both, allowing the user to investigate the nature of relationships among attributes. For more information of discovering causality with TimeSleuth see the related help topic.
C4.5 is a supervised machine learning tool for discovering rules from examples. The idea is to observe a series of variables' values. Then the user decides that one of the variables (called the decision variable, or attribute) takes on different values based on other variables (called the condition attributes). The values of all the variables are recorded, and then C4.5 tries to find rules for predicting the value of the decision attribute using the value of the condition attribute.
C4.5 is available for free. You can download the C source files for C4.5 Release 8 from http://www.cse.unsw.edu.au/~quinlan.
Example 1
Here is an example: Suppose we have two decision attributes x and y, and
the condition attribute z. The following cases have been recorded:
x |
y |
z |
0 |
0 |
0 |
1 |
0 |
0 |
1 |
1 |
1 |
0 |
0 |
0 |
For C4.5, the last variable is the decision attribute. That is why z appears last. One rule in this example is that if{(x = 1) AND (y = 1)} then z = 1
C4.5 works by first creating a decision tree, and then extracting rules from that decision tree. C4.5 thus has two main components: a program called c4.5, that generates decision trees, and a program called c4.5rules, which extracts the rules. For any given set of input files, c4.5 should be run first, and then the user can run c4.5rules.
C4.5 has to be modified for TimeSleuth to fully function. TimeSleuth does the following:
Example 2
Continuing with the previous example, suppose we suspect that z has a
temporal dependency, i.e., its value may depend on previous values of
x, y and z. Guessing that this temporal relationship exists over two time
steps (two observations of the variables) we can flatten the records
in the previous example by merging every two records in the previous example
to form a new record.
x (t1) |
y (t1) |
z (t1) |
x (t2) |
y (t2) |
z (t2) |
0 |
0 |
0 |
1 |
0 |
0 |
1 |
0 |
0 |
1 |
1 |
1 |
1 |
1 |
1 |
0 |
1 |
0 |
Notice that now we are looking for different kinds of rules. C4.5 now looks for rules explaining the value of z at time step 2, which may be different from the rules for z in the previous example. For example, if the case <1 1 1> was observed in the first row of example one, then the flattened records of example 2 would not contain any rules for z = 1 at time step 2.
In Example 2, we used a Time Window of 2. This means that we assumed the temporal relations would be contained within two consecutive observations of the variables. TimeSleuth allows the user to test for the existence of temporal relations among a set of variables by flattening the input and invoking C4.5. The user can try different Time Windows for this purpose. C4.5's output rules then will respect any temporal order among the variables, as the variables will appear according to the order in which they appeared in the input.
It is not just the rules that respect the temporal order. C4.5 generates decision trees, from which rules are extracted. TimeSleuth modifications force C4.5 to make sure that the variables in the tree also appear according to their temporal order. Reordering the variables in a rule does not change them, so C4.5's rules are not affected by the choice of the Time Window. The tree generation algorithm, however, is heavily affected by the value of the Time Window. Any value above 1 may cause the tree to become smaller, and the error rate of the tree may increase. This is because after a variable from a certain time step has been chosen to expand the tree at a certain branch, no variable from any previous time step can be selected to continue the expansion in the branch. This affects the tree because there are now fewer attributes available. For this reason, TimeSleuth offers the user two different Time Windows. One for c4.5 (tree generation) and the other for c4.5rules (rule generation). The user can choose a Time Window of 1 for tree generation, and a higher value for rule generation.
The TimeSleuth application uses c4.5rule's Time Window when flattening the data. If you don't want to change c4.5's tree generation algorithm, leave its Time Window at 1 and only change c4.5rule's Time Window. Otherwise, set both of them to the same value.
TimeSleuth goes one step further by allowing the user to try every variable as the decision attribute. In this case TimeSleuth invokes C4.5 multiple times, each time with a different variable as the decision attribute, and reports the results to the user.
TimeSleuth can extract the possible values of an attribute from the data file, freeing the user from the need to specify them manually. TimeSleuth is thus an unsupervised learning tool.