A Brief Introduction to TimeSleuth:

TimeSleuth is a tool that uses C4.5 and generates temporal rules from sequential data. TimeSleuth can judge the causality/acausality of a set of rules. Includes source code in Java.

You can find papers about TimeSleuth here.

The package also contains a patch file for modifying C4.5 Release 8's source code. Both c4.5.exe (tree generator) and c4.5rules.exe (rule generator) are modified. TimeSleuth needs both programs. The patch file also enables C4.5 to output rules in Prolog. Try c4.5rules.exe with the new "-p 0" or "-p 1" options.

The complete source code for C4.5 can be found from Ross Quinlan's homepage.

To download TimeSleuth please click here and select the download tab. The TimeSleuth package contains indformation about the system, as well as examples and the complete source code.

Kamran Karimi can be contacted at kamkar@users.sourceforge.net


TimeSleuth in More Detail:


TimeSleuth is a tool for learning rules from observed data. It can also be used to test the causality/acausality of the rules. TimeSleuth is based on C4.5, and expands its capabilities in important ways, including analysis abilities that allows the user to investigate the relations among the attributes. This makes TimeSleuth a data mining tool (relationships that are meant to be interpreted by experts) as well as a machine learning tool (exact rules to be used automatically). TimeSleuth can output Prolog statements, thus making its output automatically executable.

TimeSleuth is part of a data mining and machine learning project in the Department of Computer Science at the University of Regina. It is in Java. TimeSleuth can be compiled and executed on any platform with Java 1.5 or higher. You need C4.5 in order to run TimeSleuth. TimeSleuth requires certain changes to be made to C4.5 for the new functionalities to be available. A patch file is provided for this purpose. The patch, when applied to C4.5 Release 8, allows C4.5 to respect temporal order. However, TimeSleuth can be used with standard (non-patched) C4.5. In this case TimeSleuth can still be used to discover temporal rules. Options such as "Add Time to Names" are meant to help the user in such situations. It can also simply be used as a graphical user interface for C4.5.

In TimeSleuth time can flow either forward or backward, or both, allowing the user to investigate the nature of the relationships among the attributes. For more information of discovering causality with TimeSleuth see the related help topic.

C4.5 is a supervised machine learning tool for discovering rules from examples. The idea is to observe a series of variables' values. Then the user decides that one of the variables (called the decision variable, or attribute) takes on different values based on other variables (called the condition attributes). The values of all the variables are recorded, and then C4.5 tries to find rules for predicting the value of the decision attribute using the value of the condition attribute.

C4.5 is available for free. You can download the C source files for C4.5 Release 8 from http://www.cse.unsw.edu.au/~quinlan.

Example 1
Here is an example: Suppose we have two decision attributes x and y, and the condition attribute z. The following cases have been recorded:

x

y

z

0

0

0

1

0

0

1

1

1

0

0

0

For C4.5, the last variable is the decision attribute. That is why z appears last. One rule in this example is that if{(x = 1) AND (y = 1)} then z = 1

C4.5 works by first creating a decision tree, and then extracting rules from that decision tree. C4.5 thus has two main components: a program called c4.5, that generates decision trees, and a program called c4.5rules, which extracts the rules. For any given set of input files, c4.5 should be run first, and then the user can run c4.5rules.

C4.5 has to be modified for TimeSleuth to fully function. TimeSleuth does the following:

  • Allowing C4.5 to understand the progression of time among the attributes.
  • This is done by dividing the attributes into time steps The attributes in each time step are supposed to have occurred in different times. The assumption is that the variables are the same in different time steps. An example clarifies this:

    Example 2
    Continuing with the previous example, suppose we suspect that z has a temporal dependency, i.e., its value may depend on previous values of x, y and z. Guessing that this temporal relationship exists over two time steps (two observations of the variables) we can flatten the records in the previous example by merging every two records in the previous example to form a new record.

    x (t1)

    y (t1)

    z (t1)

    x (t2)

    y (t2)

    z (t2)

    0

    0

    0

    1

    0

    0

    1

    0

    0

    1

    1

    1

    1

    1

    1

    0

    1

    0

    Notice that now we are looking for different kinds of rules. C4.5 now looks f or rules explaining the value of z at time step 2, which may be different from the rules for z in the previous example. For example, if the case <1 1 1> was observed in the first row of example one, then the flattened records of example 2 would not contain any rules for z = 1 at time step 2.


    In Example 2, we used a Time Window of 2. This means that we assumed the temporal relations would be contained within two consecutive observations of the variables. TimeSleuth allows the user to test for the existence of temporal relations among a set of variables by flattening the input and invoking C4.5. The user can try different Time Windows for this purpose. C4.5's output rules then will respect any temporal order among the variables, as the variables will appear according to the order in which they appeared in the input.


    It is not just the rules that respect the temporal order. C4.5 generates decision trees, from which rules are extracted. TimeSleuth modifications force C4.5 to make sure that the variables in the tree also appear according to their temporal order. Reordering the variables in a rule does not change them, so C4.5's rules are not affected by the choice of the Time Window. The tree generation algorithm, however, is heavily affected by the value of the Time Window. Any value above 1 may cause the tree to become smaller, and the error rate of the tree may increase. This is because after a variable from a certain time step has been chosen to expand the tree at a certain branch, no variable from any previous time step can be selected to continue the expansion in the branch. This affects the tree because there are now fewer attributes available. For this reason, TimeSleuth offers the user two different Time Windows. One for c4.5 (tree generation) and the other for c4.5rules (rule generation). The user can choose a Time Window of 1 for tree generation, and a higher value for rule generation.


    The TimeSleuth application uses c4.5rule's Time Window when flattening the data. If you don't want to change c4.5's tree generation algorithm, leave its Time Window at 1 and only change c4.5rule's Time Window. Otherwise, set both of them to the same value.

  • Discrimination Among Causal and Acausal Temporal Relations.
  • . TimeSleuth allows the user to assume both forward and backward directions for time. The flattened data in each direction is used to generate rules, and based on the quality of the rules in each direction, the relationship represented by the rules can be branded as either causal or acausal.

  • Unsupervised Investigation of the Relationships Among the Attributes.
  • In C4.5 the user has to differentiate between the decision attribute and the condition attributes. The program then tries to find rules to predict the value of the decision attribute. C4.5 expects the value of the decision attribute to appear last in every line of observationed values, and this means that if the user changes his mind and wants to set another variable as the decision attribute, he has to reformat the input file. TimeSleuth does this automatically, so the user is free to try different hypothesis.

    TimeSleuth goes one step further by allowing the user to try every variable as the decision attribute. In this case TimeSleuth invokes C4.5 multiple times, each time with a different variable as the decision attribute, and reports the results to the user.


    TimeSleuth can extract the possible values of an attribute from the data file, freeing the user from the need to specify them manually. TimeSleuth is thus an unsupervised learning tool.

  • Converting decision rules into Prolog statements.
  • C4.5, when modified by the included patch, can generate Prolog statements. Such statements can then be run automatically by a Prolog interpreter.

  • Screening output rules based on confidence level.
  • C4.5 assigns confidence levels to its output rules. Confidence level is a measure of how reliable a rule is. TimeSleuth allows the user to specify a minimum confidence level, so that C4.5 will not output rules with less confidence level values.

  • Tabular presentation of the output for analysis.
  • TimeSleuth allows the user to see the output rules and the way the attributes are used, in tabular forms. This form of presentation helps make C4.5 more useful as a data mining tool.

  • Acting as a graphical user interface for C4.5.
  • TimeSleuth can be used to interact with C4.5 in a graphical fashion, even if the user is not interested in the special features that are provided by TimeSleuth.