We submitted the following files to Phase I of the PTC for 2000-2001: class_blind_0.10_fm_fragment_table class_blind_0.01_fm_fragment_table class_sens_fm_fragment_table class_blind_0.10_mm_fragment_table class_blind_0.01_mm_fragment_table class_sens_mm_fragment_table class_blind_0.10_fr_fragment_table class_blind_0.01_fr_fragment_table class_sens_fr_fragment_table class_blind_0.10_mr_fragment_table class_blind_0.01_mr_fragment_table class_sens_mr_fragment_table These files contain tables of frequent linear molecular fragments. The content of these files should be more or less self-explanatory. "class blind"/"class sensitive" refers to the way the features were constructed. "fm" means female mice, "mm" means male mice, "fr" means female rats and "mr" means male rate. For the class-blind runs, we set the minimum frequency of the fragments to 10% and the maximum frequency to 90%. In a second attempt, we constructed less general features/fragments and set the minimum frequency threshold to 1% and the maximum frequency threshold to 99%. The files containing "0.10" in their names are the files for 10%/90%, and the files containing "0.01" are those for 1%/99%. For the class-sensitive runs, we queried for features/fragments that are significantly over-represented in the active compounds and under-represented in the inactives. The minimum frequency on the actives was set to 6, 10, 15 and 20 in turn, and the maximum on the inactives was set dependent on the class distribution such that the occurrence of a fragment in the active class is significant at a significance level of 0.999 according to the chi-square statistic. Only very few linear fragments turned out to be strongly statistically associated with the class membership. Decreasing the significance level might have given more of these fragments. We took only examples with known and valid Smiles strings for feature construction. Secondly, we generally ignored the equivocals in feature construction. For class-sensitive feature construction, compounds with "some evidence" (se) were counted as positive (carcinogenic). The details are described in the following papers. Linear molecular fragments have been introduced in Inductive Logic Programming and Data Mining in: Kramer S., Frank E.: Bottom-Up Propositionalization, in: Notes on the Work-In-Progress Track of the Tenth International Conference on Inductive Logic Programming (ILP-2000), 2000. The integration in the inductive database framework (querying for features of interest) has been described in: De Raedt L., Kramer S.: The Level-Wise Version Space Algorithm and its Application to Molecular Fragment Finding, to appear in: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01), 2001. The use of fragments in conjunction with Support Vector Machines has been proposed in: Kramer S., Frank E., Helma C.: Fragment Generation and Support Vector Machines for Inducing SARs, to appear in: SAR and QSAR in Environmental Research, 2001. The use of the MolFea, the molecular feature miner (resulting from the integration of linear molecular fragments and the inductive database framework of RDM) for feature construction and feature mining has been documented in the following two papers: Kramer S., De Raedt L.: Feature Construction with Version Spaces for Biochemical Applications, to appear in: Proceedings of the Eighteenth International Conference on Machine Learning (ICML-2001), 2001. Kramer S., De Raedt L., Helma C.: Molecular Feature Mining in HIV Data, submitted, 2001. If you are interested in more details about this submission or some of these papers, do not hesitate to send me an email. Stefan Kramer Institute for Computer Science Machine Learning Lab Albert-Ludwigs-University Freiburg, Germany http://www.informatik.uni-freiburg.de/~skramer skramer@informatik.uni-freiburg.de