Supplementary data
This page contains supplementary data for the paper
Jason Weston, Christina Leslie, Dengyong Zhou, Andre Elisseeff and William Stafford Noble. "Semi-Supervised Protein Classification using Cluster Kernels." (Extended version). Short version submitted to NIPS 2003.The full text of the paper is available from the link above. Here, one can find the following data:
- ROC-50 scores for all families and all detection methods from the paper in plain text format.
- ROC scores for all families and all detection methods from the paper in plain text format.
- Plain text table specifying the positive and negative training and test sets for each family. Each row is one sequence, and each column is one family. (0 = not present; 1 = positive train; 2 = negative train; 3 = positive test; 4 = negative test). [Same file, but with no headers]
- Summary of data splits giving the number of positive and negative training and test set examples and amount of unlabeled data for each family.
- Names of the SCOP families.
- Sequence file in FASTA format containing all sequences in SCOP version 1.59 with less than 95% identity.
- 7329x7329 Kernel matrices for methods used in the experiments: (here are the IDs by row or column)
- BLAST matrix, ascii text file, gzipped (49 MB).
- PSI-BLAST matrix using the complete 7329 examples as a database, ascii text file, gzipped (52 MB).
- Spectrum Mismatch Kernel , k=5, m=1, ascii text file, gzipped (79 MB).
- The Spider software used in the experiments, a Matlab-based library of machine learning tools.
- Matlab scripts to run the semi-supervised experiments (using the Spider software.)