This thesis is concerned with the prediction of protein mutations using artificial neural networks. From the biological perspective it is of interest to investigate weather it is possible to find rules of mutation between evolutionary adjacent (or closely related) proteins. Techniques from computer science are used in order to see if it is possible to predict protein mutations i.e. using artificial neural networks. The computer science perspective of this work would be to try optimizing the results from the neural networks. However, the focus of this thesis is primarily on the biological perspective and the performance of the computer science methods are secondary objective i.e. the primary interest is to show the existence of rules for protein mutations.
The method used in this thesis consists two neural networks. One network is used to predict the actual protein mutations and the other network is used to make a compressed representation of each amino acid. By using a compression network it is possible to make the prediction network much smaller (each amino acid is represented by 3 nodes instead of 22 nodes). The compression network is an auto associative network and the prediction network is a standard feed-forward network. The prediction network predicts a block of amino acids at a time and for comparison a sliding window technique has also been tested.
It is my belief that the results in this thesis indicate that there exists rules for protein mutations. However, the tests done in this thesis is only performed on a small portion of all proteins. Some protein families tested show really good results while other families are not as good. I believe that extended work using optimized neural networks would improve the predictions further.