If you have been following the content on this blog, you would have noticed that my general theme of projects has been to use Machine Learning (ML) or Deep Learning (DL) as a tool to solve some interesting problem. Through this process I became interested in learning algorithms itself, and started pondering about these processes, could they be made more efficient, etc.
I started working with Philosopher John Morrison to analyze learning algorithms, how they finetune to various tasks, what determines their performance, etc.
Through this project we conclude something that may seem intuitive- but that makes it cooler!
So, what is the intuition?
The task we are modelling is simple. I aim to see if networks that have been trained a particular way would finetune better to some tasks- even if the underlying modeling of different finetuning tasks is similar. Because this is something biological systems do, and traditionally not expected of machines.
For example: suppose there are 2 approaches (a) and (b) to learning a math concept. And, lets say you learn only one approach, say (a), to solve it.
Now, the hypothesis is that: you are likely to easily solve a new task that is related to this original task, if it has an approach that is similar to the one you learnt (a). Suppose, the new task was similar to approach (b), then you would likely find it harder to tackle that question.
We model this scenario using Neural Networks and analyze the results.
Let's talk Neural Networks:
Let's say we want a deep learning model that learns the mapping of:
We build two neural networks and each of them learns this mapping differently:
(1) by having two inputs and learns a product mapping
(2) by having two inputs and learns a summation mapping
Finetuning Analytics:
Now that these models have been trained, let's say we want to finetune it to some other tasks.
Specificially, let's say this new task is
Note that the LHS resembles the learning approach of model 1, and the RHS resembles the learning approach of model 2
Hypothesis:
Is it likely that the model (1) will finetune better to the task (x+1)*(x+2) as opposed to model (2) fine tuning to (x^2) + (3x+2)
The underlined terms denote the coefficients that are different compared to the training model (1) and (2). The product relation is "closer" to model 1 because it has only 1 coefficient that is different; but the summation relation is "further" because it has 2 coefficients that are different.
The loss plots shown below validate this, where the yellow line is almost consistently showing lower loss than the blue line during training:
Similarly:
Let's say we want to finetune it to some other tasks.
Again, the LHS resembles the learning approach of model 1, and the RHS resembles the learning approach of model 2
But now,
Hypothesis:
Is it likely that the model (2) will finetune better to the task (x^2) +(2.5x +1) as opposed to model (1) fine tuning to (x+2)*(x+0.5)
The underlined terms denote the coefficients that are different compared to the training model (1) and (2). The summation relation is "closer" to model 2 because it has only 1 coefficient that is different; but the product relation is "further" because it has 2 coefficients that are different.
The loss plots shown below validate this, where the yellow line is almost consistently showing lower loss than the blue line during training:
Conclusion: While these results may seem intuitive, I think these are interesting insights into understanding learning properties better. The way such learning algorithms follow such simple ordinal rules and perform math tasks, could provide insight into inference patterns in the brain! If you know of any other similar ordinal rule based excercies and psychology experiments that could provide cool insights into the brain, I would be happy to hear.
Comments