"Reinforcement learning is the problem of getting an agent to act in the world so as to maximize its rewards. For example, consider teaching a dog a new trick: you cannot tell it what to do, but you can reward/punish it if it does the right/wrong thing. It has to figure out what it did that made it get the reward/punishment, which is known as the credit assignment problem."
1) A human builds a classifier based on input and output data
2) That classifier is trained with a training set of data
3) That classifier is tested with a test set of data
4) Deployment if the output is satisfactory
To be used when, "I know how to classify this data, I just need you(the classifier) to sort it."
Point of method: To class labels or to produce real numbers
1) A human builds an algorithm based on input data
2) That algorithm is tested with a test set of data (in which the algorithm creates the classifier)
3) Deployment if the classifier is satisfactory
To be used when, "I have no idea how to classify this data, can you(the algorithm) create a classifier for me?" Point of method: To class labels or to predict
1) A human builds an algorithm based on input data
2) That algorithm presents a state dependent on the input data in which a user rewards or punishes the algorithm via the action the algorithm took, this continues over time
3) That algorithm learns from the reward/punishment and updates itself, this continues
4) It's always in production, it needs to learn real data to be able to present actions from states
To be used when, "I have no idea how to classify this data, can you classify this data and I'll give you a reward if it's correct or I'll punish you if it's not."
An RL agent may include one or more of these components:
Two fundamental problems in sequential decision making: