Essay: Reinforcement learning vs supervised learning

einforcement learning requires a learning agent to learn from the environment rather than being guided what to do. The learning agent knows what to do with mapping the situation to actions in order to increase the chances of the targeted rewards(Sutton & Barto 15). As a result, the learning agent has to rely on the past experiences as well as exploring the new choices. With lack of stipulated guidelines and uncertain chances of matching the right actions, the reinforcement learning indeed is trial and error learning. A numerical reward is used as a reinforcement signal to encourage the learning agent to successfully keep matching the expected outcomes. Therefore the learning agent must learn how to select the right actions that will maximize the accumulated reward. This paper will address the problem statement of whether reinforcement learning is more effective than supervised learning. The paper supports the opinion that all learning is as a result of one’s reinforcement history.
Brief literature review
Reinforcement learning emphasizes more on learning problem rather than learning methods (Whiteson 20). The learning agents will more guided by the problems they have encountered in past experiences and try to avoid them. This ensures that they are more sensitive to the environment they are deducing important lessons from. As a result, the learning agent works hard in relating the state of the environment and the goals they intend to achieve. Therefore the learner’s actions and how well they are favored by the state of environment highly determines the chances of learning agent of getting the numerical rewards.
Most beneficial learning requires a learning agent to learn from its own experience rather than an external knowledgeable supervisor (Sutton & Barto 17). . Reinforcement learning provides a vast support in beneficial learning as compared to supervised learning. For instance, a gazelle kid finds hard to stand by its own feet immediately after birth. However, to adapt in harsh environment full of predators such as lions, the kid must learn how to run fast to increase the chances of survival. Such learning requires own experience rather than guidelines by the mother supervision.
Reinforcement learning systems usually develop models of the environment in order to fully understand the behavior of the environment. The model predicts the resultant next reward and the next state given a certain state and action (Porr & Woergoetter 1). The use of models makes it easy for the learning agent to be aware of the possible future situations and the rewards associated with them. As a result the reinforcement learning becomes more effective where the learner becomes aware of which actions have the high probability of attracting a reward without being guided.
Scope of the study
The study of reinforcement learning is important to learners as it enhances self reliance. The learning agent is able to make the right decisions by themselves given the state of actions and the goals targeted. Reinforcement learning also enables the learner to be innovative. This is because the learning agent has to keep exploring new choices to maximize the chances of getting the numerical rewards.
Counter arguments
Supervised learning does not have a challenge of balancing exploration and exploitation (Porr & Woergoetter 1). The learning agent in reinforcement learning has to keep on referring to the past procedures that they may have tried and found them attracting rewards. However, to maximize the chances, the agents have to also explore the learning environment seeking better selections. The dilemma arises since neither exploitation nor exploration solely guarantees the success of the task. On the other hand, supervised learning heavily relies on an experienced person to pass the knowledge to the learning agent. Additionally, the supervisory learning relies on the already set guides which are well scrutinized to ensure that if well followed, the learner is guaranteed success.
Reward seeking is narrow compared to goal seeking (Meisner,Laurer, Igel & Redmiller 2). A learning agent in reinforcement learning will always choose an action that will maximize the chances of numerical rewards over time. This means that the agent will only be interested on specific actions that it has exploited on past and found that to have more reward. The learning agent will also explore new chances that it consider to have higher chances of maximizing the reward.Being a learning characterized by a trail and error approach, the risk becomes if the anticipated action fails to attract the expected rewards. On the other hand, a goal seeking approach is more general than reward seeking approach of the reinforcement learning ( Porr & Woergoetter 1). A goal seeking approach does not have to be specifically be about a specific state of action. It provides a room for multitasking between two courses of actions that maximizes future goods and limits the risk of trial and error. For instance, a student may be pursuing a course in engineering in a certain university. Despite his main goal being to pass exams and becoming an engineer in the future, the same student may be also be participating in athletes as a sport. It is notable that the student will have the higher chances of achieving both goal at the end of his career. This shows that supervised learning which is more goal oriented will be more effective than reinforcement learning which aim at specific rewards.
Supportive arguments
Reinforcement learning explicitly considers the whole problem of a goal-driven agent while purely interacting with an unknown and uncertain environment(Sutton & Barto 18). This approach is much better as compared to supervised learning that considers sub-problems without bearing in mind how they might fit in the larger picture. For instance, in supervised learning a student strictly follows the guidelines of a knowledgeable supervisor without explicitly specifying how the acquired ability would be finally be of help after the course. As a result, the supervised learning fails to consider the planning roles in real time decisions rather than following merely stipulated guidelines.
Reinforcement learning agents are driven by explicit goals and can choose actions to influence their learning environment. This makes reinforcement learning more effective than supervised learning as it uses an approach of a complete , interactive and goal seeking. As a result, in reinforcement learning the agent has to operate despite the significant uncertainty from the learning environment (Sutton & Barto 20). For instance a gazelle calf must learn how to run at least twenty miles per hour despite the struggles it faces to stand by its feet minutes after being born. This makes reinforcement learning’s planning more effective than supervised learning as it addresses the interplay between planning and real time selection of actions. In addition, when reinforcement learning borrows knowledge from supervised learning, it only takes that decisions critically analyzing which capabilities are critical which are not. This improves the probability of getting the numerical rewards while purely working in an uncertain environment (Whiteson 30). For instance, an adaptive controller adjusts parameters of a petroleum refinery operation in real time. Despite the availability of set points that are put in place by a qualified engineer, the controller critically analyzes which set points are critical and which are not necessary. As a result, the adaptive controller successfully optimizes the yield, cost and quality trade-offs based on the specified marginal costs but without strictly following the set points that were originally suggested by the engineer. This unique trait is notably absent in supervised learning where the agent strictly adheres to the instructions of the knowledgeable supervisor.
Reinforcement learning enables the agent to improve its performance over time by using the past experiences. This because in reinforcement learning the explicit goals must be attained for the numerical reward to be given(Porr & Woergoetter 1). With lack of generalizability in goals the agent will be anticipating to attain, the learning agent must be keen on the past experiences to avoid the chances of errors re-occurring. For instance, a master chess players heavily relies on his intuition and experiences while deciding which is the best move to make. The past experiences will not only help him to decide the possible replies and counter replies of the opponent player , but also helps him to improve his playing tactics in the future. However, since the effects of an action (e.g. A move reply by a chess player) cannot be fully predicted, reinforcement learning requires the learning agent to frequently monitor its environment and makes the appropriate reactions. Therefore, the past experiences of the learning agents and exploration of the new choices is boosted by the frequent interaction with the learning environment (Sutton & Barto 20).This makes reinforcement learning more effective in handling real life time problems which requires an individual to operate in an uncertain environment as compared to supervised learning.
Reinforcement learning allows evaluation of the state of actions rather than purely following the set instructions. The learning agent is expected to explicitly search among the alternative actions. Despite lack of instructions, the reward received by the learning agent after every action provides the relevant information about how desirable the action was(Sutton & Barto 19). Therefore the learner must conduct a generate and test method whereby they try an action, observe the outcomes and selectively retains the actions that have the desirable outcomes. For instance, a gazelle calf must make the right jumps minutes after birth to increase the chances of survival in a jungle full of predators. However, some jumps might lead to injuries rather than boosting its running speeds. In the case of a chase player, two moves mat lead to a win of the opponent’s counter replies. However, despite the anticipated reward (wining), the winning move may follow a loss through being checked by the opponent eventually losing the game. This kind of learning by selection rather than learning by instructions makes reinforcement learning more effective than supervised learning. For instance, in supervised learning, the learning agent have the set guides which clearly dictates which is the right actions to be followed in order to realize the anticipated goals. Therefore, in supervised learning, the need for searching for the correct action becomes less important (Whiteson 40). This makes supervised learning less effective in dealing with real life situations as compared to reinforcement learning. The supervised learning agent cannot learn how to control its environment since it follows rather than influences the instructive information it receives from the knowledgeable supervisor. As a result, the supervised learning system encounters hardship while maps situations to actions that match well with correct actions that are specified by the environment. Instead of influencing the environment to favor the set actions and anticipated rewards, supervised learning make itself operates as instructed by the environment. This makes it less effective than reinforcement learning.
This paper clearly addresses the problem statement: is reinforcement learning more effective than supervised learning?. The counter arguments support the supervised learning rather than reinforcement learning. The first counter arguments states that the exploitation- exploration dilemma is not experienced in supervised learning as compared to the reinforcement learning. The second counter argument states that reinforcement learning reward seeking approach is narrow compared to goal seeking approach in supervised learning which is more general. The supportive arguments indicate that reinforcement learning is more effective than supervised learning. The supportive arguments are: reinforcement learning consider the whole problem of a goal driven agent while interacting with uncertain environment, interacting with environment improves the agent performance over time and reinforcement learning is more selective rather than instructive.
From the above discussion, it is beyond doubt that all learning is a result of one reinforcement history. Reinforcement requires the learning agent to exploit the past experiences that they have proven to yield more desirable outcomes. However, as the agent operates in a purely uncertain environment, the effects of any selected action cannot be fully predicted. The learning agent must also explores for the new chances by selectively choosing between the alternative actions. As a result the agent works hard in achieving the numerical reward by matching the course of action with the environment.Failure is not an option where the goals are explicitly defined rather than being general. Despite unpredictable outcomes and the uncertain environment, the reinforcement learning agents improves their performance based on their past history rather than the set instructions .

Source: Essay UK -

Not what you're looking for?

Search our thousands of essays:


About this resource

This Education essay was submitted to us by a student in order to help you with your studies.

  • Order a custom essay
  • Print this page
  • Search again

Word count:

This page has approximately words.



If you use part of this page in your own work, you need to provide a citation, as follows:

Essay UK, Essay: Reinforcement learning vs supervised learning. Available from: <> [28-02-17].

More information:

If you are the original author of this content and no longer wish to have it published on our website then please click on the link below to request removal:

Essay and dissertation help

Latest essays in this category:

Our free essays: