This video explains the concept of reinforcement learning in machines and gives some very good examples by showing how the algorithm behind reinforcement learning continuously compares particular actions (responses) into the machine engine (in this case a game). When a positive result is achieved and a reward is given, the set of steps leading to that reward is saved. This keeps going on in order to accrue as many positive behaviours as possible. When the concept of reward is not that straightforward in that the steps to get to a reward are much more complex, reward shaping and adding more rewards for every scenario is possible (although time-consuming). Training without rewards is very hard in reinforcement learning, a technique which closely echoes the behavioural learning patterns of early educational systems.
The idea of algorithmic systems that pepper student learning with occasions for enjoying reward (as in the case of easy quizzes in MOOCs) may act as the carrot before the donkey in order to promote the self-directing learner while providing an occasion for ‘datafication’ and collection of data (Williamson, 2017). In this case, student behaviour becomes a very ‘valuable commodity’ (Knox et al, 2020) in providing the ‘action to the state’ as explained in the video because it can help predict outcomes. Ironically students are then providing their behaviour patterns for free to the users of CMSs, VLEs and MOOCs.
not only is data positioned before the desires of the learner as the authoritative source for educational action, but the role of the learner itself is also recast as the product of consumerist analytic
technologies. (Knox et al, 2020)
Educational systems that study and collect data in order to provide ‘the best possible learning experience’ and ‘limit’ the online learner to a simple reward system are an example of Biesta’ s concept of ‘learnification’, whereby the system is merely interested in producing successful students and growing numbers of successful students. This kind of ‘solutionism’ is a far cry from the learning process envisaged by Biesta. (Biesta, 2012). The social dimension of education is absent as a starter and learning is reduced to the concept of playing a basic video game (like Pong) in which the reward rather than the playing experience is what ultimately counts, reducing the learner to the idea of a ‘product’ (Rushkoff, cited in Knox et al, 2020). This is a view deeply enshrined in radical behaviourism and a concept built upon the binary determinism of computer systems that are able to break down responses to knowledge into a system of ‘ons’ and ‘offs’ that will eventually (even thanks to the development in quantum computing) challenge or even outperform the best human minds as seen below.
References:
Biesta, G., (2012). Giving Teaching back to education: Responding ot the disappearance of the teacher. Phenomenology & Practice 6 (2)pp 35-49.
Knox, J., Williamson, B., & Bayne, S., (2020) Machine behaviourism:
future visions of ‘learnification’ and ‘datafication’ across humans and digital technologies, Learning, Media and Technology, 45:1, 31-45, DOI: 10.1080/17439884.2019.1623251
Williamson, B. 2017. Introduction: Learning machines, digital data and the future of education (chapter 1). In Big Data and Education: the digital future of learning, policy, and practice. Sage.

