Why my interest?

I have a short experience working with artificial intelligence, machine learning and image processing. My first contac with this field was during my studies of Master Degree in Cuernavaca, Morelos, Mexico. Here, I had the oportunity to work with robotic vision (in a basic way); my thesis was about evaluating algorithms for feature extraction and feature association for robotic vision. In order to evaluate these algorithms, extraction (SIFT,MSER,Harrys,Hessian) and association (online Visual Bag of Words and Vacabulary Tree), we developed a system able to recognize if a scene has been visited previosly or otherwise add this scene to a “database”. This is known as Loop-Closure Detection.The developed system was able to start from nothing (dictionary and database of scenes which became complex to explore) and learn while visiting an indoor environment. During my research there was something that got my attention and this was the part of feature association and even when finishing my thesis I had the feeling that something was missing.

My second contact was during working in an internship in Dublin, Ireland. Here we were working with clothes classification. During this experience I was in charge of doing some experiments with the classes of clothes that we had a that moment and because we did not have many instences of this classes we were no able to use Deep Learning techniques. So, instead we used SVM and random forest. Every time I run a training I thought, “Ok, now we have this classes with certain instances, but what is going to happen in the future when we get more classes or more instances?”, “I can tune the model for this data, but if new data arrives we have to retrain the model”. Again, I had the feeling that something was missing.

Because of the lack of time I did not pay more attention to this feeling until now 01/08/2016. Before starting my research the only concept I had in my mind was “Continuos Learning (CL)”. For me, this has the next definition “CL is the ability of a system to learn over the time in a changeable environment”. After searching for CL I found that there is already a concept in the literature and is known as Lifelong Machine Learning (LML) or simply Lifelong Learning (LL).

What is Lifelong Machine Learning?

Definition: Lifelong Machine Learning, or LML, considers systems that can learn many tasks over a lifetime from one or more domains. They efficiently and effectively retain the knowledge they have learned and use that knowledge to more efficiently and effectively learn new tasks. [Silver et al.,2013]

For [Chen et al.,2015], LML is a learning paradigm that aims to learn as humans do: retaining the learned knowledge from the past and use the knowledge to help future learning (Thrun,1998, Chen and Liu, 2014b, Silver et al., 2013). Although many machine learning topics and techniques are related to LL, e.g., lifelong learning (Thrun, 1998, Chen and Liu, 2014b, Silver et al., 2013), transfer learning (Jiang, 2008, Pan and Yang, 2010), multi-task learning (Caruana, 1997), never-ending learning (Carlson et al., 2010), selftaught learning (Raina et al., 2007), and online learning (Bottou, 1998), there is still no unified definition for LL.

[Chen et al.,2015] also give a more formal definition and the elements that LML could have: Definition (Lifelong Learning): A learner has performed learning on a sequence of tasks, from 1 to N- 1. When faced with the Nth task, it uses the knowledge gained in the past N-1 tasks to help learning for the Nth task. An LL system thus needs the following four general components:

Past Information Store (PIS): It stores the information resulted from the past learning. This may involve sub-stores for information such as (1) the original data used in each past task, (2) intermediate results from the learning of each past task, and (3) the final model or patterns learned from the past task, respectively.
Knowledge Base (KB): It stores the knowledge mined or consolidated from PIS. This requires a knowledge representation scheme suitable for the application.
Knowledge Miner (KM): it mines knowledge from PIS. This mining can be regarded as a meta-learning process because it learns knowledge from information resulted from learning of the past tasks. The knowledge is stored to KB.
Knowledge-Based Learner (KBL): Given the knowledge in KB, this learner is able to leverage the knowledge and/or some information in PIS for the new task.

Another description is given by [Lee et al.,2016], Lifelong learning refers to the learning of multiple consecutive tasks with never-ending exploration and continuous discovery of knowledge from data streams.

Motivation

Humans learn to solve increasingly complex tasks by continually building upon and refining knowledge over a lifetime of experience. This process of continual learning and transfer allows us to rapidly learn new tasks, often with very little training. Over time, it enables us to develop a wide variety of complex abilities across many domains. Despite recent advances in transfer learning and representation discovery, lifelong machine learning remains a largely unsolved problem. Lifelong machine learning has the huge potential to enable versatile systems that are capable of learning a large variety of tasks and rapidly acquiring new abilities. These systems would benefit numerous applications, such as medical diagnosis, virtual personal assistants, autonomous robots, visual scene understanding, language translation, and many others.
[Eric E.,2013a]

[Bing Liu,2014] says that Statistical learning algorithms like deep NN, SVM, HMM, CRF, and topic modeling have been very successful in machine learning and data mining applications. Given a dataset, such an algorithm simply runs on the dataset to produce a model without considering any related information or past learning results. Although these algorithms can still be improved, such single task and isolated algorithmic approaches to machine learning have their limits in terms of both accuracy and robustness. Looking ahead, the BIG question is what we can do beyond these algorithms to improve machine learning much further. I believe the answer is lifelong machine learning or simply lifelong learning, which learns as humans do. The key characteristic of human learning is that humans learn continuously – we retain the knowledge gained from the past learning and use the knowledge to help future learning and problem solving. Existing isolated machine learning algorithms are not capable of doing that. However, without the lifelong machine learning capability, AI systems and machine learning can never be intelligent. We believe that now is the right time to explore lifelong learning. Big data offers a golden opportunity for lifelong learning because its large volume and diversity (very important) give us abundant information for discovering rich and commonsense knowledge automatically, which can enable an intelligent learning agent to perform continuous machine learning (or simply continuous learning), to accumulate knowledge, and to become more and more knowledgeable over time.

Silver in [Silver et al.,2013] also argues that LML is a logical next step in machine learning research. The development and use of inductive bias is essential to learning. There are a number of theoretical advances in AI that will be found at the point where machine learning meets knowledge representation. There are numerous practical applications of LML in areas such as web agents and robotics. And our computing and communication systems now have the capacity to implement and test LML systems. In this peper they also present their position on the move beyond learning algorithms to LML systems, detail the reasons for their position and discuss potential arguments and counter-arguments.

[Lee et al.,2016] also say that LML is crucial for the creation of intelligent and flexible general-purpose machines such as personalized digital assistants and autonomous
humanoid robots [Thrun and O’Sullivan, 1996; Ruvolo and Eaton, 2013; Ha et al., 2015]. They are interested in the learning of abstract concepts from continuously sensing nonstationary data from the real world, such as first-person view video streams from wearable cameras [Huynh et al., 2008; Zhang, 2013]

What does LML involves?

[Eric E.,2013b] says that learning over a lifetime of experience involves a number of procedures that must be performed continually, including:

Discovering representations from raw sensory data that capture higher-level abstractions.
Transferring knowledge learned on previous tasks to improve learning on the current task.
Maintaining the repository of accumulated knowledge.
Incorporating external guidance and feedback from humans or other agents.

Each of these procedures encompasses one or more subfields of machine learning and artificial intelligence and that is why the primary goal of their symposium is to bring together practitioners in each of these areas and focus discussion on combining these lines of research toward lifelong machine learning.

Some topic are:

knowledge transfer
- active transfer learning
- multi-task learning
- cross-domain transfer
- knowledge/schema mapping
- source knowledge selection
- one-shot learning
- transfer over long sequences of tasks
continual learning
- online multi-task learning
- online representation learning
- knowledge maintenance/revision
- developmental learning
- scalable transfer learning
- task/concept drift
- self-selection of tasks
representation discovery
- learning from raw sensory data
- deep learning
- latent representations
- multi-modal/multi-view learning
- multi-scale representations

incorporating guidance from external teachers
- learning from demonstration
- skill shaping
- curriculum-based training
- interactive learning
- corrective feedback
- agent-teacher communication
frameworks for lifelong learning
- architectures
- software frameworks
- testbeds
- evaluation methodology
applications of lifelong learning
- data sets
- application domains/environments
- simulators
- deployed applications

What do we have now?

Existing LML research is still in its infancy. The understanding of LML is very limited and Current research mainly focuses on only one type of tasks in a system [Zhiyuan et al.,2016]. Even tought there are several works that have been carried out trying ot follow this paradigm.

[Silver et al.,2013] gives and overview of some prior research in supervised, unsupervised and reinforcement learning that consider systems that learn domains of tasks over extended periods of time. In particular, progress has been made in machine learning systems that exhibit aspects of knowledge retention and inductive transfer.[Zhiyuan et al.,2016] remarks other works like ELLA: Efficient Lifelong Learning Algorithm[Ruvolo et al., 2013], Lifelong Sentiment Classification[Chen et al.,2015], NELL: Never-Ending Language Learner [Carlson, A. et al.,2010] in his tutorial about LML and in his workshop, [Lampert, 2016] also presents some works that are being done following LML.

Challenges to face when pursuing LML.

As you can see and I personally could see, LML is a whole new world with many possibilities and when trying to pursue it we will have to overcome several challenges.
[Silver et al.,2013] mentions some challenges/benefits of LML as: input / output type, complexity and cardinality, training examples versus prior knowledge, effective and efficient knowledge retention, effective and efficient knowledge transfer, scalability, practicing a task, heterogenous domains of tasks(image and tags, music and Lyrics, etc), Acquisition and use fo meta-knowledge.

In my personal opinion and in a more specific way I can mention the next challenges:

Model complexity.
Catastrofic forgetting.
Novelty Detection.
1. Concept Drift.
2. Concept Evolution.

And I am pretty sure that there are more of these to overcome.

In the next post I will cover the importance of the above challenges and some works that have been done regarding them.

References

[Bing Liu,2014]
[Bing Liu,Sep 24, 2014,Lifelong Learning – learning as humans do,10 August 2016, <https://www.cs.uic.edu/~liub/lifelong-learning.html>%5D

[Carlson, A. et al.,2010]
Carlson, A., Betteridge, J.,Hruschka, E.R., Kisiel, B., Mitchell, T.M., & Settles, B.. ,2010. Toward an Architecture for Never-Ending Language Learning. AAAI.

[Chen et al.,2015]
Chen, Z., Ma, N. and Liu, B., 2015. Lifelong Learning for Sentiment Classification. Volume 2: Short Papers, p.750.

[Eric E.,2013a]
Eric E.,2013, LIFELONG MACHINE LEARNING: Papers from the 2013 AAAI Spring Symposium, 08 August 2016, <http://www.aaai.org/Press/Reports/Symposia/Spring/ss-13-05.php>

[Eric E.,2013b]
Eric E.,2013, AAAI 2013 Spring Symposium: Lifelong Machine Learning, 09 August 2016, <http://www.seas.upenn.edu/~eeaton/AAAI-SSS13-LML/#OrganizingCommittee>

[Lampert, 2016]
Lampert, C., 2016, Towards Lifelong Machine Learning Christoph Lampert, CHIST-ERA Workshop , June 8, 2016.

[Lee et al.,2016]
Lee, S.W., Lee, C.Y., Kwak, D.H., Kim, J., Kim, J. and Zhang, B.T., 2016, Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors. In TwentyFifth International Joint Conference on Artificial Intelligence.

[Ruvolo et al., 2013]
Ruvolo, P. and Eaton, E., 2013. ELLA: An Efficient Lifelong Learning Algorithm. ICML (1), 28, pp.507-515.

[Silver et al.,2013]
Silver, D.L., Yang, Q. and Li, L., 2013, March. Lifelong Machine Learning Systems: Beyond Learning Algorithms. In AAAI Spring Symposium: Lifelong Machine Learning (pp. 49-55).

[Zhiyuan et al.,2016]
Zhiyuan (Brett) Chen, Estevam Hruschka, Bing Liu, 2016,Lifelong Machine Learning and Computer Reading the Web ,KDD-2016 Tutorial, Aug 13, 2016, San Francisco, CA, USA

post