Human + machine increased collaboration: how are Machine Learning solutions trained in, taught and tested?

By Melina Nogueira, Technology Director at Arbusta

Machine Learning Training

There’s still lots of talk about artificial intelligence and machine learning as some of the main IT investment strategies that were accelerated by the coronavirus pandemic. We see apps in the market that are more mature in terms of e-commerce, health, finance and education.

Machine learning is an application of artificial intelligence (AI) that enables systems to automatically learn and improve on the basis of experience without being explicitly programmed to do so. It is focused on the development of computer programs that can access data and use them to learn for themselves. Its main aim is to enable computers to learn without human intervention or assistance and adjust their actions accordingly. This is a technique that, to a greater or lesser extent, no innovative company or business today can fail to apply. 

Recent research has shown that 29% of global developers have worked on AI/ML software in the last year. And it suggests that 25% of Fortune 500 companies will add AI construction blocks (for example, automatic text analysis and learning) to their robotic process automation (RPA) efforts in order to create hundreds of new intelligent process automation (IPA) use cases. Another consultancy firm anticipates that, in 2021, 15% of the customer’s experience applications will be hyper-personalized, continuously combining a variety of data and new reinforcement learning algorithms.

Behind machine learning lies confidence in the fact that systems have the ability to learn from data, recognize patterns and make decisions with marginal human intervention. However, in order to apply this technique, we must start with the training of algorithms, i.e. with the machine learning training stage, which can be found for example in the development of the so-called virtual assistants or chatbots, which are an everyday feature in most of the cutting-edge e-commerce sites. Indeed, those chatbots also employ NLP (natural-language processing) techniques, which allow us to analyze, interpret and make sense of human language and to enable chatbot annotation processes. The latter techniques in turn go hand in hand with those of text recognition, which allow us to obtain and analyze document information in image format. 


Machine Learning Tunel Model - Arbusta

Within the context of machine learning, developers are often asked to create a system –or model– that answers one or several questions. This model is developed through a “training” process. The aim of this training is to create a specific model that answers the questions correctly most of the time. The machine needs to be “trained” by explicitly feeding it the attached correct answers. In other words, to train a model, we must gather data. These training data will help it match the patterns found in data to the correct answer – and this will enable it, in the future, to predict answers through the training it has received.

In this scenario, the first thing will be to gather the data to be fed to the model. It’s important that these are quality data, although quantity matters as well. In order to automatically gather data from different sources, web scraping techniques are frequently used today. 

Later, data will have to be prepared, mixing them and ensuring that there are no correlations leading to biased answers. The human role is once again key to this process.


The Human in the Loop paradigm involves adding human feedback during the machine learning algorithm training (modeling and simulation) stage. But what are HITL’s purposes? This approach is gaining ground as the best possible way to train more accurate models, as people continually test, adjust and feed data, in a live, constructive and virtual manner.

The Human in the Loop paradigm involves adding human feedback during the machine learning algorithm training (modeling and simulation) stage.

Since machine-driven systems haven’t yet achieved the desired accuracy levels, human intervention is required in training circuits, in order to create more accurate automatic learning models. This means that HITL describes the process carried out in cases in which the machine or computer system can’t offer an answer to a problem and require human intervention.


Most of data scientists divide their data into three parts: training data, cross-validation data and testing data. 

✔ Training data are used to ensure that the machine recognizes the patterns; they are employed to train and to adjust the model’s parameters for the purpose of reducing bias.

✔ Cross-validation data are used to guarantee the greater accuracy and efficiency of the algorithm employed to train the machine. They are applied to hyperparameter tuning and to improving their model’s generalization ability.

✔ Testing data are used to establish how well the machine can predict new answers based on its training. They provide an impartial assessment of a final model. This stage allows us to test the model with data never before used for training, and ascertain how it may function when faced with data it hasn’t yet seen. 

✔ Validation and testing data must come from the same distribution. The assessment should not employ the same data as the training, since the machine might memorize the “questions”. At this stage, the accuracy level of the already-trained model should be checked: if it’s around 50%, it won’t be useful; if, however, it reaches or exceeds 90%, its answers will be highly reliable.


If the predictions don’t yield good “correct answer” percentages, it’ll be necessary to go back and adjust or reconfigure the parameters (usually known as “hyperparameters”). However, it’s also important to establish what it is that makes a model “good enough”; otherwise, we’ll run the risk of getting stuck in parameter adjusting for a long time.

Once the training and assessment stages have been completed, the model will be ready to answer questions – that is to say, for prediction or inference in actual contexts. 

These processes naturally require skills, and today there are companies like Arbusta, who offer machine learning training services with different specializations, ranging from general proposals for e-commerce, the web and applications to other, industry-segmented ones (for example, for banks or healthcare providers).

Has your company already brought these developments on board? If you’re interested in discussing how Arbusta’s teams can contribute to machine learning training projects for your enterprise, drop us a line at [email protected] so we can start a conversation. 


> Machine Learning Training <

> About Arbusta <

> What makes us special <