Data annotation: how working conditions impact the quality of Artificial Intelligence models

The process of data annotation – especially the efficient data-labeling activity at scale for machine learning projects – presents several complexities. Given that data is the raw material on which machine learning projects are built, ensuring its quality is essential. If labels lack precision and quality, an entire highly complex project based on artificial intelligence can be affected by invalidating predictive models. Have you ever wondered under what conditions those data are produced? 

That was the question that guided the research of Milagros Miceli, a sociologist Ph.D.  candidate in Computer Science at  TU-Berlin.  According to Miceli, it is crucial that the people in charge of carrying out the data annotation tasks know the context of their work: what they are doing the labeling for, and what the objectives of the project (of which they are a crucial link) are. It is also relevant that they be aware of the impact their work has on the final quality of the dataset, and therefore, it is important that their tasks be recognized and valued.


It is known that the preparation, loading and cleaning of data usually requires up to 45% of the time  spent working with data. In addition, the application of complex ontologies, attributes, and various types of annotations to train and deploy machine learning models adds more difficulty. Training  data annotation workers, ensuring their working conditions and their well-being is therefore key to improving the chances that labeling efforts will yield the expected quality.

The big challenge: data processing and labeling

Currently, companies have an abundance of data, arguably even excessively so. The big challenge is how to process and label them so that they are usable. Precisely labeled data helps machine learning systems to establish reliable models for pattern recognition, which, in turn forms the basis for every AI project. 

Since data labeling requires managing a large amount of work, companies often need to look for an external team to take care of this. In these cases, it is vital to ensure smooth communication and collaboration between taggers and data scientists in order to maintain quality control, validate the data, and resolve any problems and doubts that may arise.  

In addition to linguistic and geographical issues, there are other aspects that can impact the interpretation of the data and, consequently, on their correct annotation and labeling. The annotator’s experience in the specific domain and his/her cultural associations will imprint a bias that can only be controlled if there is awareness of this during the process. When there is no single “right” answer to subjective data, the data operations team can establish clear instructions to guide how the people doing the annotation should interpret each data point.

What are the production conditions of the data sets?

There are studies that focus on the problem of individual annotator bias in the data annotation task. However, there are also new lines of research, such as those presented by Milagros Miceli, which highlight the asymmetries of power implicit in the production conditions of datasets.   From this perspective, since annotators follow the exact instructions provided by clients, the interpretation of the data they make “is deeply  limited by the interests, values and priorities of the stakeholders with the most (financial) power”. That is, interpretations and labels “are imposed vertically on the annotators and, through them, on the data.” Therefore, it would be incorrect to take for granted the hierarchical powers that influence the annotation processes behind the scenes.  

Data annotation

© Milagros Miceli, Martin Schuessler/ Weizenbaum Institute

Even in the event that the data is presumably more “objective,” challenges will still appear, especially if the labeling analysts don’t know the context of their work and don’t have good instructions or established feedback processes. 

Without neglecting the various factors that impact data annotation, what is observed in practice is that training is an important aspect of this process, since it helps “annotators to properly understand the project and produce annotations that are valid (precise) and reliable (consistent)” within the relevant framework,  as indicated by Tina Tseng, Amanda  Stent and Domenic Maida in this Bloomberg study. 

In this sense, written project guidelines – which must clearly detail the parameters, describe the functionality of the annotation platform and add representative examples – can be used as a tool to train the teams, and to provide relevant feedback. 

Understanding the problem

At Arbusta, our goal is to have the data-labeling analysts be committed to their project, understand the context, receive the necessary support and maintain any needed dialogue with the client’s teams.

For example, in the case of Mercado Libre – the leading e-commerce and payments platform in Latin America – our teams accompanied the process of scaling supervised machine learning models that are used to ensure that the marketplace provides a secure space. Specifically, at Arbusta we were responsible for manual tasks: labeling images and text in order to help their model complete its development cycles, particularly in the training stage, to make it more accurate and up-to-date. 

As part of Arbusta’s machine learning training services, our teams were integrated with the client’s teams in order to understand the model that, in this case, was designed to detect fraud in Mercado Libre’s publications. The tagging service was used, on the one hand, to detect original logos (image annotation) and, on the other, to verify that the text fields of each publication were correctly completed.

“The incorporating of Arbusta for Data Annotation impacted on the efficiency metrics of the models because it allowed us to scale the labeling of paragraphs, which, in turn generated a dataset of greater size and quality, which are key to the training of our models. Their teams managed to fit into our dynamics. They tried from day one to understand how they could fit into the work-flow, giving it continuity.”  cited Raúl Juarez, Machine Learning Senior Manager at Mercado Libre.  

To achieve this task, at Arbusta, we first sought an understanding of the problem, which was key to moving forward with the same objective. Subsequently, Arbusta’s data labelers   were introduced to the model developed by the client and developed the best strategy to detect the inputs required by the system. In addition, meetings were held between the Arbusta and Mercado Libre teams, and as a consequence of this integration, a quality result was all but guaranteed.

You can learn more about our Machine Learning Training service here.