Anoppi project

Automatic anonymisation and content description of documents containing personal data (Anoppi)

OM042:00/2018 Development

The Anoppi project led by the Ministry of Justice will implement two language technology-based artificial intelligence tools for automatic anonymisation and content description of court decisions and other official decisions issued by authorities. With the assistance of the new applications, the electronic availability of documents can be improved, for example for the purposes of decision-making and research.

Basic information In progress

Project number OM042:00/2018

Case numbers VN/5161/2018

Set by Ministry of Justice

Term/schedule 1.10.2018 – 31.12.2020

Date of appointment 26.10.2018

Goals and results

The self-learning anonymisation tool (ANOPPI) that will be created in the project will be capable of automatically recognising and marking key phrases to be anonymised and links between them, such as different references to the same person. On the basis of the analysis, the tool then provides the anonymiser with a suggestion for an anonymised document and flexible instruments to make any further modifications that may be needed in the document. The language and semantic computing technology required in this work recognises conceptual references to persons, organisations, locations and other details in text documents.

The same technical solution and software will also be used for automatic content description, which refers to the search of key concepts essential to the contents of a document. This kind of self-learning automatic annotation (APPI) will enable intelligent search of documents and linking of them to other material, for example linking of legal cases to other similar cases and to the related legislation. For example, the content description of legal cases in the case management systems of courts (Ritu, Sakari, Tuomas, etc.), in the document and records management systems of other authorities, and on Finlex requires manual work that is expensive, in the same manner as anonymisation, and that is why it has been carried out to a very limited extent so far.

Summary

The Anoppi project led by the Ministry of Justice will implement two language technology-based artificial intelligence tools for automatic anonymisation and content description of court decisions and other official decisions issued by authorities. With the assistance of the new applications, the electronic availability of documents can be improved, for example for the purposes of decision-making and research.

Starting points

The various public-sector actors produce an enormous amount of information and data, and it would be useful for other authorities, businesses and citizens to have open access to them. Due to data protection issues, this is not, however, possible at the moment. Better access to decisions made within the public administration and by courts will enable the utilisation of earlier decisions in the consideration of new cases. It will also be useful and relevant for research concerning the established practices of the public authorities and the administration of justice.

The project is about anonymisation of named entities appearing in text documents for the purposes of open use and publication of data. The problems related to personal data protection and privacy can be solved by pseudonymising or anonymising data that will be published openly. For example, the names of persons are systematically replaced by neutral names such as "Person A". To keep it simple, we will from now on use the term 'anonymisation' to refer also to pseudonymisation, which is a more simple procedure allowing the original names to be more easily restored with the assistance of contextual data. The common challenge related to both anonymisation and content description is that they both require special expertise and manual work, which is expensive. In addition, the documents to be processed are often very extensive. Out of the different document types produced by authorities, the project will specifically focus on court decisions, which affect citizens and businesses in many different ways.

Additional information