OCR of ID’s for a Japanese dating app
As a part of work for the Japanese dating app, HireUkraine had to implement the Optical Character Recognition (OCR) for ID’s to ensure the app users are of legitimate age. The problem here was that Japanese documents are issued in hieroglyphs and are scarcely recognizable by standard OCR algorithms. In addition, personal IDs are not the type of photos to be found on the Internet by millions, so comprising the data set for the model training was troublesome at best. In addition, the time was very short, so ve decided to go for the most cost-effective solution — and it paid off well in the result.
The team consisted of a DevOps engineer, Middle Python Developer and Middle Data Analyst, who had to select the appropriate OCR model and train it to recognize the Japanese hieroglyphs.
At first, we tried to train the bespoke OCR model based on the entities within the documents and insert the identified entities (name, age, date of ID issuance, a title of the company, etc.) as an input to Google Translate to get the English translation and insert it into database. However, the low resolution of scanned documents constantly resulted in English text having lots of errors.
This partial success was not enough, and we decided to opt for a simpler way. We used Google Vision to identify the data and applied the regular mathematical expressions to get the sense out of it. This way the “Flaming Rising Dragon 6 of Nippon 29” is easily transformed into “29th day of 6 month of Heisei epoch”, which corresponds to June 29th, 1995.
We also used the image similarity algorithm to teach the system to distinguish the insurance cards from the driver’s licenses and the passports. This feature greatly simplifies the profile creation for the users of the dating app, as they can simply provide the photo-quality scans of their ID’s and the system verifies the data with ease.
Industry: IT services/telecom
Partnership period: 2017-ongoing
Team size: 3 people
Team location: Kharkiv, Ukraine
Expertise delivered: DevOps services, cloud infrastructure management, Python back-end development, ML model implementation
Technology stack (c иконками): Kubernetes, Jenkins, Ansible, Terraform, Python, OCR, ML training and implementation
After the previous project engineers left, the transition to HireUkraine team was seamless. I honestly don’t have any complaints about them, and that’s what’s most impressive. Working with them is smooth, and as a CEO, that’s the best outcome.