Supervisor Prof. Anders Hast, Centre for Image Analysis:
anders.hast@it.uu.se

The impact of Bias in Face Recognition 
Face recognition consists of four distinct stages. First a face is detected. Secondly, anchor points are found so that the face can be aligned and cropped, in order to have the exact same position for all faces. Next a feature vector is computed from the aligned face. In the final stage, verification is performed, usually by computing the cosine similarity between feature vectors.  However, it turns out that the reliability of face recognition (FR) varies depending on such things (parameters) as age, head pose,  sex (man/woman), face expressions and ethnicity.  The main idea is to investigate how much such parameters are sources of Bias and will impact the classification results, e.g. how much less or more accurate is FR for Women compared to Men? In the second step it is proposed that Transfer learning is done with the purpose of decreasing the Bias in question.  Bias comes from imbalanced data, so if there are more men than women in the training data set then it will generally perform better in identifying men than women. So in order to measure bias, all other parameters must be kept constant and one is varied. All parameters should be analysed

Age Estimation in Photographs
A persons age can be very hard to tell from face photos. However, deep learning approaches have the capacity to learn from huge amounts of data. Several such databases exist online. The purpose of this project is to learn from such datasets and make estimations of the age of persons, regardless of gender and ethnicity. The results will be very useful for the EB-CRIME project where age estimation is one important part. Especially it is interesting to tell whether a persons has reached the age of 15 or 18. It is also interesting to determine how well the algorithm works for different groups (age, gender and ethnicity).  One problem is that the quality of the images in the datasets varies quite a lot. Some images are blurry, some have strange face expressions and some have varying head poses. The first step would be to create a normalised dataset for training a deep learning neural network. Head pose can be measured (tilt, yaw and pitch) and keep only the ones within a certain threshold, perhaps no more than 10% (or even 5% depending on the results) so that the face is front looking face. Some pipelines (e.g. InsightFace) allow to measure the head pose and some even tells the face expressions. Keep only the ones with a moderately smiling or neutral face expression and remove those with extreme expressions. Finally, by flipping the face horizontally, a new feature vector can be computed for identification. If the cosine similarity between the original face and flipped face is much smaller than 1.0 then it is probably a difficult face and can be removed. The normalised dataset will be a valuable resource for future research of age estimation and the results might show that such dataset improves the estimated age, i.e. reducing the mean absolutely error (MAE). 

Transfer Learning for Handwritten Text Recognition of Old Swedish Text
Old Swedish handwritten text from the 17:th and 18:th century is harder to read than newer ones from the 20:th century. In this project the idea is to use a pipeline for document analysis and Handwritten text Recogntion (HTR) and do so called transfer learning to optimise it for such older Swedish texts. The Loghi pipeline: 
https://github.com/knaw-huc/loghi 
are one of the more attractive alternatives for this. The idea is to use a model such as for example older dutch texts, and then do transfer learning, using a collection of already transcribed older Swedish texts, which will be provided  in the project from the collaboration with the History Department. The first problem is to determine, which layout analysis approach works best, such as Yolo based or PyLaia. Then Loghi can be evaluated as such, comparing to HTRFlow based on the Swedish Lion model:
https://huggingface.co/Riksarkivet/trocr-base-handwritten-hist-swe-2
In the end we would like to know if Loghi can compete with HTRFlow, both when it comes to accuracy and speed.

Emotion Recognition in Video Recordings
Emotion recognition can be done by using individual photographs and is often Deep Learning based:
https://www.diva-portal.org/smash/get/diva2:1689228/FULLTEXT01.pdf
However, the new MediaPipe Face Landmark detector can also be used to describe emotions, and it also has output that describes more general details, such as where eyes are closed, eyebrows raised, mouth smiling etc (See BlendShape):
https://ai.google.dev/edge/mediapipe/solutions/vision/face_landmarker 
There are several interesting research questions, such as can the landmarks be used as emotion detector, just as well as the faces themselves? And can the BlendShapes be used? The landmarks are fever than the pixels, and the Blendshapes are fever than then landmarks. So if any of them can be used, the model will be easier and faster to train. Moreover, it is interesting to see if we can utilise the changes in expression in video recordings to understand underlying emotions expressed by a person.

Leave a comment