How I got my first paper — Part — 3
So, I was talking about how I got my first publication starting off by reading MS-Thesis of one of my seniors in the lab:
Following it with a bit of literature review:
Today, I will talk about the next steps we took which eventually lead us to a publication.
Considering PSNR, SSIM on 256 x 256 Dayton split of GT Cross View Dataset
First task decided was to test X-Seq from Cross View Image Synthesis using Conditional GANs and Cross View Image Generation using Geometry Guided Conditional GANs for a2g(aerial to ground) task because of the following reasons:
SSIM is better, PSNR is comparable, SD(Sharpness Difference) is better as compared to g2a(ground to aerial) task.
Sharpness Difference is basically inverse of grads. It measures the loss of sharpness during image generation.
Get middle layers to train our encoder
Now, the task was to get the middle layers from start to mid to train our encoder using pretrained weights of X-Seq.
Basically, we required the model from start to the point, before the first de-convolution layer.
After testing on a2g, next task was to test on g2a task. After testing both approaches, generate both a2g and g2a for a city in GT Cross View.
Given an a2g image, compute L2 b/w a2g and all g2a image features and check if top-1 is correct or not, similarly for top-5, top-10 for k queries.
After testing both approaches using X-Seq, decided the layer number 23 to take features from.
The features learned might be good enough to map b/w g2a and a2g.
It is to be kept in mind that the model is not trained for retrieval but just for generation.
We required a model to classify features as similar or not. We decided to label the similar and dissimilar pairs as 0/1.
We decided to build our own network for classification, for which we started reviewing residual blocks.
After going through several approaches, we decided to use siamese like architecture as follows:
Given two images, a classifier would tell either they are similar or not, if they are similar, other similar images will be retrieved from the database.
For the task, we required street and satellite view image pairs, for which we decided to collect our own dataset using the satellite-view images from the benchmark NWPU-RESISCS-45 dataset, while respective street-view images of each class are downloaded from Flickr image dataset using Flickr API.
I think, this was the base from which we derived our experiments for the paper: Cross-View Image Retrieval — Ground to Aerial Image Retrieval through Deep Learning
That’s it for now. I think I will soon write about my current work.