Week 1 Summary

It has been a crazy busy week. I have completed all the work which Hannah, my supervisor, wanted me to complete this week. To recap this was the initial list:

Contact Dr Harry Strange and ask what he wanted from the project
Open a blog
Write a blog post with the title and what I plan to do.
Come up with a taxonomy of notes and write a blog post on this.
Find 3 papers about my dissertation
Write a blog post about these papers and give a short description on the “best” paper.

I did all those initial things requested by Hannah. I have also started to draft out the outline specification from some information I gathered from Harry. This will be taken to the meeting with Hannah to discuss and hopefully flesh out some requirements.

Another thing I need to consider this week is the methodology I wish to follow. I will be looking into Extreme Programming for a single person, but once some requirements have been identified I will have a better understanding as to whether it’s the correct methodology for this project.

In the mean time I will be collating a set of small rules for how the notes should be constructed, so that the notes have some similarity. An example rule could be images must have boxes around them. This will form the core part of the note analysis I will look to produce a small set of rules and design the system in such a way that new rules can be added at not much effort.

Additionally, I have started some spike work solutions looking into the Tesseract OCR and as I write this blog post it has managed to begin to train some of my handwriting. So far it seems to pick up quite a lot of letters from my handwriting but there are a lot of false positives detected.

There should also be honourable mentions to The tesseract wiki page with information about training. However I found, after I published the 3 papers, a master thesis by Brian M. Gonzalez Link to PDF which uses the Tesseract OCR tool to train on handwriting data to interpret code and produce an output. This thesis was extremely relevant and must be commended for the great section on how actually to train the handwriting data set. This will be an incredible useful resource to consider.