Quick look at Text Detection in the Google Vision API

Last week I posted some observations based on using the Computer Vision API (part of the Azure Cognitive Services) in an attempt to read text content from complex images.  For these experiments I’ve chosen to use Pokémon cards as they contain a large variety of fonts and font sizes, and do not use a basic grid for displaying all text.

Today I’ve taken some time to do a comparative test with the same images using the text detection feature within Google Cloud Vision API (which at the time of writing is also still in preview).

API Comparison

The simplicity of creating a test client was on-par with the Azure service, again requiring only three lines of code once the client library and SDK were installed.

There are a few observable differences in the way Google and Azure services have been constructed;

  • servine-google-vision-api Azure Computer Vision API attempts to break all text down to regionslines, and words – while the Google Cloud Vision API only identifies individual words.
  • Azure Computer Vision API attempts to identify the orientation of the image based on the words detected
  • Azure Computer Vision API returns graphical boundaries as rectangles, while Google Cloud Vision API uses polygons.  This allows the Google service to more accurately deal with angled text, and mitigates the lack of orientation detection.

Using the Google Cloud Vision API, the orientation could be derived based on the co-ordinates provided.   Using the longest word returned one can generally assume that the longest side on the bounding polygon represents the horizontal axis of a properly orientated image.

The Google Cloud Vision API also makes an attempt to group all text into a single ‘description’ value containing an approximation of all the identified words in the correct order.  This works well for standard text where the topmost point of the bound polygons for each word align – however images where the polygons are bottom-aligned (due to different font-sizes) the description is in reverse order.

Text Detection Comparison

The results comparison results varied from card-to-card, but overall on comparison of the two services the Google Cloud Vision API appears slightly more accurate.

Note: Comparison was done using photographs of the cards, saved in .jpg format.  This method was selected as it realistically simulates how the text detection service would be used from a smartphone application.  High-resolution .png images (as opposed to photographs) resulted in slightly more accurate results from both services.

Clean Image Comparison

The following examples are an example of the results based on a ‘clean image’ – in this case Servine provided the best results in both APIs.

  • Both APIs did a good job of reading clear blocks of text, including content in a very small (but non-italic) font.
  • Neither service identified word ‘Ability’, presumably due to the font/style used.
  • Neither service correctly identified the small/italic text correctly (Evolves from Snivy)
  • Neither service identified the font change for the numeric ‘1’ in STAGE1
Original Image Azure Google
screen-shot-2017-02-21-at-11-11-23-am

STAGE1 Servine
Evolves from Snivy
servine-3

STAGEI Ser vine
Evolves from Snivy
screen-shot-2017-02-21-at-11-18-01-am

STAGE1 Servine
Evolves from Snivy

screen-shot-2017-02-21-at-11-22-09-am

NO. 496 Grass Snake Pokémon HT: 2’07” WT 35.5 lbs

screen-shot-2017-02-21-at-11-42-44-am

NO, 496 Grass Snake Pokémon HI: 2’07” WT: 35B lbs
screen-shot-2017-02-21-at-11-23-35-am

NO, 496 Grass Snake Pokémon HT 2 O7 WT 35.3 lbs
 screen-shot-2017-02-21-at-11-49-50-am

Ability Serpentine Strangle
When you play this Pokémon from your hand to evolve 1 of your Pokémon, you may flip a coin. If heads, your opponent’s Active Pokémon is now Paralyzed.
 screen-shot-2017-02-21-at-11-49-39-am

Ability Serpentine Strangle
When you play this Pokémon from your hand to evolve l of your Pokémon, you may flip a coin. If heads, your opponent’s Active Pokémon is now Paralyzed.
 screen-shot-2017-02-21-at-11-50-08-am

Ability Serpentine Strangle
When you play this Pokémon from your hand to evolve l of your Pokémon, you may flip a coin. If heads, your opponent’s Active Pokémon is now Paralyzed.

Lower Quality Images

Lower quality images resulted in much bigger differences between the Vision APIs provided by Azure and Google, with the Google Cloud Vision API providing much more accurate text recognition.

Original Image Azure Google
screen-shot-2017-02-21-at-1-19-11-pm

They photosynthesize by bathing their tails in sunlight.  When they are not feeling well, their tails droop.
screen-shot-2017-02-21-at-1-18-36-pm

 their tails in trot
screen-shot-2017-02-21-at-1-18-25-pm

rhey photosynthesize bybathinv their tails in sunlight, when they
are not feelinu well, their tails
droop.
 screen-shot-2017-02-21-at-1-29-43-pm

Basic Machamp-EX HP180
 screen-shot-2017-02-21-at-1-29-06-pm

Basic Machamp-EX HP180
 screen-shot-2017-02-21-at-1-29-11-pm

Basic Machamp-EX MP180
screen-shot-2017-02-21-at-1-32-05-pm

Pokémon-EX rule
When a Pokémon-EX has been Knocked Out, your opponent takes 2 Prize cards.
screen-shot-2017-02-21-at-1-32-53-pm

Pokémon-EX rule
Wheoa Pokémon•EX has your opponent takes 2 Prue
screen-shot-2017-02-21-at-1-33-01-pm

Pokémon-EX rule
When a Pokémon Exhas been Knocked out,
your opponent takes 2 Prize cards.
 screen-shot-2017-02-21-at-1-42-43-pm

If its coat becomes fully charged with electricity, its tail lights up. It fires hair that zaps on impact.
 screen-shot-2017-02-21-at-1-42-09-pm

l/ its coat becomes fully ch.itxed 9ith electricity, its tail lis’hts up. It fires hair that zaps ort impact.
 screen-shot-2017-02-21-at-1-42-00-pm

If its coat becomes fully charged with electricity, its tail lights up. lt fires hair that zaps on impact.

Conclusion

Although the Azure Computer Vision API does include more helpful features (such as identifying image orientation and pre-grouping regions and lines of text), the text detection exhibited by the Google Cloud Vision API was far more accurate when using lower-quality images.

Quick look at Computer Vision API (Azure Cognitive Services)

As both my children have been playing Pokemon TCG, an idea I’ve been toying with is an application to use image recognition and OCR to identify and catalogue cards as they are added to the collection.

With recent improvements cloud based machine learning services I’m was interested in whether this historically difficult task has become any simpler – so today I’ve taken some time to look into the OCR capabilities within the Computer Vision API, released as a preview as part of the Azure Cognitive Services family of APIs.

servine-1Using the API couldn’t be simpler (processing the image required only three lines of code) but my interest was more focussed around the quality of the results – which I must admit initially surprised me, however ultimately I found that the output depends largely on the clarity of each individual card.

The primary test image I was using was a photo of Servine. As you can see from the image on the right there is a lot of content within the card, and a huge amount of variety in terms of fonts and font sizes to detect.

As part of the service the Computer Vision API attempts to first split the card into logical text Regions, which are then broken down further to Lines and Words.  

Regions

clefairy-1servine-2The red borders overlaid upon the images to the right outline the regions that the Computer Vision API returned. Note that the way the algorithm has grouped the regions doesn’t necessarily align with the logical blocks of text, or obvious ‘human readable’ differences such as font size.

It’s also worth noting that the regions calculated are not necessarily consistent – observe that the regions identified for the Clefairy card are completely different to Servine. Other cards resulted in even larger differences, depending on picture quality and glossiness of the card.

Lines and Words

servine-3Each region is further broken down into lines and words – I’ve highlighted the identified lines using yellow in the image to the right. I was very impressed by the fact that even the text in the very small font was identified – though this was not the case on all cards I tested with.

The actual text recognition wasn’t quite as accurate – for example;

  • STAGE1 was identified as STAGEI
  • Servine was actually identified as two words (Ser vine).
  • Oddly the very small ‘Evolves from Snivy’ in the second line was read perfectly.

servine-4Likewise the main body text read very well – though it was interesting to note that while most fonts on the card were recognised the word ‘Ability’ was not included as part of the line, which I’m putting down to the slightly different font and background.

clefairy-2Other test cards did not process so well though. This example from the Clefairy card shows that while most of the description text was correctly identified, the final line wasn’t recognised at all.

Conclusion

Overall the character recognition is certainly impressive, despite being still in preview.  In a number of my test cases sufficient data was derived from the card to identify the basic characteristics of the Pokémon (name, HP, attack details etc).  This would be enough to populate a basic catalogue of data, though it’s still far from perfect.

Getting a good result relies not just on the quality of the photo, but also on the characteristics of the card itself. Low contrast between the text and background was problematic, as was any reflection from the card such as in the case of a holographic card, or where protective card sleeve were used.

There is still not enough consistency in the results to be able to build out a robust card catalogue application, but it’s still a fun subject matter to investigate a little further.  My next step (for another day) is to look at whether a series of images of the same card can be used to apply some error correction and derive more accurate results.

Update

In a more recent post I’ve taken a look at the comparison of the results for these images using the Google Cloud Vision API.  Take a look at my follow up post Quick look at Text Detection in the Google Vision API.

NuGet Package Restore error with TFS Build (Online)

The Issue

After several months of successful builds using the online TFS build server (part of Visual Studio Online) I encountered a new issue with NuGet Package Manager that had me frustrated for some time this week. The build worked great on my machine, but consistently failed on the build server.

This project references NuGet package(s) that are missing on this computer. Enable NuGet Package Restore to download them. For more information, see http://go.microsoft.com/fwlink/?LinkID=317567.

The error message contains a link (also available here) to more information about the root cause for this error – but the proposed work around for use with a hosted TFS / CI environment didn’t leave me with any obvious actions to fix this problem.

This solution doesn’t address build server / continuous integration (CI) scenarios. In order to successfully use package restore on the build server, you have two options:

  1. Check-in the .targets file.
  2. Explicitly run NuGet package restore prior to building your project/solution.

Unfortunately this article didn’t lead me to an obvious resolution;

  • Adding the .targets file to source control above looked easy enough, but after a quick search the application directories for an appropriate .targets file no obvious candidates were found. There were a number of .targets files within specific packages, however none seemed to immediately stand out as the one being referred to.
  • Some basic investigation into the option of forcing TFS to run the NuGet Restore also failed to lead to any way to execute the NuGet packages from within the hosted environment.

Resolution

It turned out that a number of folders within the NuGet Packages folder were not correctly in source control, including a number of Application Insights folders and the Microsoft.Bcl.Build package.

Adding all NuGet package folders to source control may have been overkill … but it has solved the problem. Admittedly this is simply a brute force approach to perform the action recommended by the MSDN article (several .targets files were checked in as part of this operation), but the upshot is that although the article does technically describe the action required it still took over a month worth of TFS Build credits to move past the error.

I’m still unclear on the exact change that caused the issue, but my suspicion is that this was caused by a corrupted project file as a result of a couple of TFS Rollback operations.

Modelling with Domain-Specific Languages

Introduction

With the entire industry moving as quickly as it has been for the past few years, it is not surprising that every once in a while you discover a technology or capability that has been around for a number of years yet has escaped your attention.

Domain-Specific Languages (specifically the implementation within Visual Studio) has been one of those cases for me, but over the past few weeks I’ve been spending my spare time playing with the models and considering its role in the production (primarily) of design documentation. I’m always on the look-out for ways to improve communication with members of the customer’s team and one thing is obviously true – it doesn’t matter whether they are managers, business users, or analysts – pretty pictures do a great job of communicating complex ideas simply.

I can almost hear you saying now ‘but Domain-Specific languages have been around for years (since VS2005 in fact) … so how is it that this is new to you!’ Well I was feeling pretty embarrassed about that, until I presented the concept internally to some of my colleagues at work and found that no-one else had investigated DSL either. Now I don’t feel so bad.

So for those readers who haven’t used DSL in the past, it’s probably safe to say that you’ve at least used the models produced using the toolkit. The DSL modelling SDK is responsible for many of the graphical designers within Visual Studio, such as;

  • The Visual Studio Architecture Tools
  • The Entity Framework designer
  • The BizTalk ESB Toolkit Itinerary designer
  • The Web Services Software Factory designer

Why I’m Excited

Generating a nice diagram is well and good – but the fact is Visio can do that. The power of the DSL SDK comes with the extensibility – some of the features that have me interested are;

  • Customized Rules and Validation. This is key to help new team members understand the model
  • Model Generated meta-data generation. A lot of data we deal with is meta-data that ultimately needs to be loaded into a database, or system of some form.
  • Model Generated design documentation. I’m quite excited about the idea of transforming (using the Visual Studio’s T4 templates) the model xml into Word XML or similar.
  • Distributable. As most models can be executed within Visual Studio Shell, they could easily be shared with business analysts and customer representatives who might not already have Visual Studio.

On top of this, model development is fast! Without writing any code at all it doesn’t take long to produce a model. Models could easily be built for short-lived project requirements.

Getting Started

As with any technology that has been around for seven years already, there is some good material on the internet for getting started, and I’m not trying to replicate that in this post.

A Basic Example

As an example, consider for a moment the process of defining document libraries in a SharePoint environment. This often involves defining a Content Type for the library, configuring the document library (possibly with additional library-specific columns), and possibly adding an approval workflow.

The Domain Class model below outlines three core types (note I’ve hidden some of the relationships from simplicity here);

  • Content Types, including;
    • Embedded relationship to Columns
  • Document Libraries, including;
    • Embedded relationship to Columns
    • Reference relationship to Content Type
    • Reference relationship to Workflow
  • Workflows, including;
    • Several Domain Properties defining attributes for workflow configuration

Each of these Domain Classes have been mapped to Component Shape and Connector elements to support the visualization of the model.

The resulting model allows users to graphically model the Content Types, Document Libraries and Workflows required for the solution, resulting in the following diagram. In this case I’ve shown Content Types in purple, Document Libraries in orange and Workflows in lime green. As you can see, this provides a high-impact top level view of the domain model, and is well suited to including in design documentation communicating the solution to customers, developers and maintenance teams alike.

While the design process is being conducted, it is often simpler to capture any additional details required for the solution being developed at the same time – this is where diagrams in Visio start to fall short a little. Since the DSL model is hosted within Visual Studio, we have full access to the property grid – so all custom Domain Properties can be captured simply.

To extend this example a little further, consider the possibilities of artifacts that could be generated from this model such as;

  • Automation of web service calls to implement the content types / libraries and workflows directly from the model
  • Generation of Word tables containing the properties stored against each Domain Class. This could include the detailed attributes from the property grid that are not shown in the visualization.

Programmatically finding a Stored Query Definition within TFS2010

Today I’m building a small utility and have a requirement to execute a stored work item query from within Team Foundation Server 2010. Previously I’d used the StoredQueries collection on the Project instance to find a query by name … however it would appear that this was depreciated as part of the TFS2010 release.

I was surprised to find that there doesn’t seem to be a simple replacement for finding a specified stored query by name – part of me thinks I must be missing a simpler solution, however if anyone else is encountering the same issue I hope that the following code snippet will be of use.

Required namespaces:

  • Microsoft.TeamFoundation.Client
  • Microsoft.TeamFoundation.WorkItemTracking.Client

/// <summary>
/// <para>Find the TFS QueryDefinition for a specified Team Project</para>
/// <para>Note that if multiple queries match the requested queryName
/// only the first will be used</para>
/// </summary>
/// <param name=”tfsUrl”>URL to the TFS project, including the
/// collection name (Eg, http://tfsserver:8080/tfs/DefaultCollection)</param>

/// <param name=”projectName”>Name of TFS Team Project</param>
/// <param name=”queryName”>Name of Stored Query. Note if multiple
/// exist the first found will be used</param>

/// <returns></returns>
public static QueryDefinition FindQueryItem(string tfsUrl, string projectName, string queryName)
{

// Setup the connection to TFS
TfsTeamProjectCollectionprojectCollection = TfsTeamProjectCollectionFactory.GetTeamProjectCollection(new Uri(tfsUrl));

WorkItemStore workItemStore = projectCollection.GetService<WorkItemStore>();

Project project = workItemStore.Projects[projectName];

return FindQueryItem(queryName, project.QueryHierarchy);

}

///<summary>
/// Recursively find the QueryDefinition based on the requested queryName.
///<para>Note that if multiple queries match the requested queryName
/// only the first will be used</para>
///</summary>
///<param name=”queryName”>Name of Stored Query. Note if multiple exist
/// the first found will be used</param>

///<param name=”currentNode”>Pointer to the current node in the recursive search</param>
///<returns>QueryDefinition</returns>
private static QueryDefinition FindQueryItem(string queryName, QueryItem currentNode)

{

// Attempt to cast to a QueryDefinition
QueryDefinitionqueryDefinition = currentNode as QueryDefinition;

// Check if we’ve found a match
if
(queryDefinition != null && queryDefinition.Name == queryName)

return queryDefinition;

// Attempt to cast the current node to a QueryFolder
QueryFolder
queryFolder = currentNode as QueryFolder;

// All further checks are for child nodes so if this is not an
// instance of QueryFolder then no further processing is required.

if
(queryFolder == null)

return null;

// Loop through all the child query item
foreach (QueryItem qi in queryFolder)
{

// Recursively call FindQueryItem
QueryDefinition
ret = FindQueryItem(queryName, qi);

// If a match is found no further checks are required
if
(ret != null)

return ret;

}

return null;

}

BizTalk 2006 and Versioning GAC’d Dependencies

Overview

Recently I found myself working on a BizTalk 2006 project that had been pre-configured to use a number of referenced assemblies that ultimately would be deployed to the GAC on the target BizTalk machine. Unfortunately I found that managing the versions and deployments of this configuration to be ‘problematic’ at best, so I’ve been giving some thought to how this might be done differently.

My basic requirements for this process are as follows:

  • The process must allow for assemblies built in a separate solution
  • The process must be easy for developers
  • All environments (including developers machines) must be installed the same way
  • Deployed assemblies must be versioned

This post describes how I went about achieving these goals – though I’m sure there must be other ways to do this as well.

Setting up the new Development / Deployment Process

Step 1 – Create a new Setup project

Create a new Visual Studio Installer project within your solution, and configure the following properties (along with any others you may feel you want to change).

Property Value
RemovePreviousVersions True
Version 1.0.0 (then increment for each build)

The real key for me here is the RemovePreviousVersions option. If this is set to “False” then the developers will need to manually uninstall the previous package – regardless of whether they are installing an upgrade.

By setting this to “True”, we are encouraging the development team to remember to increment the version of the setup package every upgrade. Changing the version of the setup package at this stage may also prompt the developer to increment the assembly version as well.

Step 2 – Add all project output to the GAC

In the “File System” setup, add a new target folder for the Global Assembly Cache. This is done by simply right-clicking on “File System on Target Machine”, then selecting “Add Special Folder”, then “Global Assembly Cache Folder”.

Now that the GAC folder is included in the installation folders list, add the “Primary Output” of your assemblies.

Step 3 – Add all project output to the Application Folder.

Technically this step is not really required – BizTalk only reads the assemblies from the GAC, so why bother putting them in the application folder as well …

There is a reason though – Visual Studio 2005 doesn’t allow you to reference a custom assembly directly from the GAC, so in order to allow the BizTalk Solution references to be upgraded each time a new copy of the assemblies is installed it is much easier to copy the output to the file system as well as the GAC.

Step 4 – Compile the Installation File, and install on your BizTalk environment

Not much to be said about this step really: compile the installation file and install onto the BizTalk development environment. I found I was doing this step fairly frequently through the day as incremental changes to the assemblies were made – so it pays to keep a network share to the installation folder open if possible.

Note that if the developer forgets to increment the version number of the installation package the installation will be blocked at this point. This is the desired result, as we always want the version incremented!

Step 5 – Add assembly references to your BizTalk solution

Open your BizTalk solution and add references to the new assemblies directly from the application installation folder (typically c:\program files\your company\your product).

Step 6 – Deploy to TEST and PROD environments the same way

Now that we have a consistent package for the assemblies don’t forget to maintain that consistency across all deployment environments … not just your dev box!

Resulting Development Process

After the changes above have been implemented, the development process for on-going changes.

Other Approaches?

I’d be quite keen to hear about how others have approached this problem. I’m unsure what is considered a pragmatic best practice in this space – and I can already hear the developers shouting about the extra steps introduced here! Is there a better way to do this while still meeting the requirements listed at the top of this post?

Adding Integration Tests to TFS Build Workflow

Overview

In my last post I described how to deploy web applications to a build integration server using Team Foundation Server 2010. The next logical step once the build is successfully deploying to the integration server is to trigger a set of integration tests to verify the deployment. In this post I will describe the changes to the Default Template build workflow to execute Integration Tests separately from the existing Unit Tests.

Unit Tests

It is important to consider at this stage why we would run integration unit tests, as opposed to the unit tests executed as part of the assembly build process.

Unit tests executed as part of the build are intended to verify the individual components are functioning correctly, and often would use mocked interfaces to ensure that only the specific functions being tested are executed. Unit tests are typically not reliant on deployed components and therefore can be run as soon as the assemblies have been built.

Integration tests on the other hand are intended to run against the fully deployed environment to ensure that the individual components successfully execute together. Integration tests therefore need to be executed after the application components have been deployed to an integration server. Failures in integration testing might indicate breaking changes such as database changes, missing data, or changed interfaces into other components of the system.

Note that running the deployment and integration tests adds to the duration required to execute a built. Rather than performing this action every time something in the solution changes it might be more pragmatic to have one Build Definition to build and run unit tests on a per-check-in basis, while another is configured for the full integration tests on a nightly basis.

Modify the Build Workflow

Workflow Sequence Overview

The integration tests have to run within the context of a build agent, to the activity needs to take place at the end of the Run On Agent activity, directly after the packages have been deployed to the build integration server within the Deploy Packages activity.

Changing variable scopes

Because we are going to borrow heavily from the existing “Run Tests” activity, but the execution will be outside the “Try Compile, Test, and Associate Changesets and Work Items” activity, we need to modify the scoping of the following variables. This is easiest done by editing the xaml directly in your favourite xml editor.

  • outputDirectory – copy from the “Compile and Test for Configuration” activity up a level to the “Run On Agent” activity.
  • treatTestFailureAsBuildFailure – copy from the try block of “Try Compile, Test, and Associate Changesets and Work Items” to the “Run On Agent” activity.

Add new Integration Tests workflow arguments

The parameters being added are as follows:

  • Integration Tests Disabled (Boolean). I’m not a fan of negative argument types (eg, Disabled, rather than Enabled), however have decided to keep this consistent with the existing Tests Disabled argument.
  • Integration Test Specs (TestSpecList).

The default value for the Integration Test Specs argument provides the defaults for filtering the unit tests to only the integration tests. Ideally I would have liked to be able to filter this to *test*.dll with a test category of Integration, however based on some rudimentary experimentation it appears that the Test Assembly Spec constructor can only set the assembly name filter. In the end I’ve used the following TestSpecList definition as the default value:

New Microsoft.TeamFoundation.Build.Workflow.Activities.TestSpecList(

New Microsoft.TeamFoundation.Build.Workflow.Activities.TestAssemblySpec

(“**\*test*.dll”))

Note: Don’t forget to change the Metadata property to ensure the new arguments are displayed in a suitable category in the Build Definition editor.

Add the Run Integration Tests Activity

Follow the following steps to add the new Run Integration Tests activity to the workflow

  1. Add a new foreach activity after the Deploy Packages activity, but still within the Run on Agent activity. This activity will be used to iterate through the project configurations defined in the build definition.
    <ForEach x:TypeArguments=mtbwa:PlatformConfiguration DisplayName=Run Integration Tests Values=[BuildSettings.PlatformConfigurations]>
    <ActivityAction x:TypeArguments=mtbwa:PlatformConfiguration>
    <ActivityAction.Argument>
    <DelegateInArgument x:TypeArguments=mtbwa:PlatformConfiguration Name=platformConfiguration />
    </ActivityAction.Argument>

    </ForEach>
  2. Create a copy of the existing activity titled “If Not Disable Tests” into the foreach statement created above
  3. Modify the copied workflow to use the added workflow arguments
    • Use Integration Tests Disabled instead of Disable Tests
    • Use Integration Test Specs instead of Test Specs

Configure the Build Definition

Configuring the filters for your integration tests is a matter for personal preference, though I’ve found the following approaches fairly simple;

  • Define all integration tests in a separate project and utilise the Test Assembly Filespec filter
  • Add a Test Category of Integration to each of the tests and use the Category Filter.
  • Configure a custom testsettings file to allow for accurately specifying the order tests should be executed

What’s Next?

Having the integration tests successfully executed is all fine and good however you will find that it is necessary to configure the endpoints in the app.config file of your unit tests project to always point to the integration server, which causes some inconvenience if you wish to run the same tests locally on a development environment.

In a future post, I will have a look at how to perform transformations on the app.config file as part of the deployment, similar to the way web.config is transformed as part of the deployment package creation.