NovelEssay.com Programming Blog

Exploration of Big Data, Machine Learning, Natural Language Processing, and other fun problems.

Adding Text to Images with C# .Net Bitmap objects

This article will show you some examples of how to add text to images with C# .Net Bitmap objects. This works for jpg, png, bitmap, and other image format types.

Procedure Overview:

  1. Create a Bitmap object with your source image.
  2. Create a RectangleF object around your source image.
  3. Create a Graphics object using your source Bitmap object
  4. Set several configuration values on your Graphics object that make the text look better in most cases.
  5. Draw your text string to the rectangle with all of the specified settings.
  6. Flush the changes and save your final output.

Here's some example code that implements the above procedure:

// Load the original image. Can be jpg, png, bmp, etc...
Bitmap bmp = new Bitmap("myImage.jpg");
// Create a rectangle for the entire bitmap
RectangleF rectf = new RectangleF(0, 0, bmp.Width, bmp.Height);
// Create graphic object that will draw onto the bitmap
Graphics g = Graphics.FromImage(bmp);
// ------------------------------------------
// Ensure the best possible quality rendering
// ------------------------------------------
// The smoothing mode specifies whether lines, curves, and the edges of filled areas use smoothing (also called antialiasing). One exception is that path gradient brushes do not obey the smoothing mode. Areas filled using a PathGradientBrush are rendered the same way (aliased) regardless of the SmoothingMode property.
g.SmoothingMode = SmoothingMode.AntiAlias;
// The interpolation mode determines how intermediate values between two endpoints are calculated.
g.InterpolationMode = InterpolationMode.HighQualityBicubic;
// Use this property to specify either higher quality, slower rendering, or lower quality, faster rendering of the contents of this Graphics object.
g.PixelOffsetMode = PixelOffsetMode.HighQuality;
// This one is important
g.TextRenderingHint = TextRenderingHint.AntiAliasGridFit;
// Create string formatting options (used for alignment)
StringFormat format = new StringFormat()
{
    Alignment = StringAlignment.Center,
    LineAlignment = StringAlignment.Center
};
// Draw the text onto the image
g.DrawString("Visit StyleMyImage.com", new Font("Tahoma",8), Brushes.Black, rectf, format);
// Flush all graphics changes to the bitmap
g.Flush();
// Now save or use the bitmap
image.Image = bmp;

The following are common items you may want to customize: Fonts, Size, Color, Text Position, etc...


If you want to change your font type or font size, edit the values you set in this part of the code:

new Font("Tahoma",14)

If you want the text to be Yellow, change the 

Brushes.Black 

to 

Brushes.Yellow


If you want the text to be in the bottom right corner, change the Alignment values in the StringFormat object.

StringFormat format = new StringFormat()
{
Alignment = StringAlignment.Far,
LineAlignment = StringAlignment.Far
};


Finally, if you want to change the Text drawn on to the image, change the first argument passed to g.DrawString from Visit StyleMyImage.com to whatever you'd like it to say.


Tesseract 4.0 C# .Net Wrapper Released!

This article is about the Tesseract 4.0 C# .Net Wrapper that is only a few days old as of April 2017.


You are probably familiar with the Tesseract 3.04 C# .Net Wrapper found here:

https://github.com/charlesw/tesseract

That is already available as a Nuget package and has many downloads.


Just about a week ago, an Alpha release of the Tesseract 4.0 C# .Net wrapper was published here:

https://github.com/tdhintz/tesseract4win64

This is an x64 only .Net assembly. 


Find the Tesseract 4.0 language packs here:

https://github.com/tesseract-ocr/tessdata

When I load English only language pack, it uses a reasonable 180MB of RAM. I tried to load "all languages", and it was using over 8GB of RAM. 


This build is incredibly slow for debug mode. It runs 5-8X slower in debug mode than release mode, so watch out for that.


Amazingly, the .Net wrapper API works exactly the same as the Tesseract C# .Net 3.0 wrapper! (When you read about how the engine changed a huge amount and using LTSM networks, this will be more amazing to you.)


A very simple usage example works like this:

var tessEngine = new TesseractEngine(tessdataPath, "eng");
using (Page page = tessEngine .Process(myImage))
{
    string resultText = page.GetText();


Be sure to drop these two files in your \bin\debug or \bin\release folder at a x64 sub-folder like this::

.\bin\release\x64\libtesseract400.dll
.\bin\release\x64\liblept1741.dll

When the Tesseract.dll 4.0 assembly loads, it needs to find those DLLs else it will throw an exception in your application.


There is a very nice Accuracy and Performance overview report of 3.04 versus 4.0 here:

https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance

I agree with it's findings generally, but my own personal tests are not nearly as "improved" versus 3.04. I have a regression test that contains about 2200 pages, and I'm observing plenty of slower and less precise OCR results with Tesseract 4.0. It is certainly not all "better and faster" as of April 2017. Since this is an extremely new Alpha release, I have high hopes that it will improve over time.


Make a Slack WebHook plugin with C# .Net, Nancy, and Ngrok.

This blog article will walk you through making a C# webhook plugin for Slack. We'll be using Nancy to setup a small web service, and ngrok to expose our service publicly so Slack can call it.


First, make a new C# console application with Visual Studio, and install a few nuget packages. The interesting packages are Nancy, Nancy.Hosting.Self, and Slack.Webhooks.


Next, we need to create a NancyHost and start it up. You'll want your console application's main to look roughly like this:

        static void Main()
        {
            JsConfig.EmitLowercaseUnderscoreNames = true;
            JsConfig.IncludeNullValues = false;
            JsConfig.PropertyConvention = JsonPropertyConvention.Lenient;
            using (var host = new NancyHost(new Uri("http://localhost:1234")))
            {
                host.Start();
                Console.ReadLine();
            }
            return;
        }

When our application runs, it launches a service that listens on localhost:1234 for requests.


Of course, you'll need some using statements like this:

using Nancy;
using Nancy.Hosting.Self;
using Nancy.ModelBinding;
using Newtonsoft.Json;
using ServiceStack.Text;
using Slack.Webhooks;


Since our application will be listening on localhost:1234, we need to add request handlers. For our slack webhook plugin examlpe, we just need to handle a post. We'll create a WebhookModule class that inherrits from NancyModule, and has a Post handler like this code:

public class WebhookModule : NancyModule
{
public WebhookModule()
{
Post["/"] = _ =>
{
var model = this.Bind<HookMessage>();
var message = string.Empty;
SlackAttachment attachment = null;
message = string.Format("@{0} Hello", model.UserName);
if (!string.IsNullOrWhiteSpace(message))
{
SlackMessage sm = new SlackMessage { Text = message, Username = "MyChat.Bot.Greeting", IconEmoji = Emoji.Ghost };
if(attachment != null)
{
sm.Attachments = new List<SlackAttachment>();
sm.Attachments.Add(attachment);
}
return sm;
}
return null;
};
}
}

When Post is received, this will receive a HookMessage and respond with a "Hello User" message. That response will be received by Slack and should output the "Hello User" message to your slack chat channel.


Here are a few other classes you'll need that define the HookMessage and some other Nancy boiler plate configuration:

    public class HookMessage
    {
        public string Token { get; set; }
        public string TeamId { get; set; }
        public string ChannelId { get; set; }
        public string ChannelName { get; set; }
        public string UserId { get; set; }
        public string UserName { get; set; }
        public string Text { get; set; }
        public string TriggerWord { get; set; }
    }
    public class TitleCaseFieldNameConverter : IFieldNameConverter
    {
        public string Convert(string fieldName)
        {
            return fieldName.ToTitleCase();
        }
    }
    public class Bootstrapper : DefaultNancyBootstrapper
    {
        protected override void ApplicationStartup(Nancy.TinyIoc.TinyIoCContainer container, Nancy.Bootstrapper.IPipelines pipelines)
        {
            container.Register<IFieldNameConverter, TitleCaseFieldNameConverter>();
            base.ApplicationStartup(container, pipelines);
        }
    }


At this point, you should be able to run your C# console application and be listening for Post requests on localhost:1234. Next, follow my ngrok block article to setup ngrok to expose your localhost:1234 service to a public address:

http://blog.novelessay.com/post/make-local-mysql-instance-publicly-available-for-a-mvc-net-website-with-ngrok

When your ngrok is ready to run, you can make your service publicly available by starting the ngrok service like this:

ngrok.exe http 1234


Lastly, you'll need to go in to your Slack configuration, and setup an Outgoing Webhook. Look in the "Browse Apps" -> "Custom Integrations" -> "Outgoing WebHooks" section. You will probably discover it more tricky to find where it was than to actually configure it.



You need to configure which channels you want your webhook to interact with, set the ngrok address that your service is serving on, and provide a token that Slack will send. You should update your C# console application to check the token value, but that's not entirely necessary for this system to work.


Here's an example of how I have an Outgoing WebHook configured in Slack:



That's everything! Give your Slack Bot a try. 

Writing a Windows Service with TopShelf

This article will show the C# code to use the TopShelf framework to write a Windows Service.


First, use Visual Studio and create a new C# console application and install the TopShelf nuget package.


Next, change your Main function to look roughly like this:

    public class Program
    {
        public static void Main()
        {
            TownCrier tc = new TownCrier();
            tc.Start();
           
            HostFactory.Run(x =>                                 //1
            {
                x.Service<TownCrier>(s =>                        //2
                {
                    s.ConstructUsing(name => new TownCrier());     //3
                    s.WhenStarted(tc => tc.Start());              //4
                    s.WhenStopped(tc => tc.Stop());               //5
                });
                x.RunAsLocalSystem();                            //6
                x.SetDescription("NovelEssayAgent v" + Assembly.GetExecutingAssembly().GetName().Version);        //7
                x.SetDisplayName("NovelEssayAgent");                       //8
                x.SetServiceName("NovelEssayAgent");                       //9
                x.AfterInstall(() => NotificationHelper.DoAfterInstall());
                x.BeforeUninstall(() => NotificationHelper.DoBeforeUninstall());
            });                                                  //10
             
            return;
        }
    }

Now, you should be asking:

  1. What is the NotificationHelper?
  2. What is the TownCrier?

The NotificationHelper is a class that lets you handle before and after install events. I'm simple writing out some log messages on those events, but you might want to automate additional deployment tasks in those events.

    static public class NotificationHelper
    {
        static public void DoAfterInstall()
        {
            File.AppendAllText("c:\\mylog.txt", "NovelEssayAgent v" + VersionHelper.GetAssemblyVersion() + " was installed.", EventLogEntryType.Information);
        }
        static public void DoBeforeUninstall()
        {
            File.AppendAllText("c:\\mylog.txt", "NovelEssayAgent v" + VersionHelper.GetAssemblyVersion() + " was uninstalled.", EventLogEntryType.Information);
        }
    }


The TownCrier is a class that I setup a never ending timer that polls looking for work. It also handles the OnStart and OnStop service events, so I can log or do other tasks when those events happen.


The EssayMgr class encapsulates work being done. EssayMgr's RunLifeCycle function is called every hbSetting seconds. You can make that amount of time configurable in many different ways (database, config file, etc...).

    public class TownCrier
    {
        private static EssayMgr mMgr = new EssayMgr();
        readonly System.Timers.Timer _timer;
        /// <summary>
        /// The constructor sets up a timer that will call RunLifeCycle when it fires.
        /// </summary>
        public TownCrier()
        {
            int hbSetting = 1;
            _timer = new System.Timers.Timer(hbSetting * 1000) { AutoReset = true };
            _timer.Elapsed += (sender, eventArgs) => mMgr.RunLifeCycle();
        }
        public void Start()
        {
            File.AppendAllText("c:\\mylog.txt", "NovelEssayAgent v" + VersionHelper.GetAssemblyVersion() + " was started.", EventLogEntryType.Information);
            _timer.Start();
        }
        public void Stop()
        {
            _timer.Stop();
            mMgr.StopAll();
            File.AppendAllText("c:\\mylog.txt", "NovelEssayAgent v" + VersionHelper.GetAssemblyVersion() + " was stopped.", EventLogEntryType.Information);
        }
    }


Add some code to do some work in the EssayMgr.RunLifeCycle function, and that's you've created your very own C# Windows Service!


Your code should build in to an exe, and you can installer and start your service using topshelf's commands on your executable via the command line:

http://docs.topshelf-project.com/en/latest/overview/commandline.html




Text Extraction using C# .Net and Apache Tika


You want to using C# to extract text from documents and web pages. You want it to have high quality and be free. Try the .Net wrapper to the Apache Tika library!


Let's build a sample app and show the use case. First step, start a C# console application with Visual Studio. Use the Nuget package manager and install the TikaOnDotNet.TextExtractor packages.



Then, try this sample code. It shows an example of text extraction examples for a file, Url, and byte array sources.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using TikaOnDotNet.TextExtraction;

namespace TikaTest
{
    class Program
    {
        static void Main(string[] args)
        {

            TextExtractor textExtractor = new TextExtractor();

            // Fun Utf8 strings found here: http://www.columbia.edu/~fdc/utf8/
            string utf8InputString = @"It's a small village in eastern Lower Saxony. The ""oe"" in this case turns out to be the Lower Saxon ""lengthening e""(Dehnungs-e), which makes the previous vowel long (used in a number of Lower Saxon place names such as Soest and Itzehoe), not the ""e"" that indicates umlaut of the preceding vowel. Many thanks to the Óechtringen-Namenschreibungsuntersuchungskomitee (Alex Bochannek, Manfred Erren, Asmus Freytag, Christoph Päper, plus Werner Lemberg who serves as Óechtringen-Namenschreibungsuntersuchungskomiteerechtschreibungsprüfer) for their relentless pursuit of the facts in this case. Conclusion: the accent almost certainly does not belong on this (or any other native German) word, but neither can it be dismissed as dirt on the page. To add to the mystery, it has been reported that other copies of the same edition of the PLZB do not show the accent! UPDATE (March 2006): David Krings was intrigued enough by this report to contact the mayor of Ebstorf, of which Oechtringen is a borough, who responded:";
            // Convert string to byte array
            byte[] byteArrayInput = Encoding.UTF8.GetBytes(utf8InputString);
            // Text Extraction Example for Byte Array
            TextExtractionResult result = textExtractor.Extract(byteArrayInput);
            Console.WriteLine(result.Text);

            // Text Extraction Example for Uri:
            result = textExtractor.Extract(new Uri("http://blog.novelessay.com"));
            Console.WriteLine(result.Text);

            // Text Extraction Example for File
            result = textExtractor.Extract(@"c:\myPdf.pdf");
            Console.WriteLine(result.Text);

            // Note that result also has metadata collection and content type attributes
            //result.Metadata
            //result.ContentType
        }
    }
}

Notice that the TextExtractionResult has a Metadata collection and also a content type attribute. Here's an example of the metadata provided along with the extracted text. It contains many things including author, dates, keywords, title, and description.


      

I've been very pleased with Tika's quality and ability to handle many different file types. I hope you try it out and enjoy it too.