NovelEssay.com Programming Blog

Exploration of Big Data, Machine Learning, Natural Language Processing, and other fun problems.

C# Entity Framework implementing 'Contains' Query for MySQL

This article will show some tricks on how to use C# Entity Framework queries with 'Contains' predicates for a MySQL database.

The goal is to use C# queries like this against a MySQL database:

var result = dbContext.People.Where(p => p.Name.Contains("john"));

First, we need to create a Full-Text Search Interceptor. We'll start with the one found at the following link:

http://www.entityframework.info/Home/FullTextSearch

That one is specific to SQL Server, so we'll need to make some changes. After my changes, a version for MySQL is this:

static class LanguageExtensions
{
    public static bool In<T>(this T source, params T[] list)
    {
        return (list as IList<T>).Contains(source);
    }
}
public class FtsInterceptor : IDbCommandInterceptor
{
    private const string FullTextPrefix = "-FTSPREFIX-";
    public static string Fts(string search)
    {
        return string.Format("({0}{1})", FullTextPrefix, search);
    }
    public void NonQueryExecuting(DbCommand command, DbCommandInterceptionContext<int> interceptionContext)
    {
    }
    public void NonQueryExecuted(DbCommand command, DbCommandInterceptionContext<int> interceptionContext)
    {
    }
    public void ReaderExecuting(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext)
    {
        RewriteFullTextQuery(command);
    }
    public void ReaderExecuted(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext)
    {
    }
    public void ScalarExecuting(DbCommand command, DbCommandInterceptionContext<object> interceptionContext)
    {
        RewriteFullTextQuery(command);
    }
    public void ScalarExecuted(DbCommand command, DbCommandInterceptionContext<object> interceptionContext)
    {
    }
    public static void RewriteFullTextQuery(DbCommand cmd)
    {
        string text = cmd.CommandText;
        for (int i = 0; i < cmd.Parameters.Count; i++)
        {
            DbParameter parameter = cmd.Parameters[i];
            if (parameter.DbType.In(DbType.String, DbType.AnsiString, DbType.StringFixedLength, DbType.AnsiStringFixedLength))
            {
                if (parameter.Value == DBNull.Value)
                    continue;
                var value = (string)parameter.Value;
                if (value.IndexOf(FullTextPrefix) >= 0)
                {
                    parameter.Size = 4096;
                    parameter.DbType = DbType.AnsiStringFixedLength;
                    value = value.Replace(FullTextPrefix, "");
                    value = value.Substring(1, value.Length - 2);
                    parameter.Value = value;
                    cmd.CommandText = Regex.Replace(
                        text,
                        string.Format(@"`(\w*)`.`(\w*)`\s*LIKE\s*@{0}\s?", parameter.ParameterName),
                        string.Format(@"match(`$1`.`$2`) against(@{0})", parameter.ParameterName)
                    );

                    if (text == cmd.CommandText)
                        throw new Exception("FTS was not replaced on: " + text);
                    text = cmd.CommandText;
                }
            }
        }
    }
}

The most significant difference is in the Regex.Replace. For MySQL, we need to make our SQL use Full Text indices using the Match-Against combo like this:

select * from people WHERE MATCH (Name) against ('john')

Note: Be sure that your table has a Full Text index on the columns you are querying. You can always add a Full Text index to your table like this:

ALTER TABLE people ADD FULLTEXT INDEX people_name_FT_IDX (Name);


Next, we'll need to add our FtsInterceptor to our DbContext like this:

public partial class MyAppMySqlDbEntities : DbContext
{
    public MyAppMySqlDbEntities()
        : base("name=MyAppMySqlDbEntities")
    {
        DbInterception.Add(new FtsInterceptor());
        Configuration.UseDatabaseNullSemantics = true;

You'll also need a using statement to go with those DbInterception calls.

using System.Data.Entity.Infrastructure;
using System.Data.Entity.Infrastructure.Interception;

Finally, we're ready to make some queries:

var personNameFtsI = FtsInterceptor.Fts('john');
var query = myDbContext.People.Where(p => p.Name.Contains(personNameFtsI));
int count = query.Count();

I suggest you set a break point in the FtsInterceptor.RewriteFullTextQuery function, and when the query.Count line is executed that breakpoint should get hit. Then, you can step through that function and see how your SQL is changing to use the Match-Against combo that MySQL needs for a Full Text index search.


A special big thank you goes out to our friends at AlphaProspect.com for help figuring all of this out.


One final "gotcha" to watch out for is that MySQL seems to only like using one Full Text index per query, so if you do something like this:

var personNameFtsI = FtsInterceptor.Fts('john');
var personJobFtsI = FtsInterceptor.Fts('engineer');
var query = myDbContext.People.Where(p => p.Name.Contains(personNameFtsI) && p.JobTitle.Contains(personJobFtsI));
int count = query.Count();

The SQL result will look correct, and it should have two Match-Against clauses, but MySQL still won't use both Full Text indices. I suggest you capture your SQL from in the FtsInterceptor.RewriteFullTextQuery function, and then execute it in the MySQL workbench. Get the execution plan or "explain" result for your query. That'll show that MySQL is really only using one of the Full Text indices.


There's a way to rewrite the queries to make use of both Full Text indices, but I have no idea how to make that happen using C# Entity Framework and Linq. Here's a link about how MySQL can do it with temp tables:

http://stackoverflow.com/questions/42097593/mysql-slow-query-when-combining-two-very-fast-queries

If you solve this, I'd love to hear from you about how you did it. Thank you.


Must change item for configuring MySQL for Windows

I recently installed MySQL and the MySQL workbench on my Windows 10 PC. It's working great. I loaded up about 2 GB of data in two 3 tables with roughly 2M rows total. Most of my simple (single table by ID or Full Text Index) queries ran really fast, and I was happy.


Then, I started working on queries that join tables. They were slow - like 26 seconds slow for a two table join by integer ID using a FK (simple stuff).


First, I checked my indices. I added several indices just trying things and hoping it'd work. That did not solve the slowness.


I had to tinker in Workbench quite a bit imagining what could be wrong and how do I identify the source problem, which turned out to be good learning exercise.


Eventually, I saw (on the Workbench Server's dashboard) that InnoDB buffer usage went to 100% during my long running queries. I didn't really know what that meant, but I was out of "good" ideas to try so I was happy to have any leads.


I asked the internet sages (search engines) about this "InnoDB buffer usage at 100%", and I found some leads. It turns out that buffer size is set in "my.ini" (on Windows) or "my.cnf" (Not-windows). The default was 8MB, which is amazingly low for any modern computer. (MySQL on my phone would want a bigger value than that as a default.)


I found my.ini at this path on my Windows 10 PC:

C:\programdata\MySQL\MySQL Server 5.7\my.ini

Specifically, edit that file with notepad or notepad++. Look for this value:

innodb_buffer_pool_size=8M

I changed mine to 1G first, and now I'm using 2G. I don't even think that's a lot, but this is all on my dev-box so I don't want to give it all of my RAM.

innodb_buffer_pool_size=2G

After you change that my.ini value, be sure to restart your MySQL service. 

  1. Type services.msc in to the Run box on Windows to bring up your services manager. 
  2. Find the MySQL service, and right click it. 
  3. There should be a "restart" option, so click that.
After this change, my 26 second simple table join query went to sub-second time.

Make local MySQL or ElasticSearch instance publicly available for a MVC.Net website with NgRok

Background:

I have a GoDaddy hosting account for NovelEssay.com, and I want to use and provide a large data set (like a personal copy of Wikipedia). GoDaddy's MySQL database limits are 1GB, which is good - but not awesome. I could host MySQL at home on my personal computer, but how could my GoDaddy MVC.Net project access the data without a public address on my local instance?

(Note: I use ngrok, but they are certainly not paying me to write about it. Please read the security disclaimer at the bottom of this article. Exposing local resources to the internet is awesome and will have extra security concerns.)


Introducting: NgRok

If you are thinking something like "I want to expose my local service to a public address" then ngrok is a great (free) tool that I like using. My current usage is exposing my local MySQL database service to the internet, so my public services can use its data.


I kick off ngrok on my local machine like this:

ngrok.exe tcp 3306

That starts up a local process that gives me a public address and port that I can connect to using my MVC.Net projects:

After running ngrok.exe, you should see something like the above. This shows TCP traffic at the public address 0.tcp.ngrok.io on port 123456 (actual value obfuscated) is routed to my localhost:3306, which is exactly where my local MySQL is listening.

Setup:

Head over to ngrok.com and create a free account, and download their ngrok.exe. Your free account has a secret key associated with it. The first time you run ngrok.exe you'll need to set that key like this:

ngrok.exe authtoken <YOUR_AUTHTOKEN>

After that, you're all set to spin up ngrok for exposing local services to the internet.


MVC.Net Connection String:

In order for your mvc.net connection string to hit the ngrok pubilc address, you'll need to change the data source part of the connection string.

An example server and port for connecting to an exposed MySQL instance:

server=0.tcp.ngrok.io;Port=123456
An example data source for connecting to an exposed SQL Server:
data source=0.tcp.ngrok.io,55555

Notice the comma before the port number in the SQL Server connection string format.


In my experience, I only had to change data source or server/port. All of the other fields (user, password, timeouts, etc...) did not need any changes when switching to a ngrok publicly exposed database.


Other Uses:

I also use ngrok.com to expose my local ElasticSearch and some local IIS hosted websites for various (short term) demos and other purposes.


When you publish a website to local IIS, that will be serving over some port that you can configure. We'll use 5555 for this example. You can expose that local IIS hosted website to the internet like this:

ngrok.exe http 5555

Or, a local ElasticSearch like this (with usually runs on localhost:9200):

ngrok.exe http 9200

Warning: ElasticSearch doesn't have built in authentication, so wrapping that with an nginx locally is recommended. I do that with NovelEssay.com in that I'm hosting my website via GoDaddy and use an ElasticSearch in Azure with ngrok and nginx authentication.


Security Disclaimer:

Be careful with ngrok. Exposing your local ElasticSearch directly is not recommended, because ElasticSearch doesn't have authentication built in. You can use things like nginx to create a basic authentication wrapper around ElasticSearch, and then expose the nginx service to the internet via ngrok. The same is with any other local service. Ngrok is awesome, but be careful.

Creating an Excel report with C# using a MySql data source

This blog article will share how to create a multi-worksheet Excel workbook file using C# that fetches report data from a MySql data source.


First, create a Visual Studio C# project and install the EPPlus and MySql.Data Nuget packages.




Next, get yourself a MySqlConnection object connected to your MySql server. If you have a specific database, you'll want to add that to your connection string. My case didn't need to do that. Obviously, change your user, password, and server values to whatever makes sense for your environment. 

                string connstring = string.Format("Server={0}; UID={1}; password={2};default command timeout={3};", serverName, userName, Password, commandTimeout);
                MySqlConnection connection = new MySqlConnection(connstring);
                connection.Open();

Next, you set up an ExcelPackage object and create worksheets as the code below shows. Notice how the column headers are set and styled.

Then, I fetch my Sql query from a helper class and execute that using a MySqlCommand. The results of my query are fetched using the data reader. As it iterates through the data reader, the code sets the query result values in to specific Excel cell locations using the row and column index values.


                string eppOutput = @"C:\myEppOutput.xlsx";
                FileInfo fi = new FileInfo(eppOutput);
                using (ExcelPackage package = new ExcelPackage(fi))
                {
                    // I two queries that run. Results are put on separate worksheets.
                    for (int queryIdx = 1; queryIdx <= 2; queryIdx++)
                    {
                        string worksheetName = "Worksheet" + queryIdx;
                        ExcelWorksheet ws = package.Workbook.Worksheets.Add(worksheetName);
                        // Each page has this four column headers:
                        // Set worksheet's column headers
                        ws.Cells["A1"].Value = "ColumnA";
                        ws.Cells["B1"].Value = "ColumnB";
                        ws.Cells["C1"].Value = "ColumnC";
                        ws.Cells["D1"].Value = "ColumnD";
                        // Make column headers bold
                        ws.Cells["A1:D1"].Style.Font.Bold = true;
                        query = Utils.GetDatabaseQueryString(queryIdx);
                        cmd = new MySqlCommand(query, dbCon.Connection);
                        int row = 2;
                        int col = 1;
                        using (var dataReader = cmd.ExecuteReader())
                        {
                            while (dataReader.Read())
                            {
                                col = 1;
                                ws.SetValue(row, col++, dataReader.GetString(0));
// Column B and C are always integer values:
                                ws.SetValue(row, col++, int.Parse(dataReader.GetString(1)));
                                ws.SetValue(row, col++, int.Parse(dataReader.GetString(2)));
                                ws.SetValue(row, col++, dataReader.GetString(3));
                                row++;
                            }
                        }
                    }
                    package.Save();
                }


Note that the col index resets each pass through the dataReader.Read, and the row index resets each time I create a new worksheet.

Finally, we can call package.Save to write out my ExcelPackage object to a file specified in the argument string that we passed to its constructor.