Thread Management Helpers

April 17, 2015

While reading about thread constructs, I’ve come across this advice more than once: “don’t try to write your own”.

I accept that recommendation. I don’t have any thread needs that can’t be satisfied by existing code. But, they need to be wrapped up to better fit what I’m trying to do.

Here are 2 examples

Start / Stop

There are a lot of objects in the app server with start/stop methods. These methods need to be thread safe, and need to defend against misuse. IE: if you call START twice, that’s useless. You can’t actually start it again.

Back when I was a boy, I was in the same situation, and I solved it by creating a base class that had start/stop methods. Those methods would handle the locking and prevent multiple calls. The subclasses implemented protected methods OnStart and OnStop.

That was functional, but these years later i’m not a fan. Of course there is a single inheritance which could be a problem. But, really, the object hierarchy isn’t in support of an IS relationship. It’s more convenience.

The solution for this is a class called StartStop. It is used here:

image

You pass the start and stop methods an Action. If it is appropriate to call the action, it will. Otherwise, it does not.

The innards of StartStop makes sure that you don’t start something that is already started, or stop something that is already stopped. Additionally, it only lets one thread do it. Other threads are kicked out as soon as possible.

Other variations could be implemented as needed. For example: perhaps you would prefer it to throw an InvalidOperationException? We have the technology.

Internally, it uses a LOCK statement (with quick exit), and a boolean to keep track of if it is currently started.

Single Thread

I’ve seen some really offensive uses of LOCK. More than once, if there was a theading issue, I’ve seen people put in a lock statement not realizing the potential impact.

The applications class manages a bunch of applications (hence the clever class name). Each start/stop has to be. thread safe. You can’t start and stop the TEST application at the same time, nor can you start it twice at the same time. If you just toss in a LOCK (THIS.STARTSTOPLOCK), then you’re blocking all operations for all apps rather than just the app you need.

So, instead, you have a lock object per application. Then you need to keep track of the lock objects per app. That stuff is all built in. But, do you want to code that 12 times?

I wrapped that up into a class called SingleThread. This is how it’s used in the Applications class.

image

You give it a key and an Action. If the key isn’t already known, then the action is executed. Otherwise, it gets out. The key is constructed from the application name.

The name of the method is ExecuteNoBlock. If it can’t execute, it just exits. (Not shown: it returns false if it doesn’t execute, true if it does execute.)

I’m using this class in several places now. It’s helpful.

Internally, nothing fancy: it’s using a concurrent dictionary. If it can add the key to the dictionary, then it executes. If it can’t, then it exits. Sweet. After the execute, the item is removed from the dictionary. The dictionary doesn’t have a value; just using a 0 byte. We really only care about existence or non-existence. (Perhaps there is a more appropriate concurrent collection?)

Here’s the entire SingleThread class:

image

Advertisements

SignalR Problems Resolved

April 17, 2015

When last I posted, I was having SignalR reconnection issues. It’s important that the nodes are able to recover from SignalR connectivity loss. SignalR will recover on it’s own for some time, but will eventually give up. The nodes can’t give up. I wrote a connector class that wraps the SignalR connection object, and keeps on trying to connect until it’s good.

The RECONNECT is initiated two ways:

  • If the client detects the disconnection and fires the Disconnected event.
  • If the bus publishes a message, and the publish fails because the connection is down.

The connector starts a task that stays in a loop until the connection is received. Because there are always oodles of threads running, it has to make sure that the loop is never started more than once. I introduced some thread wrappers to handle that, which I’ll cover in another post.

My issue last night was that the client was not reconnecting when the server came back up. I uncovered 2 issues while troubleshooting this

SignalRBusServer Bug – I recently posted an apology to my brain for the mental and emotional trauma I cast upon it while working through bus design issues. If you recall, the issue was more about elegance than functionality as it has been working forever. The key challenge there was elegance, not making it work… it’s been working forever. The LocalBus is stand alone, but you can inject a publisher. So, you inject the SignalR publisher. When a message is received from SignalR (IReceiver) it needs to be passed to the local bus. That 2 way relationship is the cause of much of the shenanigans. Attempt #715 at resolving this resulted in having the IBusServer create the local bus with the correct publisher, and also send messages to the bus it created. The problem I discovered while testing is that the bus was being created and destroyed as the server was starting and stopping. So, the app server fires up with one localbus, but if you restart the SignalRBusServer object, then there’s a new localbus that no one knows about. That is why the bus wasn’t reconnecting; the old bus was running the loop, and the new one wasn’t referenced by anything. I changed it to only destroy the localbus on dispose, not on stop.

Establishing the Connection – Microsoft APIs have changed a lot in recent years. WCF is ridiculously complicated. WebAPI and SignalR are much simpler, but they keeps changing especially as OWIN was introduced. They achieve the simplicity, in some cases, by having static global objects to be used by everything in the appdomain. I’m ok with that, as long as you don’t have to use them. And, to this point, that has been a challenge, because I’m using them. In the app server, things are associated to features. If a WebAPI controller is associated to a stopped feature, then that controller should not be reachable when the feature is stopped. WebAPI isn’t aware of any of that, though. It just finds all of the controllers in the app domain and wires them up. To counter that, I created a filter to return 404 for controllers that are hit but aren’t supposed to be up. Based on what I did tonight with SignalR, I’m hopeful that I’ll be able to modify the WebAPI code to just not load the controllers that it needs.

Anyway, back to the point: When i stopped/started the signalr server, clients couldn’t connect even though it started successfully. This is the relevant code:

image

The ugly parts are the usages of GlobalHost. That’s is the part that I don’t like. It turns out that was the cause of the problem, although I don’t know, in particular, why. All I know is that once I figured out how to stop using GlobalHost, the restart worked and clients reconnected.

Here’s the new code.

image

Much better. GlobalHost is gone. Instead of using it, it creates its own dependency resolver (which is the same type GlobalHost uses.)

I look forward to revisiting the webapi code.

Working Well

All of my current tests and expectations regarding startup, stopping, and synchronization are working.

  • start the hub
  • create a node
  • set the node’s model
  • start the node
  • change the model – the node updates
  • shut down the node
  • change the model
  • start the node – the node updates
  • stop the signalr server
  • wait for the node connection to timeout and enter the reconnect loop
  • restart the signalr server – the node reconnects
  • stop the signalr server
  • wait for the node connection to timeout and enter the reconnect loop
  • change the model
  • start the signalr server – the node reconnects, then updates

Unlike previous iterations, it’s not barely working… it’s really working. There’s nothing flaky or questionable about it. It’s solid. Of course, there’s more work that I would like to do. There are 2 pieces of code that I’d like to revisit, but those just a few lines in a couple of methods.


App Server: Console App, AppHostFeature, Synchronization, SignalR, Rest Client

April 16, 2015

I’ve been spending time testing and tightening things up.

Console Host: The console app is what I have been using, since day one, to develop and test with. It has 2 hard coded nodes and a hard coded hub. I use comments to control what I need or don’t need for whatever I’m doing.

I wrote a new console host app that’s much nicer. It also servers as an admin client.

You can create and manage as many nodes as you want through the command line interface. All of these nodes are hosted in the same process, which is useful for development and testing.

It can also be used to make http api calls to the hub (which is probably in the same process.)

As far as the program is concerned, everything you type is just a command. But, there are three types of commands:

* bus messages – the message will be sent on the bus

* local command – this is something that is coded for in the console app. it does things like clear the screen, start/stop nodes, start/stop hub, switch to a node of your choice. (IE: if you have 3 nodes in process, and you want to send a message, from which node do you want to do it? the SETNODE local command will give you a reference to the bus of the specified node

* http command – makes an http api request

Even though they are all commands, I don’t want them all jumbled together. I want it clear which commands do what sorts of things. So, when you type a command, you prefix it.

. – it is a local command

/ – it is an http command

nothing – it is a bus message

Here are some sample commands. This is the code I use to initialize my test server.

image

.starthub – creates and starts the hub in process

.createnode node1 – creates a node in process

/setmodel node1 Application_Server – make an http call that sets the node type.

.startnode node1 – starts the node – in process

/setmodel node1 Reporint_Server – make an http call that changes the node type. (This results in a a sync message being sent. node1 will stop, sync, start it’s applications.

I also have command to start/stop signalr, which is useful to make sure the clients handle disconnects and reconnects properly.

App Host Feature and Synchronization

I did a lot of work on synchronization a few months ago. I thought it was done. But, more work was needed. I reviewed it top to bottom, and made some changes.

The AppHostFeature was recently overhauled to make parts of it swappable. I never really finished that until now. Everything is starting/stopping gracefully.

Rest Client

It seems to be a popular opinion that, when doing rest, you don’t really need a client. Yes, I still see a lot of them. I created one for the app server. So far, it only does a couple things as needed by the console app. But, it will grow.

To keep the references minimal on this DLL, I’m not including the contracts. It builds up the JSON manually (via json.net) rather than through serialization.

I was surprised to see that HttpClient doesn’t support patch. You have to build it up manually. I’m using patch in one place: to change the model type or the group of a node.

In that case, Patch is appropriate, but it’s not always called correctly. If you PATCH a node that doesn’t exist, it defines the node. I’m pretty sure that’s bad REST, and I have a todo to address it.

SignalR

The SignalR server is working well, but I’m having a bit of an issue with the client. I can’t get it to reconnect reliably. I’ll work on that.


Dear Brain: Sorry about the Bus

April 6, 2015

 

Despite a lack of posts, I have been working on the app server quite a bit recently.

The “bus” is an async, loose-coupled messaging system that allows every node to talk to every other node, and every application talk to every other application. The idea is for it to be simple. I want it to be simple to use, and simple to code for extensibility. Therein has been the angst.

The bus has been working for at least 2 years. It is an essential part of this thing, and was the first thing built. But, it was built minimally. The only requirement at the time was that it worked so that I could build everything else on top of it.

But, it wasn’t pretty. Applications communicate to the node via remoting. Nodes communicate to the hub via SingnalR (by default), or RabbitMQ (provided), or anything else you want to provide.

This was accomplished by having a really ugly IBus interface that allowed for registering child buses, bus names, etc. The SignalR bus inherited from LocalBus and managed everything within. Separation of concerns: 0.

I have spent the last few weeks thinking about that and reworking it. I spent many hours just prototyping in notepad, trying to figure out how all of these things can work together and be pretty. I tried (and failed) to enlist a fresh perspective on it.

Once I thought I had it working the way I wanted it to in notepad, I started prototyping the real thing. And, of course, it didn’t work as planned. Pretty is good. Functional is better.

Another goal is for the bus to remain stand alone. It doesn’t have to be used by the app server. Once I got it working and pretty for stand alone, I then had to integrate it into the appserver, and that didn’t go very well either. If I applied a hammer to it, it would have worked, but that’s not what I wanted. So I kept refactoring until I got it to where it is now.

At last, it’s good. I’m not going to say it’s done. Functionally, there are still some things it needs. Architecturally, it has one more test to go through. Currently, applications are isolated per app domain. I’m working towards isolating by process. When I get to that, it will require to swap out a remoting (marshalbyref) component with a pipe component. I think that will go in properly from the bus perspective, but the app server may need more work. I have one “temporary” interface in there that is adapting the bus for how the app server registers app domains. When it comes to a pipe, it won’t register an appdomain, so that’s where it will break.

The bus is improved in the following ways:

  • It’s all JSON serialized now. Before, I had serialization/deserialization methods that would be  called only when a message had to cross app domains. That got sloppy.
  • A long known issue is that a message type may not exist in a remote location, so the message couldn’t be deserialized. That was never a problem because I have all of the contracts everywhere, but it wouldn’t fly for a custom user type that isn’t deployed everywhere. That’s all handled now, but with room for improvement. If something is unable to deserialize a particular type, it will keep trying every time it receives that type. I’m going to improve it to maintain a small list of types that it knows it can’t handle so that it doesn’t keep throwing exceptions.
  • The bus now handles object, as it always has, and the underlying message type which is BusMessage. You can subscribe to either.

The bus was built on objects. You subscribe to objects of a particular type, and you will receive all of those objects and all of it’s subclasses. Getting that to work side-by-side with generic bus messages was fun.

One key difference between generic and object messages: you can specify conditions on objects, which is a widely used feature. You can’t yet do that for generic messages. For objects, it uses FUNC<MESSAGE, BOOL>. For generic, all it could do right now is FUNC<STRING, BOOL>. Of course, that can improve, but that’s where it is for the moment.

This is the setup for one of stand-alone bus tests.image

It creates a signalr bus server, and a node. The node server has an AddAppDomain method. You pass it an appdomain, and it will instantiate a bus in that appdomain and hold on to a remote reference to it.

The test goes on to:

  • Create 2 appdomains
  • Add them to NServer
  • Send a message to each bus: hub, server, app1, app2
  • Confirm that every message is received by every bus: hub, server, app1, app2

This is a similar test. The difference is this uses the application server’s version.

image

ApplicationBusServer is application server specific. It keeps track of signalr connections, and publishes online/offline messages as nodes come and go.

ApplicationBusServer does not use the SignalRBusServer shown in the previous test. The intent was to resuse it, but it didn’t fit. It uses some of the same dependencies, but composes them differently.

At long last, the bus is reintegrated to the appserver, and the messages are flying. Furhtermore, the online/offline messages are now working properly, which is new. That never worked before.


AppServer – Updates

February 18, 2015

It’s been a while… too busy at work.

Application Sync

When the application server starts, it connects to the hub and downloads all of the applications that it needs. This process was convoluted in the interest of efficiency. It downloaded each file only one time, and then distributed it to each application that needs it. This is all managed by a feature called the Application Host Feature (which, really, one of a few components that comprise the heart of the software.)

Unfortunately, there was a problem with the sync. It only runs when the feature is started. If you change the application, then the application won’t sync until the feature is restarted. Of course, I knew that, and planned to rectify it. But, the rectification was too ugly, so I ended up rewriting it.

The sync is now two steps: Configuration and applications.

Configuration – it downloads the configuration for all of it’s apps. It removes any started apps that no longer belong, and updates the configuration with any new applications.

Applications – when the application starts, it is synced. It downloads a list of files, determines if there are any changes, then updates/adds/deletes as needed. If the application is started, it only stops it when necessary.

This is less efficient, but more functional. It may end up downloading the same file multiple times. A quick change can eliminate that: just check the rest of the directory structure and see if the file is already there. If so, grab it a local copy. (All of the file operations are based on NAME and HASH, so you won’t get the wrong version of the file.)

Visual Studio 2015 / NuGet / Cors

I started using Visual Studio 2015… It’s been a bit troublesome with both locking files and GIT, but pushing through. I love a lot of the c# changes.

NUGET either doesn’t work the way I like, or I just need to get used to it. After attempting to update the nuget packages, CORS seemed to stop working. The javascript calls kept receiving INTERNAL SERVER ERROR 500, which wasn’t happening in the server. It kept reporting that the ORIGIN header wasn’t coming back, which I confirmed in FIDDLER.

I spent some time chasing my tail, and never found the cause of the problem. The fix was to go through every nuget package, one at a time, and make sure it was up to date individually. Working through the list is really slow… it shows a few, then hangs for a while as it loads the next few. Then, each time you update a package, you have to start at the top and work through the list again. I wasn’t able to track down the specific error, but I believe it was due to out of sync web api DLLs.

Next

Context Logging

The next thing I want to work on is context logging. As things do work, they publish messages. “starting application”, “syncing”, “downloading files”, etc. It’s a hierarchy of activities. It works very well, but it clutters the code. I’m trying to find a more elegant way to do it, but have been unsuccessful so far. I will at least review it again, and maybe just have to settle.

Process Isolation

I’ve been talking about this for a while, but still haven’t started on it yet. Rather than putting each application in it’s own appdomain, I want the option of having them run as their own process.

Server Syncing

Currently, applications sync as they start. They will always have the latest version. (You can also publish a message telling it to resync any time you want.)

But, the server itself does not. It gets deployed, and that’s it. DEVDASH, which was one of the projects that lead to this one, would update itself and it’s applications. I need to get that functionality in here. Then, if a core assembly gets updated, all servers will download it and restart.


My Other Blog

January 28, 2015

 

Please also see http://jayallard.wordpress.com, which has non-code related posts. It’s all foolishness.


A little bit of a setback, but back on track

December 19, 2014

I haven’t had much time to work on it lately, and I’ve been a bit under the weather the last few days. I’ve hardly done anything.

There have been a few minor setbacks, mostly of my own doing. One was not, though.

To make a long story short, something changed in Visual Studio 2013 Update 4 that results in files being locked even though they are no longer in use and their app domain has been unloaded. When the app server synchronizes, it downloads new applications and deletes the ones it no longer needs. The delete is failing because visual studio is locking those files. It works fine without the debugger, but it’s a pain when working on install/synchronization issues. It took a while for me to determine the problem was the debugger. I installed Visual Studio 2015 Ultimate Preview, and it has the same issue.

In other news, I’ve been working on the installer quite a bit, just trying to make it better and cleaner. most of the code in this thing is what I call a “shallow pass”, which is just enough to get it working without being complete junk. The installer is now pretty clean. I want to refactor it one more time because there’s too much in the single class… there are 4 or 5 steps in the install, and they should be separate classes. But, for what it is, it’s pretty solid. But every time I installed to remote machine, it was taking forever to start, and was using as much CPU as it could. It turns out that was because of the RabbitMQ code that I quickly wrote was written to quickly.

The RabbitMQ api allows you to DEQUEUE (with or without a timeout), or DEQUEUEWITHNOWAIT, which returns immediately if there aren’t any messages. Despite it being a shallow pass, there is a throttler to limit the number of channels per application. The throttler iterates the channels on x threads looking for messages. So, maybe there are 10 subscribers, but they are all handled by 2 threads. It uses DEQUEUEWITHNOWAIT to get the message if there is one, and if not it moves on to the next subscriber. The problem is that loop was too tight. I changed it to use DEQUEUE with a 500millisecond wait. That solved it.

Lastly, there is a bus problem which I have fixed, but it needs to be addressed. The app server defaults to the built in SignalR bus. If you call UseRabbitMQBus, it swaps it out with the RabbitMQBus version. The problem is that the the signalr bus was already instantiated and is running. It can’t connect to the server, because there isn’t work, but it’s trying. The quick fix was to add a Start method to the bus so that it doesn’t start working until started. Then, there’s a SignalR bus, instantiated but dormant, which is replaced with the RabbitMQ one, instantiated but dormant. When the server starts, it starts the bus. This isn’t very clean. I believe the cleaner approach will be for the setup code to register a bus factory rather than a bus, and then the server can use the factory to get the instance (then register it directly). I haven’t looked into it yet, but that’s my first impression.

Installer Enhancement

I already mentioned that I did a lot of work on the installer. One new piece of functionality is that it publishes it’s status as it goes. Now the installer page in the admin portal receives a stream of events related to the install. Then, when it’s complete, it gives you the option to jump right to that server’s admin page. (Previously, it redirected automatically. Now that there’s feedback, you can peruse it before moving on.)

What’s Next

I’m not sure. I think I’m going to work on the Application Host Feature. I made a mistake in how the synchronization works… when the feature starts, it synchronizes all applications. This is efficient because shared files are only downloaded once. But, really, it should synchronize the applications as they start incase it has changed since the last time it started. I see this issue when I make configuration changes to the server. Restarting the app doesn’t recognize the changes. The server has to be restarted, and that’s sub par.

Also, I would like to add the option to have process isolation rather than app domain isolation. I have been laying the ground work for this from the beginning. I’m must not sure what technology to use to have the processes talk to each other. My first instinct is named pipes via WCF, but WCF feels like a dirty word these days. SignalR is an option, but I’d be using it for P2P, which isn’t what it’s for. There’s always HTTP, but then that’s public. Name pipes seems like the answer, but I’ll have to look into it before deciding.

Conclusion

The setback have been dealt with in some fashion. Everything is back to working as it should.

And, as always, PLEASE HELP ME!!!