Thread Management Helpers

April 17, 2015

While reading about thread constructs, I’ve come across this advice more than once: “don’t try to write your own”.

I accept that recommendation. I don’t have any thread needs that can’t be satisfied by existing code. But, they need to be wrapped up to better fit what I’m trying to do.

Here are 2 examples

Start / Stop

There are a lot of objects in the app server with start/stop methods. These methods need to be thread safe, and need to defend against misuse. IE: if you call START twice, that’s useless. You can’t actually start it again.

Back when I was a boy, I was in the same situation, and I solved it by creating a base class that had start/stop methods. Those methods would handle the locking and prevent multiple calls. The subclasses implemented protected methods OnStart and OnStop.

That was functional, but these years later i’m not a fan. Of course there is a single inheritance which could be a problem. But, really, the object hierarchy isn’t in support of an IS relationship. It’s more convenience.

The solution for this is a class called StartStop. It is used here:

image

You pass the start and stop methods an Action. If it is appropriate to call the action, it will. Otherwise, it does not.

The innards of StartStop makes sure that you don’t start something that is already started, or stop something that is already stopped. Additionally, it only lets one thread do it. Other threads are kicked out as soon as possible.

Other variations could be implemented as needed. For example: perhaps you would prefer it to throw an InvalidOperationException? We have the technology.

Internally, it uses a LOCK statement (with quick exit), and a boolean to keep track of if it is currently started.

Single Thread

I’ve seen some really offensive uses of LOCK. More than once, if there was a theading issue, I’ve seen people put in a lock statement not realizing the potential impact.

The applications class manages a bunch of applications (hence the clever class name). Each start/stop has to be. thread safe. You can’t start and stop the TEST application at the same time, nor can you start it twice at the same time. If you just toss in a LOCK (THIS.STARTSTOPLOCK), then you’re blocking all operations for all apps rather than just the app you need.

So, instead, you have a lock object per application. Then you need to keep track of the lock objects per app. That stuff is all built in. But, do you want to code that 12 times?

I wrapped that up into a class called SingleThread. This is how it’s used in the Applications class.

image

You give it a key and an Action. If the key isn’t already known, then the action is executed. Otherwise, it gets out. The key is constructed from the application name.

The name of the method is ExecuteNoBlock. If it can’t execute, it just exits. (Not shown: it returns false if it doesn’t execute, true if it does execute.)

I’m using this class in several places now. It’s helpful.

Internally, nothing fancy: it’s using a concurrent dictionary. If it can add the key to the dictionary, then it executes. If it can’t, then it exits. Sweet. After the execute, the item is removed from the dictionary. The dictionary doesn’t have a value; just using a 0 byte. We really only care about existence or non-existence. (Perhaps there is a more appropriate concurrent collection?)

Here’s the entire SingleThread class:

image

Advertisements

SignalR Problems Resolved

April 17, 2015

When last I posted, I was having SignalR reconnection issues. It’s important that the nodes are able to recover from SignalR connectivity loss. SignalR will recover on it’s own for some time, but will eventually give up. The nodes can’t give up. I wrote a connector class that wraps the SignalR connection object, and keeps on trying to connect until it’s good.

The RECONNECT is initiated two ways:

  • If the client detects the disconnection and fires the Disconnected event.
  • If the bus publishes a message, and the publish fails because the connection is down.

The connector starts a task that stays in a loop until the connection is received. Because there are always oodles of threads running, it has to make sure that the loop is never started more than once. I introduced some thread wrappers to handle that, which I’ll cover in another post.

My issue last night was that the client was not reconnecting when the server came back up. I uncovered 2 issues while troubleshooting this

SignalRBusServer Bug – I recently posted an apology to my brain for the mental and emotional trauma I cast upon it while working through bus design issues. If you recall, the issue was more about elegance than functionality as it has been working forever. The key challenge there was elegance, not making it work… it’s been working forever. The LocalBus is stand alone, but you can inject a publisher. So, you inject the SignalR publisher. When a message is received from SignalR (IReceiver) it needs to be passed to the local bus. That 2 way relationship is the cause of much of the shenanigans. Attempt #715 at resolving this resulted in having the IBusServer create the local bus with the correct publisher, and also send messages to the bus it created. The problem I discovered while testing is that the bus was being created and destroyed as the server was starting and stopping. So, the app server fires up with one localbus, but if you restart the SignalRBusServer object, then there’s a new localbus that no one knows about. That is why the bus wasn’t reconnecting; the old bus was running the loop, and the new one wasn’t referenced by anything. I changed it to only destroy the localbus on dispose, not on stop.

Establishing the Connection – Microsoft APIs have changed a lot in recent years. WCF is ridiculously complicated. WebAPI and SignalR are much simpler, but they keeps changing especially as OWIN was introduced. They achieve the simplicity, in some cases, by having static global objects to be used by everything in the appdomain. I’m ok with that, as long as you don’t have to use them. And, to this point, that has been a challenge, because I’m using them. In the app server, things are associated to features. If a WebAPI controller is associated to a stopped feature, then that controller should not be reachable when the feature is stopped. WebAPI isn’t aware of any of that, though. It just finds all of the controllers in the app domain and wires them up. To counter that, I created a filter to return 404 for controllers that are hit but aren’t supposed to be up. Based on what I did tonight with SignalR, I’m hopeful that I’ll be able to modify the WebAPI code to just not load the controllers that it needs.

Anyway, back to the point: When i stopped/started the signalr server, clients couldn’t connect even though it started successfully. This is the relevant code:

image

The ugly parts are the usages of GlobalHost. That’s is the part that I don’t like. It turns out that was the cause of the problem, although I don’t know, in particular, why. All I know is that once I figured out how to stop using GlobalHost, the restart worked and clients reconnected.

Here’s the new code.

image

Much better. GlobalHost is gone. Instead of using it, it creates its own dependency resolver (which is the same type GlobalHost uses.)

I look forward to revisiting the webapi code.

Working Well

All of my current tests and expectations regarding startup, stopping, and synchronization are working.

  • start the hub
  • create a node
  • set the node’s model
  • start the node
  • change the model – the node updates
  • shut down the node
  • change the model
  • start the node – the node updates
  • stop the signalr server
  • wait for the node connection to timeout and enter the reconnect loop
  • restart the signalr server – the node reconnects
  • stop the signalr server
  • wait for the node connection to timeout and enter the reconnect loop
  • change the model
  • start the signalr server – the node reconnects, then updates

Unlike previous iterations, it’s not barely working… it’s really working. There’s nothing flaky or questionable about it. It’s solid. Of course, there’s more work that I would like to do. There are 2 pieces of code that I’d like to revisit, but those just a few lines in a couple of methods.


App Server: Console App, AppHostFeature, Synchronization, SignalR, Rest Client

April 16, 2015

I’ve been spending time testing and tightening things up.

Console Host: The console app is what I have been using, since day one, to develop and test with. It has 2 hard coded nodes and a hard coded hub. I use comments to control what I need or don’t need for whatever I’m doing.

I wrote a new console host app that’s much nicer. It also servers as an admin client.

You can create and manage as many nodes as you want through the command line interface. All of these nodes are hosted in the same process, which is useful for development and testing.

It can also be used to make http api calls to the hub (which is probably in the same process.)

As far as the program is concerned, everything you type is just a command. But, there are three types of commands:

* bus messages – the message will be sent on the bus

* local command – this is something that is coded for in the console app. it does things like clear the screen, start/stop nodes, start/stop hub, switch to a node of your choice. (IE: if you have 3 nodes in process, and you want to send a message, from which node do you want to do it? the SETNODE local command will give you a reference to the bus of the specified node

* http command – makes an http api request

Even though they are all commands, I don’t want them all jumbled together. I want it clear which commands do what sorts of things. So, when you type a command, you prefix it.

. – it is a local command

/ – it is an http command

nothing – it is a bus message

Here are some sample commands. This is the code I use to initialize my test server.

image

.starthub – creates and starts the hub in process

.createnode node1 – creates a node in process

/setmodel node1 Application_Server – make an http call that sets the node type.

.startnode node1 – starts the node – in process

/setmodel node1 Reporint_Server – make an http call that changes the node type. (This results in a a sync message being sent. node1 will stop, sync, start it’s applications.

I also have command to start/stop signalr, which is useful to make sure the clients handle disconnects and reconnects properly.

App Host Feature and Synchronization

I did a lot of work on synchronization a few months ago. I thought it was done. But, more work was needed. I reviewed it top to bottom, and made some changes.

The AppHostFeature was recently overhauled to make parts of it swappable. I never really finished that until now. Everything is starting/stopping gracefully.

Rest Client

It seems to be a popular opinion that, when doing rest, you don’t really need a client. Yes, I still see a lot of them. I created one for the app server. So far, it only does a couple things as needed by the console app. But, it will grow.

To keep the references minimal on this DLL, I’m not including the contracts. It builds up the JSON manually (via json.net) rather than through serialization.

I was surprised to see that HttpClient doesn’t support patch. You have to build it up manually. I’m using patch in one place: to change the model type or the group of a node.

In that case, Patch is appropriate, but it’s not always called correctly. If you PATCH a node that doesn’t exist, it defines the node. I’m pretty sure that’s bad REST, and I have a todo to address it.

SignalR

The SignalR server is working well, but I’m having a bit of an issue with the client. I can’t get it to reconnect reliably. I’ll work on that.


Dear Brain: Sorry about the Bus

April 6, 2015

 

Despite a lack of posts, I have been working on the app server quite a bit recently.

The “bus” is an async, loose-coupled messaging system that allows every node to talk to every other node, and every application talk to every other application. The idea is for it to be simple. I want it to be simple to use, and simple to code for extensibility. Therein has been the angst.

The bus has been working for at least 2 years. It is an essential part of this thing, and was the first thing built. But, it was built minimally. The only requirement at the time was that it worked so that I could build everything else on top of it.

But, it wasn’t pretty. Applications communicate to the node via remoting. Nodes communicate to the hub via SingnalR (by default), or RabbitMQ (provided), or anything else you want to provide.

This was accomplished by having a really ugly IBus interface that allowed for registering child buses, bus names, etc. The SignalR bus inherited from LocalBus and managed everything within. Separation of concerns: 0.

I have spent the last few weeks thinking about that and reworking it. I spent many hours just prototyping in notepad, trying to figure out how all of these things can work together and be pretty. I tried (and failed) to enlist a fresh perspective on it.

Once I thought I had it working the way I wanted it to in notepad, I started prototyping the real thing. And, of course, it didn’t work as planned. Pretty is good. Functional is better.

Another goal is for the bus to remain stand alone. It doesn’t have to be used by the app server. Once I got it working and pretty for stand alone, I then had to integrate it into the appserver, and that didn’t go very well either. If I applied a hammer to it, it would have worked, but that’s not what I wanted. So I kept refactoring until I got it to where it is now.

At last, it’s good. I’m not going to say it’s done. Functionally, there are still some things it needs. Architecturally, it has one more test to go through. Currently, applications are isolated per app domain. I’m working towards isolating by process. When I get to that, it will require to swap out a remoting (marshalbyref) component with a pipe component. I think that will go in properly from the bus perspective, but the app server may need more work. I have one “temporary” interface in there that is adapting the bus for how the app server registers app domains. When it comes to a pipe, it won’t register an appdomain, so that’s where it will break.

The bus is improved in the following ways:

  • It’s all JSON serialized now. Before, I had serialization/deserialization methods that would be  called only when a message had to cross app domains. That got sloppy.
  • A long known issue is that a message type may not exist in a remote location, so the message couldn’t be deserialized. That was never a problem because I have all of the contracts everywhere, but it wouldn’t fly for a custom user type that isn’t deployed everywhere. That’s all handled now, but with room for improvement. If something is unable to deserialize a particular type, it will keep trying every time it receives that type. I’m going to improve it to maintain a small list of types that it knows it can’t handle so that it doesn’t keep throwing exceptions.
  • The bus now handles object, as it always has, and the underlying message type which is BusMessage. You can subscribe to either.

The bus was built on objects. You subscribe to objects of a particular type, and you will receive all of those objects and all of it’s subclasses. Getting that to work side-by-side with generic bus messages was fun.

One key difference between generic and object messages: you can specify conditions on objects, which is a widely used feature. You can’t yet do that for generic messages. For objects, it uses FUNC<MESSAGE, BOOL>. For generic, all it could do right now is FUNC<STRING, BOOL>. Of course, that can improve, but that’s where it is for the moment.

This is the setup for one of stand-alone bus tests.image

It creates a signalr bus server, and a node. The node server has an AddAppDomain method. You pass it an appdomain, and it will instantiate a bus in that appdomain and hold on to a remote reference to it.

The test goes on to:

  • Create 2 appdomains
  • Add them to NServer
  • Send a message to each bus: hub, server, app1, app2
  • Confirm that every message is received by every bus: hub, server, app1, app2

This is a similar test. The difference is this uses the application server’s version.

image

ApplicationBusServer is application server specific. It keeps track of signalr connections, and publishes online/offline messages as nodes come and go.

ApplicationBusServer does not use the SignalRBusServer shown in the previous test. The intent was to resuse it, but it didn’t fit. It uses some of the same dependencies, but composes them differently.

At long last, the bus is reintegrated to the appserver, and the messages are flying. Furhtermore, the online/offline messages are now working properly, which is new. That never worked before.