The Unorthodox Engineers

Monday, January 7, 2013

Now we're cooking

Man, I hate doing that. Hacking up a last-minute feature and pushing it out to the live site with nowhere near enough testing. There needs to be a damn good reason to do that.

Sometime in the last week or so the NSW Rural Fire Service added a new field to their data feed. It's a field called "Instruction" that contains rather useful text like "There is a heightened level of threat. Conditions are changing and you need to start taking action now to protect you and your family."

I swear that field wasn't there before. Looks like I'm not the only one still coding hard and fast for fire season.

It's useful and important information. My framework is built to be adaptive in the face of changing data. It's taken longer to blog about than implement the code. But I still hate pushing features out that fast. It's way too easy to find yourself in a bugfix spiral or take down the site entirely.

I think I made the right decision.

Now if only the Vic fire service would add a similar field... Or the NT Fire service would provide a feed at all.

Thursday, December 20, 2012

Just a regular expression

It's taken me a little time to really get my head around the Javascript regular expressions. You have to be pretty on the ball when you're writing a parser.

In fact I managed to double the decoder speed with a couple of deft new functions, by thinking a little more carefully about the Javascript RegEx objects and their peculiars.

At first, Javascript's RegEx object seem particularly bad for the task of generalized parsing, because parsing generally requires checking the next token and the Javascript function calls seem to do either first or global searches, but not from a given index.

Then they seem like a great idea when you discover the 'sticky' option in the MDN documentation which distinctly allows you to do just this. Then they seem like a terrible idea again when you discover that only Firefox implements them.

Most people get disheartened here and go back to parsing the string themselves character by character. But regular expressions execute as compiled, highly optimized code. You're never going to beat that with 'case' statements.

So, back to the code mines we go, and some more digging turns up the "lastIndex" property on the RegEx objects, which is one of the more badly named javascript methods. (and there are a lot, trust me) It has some quite unexpected behavior, and it's got at least two purposes, but this property in combination with the standard .exec() call, turns out to be exactly what we need.

exec() seems inappropriate (or at least very inefficient), because a LALR parser wants to check if the next chunk of string is the token it wants, and standard exec() with a global option can't be stopped from zooming away into the string, looking for it anywhere. Right to the end if need be.

But that's OK, because if we let it, and then remember where (or if ) it found a match, then we don't have to ask again until we get past that point. It's still very inefficient at the point we first call it, but it gives us enough information to not have to call it again for a while.

It's obvious when you think about the situation when you get near the end of the file. By then, many of the match expressions will have worked out they never appear again. They can just hang up their coats and say 'nope' if they're asked if there's anything left to do. It's called having a functional memory.

Yes yes, standard RegEx optimization tricks. Nothing new. That's not the point.

Once I put the optimization in the code doubled in speed. (whoo!) But only on Safari (iOS) and FireFox (windows). Chrome (windows) and IE (windows) continued running at exactly the same speed as before.

Now that's interesting.

What it suggests is that the Javascript engines in Chrome and IE had already implemented the identical optimization, but at a language interpreter level. I assume they detect the exact same RegEx object being applied to the exact same string object and they just re-start the matching from internally stored state.

But Safari and Firefox clearly haven't implemented this "same regex, same string" optimization, so when I explicitly do it myself it saves an enormous amount of time.

Here's the relevant bit of code. Don't worry about the lambdas.

function decode_rex(type, empty, rex) {
rex_registry.push(rex);
return function(s,i) {
var r = {ok:empty?true:false, pos:i, len:0, type:type};
// are we past the point where we need to look for a new match?
if(rex.lastIndex<=i) {
// match the expression from here
rex.lastIndex = i;
var m = rex.exec(s);
if(m instanceof Array) {
// found it
rex.matchIndex = m.index;
rex.matchLength = m[0].length;
rex.lastIndex = rex.matchIndex + 1; // safe consumer
} else {
// no more
rex.lastIndex = s.length;
rex.matchIndex = -1;
rex.matchLength = 0;
}
}
// is the next match supposed to be now?
if(rex.matchIndex===i) {
// consume the match
r.ok = true; r.len = rex.matchLength;
}
return r;
}
}

Finite Loop

I was going to post more details about JSON-U yesterday, but right about then the entire concept was in pieces around my feet after discovering a few things about how the colon and semicolon are handled in reality. Don't ask.

But that's OK, because I reworked the grammar and managed to remove both special characters, two more, and added new capabilities besides. To the point of making URL's nearly Turing-complete. Ignore that for the moment.

There are a few last things I'm trying to figure out. I've got my encoding to reliably survive most kinds of common mangling, and it really takes some determined effort to make the parser fail, but there's always the lure of handling one more case now so you never have to worry about it again.

Oddly, by being so free to rip things out of the parser, I'm discovering was of putting them back but using only the things remaining. For example, I had an array constructor syntax, and a function call syntax. Then I removed arrays as an explicit data, which is unhead of, but I kept the named function space (now called 'locallizers") and defined the "unnamed function" to "return an array consisting of the parameters passed to the function".

Boom, explicit arrays are gone. Replaced with a "pseudo-call" to a function that creates them. So functions are more primal than arrays, in a data coding. And since the functions use the round brackets, we save two characters from ever being used.

I've gone round and round, pulling things out and replacing them again. (quoted strings are back, but only to read in user-hacked values, never as output.) and it's like an infinite Rubik's cube where a couple more twists leaves a scrambled mess, but then a few more moves and everything is harmonious and better matched than ever.

I'm down to the point where I have a test page that generates hundreds of random data structures, encodes them all, checks they go backwards again properly, and offers up a list for clicking. I can type random crap into the parse-as-you-type text boxes, and JSON-U tends to cope with this better (spends more time in a 'parsed' state that correctly represents the data) than JSON does.

Along the road I've made some funny choices. But they hopefully combine their powers in a good way. For example, here's one consequence:

If you repetitively 'unescape' canonical JSON-U, you get the same string.
if you encodeURI() canonical JSON-U, you get the same string.

That's actually a major ability, and can't be said for any encoding that has a backslash, percent sign, or space in it. Ever.

("Canonical" just means "the one true blessed way, not the way we do it when we're in a rush.")

The single remaining annoying issue turns up when someone uses encodeURIComponent() on something that ends up in the hash fragment. From what I can tell of the standard, the fragment requires only the equivalent of encodeURI() and all possible component characters, including the hash, are allowed from that point on.

Therefore, doing a blanket encodeURIComponent() or escape() on anything destined for the hash fragment is de facto wrong. But that won't stop people from doing it, because who really knows the standards that well? How far do I go to accept the incorrect behavior? I think the answer is actually "not at all". But then, it might be easy to make it work with just a few more twists of the cube.

At the moment my encoding survives most mangling, Can I make it survive all? Perhaps.

Why do I care so much? Because shortly I'll be handing out URLs in the new format that I'll be expected my server to honor for months. Years even. While the code goes through development cycles, and I'm sure I'll want to change the parameters every few damn months. I need something like JSON-U up and running before I even hand out my first (versioned, metadata enhanced, expiry timed) links.

I have to be able to accurately predict all possible futures for the data I'll want to transmit and consume. As they say, prediction is very hard, especially about the future.

Fortunately, I am a computer scientist. And predictions of those kinds are one of my superpowers.

Sunday, December 16, 2012

A digression to the heart of the matter

I "invented" a new micro-language over the weekend. I put the word in air quotes, because I was actually trying very hard not to do any such thing, but was forced by dreadful necessity to do some inventing anyway.

Why the odd programmer self-loathing? Because there are already too many microformats in use, and I'm not sure that adding "yet another unifying standard" to the mix will help. But I need it.

Here's the problem: URLs. That's it in a nutshell. Bloody URLs.

There is a spec for them, but hardly anyone reads it. Even when they have, they usually just chop URL strings up with Regular Expressions intended to get the bit they want, and fail utterly if given anything that doesn't start with http:// or https://. So it's a minefield.

The latest crazy thing to do is to make extensive use of what they call the 'hash fragment'; everything that comes after the '#' sign in the URL, which is a special fragment for two reasons:

It is never sent to the server during page requests, it's only available to the client browser and scripts.
Clicking on a page link that differs only in it's hash fragment from the current page does not reload the page, and in fact does nothing at all (except notify your scripts) if the fragment id doesn't match up with a real element id.

Remember, making the browser focus on an element half-way down the page was the original intention of the hash fragment, so all browsers support it and are 'tolerant' of this abuse. (ie: they don't throw errors or 'correct' the URL because a weird fragment ID doesn't actually exist in the page) We are just extending that metaphor, which just happens to work consistently on almost every browser ever made.

A classic modern use of the hash fragment is to put database article identifiers in it that correspond to AJAX calls to obtain that 'page fragment'. When your script detects the user has clicked on a new 'fragment link', the corresponding article is loaded. Sites like Twitter and Facebook have reached the point where there is only one 'page' for the entire site, and everything you see is dynamically loaded into that container.

A consequence has been such AJAX-driven sites were difficult for search engines (ie: Google) to index properly as none of the 'content' was on any real pages anymore. So they came up with a scheme: When the search engine (which acts like a big automatic browser) sees URLs with hash fragments, why not call a 'special' url with that fragment as a normal server GET parameter (basically, juggle the URL around a bit to put the client part in the server part) and then the site can tell the search engine all about the text and paragraphs that the 'hash fragment' corresponds to.

The search engine can even publish those links, and since they still contain the article id, your site will load up that content and the user will see what they expect!

So long as everyone does their jobs right.

Just to be sure that they don't accidentally index sites which can't cope with this behavior Google (and therefore all search engines) use the presence of the "!" symbol at the start of the fragment to indicate "this is an indexable hash link" Since the 'bang' character must be the first one following the 'hash', it is informally referred to as a "hashbang". (For the sound made by primitive parsers mangling your parameter strings just before they crash)

Why does this matter? Well, let's say I have a map application (hey I do!) and I want to record the co-ordinates of their window location into the URL in such a way that if they copy the link and sent it to someone, or just bookmark it, then going back to that link reloads the map to that same geographic location.

These cannot be ordinary server GET parameters, because we can't change the browsers URL to that without causing a page reload. It has to stay in the hash fragment, which means the client has to do all the work decoding it's own parameters.

In fact, wouldn't it be nice if we could just append an entire serialized Javascript object up there in the URL? And then deserialize it again with the same ease as JSON? Hey, why don't you just turn Javascript objects into JSON strings, then URLEncode them? Well:

The standard javascript encode/escape routines don't quite do all the edge-case characters right.
JSON transforms really badly when URLEncoded. As in, every bracket and apostrophe turning into three or six characters bad.
Let's not even get into the unicode/utf8 conversion issues.
The result is usually an unreadable mess.
A lot of people do exactly that anyway.

Well, if the standard JSON format encodes badly because of its syntax character choices, why not just make some different choices that are URL-friendly? And then deal with the other character encoding issues?

...oh, and it would be nice if the data didn't change format when in the hash fragment compared to when it gets used as a server parameter, since different characters are allowed...

... and it would be nice if we could save some bytes by leaving out some of the JSON 'syntactic sugar' (like quoted identifiers) that aren't totally necessary...

... and if there were a more efficient way of encoding binary strings than 'percent' escaping, that would be good....

... and it would be nice if the user (ie, the developer) could still hand-edit some parameters, but then hide them away without affecting anything else...

That's pretty much what I have done. I'm calling the result JSON-U.

So, that's the why. Next time, the how.

Tuesday, December 11, 2012

Australian Fire Maps released!

It's fire season in Australia. My state in particular, Queensland, tends to catch fire quite a lot. I know this because I recently worked for the state government and built them a bushfire tracking system.

One almost as good as the system I built for you.

(screenshot. click to view live map)

This app integrates data from the following sources:

Queensland Fire and Rescue Service
New South Wales Rural Fire Service
Australian Capital Territory Emergency Services Authority
Victorian Country Fire Authority
Tasmanian Fire Authority
South Australia Country Fire Service
West Australia Department of Fire & Emergency Services
Sentinel Hotspots: Satellite thermal imaging from daily passes by MODIS

You will note that the Northern Territory is absent from this list. There are reasons.

Saturday, July 30, 2011

PayPal IPN and App Engine

Well, the experiment seems to be a success, Igor. The IPN endpoint has been in production and functioning for a couple of weeks now. Messages are coming from PayPal, hitting the distributor, and reliably being passed on to the IPN Endpoints. Not a single missed message. We have two of them running now, one for each subscription system.

We've had more trouble with the endpoints being unable to handle the extra streams, frankly. (Don't cross the streams, even though you can) The IPN protocol is whatever the opposite of "stateful" is, so they already had enough trouble trying to figure out what each incoming message was for, and what to do with it. Although the more specific the trigger (like 'Affiliate Program' software that only cares about the sign-up, not the subscription process) the better it works. Fortunately if things go weird, we just turn off the spigot to that IPN Endpoint, or apply some filters.

The only thing close to a severe 'bug' I encountered was not paying enough attention to the character encoding. Here are the two crucial points:

PayPal sends IPN messages in charset "windows-1252" by default.
Google App Engine Java programs expect "UTF-8"

This wasn't really apparent until we had a few payments come in from people in Europe and Asia, with their hoarded supplies of umlauts. If you're wondering 'Why the difference?' it's because PayPal is apparently still stuck in 2002. I don't know.

The incompatibility doesn't seem like much, but in the GAE Servlet environment you need to call setCharacterEncoding() before you call getParameter() for the first time. Since you don't know what the encoding is until after you read the 'charset' parameter, we have a catch-22.

The 'correct' solution to this is to pull apart the input stream ourselves (thus duplicating all the Servlet code which already does this for us) in order to determine the original encoding. Then set the encoding format on the request object, and then re-read the data parameters as normal. Ugh.

However, if you try this, you will run into the next issue: the verification callback to PayPal must exactly match the message you were sent. If you translate the incoming message into UTF-8 before distribution, the IPN endpoints won't even know what the original encoding was, and therefore how to send the message back.

The verification will fail, and PayPal will keep resending the message until timeout. Bad.

Now, even if PayPal is clever enough to ignore the raw difference in the character encoding field and do all the relevant translation perfectly (ha!) that still isn't good enough. So long as there isn't a complete 1:1 mapping between the two character sets (windows-1252 and UTF-8, and there isn't) there will always be 'edge cases' of particular characters that can't make the round trip. Certain names will break the system. Not normal names, but they'll be in there due to strange keyboards, or malice.

The best solution is to back away slowly from the entire mess of character encoding translation and do the following:

Set the IPN character encoding to UTF-8 in your PayPal account.

That's it. I really don't know why windows-1252 is still the default for new accounts. Pure inertia, I expect. That brings IPN up to code with the modern internet.

You'll have to check that your existing IPN Endpoints are UTF-8 compatible, of course, but if it is I suggest you do this sooner rather than later, as future-proofing. (At least test it.) And for new accounts, do it on day one, because it's much scarier changing it once transactions are flowing. One mistake and they all go tumbling away into /dev/null.

Of course IPN messages that are already in-flight (or re-sent from inside the PayPal account) continue using the old encoding, (Arrrggghhhh!) so the switch-over takes a couple of days and can't rescue failing transactions. The solution for already-sent messages is to manually validate them in each Endpoint (if you have that facility) to push the transactions through.

They're real transactions, it's just PayPal will never acknowledge them once the translation gremlins have been at work. Fortunately it's only the random names that will be a little screwy for a month, (before the next transaction corrects it) not financial data or item numbers.

When I get time, I should at least add an 'expected encoding' option to the software for those situations where you have absolutely no choice and need to force the Distributor into another character set. That might have all kinds of consequences, though.

Another logical extension to the IPN distributor would be to also intercept the validation callback from the IPN Endpoint to PayPal in order to translate the encoding back to original. (Also, to mark off that endpoint as having received the message and prevent re-sends) On close inspection this is quite a bad idea, because it removes the 'cross checking' inherent in having PayPal separately validate messages passed by the Distributor, and creates a single point of attack within the distributor/proxy. It might become necessary, but it's still a bad idea. Plus, how many IPN Endpoints allow you to change the callback URL from the "paypal.com/webscr/" default? (Some, but not all.)

It works. A base has been established. There's clearly a few extra problems to solve, but I have other things to mess with before that.

Thursday, July 21, 2011

No rest for the Wicked

Oh my Gods. And I thought I didn't have enough time before. There literally aren't enough hours in the day. And my attempts to create extra hours (mostly squeezed in just after midnight) backfired rather badly.

You see, we went live last week. The space shuttle launched, and then so did we. ("the last shuttle is away... aw.") And I didn't think we'd told anyone about it, but somehow we've already got two customers. Two paying members! How did they even find out?

There's two people running around inside my application and I don't know where they are. Well, I do. Sort of. There's logs. But I miss being able to just lean over their shoulder and ask "why are you doing that?". Analytics is nice, but not enough. I want screenshots.

What's terrifying is how many things are still broken. Well, I say 'broken'... I mean they work, but not all the time. From the end user perspective this means occasionally having to press 'delete' twice, or unjam a queue by restarting a server, which I'm told isn't that big a deal.

For me, however, every system jam is a failure in my logic. I'm just Vulcan enough that it personally annoys every damn time, deep down.

My system must be a perfect system, with all bugs and errors banished to the land of wind and ghosts.

In practice, there's a point of diminishing returns. No matter how good I make my code, I'm still subject to the workings of the internet, various phone companies, and often Microsoft. So long as my error rate is lower than that lot, I'm doing well.

And there's also a point beyond which you make things worse. You have one line of code that does something, and it occasionally fails. So you write another line of code to catch the exception and perform the fallback... but that could fail too. So you add another line. Soon 90% of your code is exception handlers which almost never occur, (and then all at once!) and so are full of bugs due to lack of testing. And the module doesn't fail cleanly anymore, it always half succeeds, leading to some of the most subtle and bizzare bugs I've ever seen.

Fail fast, fail hard. These are actually the cornerstones of a reliable system, because you can always retry. Just keep pounding 'reload' until it works, and trust in Idempotency.

Saturday, July 9, 2011

The Googlepuss

Well, I've been on Google+ for just about 24 hours now, thanks to a friend who works there. I think that's the right time to post initial impressions.

Googleplus... Googleplus... Googleplus!

xkcd was totally correct with their analysis.
It's good. Clean, simple, and obvious, just what you expect from Google. In fact, it's simpler to use than most of their offerings like Wave or Docs or even Blogger.
It looks a lot like Facebook, but only at first glance. Then again, Facebook looks a lot like Orkut. I expect there's only so many ways to display a list of messages from friends with comments attached.
It feels private and secure. It takes just enough effort (one click-drag) to add people into your circles that it's not something you can do accidentally. You might choose the wrong circles to post to, but that's down to you.
You can explicitly prevent 'resharing' or disable comments on a per-message basis. That's pretty much the opposite of Facebook, which uses almost everything you say as an opportunity to start a discussion with everyone you know. Disabling reshare is good for those semi-private things you don't want accidentally spread around by well-meaning friends. And disabling comments becomes important if you've got thousands of readers and controversial topics, like a news service or pop star.
The Beta state of the software is still quite apparent. Functions and comments sometimes disappear for half and hour, and then come back again with an expression of "What? I was resting my eyes! What?". The drag-and-drop of people to my circles broke at once point, but a reload fixed it. There's a couple of missing things, like sorting circles. All very minor surface stuff. But you can feel the stability of the underlying code. At no point was I confused about what was going on.
Even in Beta, it's remarkably forgiving. I was re-arranging my circles and made a big mistake: I removed people from the old circle first. Since that was the only circle they were in, and the interface only lists people currently in your circles, I lost them. (Add to the new circle first!) However, a little later, the unattached names all turned up in the 'suggested friends' list, so I unexpectedly got them back.

Mostly, the impression is that Social Networking has finally grown up a little. I expect Facebook will continue to reign amongst drunken college students who assume that every comment that falls from their self-centered lips needs to be broadcast to the entire universe as quickly as possible.

But us slightly older farts want to be able to bitch about our work to our friends. (And talk about that other job we're going for.) We need to be able to carry on secret affairs and double-lives. We don't need to make snide comments about people to their face under the guise of a general post about our day. A mature social network needs to support mature relationships, and Google+ tries very hard to do that with it's 'Circles'.

In the end, I expect there's room for both. Google+ will make social networking a useful business function, while Facebook continues to hold onto their original 20-something high disposable income Farmville demographic. In fact, this separation might be good for both of them, so long as they learn to co-exist. Multiple players legitimize a market, but a war isn't good for anyone.

Hey.

Friday, July 8, 2011

Neural Distraction

The IPN distributor has been in production for a couple of days now, passing actual PayPal transactions to one of the small subscription systems. This is the "run-in" period, (another machinist's term IT has appropriated) as distinct from the "smoke test" for hardware. (Literally, "plug it in and see if smoke comes out". A tense moment for anyone who has built an electronics kit.)

Basically, there's nothing to do at this point than sit back and wait for the Big Nasty Real World to break things. And I'm out of town for a couple of days, so I'm literally forced to just let the run-in version sit there and not mess with it, which is exactly what I should do.

So, in the meantime, I'm happily distracted by Biologically-Inspired Massively-Parallel Architectures – computing beyond a million processors by Steve Furber and Andrew Brown.

This project neatly involves quite a lot of things I care about, as one of my Majors was Artificial Intelligence, I've written code for massively parallel machines (the 4096 processor MASPAR) and built Artificial Neural Networks on my old ARM-based Archimedes A310 which was co-designed by... Steve Furber.

Steve Furber and Sophie Wilson are together responsible for ARM, perhaps the most successful processor of all time, if you count cores shipped / in use rather than dollars or profit. The Nintendo DS has two. Your smartphone may have up to five. There are more ARM processors on the planet than people. They made 2.4 billion CPUs in 2006 alone.

I doubt I'm uniquely qualified to critique this paper, but there's probably not many of us.

First, the good. The paper contains the general outlines of a computer that contains 1 million processors, capable of simulating a neural network about 1% the complexity of the human brain (a billion neurons) in real-time. And this isn't just hand-waving. The design is based on some custom ASICs (standard cores, but some specialised network interfaces) but then Steve does that kind of thing in his lunch break. ARM and Stilistix are on board, and ready to make silicon for him.

I'm quite sure he could build this machine, turn it on, and it would work pretty much as advertised. And a million processors might sound like a small building's worth of rack space, but your average ARM core probably costs two dollars, and will run off a AAA battery. This is how you can pack 50,000 card-sized nodes into a couple of cabinets without melting through the floor - or blowing the budget. This is a cheap machine, relatively speaking.

Steve makes the point that your average computer costs as much in electricity as the original hardware. Since CPU power increases exponentially with every generation, but energy is getting more expensive, your power bills become the limiting factor.

However, the most impressive thing about this machine is not the processing power, but the networking. If you're simulating a billion neurons, you have the problem that each one connects to a thousands others. Every (easily simulated) neuron event generates a thousand messages that need to get to other nodes. It's not enough to have massively parallel processing - you need massively parallel connectivity too. Steve's self-healing toroidal mesh is just the ticket.

So, that's the hardware sorted. How about the software?

Well... ahem. I think the most surprising thing is how little the field of Artificial Neural Networks has progressed since I left University, nearly two decades ago. Either that, or our little group was more advanced than I realized. Reading the new paper, all the descriptions of neuronal behaviour are pretty much the same as my day. This shouldn't be too surprising, I suppose. How neurons physically work was sorted out by medicine a while ago.

And this is probably why Steve Furber's "Spiking Neural Network architecture" is basically the same as the scheme I came up with in 1992. (Ah, if only I'd published...) We seem to have gone down the same path, for the same computational reasons, Only I think he's missed some things about how the biology works.

Before I get into that, I should point out that even though I had a cool idea long ago, there was absolutely no possibility at the time of putting it into practise. Steve has done the hard part, designing (and funding) a machine on which to test these theories. Finally.

And when I say "he's missed something", he's done it for the best of reasons; by approaching the problem a little too logically. Being an outstanding engineer, he has sought to remove the inefficiencies of squishy biological neurons and replace it with clean and perfect silicon.

However, it's my belief (any untested theory is just a belief) that two of the squishy inefficiencies that he's trying to remove are actually vital to the entire process: the time it takes signals to propagate along the dendrites, and the recovery time of the neuron.

I could go into a little chaos theory here, but let's just stick with the concept that, after a neuron fires, it's unavailable for a short time. And since timing is everything, this is equivalent to inhibiting the activity of the neuron for the window of the next spike. Perhaps an important one. We don't have to turn a neuron off to inhibit it (this takes a lot of time and effort) we just have to fire it prematurely. We are deliberately creating high-speed 'disappointment'.

Let's combine that with the dendrite propagation time, for some simple mind experiments.

Imagine a neuron that has it's output connected to another neuron by two connections of different lengths. So when the first neuron fires, the second receives two signals, with a delay in between. If the potential is large enough to fire (spike) the second neuron, then it will probably fire when the first signal reaches it, but can't fire when the later signal reaches it, because it will still be in recovery.

Let's say it's that second firing window which is actually the important one, needed to trigger a final neuron at the perfect time. So if the first neuron is prematurely firing the second neuron (which means it's still recovering when it gets the 'proper' signal later on) then it's effectively inhibiting it from doing it's job.

To let the second (wanted) signal through , all we have to do is fire the poor second neuron even earlier. An earlier spike (from yet another neuron) would inhibit the inhibition spike from the first neuron. Doing it at just the right moment means the neuron has recovered in time for the final spike.

So we can alter which of two time windows the second neuron spikes in by sending a 'control' signal a little earlier. Not exactly boolean logic.

Any percussionist knows all this. Hit the drum too soon (when it's still vibrating from the previous strike) and it sounds different.

This is of course completely inefficient, although elegant in it's own way. Any decent engineer wants to simplify all this inhibiting of inhibitions via multiple redundant paths. But I think it's an important part of how our brains process information, so removing that 'inefficiency' will prevent the network from behaving like it's biological counterpart. All that chaotic delay might be the seat of conciousness. Who the hell knows.

But I don't see this level of temporal control in Steve Furber's paper. In fact, he seems to be actively working against it by desynchronizing the mesh, although that might just be the lowest hardware level, and perhaps 'global time' is accurately established by each node's clock. The spike packets apparently contain no time information, and seem to be broadcast to all receiver neurons instantly. That sounds like dendrite propagation is being ignored, to me.

Without very tightly modelling propagation delays, how do we recreate all the subtle phase timing of a biological brain? This is my concern. From what I've seen, nature takes advantage of every trick it can use.

Fortunately the fix is pretty simple: Just tag each strike broadcast packet with a global time, and then have a delay queue on each neuron's inputs to process them in time order. If the mesh communications is fast and reliable enough, perhaps just the delay queue is enough.

There you go, concious computer, no problem. Next?

Monday, July 4, 2011

... or close up the wall with our English dead

So I'm writing a PayPal module. Again. Third time. Once more unto the breach, indeed. Maybe this time I'll get it right.

See, first go I tried doing it the 'right' way. I read all the documentation, ran all the sample code, and got a sandbox account. Everything was lovely, until the day came to deal with real transactions.

The second system was born from the still-burning ashes of the first in a very phoenix-like way. (I assume most phoenixes start life confused and terrified by the fiery collapsing buildings they would tend to wake up inside...) I never want to be patching a live database's table structure before an IPN message retry deadline again.

Here's the total sum of everything I wish I'd known before starting:

PayPal Lies.

With a little more experience I realize that they are at least consistent liars, a slight mercy.

Now for some of you, the word 'Lie' may seem a little strong. Perhaps it is. Consider the following definition of the "receiver_id" IPN field from the PayPal Documentation:

receiver_id

Unique account ID of the payment recipient (i.e., the merchant). This is the same as the recipient's referral ID.

Length: 13 characters

On nearly every transaction, at least every one you see from the sandbox, this field contains our own Merchant ID, and we are encouraged to check this is indeed our own shop identifier before allowing the record through.

It's only when you finally get an "adjustment" record (extremely rare) that you see the mistake. On an adjustment, "receiver_id" is completely missing. But wait... there's our merchant code, but what's it doing over in "payer_id"?

That's when it twigs. PayPal means the receiver of the payment, not the IPN message. (Which is also called a 'receiver' endpoint) So when PayPal reverses a payment, they also reverse the ID fields.

This might sound sensible... for about five seconds. Random ordering of the 'From' and 'To' fields based on another 'Type' field is really not a great way to build foolproof systems. Especially when one of those fields is supposed to be used as a primary fraud protection measure.

A clever person might ask: "what happens if you buy something from your own shop? How does it handle the same ID in both fields?" but at least that question is easily answered: PayPal will not allow you to transact with yourself. Apparently it sends you blind.

So, when they literally say "(i.e., the merchant)" in the specification text they are badly mis-characterizing the API, leading you up a garden path. It's not like there's other documentation to cross-check this against either. At best, they're being confusing in an API intended to process millions of dollars worth of other people's money.

Here we go, I'll spend thirty seconds re-writing that definition to what it actually is:

receiver_id

Unique ID of the PayPal account that received this payment.

Length: 13 characters

And it's shorter, too. But even better would be using a design pattern that doesn't make the mistake of mixing concepts. Instead of a payer_id and a receiver_id which swap order, y'know, depending - they really should have a merchant_id and a customer_id which always have the corresponding data in them.

Even though payer_id and receiver_id are exactly the same data type, they are semantically different and their content should never appear in the other field. The consequences are just too terrifying.

I remember reading about a similar design error in the US Army's Cruise Missile Targeting Request software form which had the 'Target Destination" and the "Request Source" GPS co-ordinate fields right next to each other, with predictably unfortunate results. Sometimes data schema validation goes beyond just making sure the value is the right length.

Also, which 13 characters? Numbers? ASCII? UTF-16? A spec shouldn't be this ambiguous. Any "specification" which doesn't say things specifically is, to my mind, a big fat lie in a binder.

Sunday, July 3, 2011

Going Faster by Slowing Down

I made another slight improvement to my AJAX multiplexer, of an amusing sort. I made it go faster (overall) by delaying the first message.

See, it turns out that when some AJAX data is needed to fill in a control field (I'm going a scattered widget-based scheme rather than neat standard single-form blocks) it tends to be requested in 'flurries'. A couple of controls will pop up together, (possibly as a result of another AJAX call) get their acts together, and decide they are hungry for more data.

Previously, the first data operation would immediately kick off an AJAX operation. This blocks other outgoing messages until the call comes back (otherwise we're using multiple connections, and that can cause issues elsewhere) during which time more operations queue up.

Once the first small request came back, the rest would get sent in a big batch.

By merely putting in a timer so that you schedule the AJAX request to happen soon but not immediately (10 milliseconds, in this case) you give all the other widgets enough time to 'jump on the bandwagon', and everybody else gets their data faster.

This has a surprising impact on the feel of many page operations. The update behavior just seems more logical. Even though the exact same amount of data is flowing to and fro, it turns out timing is very important for this, as in comedy and love.

Also, I just invented another mini-language. (Well, I'm copying Javascript, sort of) I was getting sick of iterating through the Java Maps and Lists that I get from parsing JSON, and I was really missing how easy it is to reach into a Javascript structure and pull out random items. Java's strict types are actually a bit of a pain, now that I've spent some time in prototypal script land where everything is infinitely malleable.

Here's what I'd do in Javascript

var json = JSON.parse(p_json);
for(var i in json) {
var o = json[i];
if(is_object(o)) { // ... }
}

Nice and straightforward. (Yes, I know the "for...in" form is frowned upon, but I control my namespace/prototype pollution very carefully) Here's how long and bulky it gets in java:

// parse the json block
Object json = json_decode(p_json);
Iterator i = ((LinkedHashMap)json).entrySet().iterator();
while(i.hasNext()) {
Map.Entry<String,Object> e = (Map.Entry) i.next();
String id = e.getKey();
Object cmd = e.getValue();
if(is_object(cmd)) {
// accept ....
}
}

And that's just iterating over one level with some pretty easygoing constraints and using my convenient is_blah utility functions copied from PHP. Once you get to three or four levels of structure, the amount of code involved starts gets quite staggering and hard to read.

Wouldn't it be nice to use Javascript-like syntax to just grab nodes out of the JSON structure just like jQuery's selectors grab DOM nodes?

That sure would be nice, I thought. And the more I thought about it the nicer it seemed. So I wrote one.

It's only a few hundred lines long, and it lacks nearly every major feature of note, and in fact all it does is, given a arbitrarily complex JSON structure like [{"a":1,"b":2,"c":[1,2,3]}] and a path selector like "[0].c[2]" it picks out the expected entry from the structure. In this case, "3", the final element.

It's like running a tiny, tiny bit of interpreted Javascript code in compiled Java VM, just to pull out a single variable. I can't tell if it's silly, or pure genius.

This collapses enormous repetitive wodges of code down to tiny constant strings expanded by a library. Whether the code runs faster or much, much slower depends on a lot of factors, and isn't straightforward to guess. Several tricks to pre-cache the tokenization would help out immensely, but that's for when I get time. (No premature optimization here...)

The other thing the selection language does is to provide a kind of error checking that's sadly lacking Javascript. each step of the path is either an object property name (a dot, followed by text like ".name" or ".a") or a numeric array index (a number, enclosed in parenthesis, like "[0]" or "[9]") which forces an expected type , at least between objects and arrays. Normally Javascript can use either syntax to access either object properties or array indexes, but throws random exceptions when your expectations don't match the data.

Totally missing features: it only selects one thing at a time, rather than lists of things which I already want to do. And it doesn't know about the standard escape characters yet, but that's just a limitation on the selector string, not the JSON data. The Jackson library is still taking care of all that, and doing a marvelous job.

Now that I've added this handy little feature to my toolbox (and spent the last hour writing test cases to try to make it break) I'll be better prepared for the future. Maybe it will even justify the time. I live in hope.

Friday, July 1, 2011

Signature of The PayPal Problem

Later on I'll be talking about The Problem, the formal definition of what we try to solve. But for now, let's look at just a small part of it; PayPal's amazing relationship with independent software developers.

Most people on the internet know about PayPal. It's ubiquitous. I'm writing a new PayPal 'backend', intended to manage user accounts. This is not my first time on this particular merry-go-round, so I'm working with a large base of tested and reliable code, but PayPal is special.

Two months ago, PayPal added a new field to it's IPN messages, called "ipn_track_id". It's filled with a big 22 character long unique-looking alphanumeric hash.

Hopefully this is the long-sought-after "idempotency marker", much desired by PayPal module developers in the same way the Arthurian Knights were quite keen on the Holy Grail.

'Idempotency' means 'doing it twice doesn't screw things up'. It has become a very important concept in making mixed internet systems reliable. See, there are 'edge cases' on days when both the network and users are being silly that cause issues that lead directly into the Twlilight Zone...

Consider, if you will, the following user story. It is a day like any other, when Arthur Q Public makes a PayPal transaction for a simple product item on an internet web site. Then, while intending to cancel another order, accidentally reverses the one he wanted to keep. Paypal begins the reversal process, locking Arthur out of further changes. Arthur, upon realizing what he has done, re-purchases the item he really wanted, expecting everything will get sorted somehow. Little does he realize that the IPN message system is passing on his purchases in random order, plus retries, on a day when the PayPal validation callback service is dropping connection from a DDoS attack, or traffic from Random Pop Star's latest album, which from a systems point of view looks like the same thing.

Depending how your system answers the question 'Is this record the same as the other one?' means the purchased will be be correct, (processed, reversed, processed again) punitive (purchased, purchased again but ignored, reversed. leaving none.) doubled (purchased, purchased again, reversal doesn't match latest, leaving two purchases) or worse (multiple interleaved retries and reversals, leaving god knows what.)

Payment records are fairly safe, because their Transaction ID works as the idempotency id. If you get the transaction twice, you can safely ignore the second. The Subscription ID works the same on the signup record... and that's the first problem right there, because suddenly you need to use different ID's depending on the type of the record.

Is there a 'Record Type' field? Sure there is! At least three of them. (Doh.)

Oh, and not all records have an ID that makes them idempotent, especially things like reversals and cancellations

So the sudden appearance of what looks like a consistent idempotency ID is exciting news. It significantly changes the processing logic of an IPN receiver. And remember this new field is being sent by PayPal as part of every single IPN message, it's not like it's an optional request. Everyone who writes or just runs an IPN endpoint (ie: an Internet Business) is expected by PayPal to deal with the existence of this new field in the IPN messages, and deal with the consequences.

You'd expect it to be documented, right?

Right? Suddenly pushing a new field at every single piece of IPN software out there, you'd expect a little Change Management documentation? Here's the highest ranked exchange in PayPal's own developer forums: https://www.x.com/thread/51579

It's short enough that I can include the entire amusing conversation:

socialadr

1 posts since
Apr 20, 2011

Apr 20, 2011 11:15 AM

Really PayPal? ipn_track_id?

PayPal recently changed the IPN data they send us and added a new field called "ipn_track_id". Of course there's no documentation of this anywhere, even Googling won't return anything at all.

I can make an assumption of what this field is used for, but is it normal for PayPal to add new fields without notifying developers?

Tags: ipn/pdt, ipn_variables, ipn_track_id

PP_MTS_Magarvin

5,750 posts since
Nov 3, 2009

0
votes

1. Apr 21, 2011 8:20 AM

in response to: socialadr

Re: Really PayPal? ipn_track_id?

Hi Socialadr

ipn_track_id is a new IPN variable that was recently added for internal tracking purposes. I have submitted a Doc update for this variable to be listed in the guide. Sorry about the inconvenience.

Thanks.

Brian Zurita

1 posts since
May 27, 2011

0
votes

2. May 27, 2011 3:48 PM

in response to: PP_MTS_Magarvin

Re: Really PayPal? ipn_track_id?

While we wait for the documentation to catch up, could you please confirm that this field will be a max of 22 characters?

... and that's it. It's been three months.

There's a similar story attached to the "notify_version" field suddenly changing to "3.1" from "3.0", and "internal tracking purposes" was mentioned as part of the reason for that one too.

Remember that the IPN protocol is similar to email; we all run receiver software expecting messages to conform to a particular format. Database tables are created and maintained with columns for each field. (well, mine don't. I know PayPal too well.) Transaction field values are thoroughly checked to prevent fraud and abuse, and deviations from the standard (such as the version changing from "3.0" to "3.1") can literally set off alarm bell sound files. You would expect PayPal to give the developer network a bit of a heads-up before arbitrarily changing the specification, especially because getting these new fields is not optional. There is no way to turn them on or off.

My personal theory is there is a tier of 'paid-up PayPal developers' who are given access to this advance information, with legal non-disclosure requirements that are apparently well enforced, because it's hard to find any information in the usual developer places. This would give them a reason to be hostile towards the Independent Developers, in order to give the paid-up members an advantage for their commercial software.

The alternative is that PayPal simply doesn't care about documenting IPN, and treats all developers like this. Basically, a choice between malevolent or uncaring. Tricky.

I really want to use the 'ipn_track_id' field... If it is what I think it is, it solves a lot of problems. But like everyone else, I apparently have to reverse-engineer it's meaning from the messages we are given.

I have to guess? While writing software expected to process actual money transactions? Really?

Fortunately, I have a large collection of data to look at. Not sandbox transactions, but a year of payments and edge cases like reversals and dispute adjustments. Data that, by definition, I can't share or use as examples. And I can tell you that PayPal's own 'sample transactions' don't even come close to the complexity of what happens during a dispute.

I hope this gives you a tiny insight into why PayPal site integration is so variable, and why so many smaller shops seem to have such difficulty with automatic processing, and go for larger 'checkout' services who take yet another cut on top of PayPal fees, and the original Bank charges, but don't allow for 'deep' user management. Small systems and independent coding projects (like the Drupal module) are fragile in the face of PayPal's arbitrary protocol changes, apparently by design.

But despite their current ubiquitous lock-in, PayPal needs to seriously lift it's game. Google Checkout is coming.

Update:
Nope. A little data trawling has suggested that "ipn_track_id" has an entirely different meaning, and solves a different problem. This value is repeated across two IPN messages that were otherwise hard to match together: the signup and payment transactions that are generated when the client starts a 'subscription' recurring payment.

Yes, it does also match across retries for the same record, but this seems secondary in purpose. It might technically be idempotent when combined with the "txn_type" field, but at that point we're back to magic combinations of fields again.

Also I just want to mention a minor thing - if you happen to be designing an API like PayPal's; while having a field like "reattempt" might sound like a good idea, but it means that each retry data request is subtly different from the original. So otherwise simple and reliable methods like storing message hashes won't work unless you strip those (otherwise quite useless) fields. A shame, more than a problem. It means that we need one set of logic for spurious TCP/HTTP retries, (which can happen, especially if a proxy in involved) and a second set for when PayPal reattempts explicitly, despite it making no difference to us why the glitch happens.

But it fits with PayPal's egocentric worldview. We are apparently supposed to care how many times PayPal tried to give it to us, enough to justify them altering the transaction message. But we don't get a simple field which lets us know which message the retry is for, in case we already got it.