Wednesday, April 23, 2014

FireFox still missing MessageChannel

I've managed to 'find' three browser bugs in the last few weeks. One is wending it's way through the chromium bug process (it's nice when you get a good one) one was already logged, and the third was FireFox.

Well, I say "bug" but really it's "unimplemented HTML5 feature"... specifically the MessageChannel object necessary for cross-document messaging. Without those, you can still use .postMessage for everything, but you take a performance hit because all the "marshalling" required, and security goes out the window.

Oh, so FireFox has not implemented a bleeding-edge new feature. Aw, cry me a river, you might say. Except this is a rare case where even Internet Explorer has managed to get over the line, making Mozilla the last holdout (except opera mini.) and it's such a simple feature!

In fact, if you look at the great repository of knowledge on such things:
http://caniuse.com/#feat=channel-messaging
You'll see that Chrome has had it for five version, IE has had it for two, even the Blackberry browser has always had it. It's one of the easier parts of HTML5 spec, and is a fundamental part of security in future apps.

So why is it missing from FireFox?

Reading through the bugtrack is quite illuminating, and paints a winding story. I think the tone was set in the initial days by one developer saying "I don't see the point of this." and another replying "Because it's in the HTML5 spec." and the first replying, "Oh well." and then nothing happening for three months.

Recently a flurry of people reporting 'bugs' prompted some work, and the feature was mostly implemented except for web workers, and then pulled back at the last moment. From what I can tell, the patches are sitting in some kind of purgatory until someone cares enough again.

In the meantime, the fallback for FireFox is to continue using the single window.sendMessage event to route everything. Why is this bad? Perhaps the best reason I can give is that, once I attached my "onmessage" handler to the outer window to receive messages back from the inner frame, that handler was passed all inter-frame traffic... including the Google+ widget messages that I didn't even know were there... and it was clear that the Google+ widget was also receiving all the messages intended for my scripts, (filled with things like authentication tokens) so I was essentially trusting them to ignore messages intended for my origin and not be evil. (just like they were trusting me)

Google, I do trust. But what happens when I add a Facebook or other third-party web-library-component-widget things to my page? Will they start conducting industrial espionage on each other because of the globally accessible message queue? How would you know?

So it's not just me working around the missing features in FireFox, or the dozen other edge-dancing coders complaining in the forums, It's Google. How much effort is being put into coping with this out in the field?

Monday, April 21, 2014

node.js finally arrives, thanks to RedHat OpenShift

I've been messing with node.js for about two years now. It's a beautiful system. Most people don't get it, but that's fine. The bigger problem had been there was no-where to run your code, if you didn't maintain your own servers.

And who seriously wants to do that, if you're not getting paid for it? It's a time sink.

I've been waiting for a major provider like Dreamhost to enable node.js. A couple of small providers have started up, but it's kind of a premium service. And that's a shame, because the first job of node.js is not to take over completely from your old site, but initially to "plug a few gaps" in the big internet puzzle - mostly all this crazy new 'real time' websockets and messaging stuff.

...like a chat client. Everyone loves a little chat client. But you're not going to redevelop your entire site and chuck Apache, with all the pain that would cause, to get one. You want a cheap little 'experimental' side-server that fills that gap, so cheap you essentially forget about it.

Well, along comes RedHat with their cloud computing platform called "OpenShift". I think they changed it recently, because I remember it having a much more sensible name and a lot less features the last time I checked.

I'll skip over explaining their funky new words for everything (like "cartridge", clearly intended to harken back to your halcyon Nintendo days) and just say that, FOR FREE, you can have THREE servers spun up for you, pre-loaded with all kinds of linux-based server setups such as Apache, PHP, JBOSS, or node.js.

If you're used to FTPing into your hosting server, then OpenShift is going to be a shock, due to the heavy use of GIT to transfer files. It seems crazy at first, especially setting up the SSH access keys if you don't have a unix box handy, but there is a method to the madness which becomes apparent when you start to 'scale'. Using GIT to deploy files to one server seems completely excessive - although it does do some cool 'merging' of your code with the latest server upgrades - but the moment you need to deploy to half a dozen instances, you'll see there's no better way.

All other providers I've tried work in a similar way. If anything, RedHat has streamlined the process to it's simplest. But why does the typical node.js deployment system 'force' you into being scaleable? Because if you don't start out that way, you never will.

Yes, socket.io and websockets work, I've tested it. (You have to be aware it appears on port 8000 externally, regardless of what you thought it would be. And only the websocket connection method works - no comet fallback that I know of.)

Widespread use of node.js and websockets is likely to reduce global internet traffic, by removing all the inefficient ajax 'polling' or comet 'long poll' IP connections being made and remade, a billion times a minute. node.js can handle more users and page hits than PHP. But none of that mattered while we didn't have any damn servers to run the code on.

So, Thanks RedHat! You've made my day, and solved one of my longest-standing network plumbing problems, for free. And you've made the internet better. Thank you.

Monday, April 7, 2014

Steam Family Sharing renegotiates copyright in our favour

Steam (the video-game distribution service) just renegotiated your copyright with them. And unusually, they've taken a step on the long road towards ending the pointless copyright wars, instead of escalating further.

You can now "lend out" digital games to your family and friends. It's a small step, but a good one:
http://store.steampowered.com/sharing/

Jaron Lanier gave a talk recently where he pointed out one consequence of the "paper-thin" Digital Rights Management around DVDs, compared with the "open-access" data on CD.  "Imagine that 20 years ago, you put a CD and DVD into safe." he thought-experimented. They cost about the same when you bought them, and twenty years later you open the safe and ask "what capabilities does owning this medium offer?"

Well, with DVDs, you can do pretty much what you could do back in the day - play them on your region-locked DVD player, unskippable FBI warnings and all.

But with CDs, you can shift the music to your iPod, digital music collections, Spotify... there's more you can do with a CD today (again, legally speaking) than ever. Amazingly, the unprotected CD has increased in utility value, thanks to technology. That investment you made back in the day keeps paying off. Weird.

Valve and Steam have just done something equivalent. They "renegotiated" your copyright with them - as is allowed in that EULA you clicked through, as could be done by any media company at any time - but instead of screwing us, they have taken the opportunity to act on our behalf.

(In this way, they are outperforming many actual governments.)

Basically, you can now lend your game library to other people as easily as letting them sit down at your own computer to play a game, or loaning them the game media. (with the added advantage they can't lose or step on it, and you can yank it back any time you need to.)

What this does, once again, is increase the utility value of your Steam digital library. All those games you bought can convert into favours amongst friends - one of the hardest currencies of all.

Steam gave you more digital rights, rather than trying to restrict them even further. Isn't that refreshing? Being treated with respect? Feels good, doesn't it?

We're closer to the day that "baby's first steps" can't get yanked from YouTube because a digitally-bought-and-paid-for copy of "Finding Nemo" was on in the background. Where buying an album means you can sing your own terrible karaoke version in public, or do a heavy-metal guitar cover that many think is better than the original, without fear of prosecution.

More generally, the digital rights afforded us need to align with the way we (and our children) actually use the technology in our modern daily lives, or the copyright industry is just a way of criminalizing the entire population for being alive in the 21st century, for profit.

Saturday, March 22, 2014

"LeoDroid" Design Decisions


I should explain a few of the design decisions I made with LeoDroid. First, DC Power!

Power Systems

Notice the USB "Power Bank" on LeoDroid's back? That's an 8500mAh NiMH battery capable of delivering 1.5 Amps. They're designed to recharge mobile phones, and come cheap from eBay. You can't build one cheaper, trust me.



If you have built power systems before and didn't know about these brilliant new devices, all you need are these magical words; "5.2V @ 1.5A DC step-up regulator, built-in USB charger, all matched to the battery, less than twenty bucks."

They can be plugged into any standard USB charger. They can charge each other. 5.2V drops to exactly 5V after a Schottky protection diode or LDO regulator. 1.5 amps is enough to spin robot motors.

Unreliable power is one of those things that can ruin your whole day. Buggy self-made power supplies can screw up your whole month. These power banks can be bought for less, in many cases, than the naked battery.

They come in a variety of shapes, colours, sizes, and optional features like solar cells. If one breaks or dies, you just plug in another. You don't have to care about battery chemistry, charge curves, or special connectors. If you use USB-powered Arduinos, there's no soldering required; just unplug from the PC, attach the USB Power Bank, and away you go.


Power Switch

Note well the power switch! The use of a DPDT switch is not accidental - the idea is that when turned on, the battery pack power is connected to the Arduino and Motor Driver - but when turned off, those systems are isolated from each other.

See it? Over to the right, above the wheel well.

To understand why, notice that the Arduino's USB port isn't connected to the battery, (which would be easy with standard cables) instead we route the power to the RAW pin. This arrangement is absolutely necessary because we can't drive enough power through the Arduino USB port to turn high-current motors. It's not a matter of wire gauge - it's the protection polyfuse. If you try to pull more than 500mA through that fuse, it will shut down.

So we have to run power to the motor driver directly, which means soldering wires. And that same input power has to get to the Arduino, because we're using its on-board regulator to supply the digital circuits with 5V. Power just can't go through the Arduino's USB connector on the way to the motors.

Unfortunately, a problem arises because we occasionally want to plug the Arduino into a desktop PC to upload new program code. We really don't want the battery turned on when doing this, because it's possible that the battery pack (if the PC is underpowered or defective) could feed power INTO the computer and burn out ports, even with the polyfuse. Or the PC could try to charge the battery bank backwards, via the wrong port. Don't cross the streams.

If you plug your PC into one side and the battery into the other, power can flow ALL the ways.


If you unplug the battery to prevent this but leave the microcontroller and motors driver connected, the host PC will need to supply all your Droids' needs - including the motors if they happen to suddenly power on because you're debugging, and that will trip the polyfuse.

If you've got good equipment, this will merely reboot the board. If.

With the double-pole switch, "on" means the battery is connected to both controller and motors, and "off" means the controller is safely isolated for programming.

Throw ze switch, Igor.


If you use a single-pole switch (or none at all) then you need to be damn sure your equipment (and everything you plug it into) can survive the conditions. Be aware that I have seen USB ports that burst into flame because someone plugged a double-headed cable in twice to the same PC.

Another simple option is to not use the RAW pin but hack up a USB cable so it splits to supply the motor driver as well. Therefore, you'd be forced to unplug the battery USB connector before plugging in the programming USB cable to the PC.

The most foolproof solution, for the usual class of fool.
It's still a switch, it's just harder to flip.


If you think about it, you would be 'manually' performing the action of the DPDT switch by having a mutually exclusive 'power cable' and 'programming cable'. If you intend to do software development, plugging and unplugging over and over can be annoying, and physically break things. Hence the actual switch. Of course, you could leave the switch in the wrong position...

If you don't intend to connect the Arduino to a PC ever again after emplacing it, then a single pole switch is probably fine. But why not do it properly?

(Did I hear someone say "diodes"? Sure, you could use diodes. They'd better be expensive schottky ones, or the voltage drop will take the circuit out of spec. Even good ones are wasting 5% of your power.)

Power Loom

This is such a simple and obvious concept - have a PCB or connector block that is the central power loom. With dupont connectors, I use old 2-row IDC connectors (such as the IDE connectors off dead hard drives) and solder all the pins in each row together to create two 'rails'. Then all the connectors plug into that. It's basically a 10-socket mini power board.

Like this, only tiny.

The alternative, which I see all the time, is the 'daisy chain' that develops when you run power wires to the nearest other powered device.

The 'loom' is not only better electrically (to prevent ground loops) but it makes visually checking hook-up polarity straightforward - if all the black wires are on one side, and all the red on the other, then you're obviously good at the loom end. (This is how we battle Murphy's law.)

It also means you can quickly isolate individual devices without worrying about chained devices. And you can untangle wires easier.


"LeoDroid" photos

Hooray, pics uploaded at last. Here is my latest robot, based around an Arduino Leonardo: I'm just about to pull it all apart again to do the final trim and shaping of plastic bits, and add The LASER.


Hi there.


Looking pensively into the distance.


The NiMh battery pack has a solar panel built-in, and runs for hours. 
Also, can recharge your phone in a pinch.


The thumbstick controls a pointer on the screen


...which controls the Integrated Programming Environment


Under the hood, kinda looks like a prop from the technicolour 60's.


The actual computer is nearly invisible. It's all really just different lengths of coloured wire.

Release: Fritzing design files for "LeoDroid"

Here are the "Fritzing" design files for the Arduino Leonardo-based robot that is sitting on my desk. I'll be posting some photos shortly, and explaining some of the design decisions.

Common Features:
  • Atmega32u4 based design
  • 128x160 pixel TFT display
  • Thumbstick
  • Ultrasonic Ranger
  • IR Remote Control Decoder
  • Magnetometer Compass
  • Sound Effects
  • PWM Motor H-Bridge control
  • PWM LED/Laser output

I've added all related files (ritzing designs, exported images, and library parts) to the GitHub repository: https://github.com/unorthodox-engineers/unorthodox-arduino/

Pro Micro Variant

This is the target hardware I developed for, the 16Mhz 5V variant of SparkFun's Pro Micro.



It might look as though there is one pin free: D14. Nope. Once SPI is switched on, it gets locked as MISO and can't be used for general I/O, even though our SPI Apprentice device doesn't have a data output line. Every single pin is used.

Arduino Leonardo Variant

The genuine Arduino "Leonardo" is completely compatible with the Pro Micro, and even has a few extra pins to play with. (though not as many as you think)




Remember that SDA and SCL are 'aliases' of pins D2 and D3, so we really only have D11, D12, A4 and A5 unused. (Though that is enough for many things!) And on the Leo, the SPI bus is on the ICSP header (only) so make sure you have the right cables.

I'll try to post errata here, but the files on GitHub will always be the 'canonical' ones. Make sure to check those before you start building. :-)

Parts List




Thursday, March 20, 2014

Review: Hack (The Facebook Language)

I design new computer languages, when I can't avoid it. It's something I've been doing since before my Honors Thesis, in which I created a distributed language that allowed programs to dart and jump from machine to machine, architecture independent, at will.

So when someone announces a "Major New Language", I tend to have a look-in. When Facebook announces a new language, well...

Here's the thing. I don't want to be mean, but it's going to come out, because some of the criticisms I have border on cruel and unusual. In fact the more I read the spec, the more I wonder what they were thinking...

Actually, I do know what they were thinking. They wanted to improve PHP. That is a laudable goal, but frankly, and this is going to sound mean, the fastest way to do that is to stop using it. Hack is designed to solve problems which are central to Facebook's business, but irrelevant to the majority of platforms and users that run PHP. For those platforms, PHP's shortcoming are not a bug, they are a feature.

Case in point: Async. Hack has some lovely Awaitable metas (sort of similar to generics) which you stick on a function and when you call it, you instantly get back a kind of 'handle' to the result which should be available eventually. 

That's nice. It makes parallelizing parts of the code really simple. 

But it's not that you couldn't already do something similar in PHP (granted, clunky as hell) it's just that most Hosting providers had the PCE (Process Control Extensions) turned off, or would suspend your account if you called too many long-running shell scripts, besides killing your script after 30 seconds of runtime, no matter what you set the timeout to.

Hack Async solves the problem internally for Facebook where they trust all their code, but doesn't do a thing for Dreamhost or Gator clients. In fact it guarantees they can't run it, because even the simplest script can now balloon out into infinite parallel threads, and there's no resource management to limit it.

Also, easy creation of parallel algorithms means easy creation of deadlock problems.. where are the tools to solve those?

Other issues that are important to a company with a huge monolithic codebase are static typing and annotations (check) immutable database tuples (check) generics (check) collections (check) and UML modelling. (um, no!)

Then there are missed opportunities. Traits are broken for the one thing they should be useful for - macros which prevent copypasta and tortuous subclassing in cases where even generics can't cope. It could have been a "literate programming" primitive. Now I don't know what it's for. "Implementing Static Interfaces" apparently.

What really amazes me is that "Lambda Expressions" are more than halfway down the list. "Nullables" (the ability to make static types include 'null') is given more prominence. That's like casually mentioning "oh, and we've got warp drive." after explaining the many benefits of the cup-holder. 

Probably because, like warp drive, it is very hard to retrofit after-the-fact and make it completely functional. Javascript had it built into its core, and still had difficulties.

There are some genuinely nice things; Iterators and Continuations are great (I'm still waiting for them in Javascript, ahem!) although frankly very few people know how to use Continuations, and Iterators are just a standard class interface, and I feel another missed opportunity to combine two things which are really the same thing.

But none of that addresses my original point; which is that Hosting providers chose PHP because it was limited and constrained, and could be used to sandbox user accounts on a shared machine. Everything in Hack was already available in C++ and UNIX, but we were not allowed that full power, because abuses (or bad code) brought down the system for everyone, partly because hosting providers are quite terrible sysadmins.

But also because our languages are really bad at allocating finite resources like memory, bandwidth, and CPU power among sub-processes. Or informing us how much is available, or being consumed by our own processes. But that's another discussion...

Because PHP was largely cut-off from the OS, (deliberately!) Hack has incorporated large chunks of the OS inside itself to make that interface frictionless and performant. Switching on all that power will go wonderfully inside the walled garden of the Facebook server farm and other such corporate installations, but it doesn't address anyone else's needs.

This is an heroic effort to get Facebook out of the language dead-end they had painted themselves into, a way to transition from PHP to C++ incrementally, without re-writing the whole damn codebase. Because they're right: PHP starts breaking down when you get too large and complex. I'm amazed they've pushed it this far. It's designed to spam out a web page in 100ms, not multithread a distributed computation. Fortunately, Turing Complete means what it says on the box.

But if you're looking for a cool server-side language for your next project, I'd still go with node.js, unless you too have a million lines of legacy PHP hung around your neck like a dead albatross.

That's not the most resounding accolade, granted, but at least Hack has a definite use-case. (More than can be said for many languages) For PHP programmers with a formal background, it offers a partial escape from our sad and sorry lives. Well, if the sysadmin would just install it...

Monday, March 17, 2014

Overview: Arduino Graphical User Interface classes

Part of last week's release was a very hacked-together GUI for my programmable droid interface. Here's a couple of screenshots that I've posted before:



Not shown is the thumb-stick that controls the square cursor that you can see near the middle of the screen. The basic features are:
  • point-and-click style interface using thumbstick.
  • line-oriented layout that favors runs of text symbols
  • clicks initiate immediate action, or enter menu mode where moving the joystick will modify the item, after which clicking again will commit the choice.
  • redraw manager maintains a list of updated regions, and can be called incrementally by the main loop.
  • screens can be 'paged', so that the cursor is either limited to the edge of the screen, or navigates over to a new page, or a different screen.
If you take a look in <unorthodox_droid_raster.h>, you'll find a set of 'Pager' classes that are 'bound' to the Raster classes. (it asks the raster classes to draw things) These are refactors of the older <unorthodox_droid_gfx.h> classes that were bound to the Adafruit GFX library.

I have no intention of 'abstracting' them both - the raster classes are the only ones in use, but the older classes are left as examples and prototypes for people who have already committed to the GFX library, or who perhaps want to port the classes to a third hardware/driver library.

Frankly, the classes aren't really important except as examples. That's going to be the hardest part of this explanation: my 'library' isn't what you expect. There's really not a lot of code there - which is the point. What I have is a way of breaking down the problem so that custom GUIs can be created within the very hard limitations of the Arduino.

Here's a couple of things to consider, issues which make this particular version of the problem different from 'classic' cases:
  • Low screen bandwidth - because the screen is usually on the end of a serial link, instead of dual-port SRAM on a video card, simply clearing the screen can take over a second.
  • Write-only - the screens I work with have no way to read pixels back from the display.
  • Memory mismatch - Ardino total RAM is 2.5K. The bytes needed to display a 24-bit 320x256 image is 245K. The Arduino would need 100 times more RAM to keep a local 'copy' of what is on the screen.
If we assume our UI is made up from some 'fixed' elements and some 'variable' bits, what we're really asking is to create a structure that fits within the tiny 2.5K Arduino RAM, that we can modify, which will somehow be able to repaint that entire 245K screen surface.

There's a name for trying to keep something big in a small space: Compression. 

Therefore the real secret to building an Arduino GUI is not about widget toolkits and pen styles, it's going to be about efficiently accessing compressed data. 

The moment we have a cursor pointer moving around, we're going to have to redraw little chunks of the screen where it is, and when it leaves. We have to be able to efficiently reach into the structure to redraw line spans when the cursor moves, or text content changes.

Compression is all about context. The briefest recorded telegram conversation consisted of one symbol sent by each participant. The first sent "?", and the second replied "!". In their context it made perfect sense, and couldn't have been clearer. If you haven't heard the story, then it is less so.

So our first layer of compression is to cut down the problem space in a way that 'everybody knows about', but that doesn't limit us too much. Since most of the 'interfaces' that I needed to build were essentially various kinds of configuration menus, and menus are lists of text items, it was pretty clear I could build such an interface out of "lines of text" and not much else. (Curses, you say.)

Font glyphs (especially european ones) can be compactly represented in 6x8 pixel blocks, or even 5x7 if you always assume a 'spacer' pixel between them. 8x8 might sound neater, but the aspect ratio of 6x8 glyphs is a lot nicer. 8x8 fonts look very 'fat', and fit less per line on small screens.

If we assume our screen is made of a grid of 6x8 pixel blocks, each one of which might contain a character, then we vastly simplify our data load from say 320x256x3 bytes (RGB pixels) to 53x32x2 bytes (assuming one character glyph and 'style' byte per cell)

Thus, keeping a coloured fixed-font grid would require 3.3K of memory. Still too much.

Wow. We can't even keep a full-screen character grid. Even the cheapest microcomputer of the 80's could do that.

We make another assumption: on most of our 'menu based' screen, there are going to be lists of things. "lists" by definition are a bunch of lines that have a similar format. If they have the same general format, but different 'drop-in' values, then that's essentially good old printf().



In the screenshots shown at the top, you can squint and notice that most screens only have two 'line formats' - the header line, and the bulk list lines. (empty lines not counted) and in fact the classes that back these screens have a very simple 'redraw' handler which tests if the row is <1 (in which case it draw the header) or >1 (in which case it subtracts two, adds the page offset, and draws that item number)

These classes are not filled with data structures that are parsed by an outside framework - they are full of if/then and case statements that hard-code the screen layout into compiled instructions - the fastest possible way to render.

The RasterPager class is the base class that each screen is built from. (What might be called a 'screen' is internally referred to as a Pager, because it 'knows how to draw a page'.)

Pagers are asked by the render manager to draw RasterSpan objects which are essentially defined as a sequential span of characters on a single text line, plus some raster context. "line 10, chars 8-17" is a typical span. 

The draw manager has a list of two bytes for each line to record the 'invalid' span. (how much we need to redraw next time - which may be nothing) If the manager is asked to 'invalidate' more spans on a line which already has one, the existing span is extended to include those characters as well. 

When the draw manager is given a 'time slice' to spend on rendering pixels, it finds the next line which has a span that needs to be drawn, and asks the Pager to redraw it.

The draw manager has no idea how the span is going to be painted - it might have character glyphs drawn into it, or just pretty patterns - it just knows which parts of which lines are waiting to be done. When it comes time to do the job, the draw manager passes the span request on to the Pager class.

So when the Pager gets the request to draw parts of 'line 0' (for example) it can resolve this in any way it likes. Case statement, array lookup, random number generator, whatever. In many ways, the Pager classes are like HTML 'stylesheet' classes; they mostly contain formatting information.

Because we are interested in rendering lines of coloured text, the RasterPager class has a method called format_span() which works very similarly to printf() in that it has a fixed "format string" parameter (compactly stored in program flash memory) and a set of variable parameters in an array.

format_span() is written to efficiently render sub-spans of the entire line - it can efficiently skip over parts that aren't involved, and doesn't buffer. It can extract single digits from the middle of decimal-formatted and aligned numeric values, and use single parameters multiple times in multiple ways, as decimal values, style indexes, or string selectors.

Here's the top two line definitions used by the SignalPager screen shown. (there's a debug footer I've omitted) It looks bulky in source, but the first string is only 14 bytes long (compiled) and the second is 26 bytes. 

// title line
PROGMEM prog_uint8_t SignalPager_title_line[] = {
// fragment metadata 
RasterPager::Span | 6,
RasterPager::Span | 9 | RasterPager::Last,
// fragment instructions
RasterPager::Lit | 1, ' ',
RasterPager::Lit | 9, 'X','-','D','R','O','I','D',' ','1'
};

// list line
PROGMEM prog_uint8_t SignalPager_signal_line[] = {
// fragment metadata 
RasterPager::Style | RasterPager::Vector | 5, RasterPager::Span | 2,
RasterPager::Style | 6, RasterPager::Span | 4,
RasterPager::Style | 3, RasterPager::Span | 2,
RasterPager::Style | 0, RasterPager::Span | 7,
RasterPager::Style | 3, RasterPager::Span | 5,
RasterPager::Style | RasterPager::Vector | 3, RasterPager::Span | 1 | RasterPager::Last,
// fragment instructions
RasterPager::Lit | 2, '<',' ', 
RasterPager::Dec | 0,
RasterPager::Lit | 2, ' ','(', 
RasterPager::Dec | 1,
RasterPager::Lit | 3, ' ',')',' ',
RasterPager::Lit | 1, '>'
};

Here's the overloaded draw_span() method on that class, which selects which of those formats to use (based on the line) and then stuffs the vector with the relevant parameters, and then calls format_span().

void draw_span(RasterSpan * s) {
int row = s->row;
prog_uint8_t * line_format = 0;
int line_vector[8];
if(row==0) {
line_format = SignalPager_title_line;
} else if(row==1) {
} else if(row<18) {
int line = row + scroll_y*16 - header;
line_format = SignalPager_signal_line;
line_vector[0] = line;
line_vector[1] = droid->signal_value[line];
line_vector[2] = droid->fs.token_size(line&0x7F);
line_vector[3] = line_vector[2] ? 2 : 3;
line_vector[4] = droid->fs.token_size((line&0x3F)|0x80);
line_vector[5] = (line<64) ? (line_vector[4] ? 2 : 3) : 7;
}
format_span(s, line_format, line_vector);
}

Note that apart from the function calls to obtain the parameter values, (which we have to do anyway) the code essentially consists of stuffing a local fixed array with values (very cheap) and then calling one method. Remember that function calls have overhead, and can incur anywhere from a dozen to hundreds of bytes of compiled program code. Calling the raster equivalent of drawText() and setColor() for each parameter directly would clearly take more code bytes - the less calls we need to make, the better. One is pretty minimal. One shared is even better.

The chain now goes: the RasterDroid draw manager knows which bits of which lines to redraw, so it incrementally calls the pager draw_span() for each one, and the pager passes that span request (along with the appropriate line definition and looked-up parameters) over to format_span(), which then draws individual font symbols to the raster device using draw_text() as needed. Whew!

This is how we decompress the instruction 'redraw span 8-10 on line 3' into pixels which hit the screen. Not with data structures - because they don't fit in RAM - but with pure code. Our program flash budget is much bigger, so we depend on that instead. Also, by compiling the code we create the fastest possible execution path, optimized for each 'screen' while sharing the common 'drawing primitive' they all use.

This is all totally backwards, when compared with how a UI toolkit like Windows or Aqua or GTK does things - with abstract widgets. But abstraction is a luxury that, on our memory budget, we can't afford. Knowing our context means abandoning polymorphism, to some degree, by "hardcoding magic numbers", or in this case; format strings and click positions.

Although... since the code is so structured, there's no reason why a 'widget layout' system couldn't be created which code-generates these classes. That system could be full of interesting widget code that aligns columns or draws ASCII art, but that can all be done pre-compile.

Sunday, March 9, 2014

Two At Twice The Price

There's a line in "Contact", spoken by Hadden, that I have taken as a personal mantra:

"First rule in government spending; why build one when you can have two, at twice the price."


That's just before his other immortal line. (And just after some really bad zero-G special effects)

A good friend of mine who studies governments once told me that while it sounds stupidly obvious, governments often alter the price of a service merely by trying to use it, and it doesn't always get cheaper in bulk. If you're in government, "two at twice the price" is a bargain.

The same thing happens in computer programming, for much the same reason - overheads. It's not just the transactions, the framework containing them has to be accounted for.

Here's a little hint that has made vast wodges of my programming more efficient: don't pass one, when you can pass an array.

A case in point would be my new Raster classes: The 'primitive' method for modifying the screen contents is a function called fragment() which takes a linear array of pixels, and paints the strip to the surface. My equivalent of drawPixel() stuffs the colour value into a one-pixel array, and then calls fragment().

Therefore, drawing a single pixel is actually a special case of drawing n pixels. Not the other way around.

So what? The difference is between calling a single function that internally loops around, (keeping all its state intact) or having a loop that calls the function n times. Those function calls aren't free, you know. And you'll rarely update only one pixel total. That's why my font routines are twice as fast.

Write the array case first. Optimize it. Then build the easy-use single-case version on top of that, because if the user is yanking single entries they don't care about performance. If they care, they'll be calling the array version to fast-fill buffers - and if that is merely a loop that calls the single case over and over, you've fooled them good and lost an opportunity for real performance.

Don't do one when you can do n at n times the price. Because the price gets worse when you lose scope or context, so build a 'vector machine' which amortizes the scope costs over as much data as possible. That's how Seymour Cray rolled.

Docs: Raster classes

Another part of the code I just released is a set of "Raster" classes and low-level device drivers for TFT/LCD screens connected to the Arduino. I use them to display my miniature IDE/GUI. They are, in many ways, a re-write of the excellent Adafruit GFX library. (Though with half as many devices, and missing some of the primitives)
  • Raster
    Abstract base driver class for drawing pixel data
    • Raster8 (Raster)
      8-bit colour interface
    • Raster16 (Raster)
      16-bit colour interface
      • ILI9325C (Raster16)
        16-bit colour driver for ILI9325. (subclassed by 'local' drivers)
      • ILI9325C_Pins (ILI9325C)
        example driver variant using digitalWrite to set pins (slow, but general)
      • ILI9325C_Leo (ILI9325C)
        example driver variant using direct port access (fast, but Leonardo-specific)
  • ST7735
    abstract base class for ST7735 24-bit LCD driver
      • ST7735_SPI (Raster16, SPIDevice, ST7735)
        driver variant using 16-bit colour data and SPI (subclassed by 'local' drivers)
  • RasterDraw
    Vector drawing primitives
    • RasterDraw16 (RasterDraw)
      16-bit optimized versions of drawing primitives
  • RasterFont
    Font drawing primitives
    • RasterFont16 (RasterFont)
      16-bit optimized versions of font primitives

Why on earth would I rewrite the GFX Classes? That's a major question. And one that needs an answer, because I've essentially forked the effort. There needs to be a good reason.

  • One reason is "244 bytes", which is the difference in compiled size between Raster and GFX code paths in my "raster_test" demo. Raster is smaller, (of course) and 240 bytes is not a trivial amount.
  • Another reason is "1,280 bytes", which is the size of the GFX font data block. You can't change, or cut down this font without messing with the library files and affecting all you projects. Very few projects use the entire font. Raster lets you use custom fonts.
  • Another is "200%", which is about how much faster the font-rendering code in Raster is compared with GFX. (Although GFX beats mine for rectangle/screen clearing)
  • But the primary reason is this: to make driver writing easier, while optimizing performance and code size. 

One big thing to remember about Arduino development: less methods == less space. Functions, especially virtual methods on objects, are not cheap. Let's look at the core drawing methods required to implement a new GFX 'hardware driver':

  • fillScreen(uint16_t color)
  • drawPixel(int16_t x, int16_t y, uint16_t color)
  • drawFastVLine(int16_t x, int16_t y, int16_t h, uint16_t color)
  • drawFastHLine(int16_t x, int16_t y, int16_t w, uint16_t color)
  • fillRect(int16_t x, int16_t y, int16_t w, int16_t h, uint16_t color)

The way this is done is by extending the GFX base class into a Device Driver class and overriding these methods. Several screen drivers are subclassed in this way:

                    Adafruit_GFX
                          |
        --------------------------------------
        |       |         |         |        |
     ST7735   TFTLCD   PCD8544   SD1331   HX8340B


Once those methods are overloaded, they are used by all the other GFX 'drawing primitives' like drawTriangle() and fillCircle(). The font rendering code in particular makes a lot of drawPixel() calls.

What's wrong with that? Let's say you want to add a new method like 'draw_bezier'. Where do you add it? If you subclass GFX and add it there, none of the 'driver' classes will inherit the behaviour. You would have to modify the original Adafruit source, or extend every driver class.

If we want to be able to extend GFX, then it can't be an ancestor class of the drivers. Also, it would be really nice to cut down the number of virtual methods to the absolute minimum. One would be ideal. Less code means less bugs.

This is where we separate the concepts of a "Raster Device", and a "Rasterizer". The device must be persistent, since it relates to physical hardware. But the chunks of code that draw pretty pictures to the device (whether lines or curves or fonts) should be able to come and go independently - even be applied to multiple devices.

It's also probably a good idea to acknowledge that some raster devices are 8 bit, and others are 16 bit. (And leave room for other depths) Sure it would be nice to abstract that, but the hit to performance is pretty intense. Abstraction is the enemy of performance. It's always faster when you know what you're doing.

While we're at it, we might as well split the 'drawing' and 'font' code so they're more granular, similar to the HTML5 Canvas concept of 'contexts'. Maybe there will be a 3D rasterizer some day...

          Raster                  RasterDraw          RasterFont
            |                          |                   |
       -----------         =====  RasterDraw16  ====  RasterFont16
       |         |        //           |                   |
    Raster8   Raster16 <===        ---------           ---------
                 |                 |   |   |           |   |   |      
             ---------           (new primitives)     (user fonts)
             |       |       
          ST7735  ILI9325C
             |       |
           -------------
           |     |     |
          (local drivers)

The trade-off is that we now have to instantiate extra objects and connect them together - first the Raster device, then that device gets wrapped in the 'Draw' and 'Font' objects. One advantage that gives; if memory gets really, really tight, we can tear-down the 'draw' objects (and re-instantiate them later) and leave the device driver loaded.

Raster Drivers only need to implement one virtual method to draw pixels:

  • fragment(word x, word y, byte dir, Page * pixels, word index, word count)

This method draws a 'strip' of pixels to the surface, starting at a position, and extending in one of the four cardinal directions (up, right, down, left) for a specific count of pixels. The raw pixel data is read from a virtual memory page, and assumed to be in the correct format for the device.

Most LCD driver chips implement the equivalent of this function in hardware, because they're good about rotations, and avoid having a preferred orientation. So we're mapping directly to hardware capabilities, not human abstractions.

It should be pretty clear that this one method can draw single pixels, short line segments, and therefore everything else. This interacts with my virtual memory classes - the page of data might be a WordPage which repeats the same value for the entire block. (creating flat colour) Or it may be a MemoryPage array of pixel values corresponding to part of a font character or bitmap image.

The 'direction' parameter means it is equivalent to the drawFastVLine() and drawFastHLine() methods as well as drawPixel(), so we've got three methods for the price of one.

The only ones left are fillScreen() and fillRect(), which are trivially implemented by calling fragment() once per row. And that code is logically shifted over to the RasterDraw class, to be extended with all kinds of new ways of drawing rectangles with frilly edges.

So, people writing hardware drivers only need to implement one abstract method. (ok, the colour lookup makes two...) And once that method is tested and optimal, any separately-written custom rendering code can wrap around that new driver. Driver and draw classes don't have to be descended from each other, or even know about each others' existence. (Important if you are creating local subclasses)

I'm not sure I can make it any more optimal than that.

It's not totally done, of course. RasterDraw doesn't do circles or arbitrary lines yet, simply because I haven't had a need, or the time. (It would be trivial to port Bresenhams algorithms, but I wonder if a bezier curve algorithm might not be more generalized...) But the interface 'contract' is in place, so that doesn't have to be my job anymore.

Share and Enjoy.


... Afterward On Colour Depth

Arduino code doesn't have a concept of 'Variant', and we're not going to create a generalized 'Color' class because performance will drop through the floor if we have to create and destroy an object per pixel.

The bit depth is largely an issue of how the rest of the code stores and presents colour data. Even if a 16-bit colour screen was suddenly attached, the program can't invent more depth for 8-bit images in storage, it can only convert the pixel format. That could be done by creating a 'virtual' Raster device that has a fragment() method that converts the data before passing it on to a 'real' driver. Such 'adaptor drivers' (actually done mostly using an 'adaptor Page' for the raw data conversion) are pretty easy to make, again because only one method needs to be proxied.

In fact, there can be advantages to internally 'pretending' everything is only 8-bit (or mono) right until it gets to the driver (which might be 24 or 32 bits) - reduced storage and buffer costs mostly, even if we don't get a speed boost because of conversion costs.

Docs: JournalFS and TokenFS classes.

I released a chunk of Arduino code this week, and one of the classes is a "journalling" (or "log-structured") filesystem for extremely small Flash/EEPROM storage, using absolutely minimal resources, in about three hundred lines of code.

Compiled size is about 660 bytes for an instance of TokenFS. The virtual memory classes add about 930 bytes more, if you're starting from naught, but I think that includes EEPROM.h and some other system dependencies too.

When I say 'extremely small', I mean the 1024 bytes of EEPROM in the Atmega32U4 inside the Arduino. One kilobyte of storage, total. That's smaller than the minimum block size of many other filesystems.

The primary job of such a filesystem is to reduce degradation of the storage, by smoothing out the write access pattern so damage is spread evenly over the whole chip, and not concentrated in a few 'hot blocks'. Flash storage is generally guaranteed 10,000 'write cycles' before media errors start to occur. That's per byte location, so the perfect ideal is to have every byte on the storage written 9,999 times before you have to tick over the 10K odometer on any of them.

It's like having a blackboard, but where the eraser is made from sandpaper. It works - but not forever.

One nice thing: the Atmel AVR's EEPROM (and program flash on some variants) is 'hardened' to endure 100K write cycles, which partly makes up for the tiny size.

JournalFS

  • Levelled write distribution over entire storage.
  • Resistant to corruption, power fails, and zero space issues.
  • Guaranteed success if volume free space equals or exceeds block size.
  • Minimal RAM use - head and tail pointer state only. (a dozen bytes)
  • Performs automatic incremental and 'emergency' defragmenting.
  • Basic journal - only understands blocks
  • 256 bytes per block maximum (currently) 
  • 5 bytes per block overhead, 100% storage utilization. 
  • 64K storage size limit.
  • Storage is scanned/checked on mount, twice.
  • Tested on AVR Arduinos, but should work on SAM machines as well.
  • Virtual Memory classes mean filesystems can be stored in RAM, EEPROM, Program Flash (read only currently), or user-defined storage like SD cards.


Built on top of the base class is what most people think of as a 'filesystem', which is an indexed catalog of things in the storage. In this case, a single byte 'token' key is used instead of a filename string.

TokenFS

  • Adds 'byte token' index on top of JournalFS. 
  • Index size passed as instance parameter - maximum 256 'entries'.
  • Moderate RAM use: the index requires one word (two bytes) per entry. (max 512 bytes)
  • Writing data block for a key replaces existing data.
  • Writing empty blocks for a key is equivalent to 'file deletion'
  • Guaranteed success if replacement block is same size or smaller than previous version.
  • No practical difference between 'missing entries' and 'zero length files' - all keys return a result.
  • Test cases wrote 200,000,000 random 'file updates' to a 1K RAMDrive without unexpected data loss - exceeding Flash hardware reliability by 1,000 times.
What's the difference? JournalFS takes care of the utterly 'low' level details, but tries not to worry too much about what it's storing. By itself it would be perfect for diary-like 'logfiles' where you just append another record every once in a while, and throw away the oldest to make room.

TokenFS assumes the first byte stored in each block is a 'token key', and maintains an in-RAM index of all the keys it finds. The index is a catalog of pointers to the storage location where the 'latest' version of that key is kept. This is kind of like the logfile, but instead of throwing away the global oldest, it's smarter about throwing away obsolete blocks for individual keys.

TokenFS is the exemplar for how to extend JournalFS by overriding block_state(), which is how JournalFS notifies descendants about blocks found, and asks if they are still valid. 

I need to make it very clear that this "Filesystem" will not scale. It is not intended for large storage devices. The primary issue is the need to scan the filesystem on mount to 'catch up' with the journal - which means reading the headers of all the blocks in storage. When you have a ridiculously small amount of storage, that's fine. If you have megabytes, that time becomes significant.

But if you need to store a bunch of 'configuration options' in an Arduino that are independent of the program flash, this is the way.

Usage

Loading the library and defining some storage is pretty easy. This is how the RAMDrive is initialized in the filesystem test.

  #include <unorthodox.h>      // include the library, of course
  
  byte storage[1024];          // create a 1024 byte buffer
  MemoryPage store(storage);   // wrap it in a virtual page
  TokenFS fs(&store,1024,192); // 192-entry tokenfs 

In most cases you'll be using EEPROM, which is even simpler.

  #include <EEPROM.h>          // do this first
  #include <unorthodox.h>      // to switch on more stuff in here
  
  EEPROMPage eeprom(0);                   // wrap EEPROM in a virtual page
  TokenFS fs(&eeprom, 1024, entries);     // start the token index

If you wanted to use only a portion (say, leaving the first 128 bytes free) you can do so:

  EEPROMPage eeprom(128);                 // start at byte 128
  TokenFS fs(&eeprom, 1024-128, entries); // use until the end

Writing TokenFS entries is currently a little clumsy, for various reasons. To avoid re-packing the block later, we prepend the token byte ourselves. (It has been a challenge making it 'seem right' but also have good performance.)


  byte block[size+1];
  // manually prepend the token byte
  block[0] = token;
  // fill the rest of the block with data:
  // ....
  // wrap the array in virtual memory
  MemoryPage page(block);
  // write it to the filesystem
  if( fs.token_write(token, &page, size+1) ) {
     // it fit. yay!
  } else {
     // it didn't. boo!
     Serial.print("\n Write Failed ");
  }
  // page is automatically freed when we leave scope

Reading entries back from TokenFS is the most straightforward, token_page() will return a virtual memory page containing the payload content, (or null) and token_size() will tell you how much of it you should read. The token byte is not included.

  int size = fs.token_size(token);
  if(size) {
    Page * page = fs.token_page(token);
    for(int i=0; i<size; i++) {
      byte b = page->read_byte(i);
      // process each byte...
    }
    // free page object
    delete page;
  }

Generally, it's a good idea to call token_size() first, because if  zero, there's no point calling token_page().  Note the Page object is allocated by the call, so delete it when you are done to avoid memory leaks. (Treat it similarly to a 'file handle'.)

If you really need speed, you can avoid instancing new virtual memory Page objects by interrogating the token index directly to get the storage offsets.

There is generally no need to 'format' a volume before use - 0x00 or 0xFF filled blocks (the most common occurrence for EEPROM) are read as 'corrupt' blocks, which indicates an empty journal. If no existing journal can be found, a new one is started from scratch. The likelihood of random data being recognized as valid blocks is very small.

The exception would be if there had previously been a filesystem installed that was similar in structure, but actually different. In that case, clearing out the EEPROM may be necessary. (Or at least the first dozen bytes)

Applications

To understand why you want all this, imagine you have ten 'options' that you keep in the EEPROM for motor speeds and current limits and other important things. (If you keep it in program flash, it gets blown away with every software update. It's not good for a bugfix to reset all the controller defaults, sometimes.)

If you put them in ten well-known locations in the EEPROM (which is the simplest method - you just document the locations and manage them with newsgroups, not code) then your microcontroller will become corrupt when the first of those options is overwritten 10,000 times. We assume it's a hard wall, to be conservative.

But if we're using a diary-like journal, we need to fill the whole storage with updates before we loop and start to overwrite the old locations again. If the same few records are being changed over and over, this is 'journalled' across the whole storage, and you will get ten or a hundred times as many updates before the storage begins to corrupt.

10,000 updates sounds like a lot, but that's once an hour for 400 days. Lots of systems recalibrate once an hour. Or restart twenty times a day. Journalling can extend 'expected lifetime' of an embedded system from a year, to decades. That is not a trivial outcome.

The trick with these kinds of filesystems is to never store the index inside the volume. Because any central directory catalog becomes the 'hot-spot' that logically must be updated more often than the files it contains. That's why TokenFS must rebuild its RAM index each time the volume is mounted. This is the trade-off... without a "pre-compiled" index available, it takes much longer to start up. It has to read the whole diary to the end.

Limitations

  • JournalFS block size is currently encoded in one byte, so up to 256 bytes per block.
  • TokenFS uses the first byte as a key token, so maximum payload is 255 bytes.
  • 16-bit words are used for storage pointers, so 64K is the maximum volume size.
  • As mentioned, the need to do a full 'journal scan' may exceed acceptable time limits on large volumes.
  • Defragmenting 'thrash' increases as the free space approaches zero. In the worst case of a completely full volume, it must defragment the entire storage (burning up one entire pass) in order to update a single entry. Do not run the filesystem at 100% capacity. Try to keep 10% free, enough for a couple of blocks at least, so the defragmenter can find space without too much trouble.
  • Writing will invalidate any Page objects already obtained from token_page(), since defrag will move storage blocks around. Don't cache pages, look them up when you need them.  

Non-Limitations

  • Blocks can be larger than the amount of available RAM.

Wednesday, March 5, 2014

Release: unorthodox-arduino library on GitHub


https://github.com/unorthodox-engineers/unorthodox-arduino

General purpose library with many useful things for the Arduino Leonardo: Flash file system, Raster/GUI, Real-time Robotics kernel, Hardware drivers, Virtual memory, Binary Trees, Tries, Associative maps, and Streaming Micro-parser.


PREVIEW RELEASE V0.1

I'm releasing this "as is", even though many parts are still a mess and need work, because there's enough useful stuff in here to be worth it. But there are no examples, very little documentation, and not a lot of explanations for the ass-backwards way I seem to do some things in the name of optimization. 

~ Jeremy


USING

Just copy the "Unorthodox" folder to your Arduino "libraries" directory, and #include <Unorthodox.h> in the usual way.
BUT: Please include all the standard libraries you intend to use (like SPI, EEPROM and Wire) _first_ because the Unorthodox lib will only declare dependant classes (such as EEPROMPage or SPIDevice) if the requirements are _already_ loaded.

The library is currently "AVR only" simply because of the directory structure, and I don't have SAM hardware to test on, but most of the classes should port without issue. And yes, I know it takes a long time to compile - everything is in header files rather than .cpp, for various stupid reasons.

LICENCE

In plain English: If you intend to use this code for projects that are personal, non-profit or educational in nature, then you are considered an "academic peer" and have fair use rights so long as attribution is given. If you intend to resell or commercially profit from this code as part of some product or service, then I will require you to enter into a commercial licence for its use.


CONTRIBUTIONS

I am not actively seeking contributors or patches at this time, though of course bug reports and feedback are always welcome. The code is not technically "open source", not yet anyway. I don't even know if other people will find it useful.

VIRTUAL MEMORY CLASSES


  • Page
  • ReadPage
  • MemoryPage
  • NearProgramPage
  • FarProgramPage
  • EEPROMPage
  • ZeroPage
  • BytePage
  • WordPage
  • Cardinal

DATA STRUCTURE CLASSES


  • PrefixTree
  • PrefixNodeResult
  • RedBlackTree
  • Map
  • MapNode

STREAMING PARSER CLASSES


  • Cursor
  • BufferCursor
  • PageCursor
  • NumberCursor
  • PrefixCursor

FILE SYSTEM CLASSES


  • JournalFS
  • TokenFS

HARDWARE DEVICE CLASSES


  • Device
  • DHT11
  • I2CDevice
  • MPU6050
  • SPIDevice
  • MAX6957
  • ENC28J60

RASTER DEVICE CLASSES


  • Raster
  • Raster8
  • Raster16
  • ILI9325C
  • ILI9325C_Pins
  • ILI9325C_Leo
  • ST7735
  • ST7735_SPI
  • RasterDraw16
  • RasterFont16

ROBOTICS CLASSES


  • Debounce
  • DroidBeeps
  • Motivator
  • BaseDroid
  • RasterDroid

GUI CLASSES


  • RasterSpan
  • RasterPager
  • SignalsPager
  • SourcePager
  • CodesPager