I added support for JSON Feed to my homemade static site generator Majestic today, and thought I’d note it because funnily enough the two implementations mentioned by John Gruber (by Niclas Darville and Jason McIntosh) used the approach I’d taken for generating my RSS feed and wanted to avoid.
Basically all three of those define a document template and pass in the posts and other required bits, and you’re done. I’m really not knocking this — again, I do this with the RSS feed and it validates fine. It’s all good.
But I ended up templating my RSS feed like this because I looked at the feedgenerator module and ran away. Majestic was my first Python project of any real size and I wanted to keep things as straightforward as I could. While it looks (with hindsight) reasonably OK in use, it doesn’t have any documentation, has been pulled out of Django, and has funky class names (
Rss201rev2Feed) that didn’t fill me with confidence that I could implement an RSS feed quickly.
I was using Jinja templating for the site and since HTML and XML are cousins just did that. But you can probably tell that I didn’t really know what I’m doing (still don’t!) with escaping as any field that might contain non-Ascii characters is wrapped in
But hey, it works. Feed’s valid.
With JSON, everything just feels much more obvious. In Python you hand off basic types to the built-in json module and you get back a string, all the encoding taken care of. And if I make a mistake Python will complain at me, instead of just dumping out a file of questionable worth.
I think this is what all the people complaining on the Hacker News thread missed. Working in JSON is comfortable and familiar — the tools are good and you get told when something goes wrong. Working with XML can be unclear and a bit of a pain, and creating an invalid document is a risk.
So my super-duper advanced JSON Feed implementation is… constructing a
dict, adding things to it and passing it off to the json module that I use all the time. Taken care of. The code’s so boring I’m not even going to include it here (but it’s online to view).
Diamond Geezer asks: “Why do we never end up in the middle?”
It’s unfair to pick on him, but I will because he posted on a day when my annoyance at centrist liberals has well and truly peaked.
First off, the “centre ground” is a concept that is entirely relative. When Jeremy Corbyn campaigned to become and was elected leader of the Labour Party in 2015, he managed to shift the centre ground — the Tories very quickly ditched a plan to bomb the Syrian government.
The centre ground is inherently unstable because it only exists relative to the two dominant forces either side. At our present moment that’s a fairly right-wing Conservative Party and a reasonably social democratic Labour Party. Any “centrist” must define themselves in opposition to their closest opponents on the left and right.
Ultimately if you do that it means you have no principles, nothing that anchors you on the left-right axis. In reality — as much as we joke about spineless politicians — few define their positions in this way and instead the “centre” in various countries is the home to a party that has “right-wing” economic policies and “left-wing” social policies. In Britain that would be the Liberal Democrats, despite Tim Farron’s recent attempts to win over the homophobes.
Left and right are in scare quotes above because this shows the point at which the left-right axis breaks down.
Ultimately the idea of centrism is bankrupt. Politics is a clash of interests. The ideas of the “centre ground,” of the “national interest,” are rubbish. Howard Zinn put it best in his People’s History of the United States:
Nations are not communities and never have been. The history of any country, presented as the history of a family, conceals the fierce conflicts of interest (sometimes exploding, often repressed) between conquerors and conquered, masters and slaves, capitalists and workers, dominators and dominated in race and sex.
As a socialist, to use our compromised axis, the boss class sits on the right and the workers on the left. Given that the boss class is but a tiny sliver of the population, what credibility does a “centrist” party have, one that pretends to balance the desires of the exploited and the exploiters?
It this this “refreshing centrism” that irks me the most, as it is always right-wing economic policies paired with some ameliorating factor — support for gay marriage, say — to assuage the liberals.
But if you’re gay, does being able to officially consecrate your relationship make up for the fact that you spend half your wages on rent?
This has run on, so let’s talk about Emmanuel Macron. The Guardian loves him, noting (without the expected contradicting clause) that it “is tempting … to conclude that European liberal values have successfully rallied to stop another lurch to the racist right.”
And so Macron, an explicit neoliberal, is raised up having defeated (we’ll see) the fascist Marine Le Pen.
The celebration is of liberal values, embodied by Macron. But Macron’s liberal values go a long way to explain the surge in support for France’s fascist National Front, as Cole Stangler shows. His liberal values are likely to increase “unemployment, inequality and poverty” through his right-wing economic policies — along the lines of the French law that bears his name (loi Macron) and hacked away at workers’ rights.
The assault on workers’ rights and public services has been ongoing for nearly 40 years yet liberals and centrists deride the term that describes our current phase: neoliberalism.
The refusal to recognise this trend puts us in a position where the Guardian celebrates the likely victory of Macron, cheering his defeat of the fascists in blissful ignorance. But his political current is the reason why we have ended up with the fascists contesting the second round of the French presidential election (again).
Faced with falling employment and living standards for four decades and (generally) abandoned by the organised left, people have turned to those who promise to take action to improve their material conditions.
Yet Macron’s policies will just exacerbate these problems. This isn’t the end of the fascist challenge in France; should Macron win and pursue his neoliberal programme we could well be in the same situation in five years’ time.
(Unless, potentially, the French left organises a strong anti-fascist campaign like that waged in Britain from the 1970s to the present time, in which the fascists have more or less been suffocated.)
This isn’t a “bold break with the past,” it is the continuation of the rule of the boss class with a fresh coat of paint.
At work we deal a lot with PDFs, both press quality and low-quality for viewing on screen. Over time I’ve automated a fair amount of the creation for both types, but one thing that I haven’t yet done is automate file-size reductions for the low-quality PDFs.
(We still use InDesign CS4 at work, so bear in mind that some or all of the below may not apply to more recent versions.)
It’s interesting to look at exactly what is making the files large enough to require slimming down in the first place. All our low-quality PDFs are exported from InDesign with the built-in “Smallest file size” preset, but the sizes are usually around 700kB for single tabloid-sized, image-sparse pages.
Let’s take Tuesday’s arts page as our example. It’s pretty basic: two small images and a medium-sized one, two drop shadows, one transparency and a fair amount of text. (That line of undermatter in the lead article was corrected before we went to print.)
But exporting using InDesign’s lowest-quality PDF preset creates a 715kB file. The images are small and rendered at a low DPI, so they’re not inflating the file.
Thankfully you can have a poke around PDF files with your favourite text editor (BBEdit, obviously). You’ll find a lot of “garbage” text, which I imagine is chunks of binary data, but there’s plenty of plain text you can read. The big chunks tend to be metadata. Here’s part of the first metadata block in the PDF file for the arts page:
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP […]">
… Blah blah blah exif data etc …
Which is the none-too-exciting block for one of the images, a Photoshop file. There’s two more like this, roughly 50–100 lines each. Then we hit a chunk which describes the InDesign file itself, with this giveaway line:
<xmp:CreatorTool>Adobe InDesign CS4 (6.0.6)</xmp:CreatorTool>
So what, right? InDesign includes some document and image metadata when it exports a PDF. Sure, yeah. I mean, the metadata blocks for the images weren’t too long, and this is just about their container.
Except this InDesign metadata block is 53,895 lines long in a file that’s 86,585 lines long. 574,543 characters of the document’s 714,626 — 80% of the file.
I think it’s safe to say we’ve found our culprit. But what’s going on in those 54,000 lines? Well, mostly this:
<stEvt:instanceID>xmp.iid:[… hex ID …]</stEvt:instanceID>
<stEvt:softwareAgent>Adobe InDesign 6.0</stEvt:softwareAgent>
<stEvt:instanceID>xmp.iid:[… hex ID …]</stEvt:instanceID>
<stEvt:softwareAgent>Adobe InDesign 6.0</stEvt:softwareAgent>
<!-- 1,287 more list items -->
It’s effectively a record of every time the document was saved. But if you look at the
stEvt:when tag you’ll notice the first items are from 2012 — when our “master” InDesign file from which we derive our edition files was first created. So, the whole record of that master file is included in every InDesign file we use, and the PDFs we create from them.
Can we remove this metadata from InDesign? You can see it in
, select it and press the rubbish bin icon. Save, quit, reopen and… it’s still there.
Thankfully Acrobat can remove this stuff from your final PDF, by going through the “PDF Optimizer” or “Save Optimized PDF” or whatever menu item it’s hiding under these days. (In the “Audit Space Usage” window it corresponds to the “Document Overhead”.)
Our solution at work has been to cut the cruft from the PDF using Acrobat when we use it to combine our separate page PDFs by hand. But ultimately I want to automate the whole process of exporting the PDFs, stitching them together in order, and reducing the file size.
After using ghostscript for our automatic barcode creation, I twigged that it would be useful for processing the PDFs after creation, and sure enough you can use it to slim down PDFs. Here’s an example command:
gs -sDEVICE=pdfwrite \
-dNOPAUSE -dQUIET -dBATCH \
Most of that is ghostscript boilerplate (it’s not exactly the friendliest tool to use), but the important option is
-dPDFSETTINGS=/screen which, according to one page of the sprawling docs, is a predefined Adobe Distiller setting.
Using it on our 715kB example spits out an 123kB PDF that is visually identical apart from mangling the drop shadows (which I think can be solved by changing the transparency flattening settings when the PDF is exported from InDesign).
In my previous post about page speed, I mentioned that I’d written my own site generator. I’m not quite ready to talk specifically about it — I want to write some documentation first — and really I doubt that anyone but me should be using it.
But, having set up publishing to Amazon S3 today, I wanted to write up how I publish this blog to multiple places so that it’ll be around whatever (within reason) might happen.
Majestic’s configuration files are set up in such a way that you have have a default settings file in a directory —
settings.json — and you can specify others that make adjustments to that.
In my case the main settings file contains the configuration for publishing to my own server (hosted at Linode) — not the nitty gritty of how to get it on to the server, but what the URLs, site title, etc should be. (It’s online if you want to have a nose around.)
Then I have two extra JSON files:
s3.robjwells.com.json, which contain the customisations for publishing for those domains. Here’s the config for GitHub in full:
"title": "Primary Unit mirror on GitHub",
"description": "A mirror of https://www.robjwells.com hosted on GitHub"
"output root": "gh-pages"
site.url is important because of the way my templates render article links (though my markdown source contains only relative links that work anywhere). And
paths.output root just specifies the build directory where the HTML files get written.
All the moving parts are contained in a makefile which can build all three of my destinations. Here it is in full:
NOW = $(shell date +'%Y-%m-%d %H:%M')
DISTID = $(shell cat cloudfront-distribution-id)
rsync -zv -e ssh www.robjwells.com.conf
rsync -azv --delete -e ssh site/
cd gh-pages ; git add . ; git commit -m "$(NOW)" ; git push
aws s3 sync s3 s3://s3.robjwells.com --delete
aws cloudfront create-invalidation
all: robjwells github aws
force-all: force-robjwells force-github force-aws
majestic --settings=robjwells.github.io.json --force-write
majestic --settings=s3.robjwells.com.json --force-write
force-* options rebuild the whole site, not just files which have changed.)
And, really, that’s all it takes to publish to multiple hosts (once you’re set up at each one, of course).
My own server is just a vanilla rsync command, with an extra one because I keep my Nginx server config locally too.
For GitHub pages the
gh-pages folder is a git repository, so
make github regenerates the site into that folder, commits the changes with a timestamp as the message, and pushes the changes to GitHub. (It’s all on the same line with semicolons because the
cd into the directory doesn’t hold across lines in the makefile.) Because the GitHub repository is set up to publish, the rest is sorted out on their end.
And for S3 I just use the official AWS tool (
brew install awscli if you’re on macOS) — the CloudFront line is because I use it to speed up the S3 version and I want to make sure an updated front page is available reasonably quickly, if not anything else.
There’s a bit of overhead setting all of these up but once you do it doesn’t have to be any more work to keep each host updated. For me it’s just a
make all away.
Since I moved the blog off Tumblr, I’ve tried to make it reasonably quick. Reasonably because it’s come in waves — waves of me saying: “Oh, that’s fine” and then deciding that whatever it is isn’t fine and needs to go.
Tumblr used to whack in about 1MB of extra guff for its own purposes. If you’re posting endless gifs then that’s not something you’ll notice, but when you’re mostly dealing in text it’s pretty obvious.
When I redesigned the site almost four years ago I remember chafing at this, but it wasn’t until I settled on my own site generator that I had the chance to really whittle things down.
Much of that was lopping off unneeded stuff such as jQuery. It did succeed in getting the size down — to about 150–200kB for an average post. But I’ve recently made a few changes to speed things up that I wanted to talk about.
Much of this has come about after reading Jacques Mattheij’s The Fastest Blog in the World and Dan Luu’s two excellent posts Speeding up this blog by 25x-50x and Most of the web really sucks if you have a slow connection. (And, well, of course, Maciej. You should really read that one.)
For ages this site used Rooney served by Typekit as its main font. I love Rooney, it’s great. But using web fonts always means sending lots of data.
Despite thinning down the included characters (Typekit allows you to choose language support) and forgoing the use of double emphasis (
<em><strong>), serving up three weights of Rooney still clocked in at over 100kB.
I’m looking at Rooney now and it is gorgeous, but there’s no way I could or can justify it — the fonts collectively would usually outweigh anything else on a page. So it went, in favour of Trebuchet MS, which I’ve long had a soft spot for.
Not related to the bytes served up but switching my registrar’s (Hover’s) name servers for Cloudflare helped cut about 100ms in DNS response times (from about 150ms).
You can host your DNS with Cloudflare for free without using any of their other caching services (I don’t), and Cloudflare is consistently one of the fastest DNS hosts in the world.
Up until now I’d been using highlighting.js to colour code snippets, and I’d been very happy with it. It’s a nice library that’s easy to work with and easy to download a customised version for your own use.
But really that wasn’t enough for me. A few things annoyed me:
- Syntax highlighting had to be performed on every view on the client device at readers’ expense.
- I could only highlight those languages I’d included in my library.
- The library included all of my selected languages no matter what was on the page.
This wasn’t ideal.
The Markdown module I use has support for syntax highlighting, but there were some deficiencies with it that had led me to pick the client-side highlighter in the first place, several years ago.
It wasn’t difficult to fix that, however. Taking inspiration from Alex Chan, I modified the included Codehilite extension to match my requirements, which were to handle a “natural-looking” language line and line numbers in the Markdown source. You can see the source online, but it’s pretty rough and I need to tidy it up. (It also uses Pygments’s inline line numbers, instead of the table approach which I’ve seen out of alignment on occasion.)
In all, I’ve gone from a baseline of roughly 160kB to 10-15kB per image-free post. It’s not the fastest blog in the world, especially if you’re far away from Linode’s London datacentre, but it should be pretty nippy.
There are some things which I’ve rejected so far.
This could make the page render faster, at the expense of transferring data that would otherwise be cached when viewing other pages. (But, assuming the blog is like most others, most will only visit a single page.)
But I feel a bit icky about munging separate resources together, and in mitigation the site is served over HTTP/2 (which most of the world supports) and inlining is an anti-pattern.
Sack off the traditional front page.
Yeah, I felt this one acutely recently after posting all those dodgy but massive Tube heat maps, sitting towards the bottom of the front page and inflating its size.
Dan Luu has his archives as the front page, which is svelte but extreme for my tastes. We’ll see about this one.
Ditch your CSS.
Yeah, I know. (Well.) But I like pretty things. And it’s only about 2.5kB.