It’s a real shame that O’Reilly have now closed their store and are now only making their books available to people who subscribe to their Safari service or buy worse versions from Amazon.

Over the past four-ish years I’ve spent roughly $250 on O’Reilly books, and they’ve been a great way to learn new technical topics. Learning Web Design by Jennifer Robbins made something I’d previously found intimidating into a straightforward and enjoyable endeavour. Earlier this year I spent some time learning SQL with the Head First SQL book by Lynn Beighley and really enjoyed it. At the moment I’m reading Mark Lutz’s Learning Python — starting from the basics and working through what is quite a thick book (1,600 pages!) to improve my Python skills and make sure I’ve got the fundamentals nailed down.

They’re all great. I love the PDF versions as they’re incredibly well-typeset and fit perfectly on an iPad screen. It’s been nice to learn in the garden, at the kitchen table, in bed — all much more comfortable than sitting at my desk. And they’ve been updated repeatedly since I bought them to fix typos and other errors — the Learning Python errata page lists nearly 170 entries and the book has been updated 15 times since the publication of the third edition in June 2013, most recently this April.

The thing is, I’ve bought these books at various times over the past few years and generally only dipped into them to start and then come back later. Working through one will generally take me several weeks or even months. This is perhaps my third crack at Learning Python — not because it’s bad (it’s great!) but because I already knew enough Python to be dangerous and getting through the earlier sections took time and patience that I’ve not had until now.

This is not something that the Safari subscription allows for — particularly when as an individual and a hobbyist programmer I cannot begin to justify the $400 a year price. It’s another world to buying a $40 ebook so you can learn a particular topic or skill.

And what of reference books and cookbooks? I own both the Python Pocket Reference and the Python Cookbook — PDFs that I can dip into at any time and together would’ve been just $52 at full price.

You can still buy paper and Kindle versions of O’Reilly books through Amazon, but the PDFs are gone. Buying weighty print books on programming topics is not something I want to do — although JavaScript: The Definitive Guide, Sixth Edition and AppleScript 1-2-3 serve well lifting my iMac up 4″. And the Kindle versions share the typesetting problems of O’Reilly’s ePub versions (now also unavailable to purchase) and introduce Amazon’s DRM restrictions.

The tagline on the O’Reilly homepage is “Safari is how you learn.” It makes me feel a bit crap, frankly, as I can’t afford a Safari subscription (nor could I justify it if I could) and I don’t want to buy an inferior product from a third party.

Sadly that means I won’t be able to learn from O’Reilly books in the future.

I quite enjoy turning out little plots for posts on here. Admittedly I’m not great at it, but I like to have a go.

However, matplotlib really is not my favourite. It feels like there’s a lot of boilerplate to write and a lot of work to do before you get make something reasonably approaching what you had envisioned in your head.

So I thought I’d give R a try, and learn some things about visualisation along the way with Kieran Healey’s data visualisation course notes, which was fun.

But mostly in this post I wanted to show how ludicrously straightforward using ggplot2 can be compared with what you have to do in matplotlib. Let’s pick on my plot of train ticket prices from just before Christmas.

The Python code for that is quite long so I’m not going to include it, but it is available to view online. I’m not being completely fair because part of that involves getting the data into shape, and I’m sure there’s things I could’ve done to cut out a few lines.

That said, it took me a while to figure out exactly how to go about doing the plot in matplotlib, exactly how to, say, parse the dates and label the axis.

There was some of that with R and ggplot2, but mostly me looking things up in the documentation as I’ve not used them much. But mostly it was pretty straightforward to figure out how to build up the plot.

Anyway, here’s the plot:

A chart showing single train fares for selected journeys in England, France, Germany and the Netherlands on Friday December 23. This plot was made with R and ggplot2 instead of matplotlib.

And here’s the code that produced it:

 1 library(ggplot2)
 2 
 3 # Read in and convert string times to datetimes
 4 trains <- read.csv('collected.csv')
 5 trains$Time <- as.POSIXct(trains$Time, format = '%Y-%m-%dT%H:%M:%S')
 6 
 7 # Get the data onto the plot
 8 p <- ggplot(trains, aes(x = Time, y = Cost))
 9 
10 # 'Reveal' the data with points and show the
11 # East Mids price trend with a smoother
12 completed <- p + geom_point(aes(color = Operator)) +
13   geom_smooth(data = subset(trains, Operator == 'East Midlands Trains'),
14               aes(group = Operator, color = Operator),
15               method = 'loess', se = FALSE,
16               size = 0.75, show.legend = FALSE) +
17 
18   # Let's adjust the scales
19   scale_x_datetime(date_breaks = '1 hour',
20                    date_labels = '%H:%M') +
21   scale_y_continuous(limits = c(0, 100),
22                      breaks = seq(10, 100, 10),
23                      expand = c(0, 0)) +
24 
25   # Set some labels and adjust the look
26   labs(title = paste('Cost of single train tickets',
27                      'leaving European\ncapital cities',
28                      'on Friday December 23 2016'),
29        y = 'Ticket cost (€)',
30        color = 'Train operator') +
31   theme_bw(base_family = 'Trebuchet MS') +
32   theme(plot.title = element_text(hjust = 0.5))
33 
34 ggsave('plot.svg', plot = completed, device = 'svg',
35      width = 8, height = 4, units = 'in')

I’m still figuring things out with R and ggplot so I’m not exactly blazing through. (I still haven’t figured out how to export transparent SVGs without editing them by hand.)

But I love the way that plots are built up out of individual pieces, which makes far more sense to me than trying to wrangle matplotlib’s figures and axes.

I added support for JSON Feed to my homemade static site generator Majestic today, and thought I’d note it because funnily enough the two implementations mentioned by John Gruber (by Niclas Darville and Jason McIntosh) used the approach I’d taken for generating my RSS feed and wanted to avoid.

Basically all three of those define a document template and pass in the posts and other required bits, and you’re done. I’m really not knocking this — again, I do this with the RSS feed and it validates fine. It’s all good.

But I ended up templating my RSS feed like this because I looked at the feedgenerator module and ran away. Majestic was my first Python project of any real size and I wanted to keep things as straightforward as I could. While it looks (with hindsight) reasonably OK in use, it doesn’t have any documentation, has been pulled out of Django, and has funky class names (Rss201rev2Feed) that didn’t fill me with confidence that I could implement an RSS feed quickly.

I was using Jinja templating for the site and since HTML and XML are cousins just did that. But you can probably tell that I didn’t really know what I’m doing (still don’t!) with escaping as any field that might contain non-Ascii characters is wrapped in <![CDATA[…]]> tags.

But hey, it works. Feed’s valid.

With JSON, everything just feels much more obvious. In Python you hand off basic types to the built-in json module and you get back a string, all the encoding taken care of. And if I make a mistake Python will complain at me, instead of just dumping out a file of questionable worth.

I think this is what all the people complaining on the Hacker News thread missed. Working in JSON is comfortable and familiar — the tools are good and you get told when something goes wrong. Working with XML can be unclear and a bit of a pain, and creating an invalid document is a risk.

So my super-duper advanced JSON Feed implementation is… constructing a dict, adding things to it and passing it off to the json module that I use all the time. Taken care of. The code’s so boring I’m not even going to include it here (but it’s online to view).

Diamond Geezer asks: “Why do we never end up in the middle?”

It’s unfair to pick on him, but I will because he posted on a day when my annoyance at centrist liberals has well and truly peaked.

First off, the “centre ground” is a concept that is entirely relative. When Jeremy Corbyn campaigned to become and was elected leader of the Labour Party in 2015, he managed to shift the centre ground — the Tories very quickly ditched a plan to bomb the Syrian government.

The centre ground is inherently unstable because it only exists relative to the two dominant forces either side. At our present moment that’s a fairly right-wing Conservative Party and a reasonably social democratic Labour Party. Any “centrist” must define themselves in opposition to their closest opponents on the left and right.

Ultimately if you do that it means you have no principles, nothing that anchors you on the left-right axis. In reality — as much as we joke about spineless politicians — few define their positions in this way and instead the “centre” in various countries is the home to a party that has “right-wing” economic policies and “left-wing” social policies. In Britain that would be the Liberal Democrats, despite Tim Farron’s recent attempts to win over the homophobes.

Left and right are in scare quotes above because this shows the point at which the left-right axis breaks down.

Ultimately the idea of centrism is bankrupt. Politics is a clash of interests. The ideas of the “centre ground,” of the “national interest,” are rubbish. Howard Zinn put it best in his People’s History of the United States:

Nations are not communities and never have been. The history of any country, presented as the history of a family, conceals the fierce conflicts of interest (sometimes exploding, often repressed) between conquerors and conquered, masters and slaves, capitalists and workers, dominators and dominated in race and sex.

As a socialist, to use our compromised axis, the boss class sits on the right and the workers on the left. Given that the boss class is but a tiny sliver of the population, what credibility does a “centrist” party have, one that pretends to balance the desires of the exploited and the exploiters?

It this this “refreshing centrism” that irks me the most, as it is always right-wing economic policies paired with some ameliorating factor — support for gay marriage, say — to assuage the liberals.

But if you’re gay, does being able to officially consecrate your relationship make up for the fact that you spend half your wages on rent?

This has run on, so let’s talk about Emmanuel Macron. The Guardian loves him, noting (without the expected contradicting clause) that it “is tempting … to conclude that European liberal values have successfully rallied to stop another lurch to the racist right.”

And so Macron, an explicit neoliberal, is raised up having defeated (we’ll see) the fascist Marine Le Pen.

The celebration is of liberal values, embodied by Macron. But Macron’s liberal values go a long way to explain the surge in support for France’s fascist National Front, as Cole Stangler shows. His liberal values are likely to increase “unemployment, inequality and poverty” through his right-wing economic policies — along the lines of the French law that bears his name (loi Macron) and hacked away at workers’ rights.

The assault on workers’ rights and public services has been ongoing for nearly 40 years yet liberals and centrists deride the term that describes our current phase: neoliberalism.

The refusal to recognise this trend puts us in a position where the Guardian celebrates the likely victory of Macron, cheering his defeat of the fascists in blissful ignorance. But his political current is the reason why we have ended up with the fascists contesting the second round of the French presidential election (again).

Faced with falling employment and living standards for four decades and (generally) abandoned by the organised left, people have turned to those who promise to take action to improve their material conditions.

Yet Macron’s policies will just exacerbate these problems. This isn’t the end of the fascist challenge in France; should Macron win and pursue his neoliberal programme we could well be in the same situation in five years’ time.

(Unless, potentially, the French left organises a strong anti-fascist campaign like that waged in Britain from the 1970s to the present time, in which the fascists have more or less been suffocated.)

This isn’t a “bold break with the past,” it is the continuation of the rule of the boss class with a fresh coat of paint.

At work we deal a lot with PDFs, both press quality and low-quality for viewing on screen. Over time I’ve automated a fair amount of the creation for both types, but one thing that I haven’t yet done is automate file-size reductions for the low-quality PDFs.

(We still use InDesign CS4 at work, so bear in mind that some or all of the below may not apply to more recent versions.)

It’s interesting to look at exactly what is making the files large enough to require slimming down in the first place. All our low-quality PDFs are exported from InDesign with the built-in “Smallest file size” preset, but the sizes are usually around 700kB for single tabloid-sized, image-sparse pages.

A low-quality image of a Morning Star arts page.

Let’s take Tuesday’s arts page as our example. It’s pretty basic: two small images and a medium-sized one, two drop shadows, one transparency and a fair amount of text. (That line of undermatter in the lead article was corrected before we went to print.)

But exporting using InDesign’s lowest-quality PDF preset creates a 715kB file. The images are small and rendered at a low DPI, so they’re not inflating the file.

Thankfully you can have a poke around PDF files with your favourite text editor (BBEdit, obviously). You’ll find a lot of “garbage” text, which I imagine is chunks of binary data, but there’s plenty of plain text you can read. The big chunks tend to be metadata. Here’s part of the first metadata block in the PDF file for the arts page:

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP […]">
 <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about=""
    xmlns:xmp="http://ns.adobe.com/xap/1.0/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/"
     Blah blah blah exif data etc 
  </rdf:Description>
 </rdf:RDF>
</x:xmpmeta>

Which is the none-too-exciting block for one of the images, a Photoshop file. There’s two more like this, roughly 50–100 lines each. Then we hit a chunk which describes the InDesign file itself, with this giveaway line:

<xmp:CreatorTool>Adobe InDesign CS4 (6.0.6)</xmp:CreatorTool>

So what, right? InDesign includes some document and image metadata when it exports a PDF. Sure, yeah. I mean, the metadata blocks for the images weren’t too long, and this is just about their container.

Except this InDesign metadata block is 53,895 lines long in a file that’s 86,585 lines long. 574,543 characters of the document’s 714,626 — 80% of the file.

I think it’s safe to say we’ve found our culprit. But what’s going on in those 54,000 lines? Well, mostly this:

<xmpMM:History>
   <rdf:Seq>
      <rdf:li rdf:parseType="Resource">
         <stEvt:action>created</stEvt:action>
         <stEvt:instanceID>xmp.iid:[… hex ID …]</stEvt:instanceID>
         <stEvt:when>2012-05-22T12:55:27+01:00</stEvt:when>
         <stEvt:softwareAgent>Adobe InDesign 6.0</stEvt:softwareAgent>
      </rdf:li>
      <rdf:li rdf:parseType="Resource">
         <stEvt:action>saved</stEvt:action>
         <stEvt:instanceID>xmp.iid:[… hex ID …]</stEvt:instanceID>
         <stEvt:when>2012-05-22T12:55:54+01:00</stEvt:when>
         <stEvt:softwareAgent>Adobe InDesign 6.0</stEvt:softwareAgent>
         <stEvt:changed>/</stEvt:changed>
      </rdf:li>
    <!--  1,287 more list items  -->
   </rdf:Seq>
</xmpMM:History>

It’s effectively a record of every time the document was saved. But if you look at the stEvt:when tag you’ll notice the first items are from 2012 — when our “master” InDesign file from which we derive our edition files was first created. So, the whole record of that master file is included in every InDesign file we use, and the PDFs we create from them.

Can we remove this metadata from InDesign? You can see it in File ▸ File Info… ▸ Advanced, select it and press the rubbish bin icon. Save, quit, reopen and… it’s still there.

Thankfully Acrobat can remove this stuff from your final PDF, by going through the “PDF Optimizer” or “Save Optimized PDF” or whatever menu item it’s hiding under these days. (In the “Audit Space Usage” window it corresponds to the “Document Overhead”.)

Unfortunately Acrobat’s AppleScript support has always been poor — I’ve no idea what it’s like now, remember we’re talking CS4 — and I’ve no time nor desire to dive into Adobe’s JavaScript interface. So while you can (and we do) automate the PDF exports, you can’t slim these files down automatically with Acrobat.

Our solution at work has been to cut the cruft from the PDF using Acrobat when we use it to combine our separate page PDFs by hand. But ultimately I want to automate the whole process of exporting the PDFs, stitching them together in order, and reducing the file size.

After using ghostscript for our automatic barcode creation, I twigged that it would be useful for processing the PDFs after creation, and sure enough you can use it to slim down PDFs. Here’s an example command:

gs -sDEVICE=pdfwrite \
   -dPDFSETTINGS=/screen \
   -dCompatibilityLevel=1.5 \
   -dNOPAUSE -dQUIET -dBATCH \
   -sOutputFile="11_Books_180417-smaller.pdf" \
   "11_Books_180417.pdf"

Most of that is ghostscript boilerplate (it’s not exactly the friendliest tool to use), but the important option is -dPDFSETTINGS=/screen which, according to one page of the sprawling docs, is a predefined Adobe Distiller setting.

Using it on our 715kB example spits out an 123kB PDF that is visually identical apart from mangling the drop shadows (which I think can be solved by changing the transparency flattening settings when the PDF is exported from InDesign).