At work one of the most tedious jobs is putting all of the stories that appear in the paper onto our website. We're still working from text files (which works very well) as for a number of reasons we haven't yet changed to use the site's CMS as the central location where copy is stored.
So towards the end of the day we have to go through nearly all the pages, strip out the headlines, standfirsts and copy, and input them into the site's backend. It's mindless, slow and frankly demoralising.
A perfect candidate for automation then!
Let's start with taking the text from the InDesign files, with an AppleScript I run through FastScripts:
tell application "Adobe InDesign CS5.5" tell the front document set the_text to get the contents of the selection end tell end tell repeat with an_item in the_text do shell script ("echo '" & an_item & "' >> ~/Desktop/web_strip.txt") end repeat do shell script ("echo '\\n\\n\\n\\n' >> ~/Desktop/web_strip.txt")
Here we take the content of the selected InDesign frames and use a shell script snippet to append it to a file on the desktop, followed by a bunch of newlines to separate the appended text.
It's straightforward, but
the contents of the selection in line 3 could return anything: more than one selected frame and you get a list, just one and you get whatever data is in the frame. So, instead of iterating over several pieces of text in line 7, you iterate over each character in one piece of text. Bonus points: InDesign CS4 (which we use at work) doesn't preserve the order in which you select frames, meaning you have to go one-by-one (and use a different script — I'm showing the one I use at home here).
Ok, now we've got a text file with all the copy in. Once that's been cleaned up — any unnoticed mistakes corrected, print-focussed formatting removed — it's time to enter it into the CMS.
For ages, this is where the automation stopped. The editor used in the CMS interprets pasted text oddly, requiring people to paste the plain text into (shock horror) TextEdit in rich-text mode, and from there into the field in the CMS. Lots of copying and pasting. Lots of boredom. Gross.
But these are just fields on a web page. And, of course, the page uses jQuery. And Safari lets you
A bit of poking around in the inspector revealed that most of the interesting fields could be set through a simple
.val(foo) on the element, while the body of the article could be set through
.html('bar') — and wrapping paragraphs in
<p> tags ensured that the text displayed correctly.
I decided on a Python text filter. Not entirely appropriate — we don't transform and return the text — but it's a dead simple way of working with the selection and removing it once we're done.
Using Python lets us do all sorts of wonderful things too other than just preparing the InDesign text to be entered into the fields. Here's the code, before I dive in any further:
I like to put data at the top of my scripts and the code beneath, so let's start on line 35, where we read in the text from BBEdit. Line 36 breaks that text apart: I use a four-part structure, where the first line is the headline, then the standfirst and byline (both of which can be blank) and then any remaining lines form the body.
In line 43 we index into the names dictionary (abbreviated to protect the guilty, lines 6–9) using the cleaned-up byline, with 4 the value for the default author. This and the story type stuff below are presented as drop-down menus, which is where the integers come from.
Next we crudely count the words in the body to work out the story type, which are roughly differentiated by length. I iterate over a dictionary of lambdas instead of a lengthy
if-elif block, mostly because it lets me use the dictionary keys to store the strings associated with the values. (You could do the same with comments in the
if-else. I don't think there's a big advantage to either method.)
Note that the lambda for the 'inbrief' type just returns False. We collect all of the brief stories into a single article on the web, with the total length anything from 100 words to over 600, and I couldn't think of an easy way to pick them out. I set the type for briefs by hand.
The last processing step is to wrap each line of the body with paragraph tags. (It's basic but it works!)
And finally we print an empty string to delete the current selection in BBEdit — time to move on to the next story.
By using this script and opening up a dozen or two 'new article' tabs in Safari, moving between each one and using the text filter to populate the fields, I can effectively do work in parallel instead of one task at a time — filling in new articles while I wait for the website to catch up and show the publishing time options. And no more copy & paste from TextEdit.