Library Hat Rotating Header Image

programming

Fear No Longer Regular Expressions

*** This post has been originally published on ACRL TechConnect Blog on July 31, 2013. ***

Regex, it’s your friend

You may have heard the term, “regular expressions” before. If you have, you will know that it usually comes in a notation that is quite hard to make out like this:

(?=^[0-5\- ]+$)(?!.*0123)\d{3}-\d{3,4}-\d{4}

Despite its appearance, regular expressions (regex) is an extremely useful tool to clean up and/or manipulate textual data. I will show you an example that is easy to understand. Don’t worry if you can’t make sense of the regex above. We can talk about the crazy metacharacters and other confusing regex notations later. But hopefully, this example will help you appreciate the power of regex and give you some ideas of how to make use of regex to make your everyday library work easier.

What regex can do for you – an example

I looked for the zipcodes for all cities in Miami-Dade County and found a very nice PDF online (http://www.miamifocusgroup.com/usertpl/1vg116-three-column/miami-dade-zip-codes-and-map.pdf). But when I copy and paste the text from the PDF file to my text editor (Sublime), the format immediately goes haywire. The county name, ‘Miami-Dade,’ which should be in one line becomes three lines, each of which lists Miami, -, Dade.

Ugh, right? I do not like this one bit. So let’s fix this using regular expressions.

(**Click the images to bring up the larger version.)

Screen Shot 2013-07-24 at 1.43.19 PM

Screen Shot 2013-07-24 at 1.43.39 PM

Like many text editors, Sublime offers the find/replace with regex feature. This is a much powerful tool than the usual find/replace in MS Word, for example, because it allows you to match many complex cases that fit a pattern. Regular expressions lets you specify that pattern.

In this example, I capture the three lines each saying Miami,-,Dade with this regex:

Miami\n-\nDade.

When I enter this into the ‘Find What’ field, Sublime starts highlighting all the matches. I am sure you already guessed that \n means a new line. Now let me enter Miami-Dade in the ‘Replace With’ field and hit ‘Replace All.’

Screen Shot 2013-07-24 at 2.11.43 PM

As you can see below, things are much better now. But I want each set of three lines – Miami-Dade, zipcode, and city – to be one line and each element to be separated by comma and a space such as ‘Miami-Dade, 33010, Hialeah’. So let’s do some more magic with regex.

Screen Shot 2013-07-24 at 2.18.17 PM

How do I describe the pattern of three lines – Miami-Dade, zipcode, and city? When I look at the PDF, I notice that the zipcode is a 5 digit number and the city name consists of alphabet characters and space. I don’t see any hypen or comma in the city name in this particular case. And the first line is always ‘Miami-Dade.” So the following regular expression captures this pattern.

Miami-Dade\n\d{5}\n[A-Za-z ]+

Can you guess what this means? You already know that \n means a new line. \d{5} means a 5 digit number. So it will match 33013, 33149, 98765, or any number that consists of five digits. [A-Za-z ] means any alphabet character either in upper or lower case or space (N.B. the space at the end right before ‘]’).

Anything that goes inside [ ] is one character. Just like \d is one digit. So I need to specify how many of the characters are to be matched. if I put {5}, as I did in \d{5}, it will only match a city name that has five characters like ‘Miami,’ The pattern should match any length of city name as long as it is not zero. The + sign does that. [A-Za-z ]+ means that any alphabet character either in upper or lower case or space should appear at least or more than once. (N.B. * and ? are also quantifiers like +. See the metacharacter table below to find out what they do.)

Now I hit the “Find” button, and we can see the pattern worked as I intended. Hurrah!

Screen Shot 2013-07-24 at 2.24.47 PM

Now, let’s make these three lines one line each. One great thing about regex is that you can refer back to matched items. This is really useful for text manipulation. But in order to use the backreference feature in regex, we have to group the items with parentheses. So let’s change our regex to something like this:

(Miami-Dade)\n\(d{5})\n([A-Za-z ]+)

This regex shows three groups separated by a new line (\n). You will see that Sublime still matches the same three line sets in the text file. But now that we have grouped the units we want – county name, zipcode, and city name – we can refer back to them in the ‘Replace With’ field. There were three units, and each unit can be referred by backslash and the order of appearance. So the county name is \1, zipcode is \2, and the city name is \3. Since we want them to be all in one line and separated by a comma and a space, the following expression will work for our purpose. (N.B. Usually you can have up to nine backreferences in total from \1 to\9. So if you want to backreference the later group, you can opt not to create a backreference from a group by using (?: ) instead of (). )

\1, \2, \3

Do a few Replaces and then if satisfied, hit ‘Replace All’.

Ta-da! It’s magic.

Screen Shot 2013-07-24 at 2.54.13 PM

Regex Metacharacters

Regex notations look a bit funky. But it’s worth learning them since they enable you to specify a general pattern that can match many different cases that you cannot catch without the regular expression.

We have already learned the four regex metacharacters: \n, \d, { }, (). Not surprisingly, there are many more beyond these. Below is a pretty extensive list of regex metacharacters, which I borrowed from the regex tutorial here: http://www.hscripts.com/tutorials/regular-expression/metacharacter-list.php . I also highly recommend this one-page Regex cheat sheet from MIT (http://web.mit.edu/hackl/www/lab/turkshop/slides/regex-cheatsheet.pdf).

Note that \w will match not only a alphabetical character but also an underscore and a number. For example, \w+ matches Little999Prince_1892. Also remember that a small number of regular expression notations can vary depending on what programming language you use such as Perl, JavaScript, PHP, Ruby, or Python.

Metacharacter Description
\ Specifies the next character as either a special character, a literal, a back reference, or an octal escape.
^ Matches the position at the beginning of the input string.
$ Matches the position at the end of the input string.
* Matches the preceding subexpression zero or more times.
+ Matches the preceding subexpression one or more times.
? Matches the preceding subexpression zero or one time.
{n} Matches exactly n times, where n is a non-negative integer.
{n,} Matches at least n times, n is a non-negative integer.
{n,m} Matches at least n and at most m times, where m and n are non-negative integers and n <= m.
. Matches any single character except “\n”.
[xyz] A character set. Matches any one of the enclosed characters.
x|y Matches either x or y.
[^xyz] A negative character set. Matches any character not enclosed.
[a-z] A range of characters. Matches any character in the specified range.
[^a-z] A negative range characters. Matches any character not in the specified range.
\b Matches a word boundary, that is, the position between a word and a space.
\B Matches a nonword boundary. ‘er\B’ matches the ‘er’ in “verb” but not the ‘er’ in “never”.
\d Matches a digit character.
\D Matches a non-digit character.
\f Matches a form-feed character.
\n Matches a newline character.
\r Matches a carriage return character.
\s Matches any whitespace character including space, tab, form-feed, etc.
\S Matches any non-whitespace character.
\t Matches a tab character.
\v Matches a vertical tab character.
\w Matches any word character including underscore.
\W Matches any non-word character.
\un Matches n, where n is a Unicode character expressed as four hexadecimal digits. For example, \u00A9 matches the copyright symbol

Matching modes

You also need to know about the Regex matching modes. In order to use these modes, you write your regex as shown above, and then at the end you add one or more of these modes. Note that in text editors, these options often appear as checkboxes and may apply without you doing anything by default.

For example, [d]\w+[g] will match only the three lower case words in ding DONG dang DING dong DANG. On the other hand, [d]\w+[g]/i will match all six words whether they are in the upper or the lower case.

Look-ahead and Look-behind

There are also the ‘look-ahead’ and the ‘look-behind’ pattern in regular expressions. These often cause confusion and are considered to be a tricky part of regex. So, let me show you a simple example of how it can be used.

Below are several lines of a person’s last name, first name, middle name, separated by his or her department name. You can see that this is a snippet from a csv file. The problem is that a value in one field – the department name- also includes a comma, which is supposed to appear only between different fields not inside a field. So the comma becomes an unreliable separator. One way to solve this issue is to convert this csv file into a tab limited file, that is, using a tab instead of a comma as a field separater. That means that I need to replace all commas with tabs ‘except those commas that appear inside a department field.’

How do I achieve that? Luckily, the commas inside the department field value are all followed by a space character whereas the separator commas in between different fields are not so. Using the negative look-ahead regex, I can successfully specify the pattern of a comma that is not followed by (?!) a space \s.

,(?!\s)

Below, you can see that this regex matches all commas except those that are followed by a space.

lookbehind

For another example, the positive look-ahead regex, Ham(?=burg) , on the other hand, will match ‘Ham‘ in Hamburg when it is applied to the text: Hamilton, Hamburg, Hamlet, Hammock.

Below are the complete look-ahead and look-behind notations both positive and negative.

  • (?=pattern)is a positive look-ahead assertion
  • (?!pattern)is a negative look-ahead assertion
  • (?<=pattern)is a positive look-behind assertion
  • (?<!pattern)is a negative look-behind assertion

Can you think of any example where you can successfully apply a look-behind regular expression? (No? Then check out this page for more examples: http://www.rexegg.com/regex-lookarounds.html)

Now that we have covered even the look-ahead and the look-behind, you should be ready to tackle the very first crazy-looking regex that I introduced in the beginning of this post.

(?=^[0-5\- ]+$)(?!.*0123)\d{3}-\d{3,4}-\d{4}

Tell me what this will match! Post in the comment below and be proud of yourself.

More tools and resources for practicing regular expressions

There are many tools and resources out there that can help you practice regular expressions. Text editors such as EditPad Pro (Windows), Sublime, TextWrangler (Mac OS), Vi, EMacs all provide regex support. Wikipedia (https://en.wikipedia.org/wiki/Comparison_of_text_editors#Basic_features) offers a useful comparison chart of many text editors you can refer to. RegexPal.com is a convenient online Javascript Regex tester. FireFox also has Regular Expressions add-on (https://addons.mozilla.org/en-US/firefox/addon/rext/).

For more tools and resources, check out “Regular Expressions: 30 Useful Tools and Resources” http://www.hongkiat.com/blog/regular-expression-tools-resources/.

Library problems you can solve with regex

The best way to learn regex is to start using it right away every time you run into a problem that can be solved faster with regex. What library problem can you solve with regular expressions? What problem did you solve with regular expressions? I use regex often to clean up or manipulate large data. Suppose you have 500 links and you need to add either EZproxy suffix or prefix to each. With regex, you can get this done in a matter of a minute.

To give you an idea, I will wrap up this post with some regex use cases several librarians generously shared with me. (Big thanks to the librarians who shared their regex use cases through Twitter! )

  • Some ebook vendors don’t alert you to new (or removed!) books in their collections but do have on their website a big A-Z list of all of their titles. For each such vendor, each month, I run a script that downloads that page’s HTML, and uses a regex to identify the lines that have ebook links in them. It uses another regex to extract the useful data from those lines, like URL and Title. I compare the resulting spreadsheet against last month’s (using a tool like diff or vimdiff) to discover what has changed, and modify the catalog accordingly. (@zemkat)
  • Sometimes when I am cross-walking data into a MARC record, I find fields that includes irregular spacing that may have been used for alignment in the old setting but just looks weird in the new one. I use a regex to convert instances of more than two spaces into just two spaces. (@zemkat)
  • Recently while loading e-resource records for government documents, we noted that many of them were items that we had duplicated in fiche: the ones with a call number of the pattern “H, S, or J, followed directly by four digits”. We are identifying those duplicates using a regex for that pattern, and making holdings for those at the same time. (@zemkat)
  • I used regex as part of crosswalking metadata schemas in XML. I changed scientific OME-XML into MODS + METS to include research images into the library catalog. (@KristinBriney)
  • Parsing MARC just has to be one. Simple things like getting the GMD out of a 245 field require an expression like this: |h\[(.*)\] MARCedit supports regex search and replace, which catalogers can use. (@phette23)
  • I used regex to adjust the printed label in Millennium depending on several factors, including the material type in the MARC record. (@brianslone )
  • I strip out non-Unicode characters from the transformed finding aids daily with regex and replace them with Unicode equivalents. (@bryjbrown)

 

Effectively Learning How To Code: Tips and Resources

*** This post has been originally published in ACRL TechConnect on Dec. 10, 2012. ***

Librarians’ strong interest in programming is not surprising considering that programming skills are crucial and often essential to making today’s library systems and services more user-friendly and efficient for use. Not only for system-customization, computer-programming skills can also make it possible to create and provide a completely new type of service that didn’t exist before. However, programming skills are not part of most LIS curricula, and librarians often experience difficulty in picking up programming skills.

In this post, I would like to share some effective strategies to obtain coding skills and cover common mistakes and obstacles that librarians make and encounter while trying to learn how to code in the library environment based upon the presentation that I gave at Charleston Conference last month, “Geek out: Adding Coding Skills to Your Professional Repertoire.” (slides: http://www.slideshare.net/bohyunkim/geek-out-adding-coding-skills-to-your-professional-repertoire). At the end of this post, you will also find a selection of learning and community resources.

How To Obtain Coding Skills, Effectively

1. Pick a language and concentrate on it.

There are a huge number of resources available on the Web for those who want to learn how to program. Often librarians start with some knowledge in markup languages such as HTML and CSS. These markup languages determine how a block of text are marked up and presented on the computer screen. On the other hand, programming languages involve programming logic and functions. An understanding of the basic programming concepts and logic can be obtained by learning any programming language. There are many options, and some popular choices are JavaScript, PHP, Python, Ruby, Perl, etc. But there are many more.  For example, if you are interested in automating tasks in Microsoft applications such as Excel, you may want to work with Visual Basic. If you are unsure about which language to pick, search for a few online tutorials for a few languages to see what their different syntaxes and examples are like. Even if you do not understand the content completely, this will help you to pick the language to learn first.

2. Write and run the code.

Once you choose a language to learn, there are many paths that you can follow. Taking classes at a local community college or through an online school may speed up the initial process of learning, but it could be time-consuming and costly. Following online tutorials and trying each example is a good alternative that many people take. You may also pick up a few books along the way to supplement the tutorials and use them for reference purposes.

If you decide on self-study, make sure that you actually write and run the code in the examples as you follow along the books and the tutorials. Most of the examples will appear simple and straightforward. But there is a big difference between reading through a code example and being actually able to write the code on your own and to run it successfully. If you read through programming tutorials and books without actually doing the hands-on examples on your own, you won’t get much benefit out of your investment. Programming is a hands-on skill as much as an intellectual understanding.

3. Continue to think about how coding can be applied to your library.

Also important is to continue to think about how your knowledge can be applied to your library systems and environment, which is often the source of the initial motivation for many librarians who decide to learn how to program. The best way to learn how to program is to program, and the more you program the better you will become at programming. So at every chance of building something with the new programming language that you are learning, no matter how small it is, build it and test out the code to see if it works the way you intended.

4. Get used to debugging.

While many who struggle with learning how to code cite lack of time as a reason, the real cause is likely to be failing to keep up the initial interest and persist in what you decided to learn. Learning how to code can be exciting, but it can also be a huge time-sink and the biggest source of frustration from time to time. Since the computer code is written for a machine to read, not for a human being, one typo or a missing semicolon can make the program non-functional. Finding out and correcting this type of error can be time-consuming and demoralizing. But learning how to debug is half of programming. So don’t be discouraged.

5. Find a community for social learning and support.

Having someone to talk to about coding problems while you are learning can be a great help. Sign up for listservs where coding librarians or library coders frequent, such as code4lib and web4lib to get feedback when you need. Research the cause of the problem that you encounter as much as possible on your own. When you still are unsure about how to go about tackling it, post your question to the sites such as Stack Overflow for suggestions and answers from more experienced programmers. It is also a good idea to organize a study group with like-minded people and get support for both coding-related and learning-related problems. You may also find local meet-ups available in your area using sites like MeetUp.com.

Don’t be intimidated by those who seem to know much more than you in those groups (as you know much more about libraries than they do and you have things to contribute as well), but be aware of the cultural differences between the developer community and the librarian community. Unlike the librarian community that is highly accommodating for new librarians and sometimes not-well-thought-out questions, the developer community that you get to interact with may appear much less accommodating, less friendly, and less patient. However, remember that reading many lines of code, understanding what they are supposed to do, and helping someone to solve a problem occurring in those lines can be time-consuming and difficult even to a professional programmer. So it is polite to do a thorough research on the Web and with some reference resources first before asking for others’ help. Also, always post back a working solution when your problem is solved and make sure to say thank you to people who helped you. This way, you are contributing back to the community.

6. Start working on a real-life problem ‘now.’ Don’t wait!

Librarians are often motivated to learn how to code in order to solve real-life problems they encounter at their workplace. Solving a real-life problem with programming is therefore the most effective way to learn and to keep up the interest in programming. One of the greatest mistake in learning programming is putting off writing one’s own code and waiting to work on a real-life problem for the reason that one doesn’t know yet enough to do so. While it is easy to think that once you learn a bit more, it would be easier to approach a problem, this is actually a counter-productive learning strategy as far as programming is concerned because often the only way to find out what to learn is by trying to solve a problem.

7. Build on what you learned.

Another mistake to avoid in learning how to program is failing to build on what one has learned. Having solved one set of problem doesn’t mean that you will remember that programming solution you created next time when you have to solve a similar problem. Repeating what one has succeeded at and expanding on that knowledge will lead to a stronger foundation for more advanced programming knowledge. Also instead of trying to learn more than one programming language (e.g. Python, PHP, Ruby, etc.) and/or a web framework (e.g. Django, cakePHP, Ruby On Rails, etc.) at the same time, first try to become reasonably good at one. This will make it much easier to pick up another language later in the future.

8. Code regularly and be persistent.

It is important to understand that learning how to program and becoming good at it will take time. Regular coding practice is the only way to get there. Solving a problem is a good way to learn, but doing so on a regular basis as often as possible is the best way to make what you learned stick and stay in your head.

While is it easy to say practice coding regularly and try to apply it as much as possible to the library environment, actually doing so is quite difficult. There are not many well-established communities for fledgling coders in libraries that provide needed guidance and support. And while you may want to work with library systems at your workplace right away, your lack of experience may prove problematic in gaining a necessary permission to tinker with them. Also as a full-time librarian, programming is likely to be thrown to the bottom of your to-do list.

Be aware of these obstacles and try to find a way to overcome them as you go. Set small goals and use them as milestones. Be persistent and don’t be discouraged by poor documentation, syntax errors, and failures. With consistent practice and continuous learning, programming can surely be learned.

Resources

A. Resources for learning

B. Communities

 

More APIs: writing your own code (2)

*** This blog post has been originally published in ACRL TechConnect 0n October 9, 2012. ***

My previous post “The simpest AJAX: writing your own code (1)” discussed a few Javascript and JQuery examples that make the use of the Flickr API. In this post, I try out APIs from providers other than Flickr. The examples will look plain to you since I didn’t add any CSS to dress them up. But remember that the focus here is not in presentation but in getting the data out and re-displaying it on your own. Once you get comfortable with this process, you can start thinking about a creative and useful way in which you can present and mash up the same data. We will go through 5 examples I created with three different APIs.  Before taking a look at the codes, check out the results below first.

I. Pinboard API

The first example is Pinboard. Many libraries moved their bookmarks in Del.icio.us to a different site when there was a rumor that Del.cio.us may be shut down by Yahoo. One of those sites were Pinboard.  By getting your bookmark feeds from Pinboard and manipulating them, you can easily present a subset of your bookmark as part of your website.

(a) Display bookmarks in Pinboard using its API

The following page uses JQuery to access the JSONP feed of my public bookmarks in Pinboard. $.ajax() method is invoked on line 13. Line 15, jsonp:”cb”, gives the name to a callback function that will wrap the JSON feed data in it. Note line 18 where I print out data received into the console. This way, you can check if you are receiving JSONP feed in the console of Firebug. Line 19-22 uses $.each() function to access each element in the JSONP feed and the .append() method to add each bookmark’s title and url to the “pinboard” div. JQuery documentation has detailed explanation and examples for its functions and methods. So make sure to check it out if you have any questions about a JQuery function or method.

Pinboard API – example 1

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Pinboard: JSONP-AJAX Example</title>
<script type="text/javascript" src="http://jqueryjs.googlecode.com/files/jquery-1.3.2.min.js"></script>
</head>
<body>
<p> This page takes the JSON feed from <a href="http://pinboard.in/u:bohyunkim/">my public links in Pinboard</a> and re-presents them here. See <a href="http://feeds.pinboard.in/json/u:bohyunkim/">the Pinboard's JSON feed of my links</a>.</p>
<div id="pinboard"><h1>All my links in Pinboard</h1></div>
<script>
$(document).ready(function(){	
$.ajax({
  url:'http://feeds.pinboard.in/json/u:bohyunkim',
  jsonp:"cb",
  dataType:'jsonp',
  success: function(data) {
  	console.log(data); //dumps the data to the console to check if the callback is made successfully
    $.each(data, function(index, item){
      $('#pinboard').append('<div><a href="' + item.u + '">' + item.d
+ '</a></div>');
      }); //each
    } //success
  }); //ajax

});//ready
</script>
</body>
</html>

Here is the screenshot of the page. I opened up the Console window of the Firebug (since I dumped the received in line 18) and you can see the incoming data here. (Note. Click the images to see the large version.)

But it is much more convenient to see the organization and hierarchy of the JSONP feed in the Net panel of Firebug.

And each element of the JSONP feed can be drilled down for further details by clicking the object in each row.

(b) Display only a certain number of bookmarks

Now, let’s display only five bookmarks. In order to do this, only one more line is needed. Line 9 checks the position of each element and breaks the loop when the 5th element is processed.

Pinboard API – example 2

$.ajax({
  url:'http://feeds.pinboard.in/json/u:bohyunkim',
  jsonp:"cb",
  dataType:'jsonp',
  success: function(data) {
    $.each(data, function(index, item){
      	$('#pinboard').append('<div><a href="' + item.u + '">' + item.d
+ '</a></div>');
    	if (index == 4) return false; //only display 5 items
      }); //each
    } //success
  }); //ajax

(c) Display bookmarks with a certain tag

Often libraries want to display bookmarks with a particular tag. Here I add a line using JQuery method $.inArray() to display only bookmarks tagged with ‘fiu.’ $.inArray()  method takes value and array as parameters and returns 0 if the value is found in the array otherwise -1. Line 7 checks if the tag array of a bookmark (item.t) does include ‘fiu,’ and only in such case displays the bookmark. As a result, only the bookmarks with the tag ‘fiu’ are shown in the page.

Pinboard API – example 3

$.ajax({
  url:'http://feeds.pinboard.in/json/u:bohyunkim',
  jsonp:"cb",
  dataType:'jsonp',
  success: function(data) {
    $.each(data, function(index, item){
    	if ($.inArray("fiu", item.t)!==-1) // if the tag is 'fiu'
      		$('#pinboard').append('<div><a href="' + item.u + '">' + item.d
+ '</a></div>');
	}); //each
    } //success
  }); //ajax

II. Reddit API

My second example uses Reddit API. Reddit is a site where people comment on news items of interest. Here I used $.getJSON() instead of $.ajax() in order to process the JSONP feed from the Science section of Reddit. In the case of Pinboard API, I could not find out a way to construct a link that includes a call back function in the url. Some of the parameters had to be specified such as jsonp:”cb”, dataType:’jsonp’. For this reason, I needed to use $.ajax() function. On the other hand, in Reddit, getting the JSONP feed url was straightforward: http://www.reddit.com/r/science/.json?jsonp=cb.

Line 19 adds a title of the page. Line 20-22 extracts the title and link to the news article that is being commented and displays it. Under the news item, the link to the comments for that article in Reddit is added as a bullet item. You can see that, in Line 17 and 18, I have used the console to check if I get the right data and targeting the element I want and then commented out later.

This is just an example, and for that reason, the result is a rather simplified version of the original Reddit page with less information. But as long as you are comfortable accessing and manipulating data at different levels of the JSONP feed sent from an API, you can slice and dice the data in a way that suits your purpose best. So in order to make a clever mash-up, not only the technical coding skills but also your creative ideas of what different sets of data and information to connect and present to offer something new that has not been available or apparent before.

My second example uses Reddit API. Reddit is a site where people comment on news items of interest. Here I used $.getJSON() method instead of $.ajax() in order to process the JSONP feed from the Science section of Reddit. In the case of Pinboard API, I could not find out a way to construct a link for a call back function. Some of the parameters had to be specified such as jsonp:”cb”, dataType:’jsonp’. So I needed to use $.ajax() method.

On the other hand, in Reddit, getting the JSONP feed url was straightforward: http://www.reddit.com/r/science/.json?jsonp=cb.   Line 19 adds a title of the page. Line 20-22 extracts the title and link to the news article that is being commented and displays it. Under the news item, the link to the comments for that article in Reddit is added as a bullet item.You can see that in Line 17 and 18, I have used the console to check if I get the right data and targeting the element I want.

This is just an example, and for that reason, the result is rather a simplified version of the original Reddit page with less information. But as long as you are comfortable accessing and manipulating data at different levels of the JSONP feed sent from an API, you can slice and dice the data in a way that suits your purpose best.

Reddit API – example

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Reddit-Science: JSONP-AJAX Example</title>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.0/jquery.min.js"></script>
</head>

<body>
<p> This page takes the JSONP feed from <a href="http://www.reddit.com/r/science/">Reddit's Science section</a> and presents the link to the original article and the comments in Reddit to the article as a pair. See <a href="http://www.reddit.com/r/science/.json?jsonp=?">JSONP feed</a> from Reddit.</p>

<div id="feed"> </div>
<script type="text/javascript">
	//run function to parse json response, grab title, link, and media values, and then place in html tags
$(document).ready(function(){		
	$.getJSON('http://www.reddit.com/r/science/.json?jsonp=?', function(rd){
		//console.log(rd);
		//console.log(rd.data.children[0].data.title);
		$('#feed').html('<h1>*Subredditum Scientiae*</h1>');
		$.each(rd.data.children, function(k, v){
      		$('#feed').append('<div><p>'+(k+1)+': <a href="' + v.data.url + '">' + v.data.title+'</a></p><ul><li style="font-variant:small-caps;font-size:small"><a href="http://www.reddit.com'+v.data.permalink+'">Comments from Reddit</a></li></ul></div>');
      }); //each
	}); //getJSON
});//ready	
</script>

</body>
</html>

The structure of a JSON feed can be confusing to make out particularly. So make sure to use the Firebug Net window to figure out the organization of the feed content and the property name for the value you want.

But what if the site from which you would like to get data doesn’t offer JSONP feed? Fortunately you can convert any RSS or XML feed into JSONP feed. Let’s take a look!

III. PubMed Feed with Yahoo Pipes API

Consider this PubMed search. This is simple search that looks for items in PubMed that has to do with Florida International University College of Medicine where I work. You may want to access the data feed of this search result, manipulate, and display in your library website. So far, we have performed a similar task with the Pinboard and the Reddit API using JQuery. But unfortunately PubMed does not offer any JSON feed. We only get RSS feed instead from PubMed.

This is OK, however. You can either manipulate the RSS feed directly or convert the RSS feed into JSON, which you are more familiar with now. Yahoo Pipes is a handy tool for just that purpose. You can do the following tasks with Yahoo Pipes:

  • combine many feeds into one, then sort, filter and translate it.
  • geocode your favorite feeds and browse the items on an interactive map.
  • power widgets/badges on your web site.
  • grab the output of any Pipes as RSS, JSON, KML, and other formats.

Furthermore, there may be a pipe that has been already created for exactly what you want to do by someone else. As PubMed is a popular resource, I found a pipe for PubMed search. I tested, copied the pipe, and changed the search term. Here is the screenshot of my Yahoo Pipe.

If you want to change the pipe, you can click “View Source” and make further changes. Here I just changed the search terms and saved the pipe.

After that, you want to get the results of the Pipe as JSON. If you hover over the “Get as JSON” link in the first screenshot above, you will get a link: http://pipes.yahoo.com/pipes/pipe.run?_id=e176c4da7ae8574bfa5c452f9bb0da92&_render=json&limit=100&term=”Florida International University” and “College of Medicine” But this returns JSON, not JSONP.

In order to get that JSON feed wrapped into a callback function, you need to add this bit, &_callback=myCallback, at the end of the url: http://pipes.yahoo.com/pipes/pipe.run?_id=e176c4da7ae8574bfa5c452f9bb0da92&_render=json&limit=10&term=%22Florida+International+University%22+and+%22College+of+Medicine%22&_callback=myCallback. Now the JSON feed appears wrapped in a callback function like this: myCallback( ). See the difference?

Line 25 enables you to bring in this JSONP feed and invokes the callback function named “myCallback.” Line 14-23 defines this callback function to process the received feed. Line 18-20 takes the JSON data received at the level of data.value. item, and prints out each item’s title (item.title) with a link (item.link). Here I am giving a number for each item by (index+1). If you don’t put +1, the index will begin from 0 instead of 1. Line 21 stops the process when the processed item reaches 5 in number.

Yahoo Pipes API/PubMed – example

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>PubMed and Yahoo Pipes: JSONP-AJAX Example</title>
<script type="text/javascript" src="http://jqueryjs.googlecode.com/files/jquery-1.3.2.min.js"></script>
</head>
<body>
<p> This page takes the JSONP feed from <a href="http://pipes.yahoo.com/pipes/pipe.info?_id=e176c4da7ae8574bfa5c452f9bb0da92"> a Yahoo Pipe</a>, which creates JSONP feed out of a PubMed search results and re-presents them here. 
<br/>See <a href="http://pipes.yahoo.com/pipes/pipe.run?_id=e176c4da7ae8574bfa5c452f9bb0da92&_render=json&limit=10&term=%22Florida+International+University%22+and+%22College+of+Medicine%22&_callback=myCallback">the Yahoo Pipe's JSON feed</a> and <a href="http://www.ncbi.nlm.nih.gov/pubmed?term=%22Florida%20International%20University%22%20and%20%22College%20of%20Medicine%22">the original search results in PubMed</a>.</p>
<div id="feed"></div>
<script type="text/javascript">
	//run function to parse json response, grab title, link, and media values - place in html tags
	function myCallback(data) {
		//console.log(data);
		//console.log(data.count);
		$("#feed").html('<h1>The most recent 5 publications from <br/>Florida International University College of Medicine</h1><h2>Results from PubMed</h2>');
		$.each(data.value.items, function(index, item){
      		$('#feed').append('<p>'+(index+1)+': <a href="' + item.link + '">' + item.title
+ '</a></p>');
        if (index == 4) return false; //display the most recent five items
      }); //each
	} //function
	</script>
<script type="text/javascript" src="http://pipes.yahoo.com/pipes/pipe.run?_id=e176c4da7ae8574bfa5c452f9bb0da92&_render=json&limit=10&term=%22Florida+International+University%22+and+%22College+of+Medicine%22&_callback=myCallback"></script>
</body>
</html>

Do you feel more comfortable now with APIs? With a little bit of JQuery and JSON, I was able to make a good use of third-party APIs. Next time, I will try the Worldcat Search API, which is closer to the library world and see how that works.

Tips for Everyone Doing the #codeyear

***   This post has been originally posted to the ACRL TechConnect blog.  ***

Learn to Code in 2012!

If you are a librarian interested in learning how to code, 2012 is a perfect year for you to start the project. Thanks to CodeAcademy (http://codeacademy.com), free JavaScript lessons are provided every week at http://codeyear.com/. The lessons are interactive and geared towards beginners. So even if you do not have any previous experience in programming, you will be able to pick up the new skill soon enough as long as you are patient and willing to spend time on mastering each lesson every week.

A great thing about this learn-how-to-program project, called #codeyear in Twitter (#libcodeyear and #catcode in the library-land) is that there are +375,443 people (and counting up) out there who are doing exactly the same lessons as you are. The greatest thing about this #libcodeyear / #catcode project is that librarians have organized themselves around this project for the collective learning experience.  How librarian-like, don’t you think?

Now, if you are ready to dive in, here are some useful resources.  And after these Resources, I will tell you a little bit more about how to best ask help about your codes when they are not working for you.

Resources for Collective Learning

Syntax Error: Catch the most frustrating bugs!

Now what I really like about #codeyear lessons so far is that some of the lessons trip you by trivial things like a typo! So you need to find a typo and fix it to pass a certain lesson. Now you may ask “How the hell does fixing a typo count as a programming lesson?”

Let me tell you. Finding a typo is no triviality in coding. Catching a similar syntax error will save you from the most frustrating experience in coding.

The examples of seemingly innocuous syntax errors are:

  • var myFunction = funtction (){blah, blah, blah … };
  • var myNewFunction = function (]{blah, blah, blah … };
  • for(i=0,  i<10, i++;)
  • var substr=’Hello World’; alert(subst);
  • –//This is my first JavaScript

Can you figure out why these lines would not work?  Give it a try! You won’t be sorry. Post your answers in the comments section.

How to Ask Help about Your Codes      

by Matteo De Felice in Flickr (http://farm4.staticflickr.com/3577/3502347936_43b5e2a886.jpg)

I am assuming that as #codeyear, #catcode, #libcodeyear project progresses, more people are going to ask questions about problems that stump them. Some lessons already have Q&A in the CodeAcademy site. So check those out. Reading through others’ questions will give valuable insight to how codes work and where they can easily trip you.

That having been said, you may want to ask questions to the places mentioned in the Resources section above. If you do, it’s a good idea to follow some rules. This will make your question more likely to be looked at by others and way more likely to be answered correctly.

  • Before asking a question, try to research yourself. Google the question, check out the Q&A section in the CodeAcademy website, check out other online tutorials about JS (see below for some of the recommended ones).
  • If this fails, do the following:
    • Specify your problem clearly.
      (Don’t say things like “I don’t get lesson 3.5.” or “JavaScript function is too hard” unless the purpose is just to rant.)
    • Provide your codes with any parts/details that are related to the lines with a problem.
      (Bear in mind that you might think there is a problem in line 10 but the problem may lie in line 1, which you are not looking.) Highlight/color code the line you are having a problem. Make it easy for others to immediately see the problematic part.
    • Describe what you have done to troubleshoot this (even if it didn’t work.)
      : This helps the possible commenter to know what your reasoning is behind your codes and what solutions you have already tried, thereby saving their time. So this will make it more likely that someone will actually help you. To believe it or not, what seems completely obvious and clear to you can be completely alien and unfathomable to others.

Some JavaScript Resources

There are many resources that will facilitate your learning JavaScript. In addition to the lessons provided by CodeAcademy, you may also find these other tutorials helpful to get a quick overview of JavaScript syntax, usage, functions, etc. From my experience, I know that I get a better understanding when I review the same subject from more than one resource.

If you have other favorite Javascript please share in the comment section.

ACRL TechConnect blog will continue to cover #libcodeyear / #catcode related topics throughout the year!  The post up next will tell you all about some of the excuses people deploy to postpone learning how to code and what might break the mental blockage!

Library and IT – Synergy or Distrust?

In my previous blog post, I asked why libraries are not actively encouraging those who are novice coders among library staff to further develop their coding skills.

I was surprised to see so many comments. I was even more surprised to see that the question was sometimes completely misunderstood. For example, I never argued that ‘all’ librarians should learn how to code (!).  Those who I had in mind were the novice coders/librarians who already know one or two programming languages and struggle to teach themselves to build something simple but useful for practical purposes.

On the other hand, all comments were very illuminating particularly in showing the contrasts between librarians’ and programmers/IT professionals’ thoughts on my question. Below are some of the most interesting contrasts I saw. (All have been paraphrased.)

Librarian (L)
– I am interested in learning how to code but I lack time. Most of all, it is hard to find guidance.

Programmer/IT professional (P)
– There are lots of resources online. Don’t make excuses and plunge in.

L is lost in learning how to code while P thinks everything needed can be found online! Interesting, isn’t it? Ls and Ps are likely to be coming from two completely opposite backgrounds (humanities vs. sciences) and cultures (committee and consensus-driven vs. meritocratic and competitive).

Librarian (L)
– IT distrusts the library staff and doesn’t even allow admin privileges to the staff PCs.
– IT people are overprotective over their knowledge. Not all but many IT tasks are relatively straightforward and can be learned by librarians.

Programmer/IT professional (P)
– Librarians require an MLS for even technology positions. That is crazy!
– You are arguing that librarians can learn how to properly program in their spare time without gaining the proper theoretical understanding of computer science and training in software engineering. That is crazy!

L thinks P should recognize that library staff do work in technology just as IT does and wants P to be more open and sharing instead of being mysterious.  On the other hand, P wants to see L value programmers and IT for their expertise and thinks that an MLS is an unreasonable requirement for a technology position at libraries. I think both parties make excellent points. About the over-protectiveness, I think perhaps it is half true but half likely to be a communication issue.

And here are some of the most valuable comments:

  • Librarians tend to miss that there can be an overlap in the role of IT and that of librarians and regard them as completely separate ones.
  • The management buy-in is important in promoting technology in a library. A nurturing environment for staff development can be quite helpful for the library staff.

I think these two comments are very close to answering my question of why libraries don’t actively encourage and support those among the library staff who know how to code albeit in a rudimentary manner to further develop their skills and apply them to the library context. Although almost all libraries today emphasize the importance of technology, the role of librarians and that of IT, librarianship and technology are often viewed as completely separate from each other. Even when there is an interest in incorporating technology into librarianship, both libraries and LIS schools seem to be puzzled over how to do so.

It is no doubt a tough problem to crack. But it explains up to a certain degree why there is not much collaboration found between librarians and programmers (or IT in a wider sense) at most libraries. Why don’t the library and the IT at a college/university, for example, form a closely-knit educational/instructional technology center?  While reading the comments, I kept thinking about the story I heard from my friend.

My friend works at a large academic library, and the university s/he works at decided to merge the university IT and the university library into one organization to foster collaboration and make the two departments’ operation more efficient. Two departments came to reside in the same building as a result. However, there was so much difference in culture that the expected collaboration did not occur. Instead, the library and the IT worked as they had done before as completely separate entities.

The university administrators may have had the insight that there is an overlapping role between the library and the IT and seen the potential synergy from merging the two units together. But without the library and the IT buying into that vision, the experiment cannot succeed. Even where a library has its own IT department, the cultural difference may hinder the collaboration between the library IT and the rest of the library staff.

How can the gap between librarianship and IT be bridged? As I have already said, I don’t think that the problem is to be solved by ‘all’ librarians becoming coders or IT professionals. That would be implausible, unnecessary, and downright strange.

However, I believe that all libraries would significantly benefit by having ‘some’ library staff who understand how programming works and so all libraries should support and encourage their staff who are already pursuing their interests in coding to further develop their skills and deepen their knowledge. (This is no different than what libraries are already doing regarding their paraprofessionals who want to pursue a MLS degree!)  Even when those staff are not themselves capable of developing a complicated, production-ready software system, they can easily automate simple processes at libraries, solve certain problems, and collaborate with professional programmers in troubleshooting and developing better library systems.

So, my question was not so much about librarians as individuals as about the strategic direction of libraries whose primary concern is providing, packaging, disseminating, and maintaining information, resources, and data. And I am glad I asked my half-baked question. You never know what you will learn until you ask.

Why Not Grow Coders from the inside of Libraries?

How fantastic would it be if every small library has an in-house developer? We will be all using open-source software with custom feature modules that would perfectly fit our vision and the needs of the community we serve. Libraries will then truly be the smart consumers of technology not at the mercy of clunky systems. Furthermore, it would re-position libraries as “contributors” to the technology that enables the public to access information and knowledge resources. I am sure no librarian will object to this vision. But at this time of ever-shrinking library budget, affording enough librarians itself is a challenge let alone hiring a developer.

But why should this be the case? Librarians are probably one of the most tech-savvy professionals after IT and science/ engineering/ marketing folks. So why aren’t there more librarians who code? Why don’t we see a surge of librarian coders? After all, we are living in times in which the web is the platform for almost all human activities and libraries are changing its name to something like learning and ‘technology’ center.

I don’t think that coding is too complicated or too much to learn for any librarian regardless of their background. Today’s libraries offer such a wide range of resources and services online and deploy and rely on so many systems from an ILS to a digital asset management system that libraries can benefit a great deal from those staff who have even a little bit of understanding in coding.

The problem is, I think, libraries do not proactively encourage nor strongly support their in-house library staff to become coders. I am not saying that all librarians / library staff should learn how to code like a wizard. But it is an undeniable fact that there are enough people in the library land who are seriously interested in coding and capable of becoming a coder. But chances are, these people will have no support from their own libraries. If they are working in non-technology-related areas, it will be completely up to them to pursue and pay for any type of learning opportunities. Until they prove themselves to be capable of a certain level of coding, they may not even be able to get hands-on experience of working in library technologies/systems/programming. And when they become capable, they may have to seek a new job if they are serious about putting to use their newly acquired programming skills.

It is puzzling to me why libraries neglect to make conscious efforts in supporting their staff who are interested in coding to further develop their skills while freely admitting that they would benefit from having a programmer on staff. Perhaps it is the libraries that are making the wrong distinction between library work and technology work. They are so much more closely intertwined than, say, a decade ago. Even library schools that are slow to change are responding and adding technology courses to their curriculum and teaching all LIS students basic HTML. But certainly libraries can use staff who want to move beyond HTML.

At the 2011 ALA Midwinter, I attended LITA Head of Library Technology Interest Group meeting. One of the issues discussed there was how to recruit and maintain the IT workforce within libraries. Some commented the challenge of recruting people from the IT industry, which often pays more than libraries do. Some mentioned how to quickly acclimate those new to libraries to the library culture and technology. Others discussed the difficulty of retaining IT professionals in libraries since libraries tend to promote only librarians with MLS degrees and tend to exclude non-librarians from the important decision-making process. Other culture differences between IT and libraries were also discussed.

These are all valid concerns and relevant discussion topics. But I was amazed by the fact that almost all assumed that the library IT people would come from the IT sector and outside from libraries. Some even remarked that they prefered to hire from the IT industry outside libraries when they fill a position. This discussion was not limited to programmers but inclusive of all IT professionals. Still, I think perhaps there is something wrong if libraries only plan to steal IT people from the outside without making any attempt to invest in growing some of those technology people inside themselves. IT professionals who come from the general IT industry may be great coders but they do not know about libraries. This is exactly the same kind of cause for inflexible library systems created by programmers who do not know enough about the library’s businesses and workflows.

So why don’t libraries work to change that?

One of the topics frequently discussed in librareis these days is open source software. At the recent 2011 Code4Lib conference, there was a breakout session about what kind of help would allow libraries to more actively adopt open source software adn systems. Those who have experience in working with open source software at the session unanimously agreed that adopting open-source is not cheap. There is a misconception that by adopting open source software, libraries will save money. But if so, at least that would not be the case in any short tem. Open-source requires growing knowledgeable technology staff in-house who would understand the software fully and able to take advantage of its flexibility to benefit the organization’s goals. It is not something you can buy cheap off the shelf and make it work by turning a key. While adopting open-source will provide freedom to libraries to experiment and improve their services and thereby empower lirbaries, those benefits will not come for free without investment.

Some may ask why not simply hire services from a third-party company that will support the open-source software or system that a library will adopt. But without the capability of understanding the source and of making changes as needed, how would libraries harness the real power of open-source unless the goal is just a friendier vendor-library relationship?

In his closing talk at the 2011 Code4Lib conference, Eric Hellman pointed out the fact that many library programmers are self-taught and often ‘fractional’ coders in the sense that they can afford to spend only a fraction of their time on coding. The fact that most library coders are fractional coders is all the more reason for having more coders in libraries, so that more time can be spent collectively on coding for libraries. Although enthusiastic, many novice coders are often lost about how certain programming languages or software tools are or can be applied to current library services and systems and need guidance about which coding skills are most relevant and can be used to produce immediately useful results in the library context. Many novice coders at librareis who often teach themselves programming skills by attending (community) college courses at night at their own expenses and scouring the web for resources and tutorials after work can certainly benefit from some support from their libraries.

Are you a novice or experienced coder working at libraries? Were/are you encouraged to further develop your skills? If a novice, what kind of support would you like to see from your libraries? If experienced, how did you get there? I am all ears. Please share your thoughts.

——————————

N.B. If you are a formally trained CS/E person, you may want to know that I am using the term ‘coding’ loosely in the library context, not in the context of software industry.  Please see this really helpful post “after @bohyunkim: talking across boundaries and the meaning of ‘coder'” by Andromeda Yelton which clarifies this. Will K’s two comments below also address the usage of this term in its intended sense much better than I did.  I tried to clarify a bit more what I meant below in my comments but feel free to comment/suggest a better term if you find this still problematic.  Thanks for sharing your thoughts! (2/22/2011)