Technology Blog 3- Regular Expressions (regex)

I had a chat while riding the subway with a friend in the business. He’s a director-level software engineer at a major technology company in New York, and was on the train with me for only a few more stops.

I asked him a basic question:

When you’re interviewing a programmer, what is an example of a question you ask?

His response:

I tell them I have a number of webpages and documents, and we’ve had a service outage or change – and we’ve been given an temporary phone number. I need you to change every instance of our phone number, in every webpage and document to the new temporary number. You have 2 hours to complete the task. How do you do it?

I really didn’t have an answer. I’ve here at //Flatiron school, learning Ruby, some JS and solving problems every day- but I’ve never even tried to conceptualize a problem like this. I really didn’t know where to begin, so I told him… “I don’t know. But can you tell me what an example of a successful approach has been?”

It’s really quite simple: the answer is ‘grep.’

As he got off the train, it left me to think about what the hell he’s talking about…

What is grep?

grep is a command-line utility for searching plain-text data sets for lines that match a regular expression. Its name comes from the ed command g/re/p (globally search a regular expression and print)

GLOBALLY search

REGULAR

EXPRESSION

PRINT

grep. It’s something in your terminal, and it looks through everything you tell it to and it prints out all the things that match the pattern that you’re looking for. It’s basically searching for patterns. And that search is based on CHARACTERS.

It uses as it’s search criteria some form for REGULAR EXPRESSION. Now, don’t let that name fool you, a regular expression actually means something quite special.

What is a Regular Expression?

A regular expression is a standardized (or mostly standardized) way for a computer to recognize a series of characters as the basis for a search. It’s a way for computer to match patterns based on characters.

a character is something like the letter a or maybe the letter A. They both may seem mostly the same, but in the eyes of a computer, where there is no intrinsic value to characters, those two letters are as unalike as 6 and X

in terms of regular expressions, we ONLY care about characters. Because that’s what the computer can look for.

Examples of Regular Expressions

One of the things my friend said to me was:

I want to get a feel for how comfortable someone is with using the command line as a tool. I don’t want engineers that only live in text-editors. You can do a lot from the command line.

And this is true. And using grep and regular expressions are only the iceberg. But I wanted to dig a little deeper and see if I could at least understand a little of what regular expressions really mean, and how they could be useful. Let’s look at some examples of regular expressions and talk about what they mean.

example: th

Shall I compare thee to a summer’s day?
Thou art more lovely and more temperate

this expression is looking for all the s characters followed by a t character h. it would recognize the th in thee, but not the Th in Thou

example: o.e

I love you more than words can wield the matter, Dearer than eyesight, space and liberty

In this example, we’re using the metacharacter “.” this acts as a sort of wildcard, and the computer will find any patter that matches an o followed by any character and an e

these can be strung together as in o…e, which would find matches of o followed by any three characters and an e

 

example: t[eo]d

When today is over Ted will have a tedious time tidying up.

this will find tod in today and ted in tedious, but not the T in ted, or the tid in tidying

example: [3-7]

a1, b2,34, c5, 67, 89

[3-79] You can combine characters and range

a list of hotel rooms: G4 G9 F2 H1 L0 K7 M9

[1-8b-gx] combine multiple sets is also possible here we’re looking for 1-8, b-r and x. It’s worth noting that most character tables will list all the lower case letters, and then list the upper case letters. some, however list characters aAbBcC. it depends on the system’s character table.

 

example: negating a character t[^eo]d

in this example we’re looking for instances of t followed by a character that is not e or o followed by a d.

 

examples: various multipliers

  • * – item occurs zero or more times.
  • + – item occurs one or more times.
  • ? – item occurs zero or one times.
  • {4} – item occurs four times.
  • {3,6} – item occurs between 3 and 6 times.
  • {3,} – item occurs at least 3 times.

bo*

book binding bookworms

bo+

book binding bookworms

b.*k

book binding bookworms

regex matching is “greedy”, meaning it will search the entire set of strings and give you all the matches, it won’t just stop after hitting the first k, as k satisfies our condition but also satisfies the * as “any character”

b.*?k

book binding bookworms

by adding the question mark after the multiplier we will the matching not lazy

 

That’s just the beginning.

There are tons of things to learn about regex, and this is only the very beginning. There are lots of resources out there that can explain the meaning of these things better than I can, and until I can do a better job, I suggest you seek those out.

I’m going to continue to look into regex, and I hope you do too!