피드 구독

Most people imagine that system administrators and programmers fiddle with knobs and diodes. Their goal? To reach into the virtual reality of the Internet, gathering the binary forces of code into the applications and infrastructure we all use today. Most people would be disappointed to learn that sysadmins and code monkeys more often poke at streams of text in hopes of getting the right response.

If you are a sysadmin or programmer and find yourself obsessively dipping into streams of text on a POSIX system, then you have probably either encountered grep, or come across a time you wished you could use this command.

Installing grep

The grep command, which is an initialism for global regular expression print, started its life as a personal utility script on the computer of the co-creator of UNIX, Ken Thompson. (This tool received wide release only because Thompson's department head, Doug McIlroy, asked for a tool “to look for stuff” in files.)

Since then, the grep command’s code has been written and rewritten by several different programmers, but its name has persisted. This situation is convenient because it means that no matter what UNIX or UNIX-like system you use, you have a grep command available. Though, this fact can also be confusing: While most grep commands attempt to be interchangeable with one another, not all grep commands are exactly the same.

This article pertains specifically to GNU grep, the default grep on Linux systems, but here’s some information in case you’re using another member of the UNIX family.

  • BSD (and systems primarily using BSD tools): Ships with the BSD version of grep, but GNU grep is available in the ports tree.
  • Illumos: Some distributions ship with the Sun version of grep, which can differ from GNU’s and BSD’s versions. GNU grep is available from your Illumos distribution’s repository.
  • Solaris: Ships with the Sun version of grep, which can differ from GNU’s and BSD’s versions. GNU grep is available from OpenCSW.

And finally, you can get grep and many more commands on Windows by installing the open source Cygwin package, which provides a vast collection of GNU and open source tools.

Search for a string in text

The canonical use of grep is searching for a precise string of characters in some greater body of text, and returning the line or lines containing successful matches. Here’s an example:

$ grep BSD example.txt
NetBSD 
OpenBSD

Search for a string in a stream of text

Another common way to use grep is with a pipe, making it a sort of filter. This technique has some advantages. One is helping to narrow grep's scope by searching through only the results of another process. For example, this command searches for iana only in the last 10 lines of example.com's source code, instead of searching the whole page:

$ curl example.com | tail | grep iana
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                             Dload  Upload   Total   Spent    Left  Speed
100  1270  100  1270    0     0   3177      0 --:--:-- --:--:-- --:--:--  3175
<p><a href="http://www.iana.org/domains/example">More information...</a></p>

This tactic also enables grep to be used in situations when it otherwise might not be effective. For instance, you can’t normally grep through a binary file, since binary files don’t contain much raw text. Yet, you can use a command such as strings to extract just the binary file’s plain text, and then use grep on the results like this:

$ strings example.xcf | grep gimp
gimp xcf v011
gimp-image-grid
gimp-image-metadata
gimprsttuvvutrqpljjiijlnv{
klmnmlkjgimpsvxwqjeehlnopg

Add context to your results

A line of text is considered a string of characters terminating with a new line character—specifically, 0x0D0A or \r\n, the carriage return (CR) and line feed (LF) ASCII characters respectively. If your text file has extra long lines, then your results can contain a lot more data than you anticipate since grep doesn’t lift the string out of context. It returns the entire line.

The --only-matching (or -o for short) grep option prints only the matching part of a line. For added context, use the --line-number option (-n for short) to see the line number where the matched pattern appears in the file. For example:

$ grep --only-matching --line-number Fedora example.txt
2:Fedora

A common way to get context about how—or why—a pattern appears in a file is to view the line above the match, or the line just after it, or both. There’s a trio of options for doing this, and they’re as easy to remember as A-B-C (literally):

  • --after-context (or -A) displays a specified number of lines after your match
  • --before-context (or -B) displays a specified number of lines before your match
  • --context (or -C) displays a specified number of lines before and after your match

For example, to see two lines before a matched pattern:

$ grep Baz -B2 metasyntactic.list
Foo
Bar
Baz

To see three lines after a match:

$ grep Baz -A3 metasyntactic.list
Baz
Qux
Quux
Quuz

And to see two lines both before and after a match:

$ grep -C2  metasyntactic.list
Foo
Bar
Baz
Qux
Quux

Search many files at once

The grep command is flexible enough that you don’t have to just grep one file at a time, or even create a fancy for loop to cycle through each file you want to search.  You can list more than one file as the target, or use a wildcard character to target multiple files. By default, grep prints the name of any file with a match, and the full line containing the matched pattern like so:

$ grep Fedora distro.list example.txt fake.txt
distro.list:Fedora creates an innovative, free, and open source platform for hardware, clouds, and containers that enables software developers and community members to build tailored solutions for their users.
example.txt:Fedora Linux

You can get just the file’s name that contains matches with the --files-with-matches option (-l for short):

$ grep Fedora --files-with-matches distro.list example.txt fake.txt
distro.list
example.txt

And you can get the file’s name that contains no matches with --files-without-match (or -L):

$ grep Fedora --files-without-matches distro.list example.txt fake.txt
fake.txt

To search all files in all subdirectories of a specific folder, use --recursive or -r:

$ grep Norm --recursive --only-matching docbooks-xsl-1.79.1 
docbook-xsl-1.79.1/tests/refentry.007.xml:Norm
docbook-xsl-1.79.1/tests/refentry.007.xml:Norm
docbook-xsl-1.79.1/tests/refentry.007.xml:Norm
docbook-xsl-1.79.1/tests/refentry.007.ns.xml:Norm
...

Ignore case

Sometimes you don’t know or care whether a string is lowercase or uppercase. Use --ignore-case (or -i if you’re lazy, or if it’s all your version of grep allows) for case-insensitivity. For example:

$ grep --ignore-case fedora example.txt 
Fedora Linux

Instead of returning all of the successful matches when searching a file, grep can be inverted such that it returns only non-matching lines. The POSIX option for this feature is -v, so it should work across most grep versions. The mnemonic-friendly option used by GNU is --invert-match. Here’s an example:

$ grep --invert-match fedora example.txt 
Debian Linux
Mageia Linux
Slackware Linux
NetBSD
OpenBSD
...

Regular expressions

Regular expressions (or “regex”) are too big a topic to cover here, and are luckily covered quite well in another article. As that article states, regular expressions have been a part of the UNIX power user’s repertoire since the early days. It’s no surprise that grep makes great use of them. After all, “regular expression” is in its name.

The only regular expression many need in order to use grep is the absolute (literal) value of the string you’re searching for, which is what all the examples up to this point use. However, getting comfortable with regex makes grep even more powerful.

Not all regular expressions are the same. The term describes the process of inventing patterns to match something. Anyone can invent a schema for regular expressions, and so grep's regex is only one ruleset among many.

The best place to start learning regex for grep is its GNU info page, in the section called Regular Expressions. You can find this page with this command:

$ info grep "Regular Expressions"

The basic characters are these:

  • A dot (.) matches any single character. For instance, s.urce matches both source and sauce.
  • A ? denotes that the preceding item in a regular expression is optional, and matched at most once. So qu?x matches qux or quix but never quux because u appears only once in the regex.
  • The * wildcard is also based on whatever precedes it, dictating that an item must match zero or more times. This behavior is different from the Bash shell’s use of *, which matches zero or more characters unconditionally, so don’t get confused if you’re used to Bash.
  • The + character also modifies its preceding character, dictating that an item must match one or more times.

Activate regular expressions in grep with the --extended-regexp option, or just -E for simplicity. This is such a common use case that most Linux distributions provide the shortcut egrep command to save you from having to type -E (although the GNU grep info page states that egrep is officially deprecated). Some regular expression syntax works with grep without using the -E option, but don’t let that fact fool you into complacency. While some characters do work either way, others get interpreted by your shell before grep can process them, so your regex won’t give you accurate results.

Here is a sample file, called metasyntactic.list:

foobar
foo
bar
baz
qux
quux
quuz
corge
grault
garply
waldo
fred
plugh
xyzzy
thud
wibble
wobble
wubble
flob

And here are some grep searches based on regular expressions.

$ grep -E b.+ metasyntactic.list
foobar
bar
baz
wibble
wobble
wubble
$ grep -E ^b.. metasyntactic.list
bar
baz

A possibly less cryptic option than using pure regex is POSIX regex notation, which is compatible with grep. POSIX regex allows for human-recognizable keywords, such as [:alpha:] or [:digit:] to identify what you want to match. Writing effective patterns is as much a puzzle as ever using this format, but at least the individual pieces you’re working with are easier to decipher.

To prevent your POSIX regex notation from being interpreted by Bash, use double brackets like you see here:

$ grep -E ^b[[:alpha:]]+
bar
baz
$ grep -E b[[:alpha:]]+$ metasyntactic.list 
foobar
bar
baz
wibble
wobble
wubble

There’s a lot more to grep's regular expressions. By sitting down with samples of what you intend to match, and then cross-referencing the grep info pages, you can learn to use this command effectively.

Grepping for success

The grep command is complex and capable. It’s excellent for quickly finding snippets of text in all manner of files and streams of data. The more you use grep, the more comfortable you become with it; and the more comfortable you are with it, the more of its many options you’ll learn. Start using grep for your commands and shell scripts s.+ner[[:space:]]than[[:space:]]l.t[[:alpha:]]r!


저자 소개

Seth Kenlon is a Linux geek, open source enthusiast, free culture advocate, and tabletop gamer. Between gigs in the film industry and the tech industry (not necessarily exclusive of one another), he likes to design games and hack on code (also not necessarily exclusive of one another).

Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

채널별 검색

automation icon

오토메이션

기술, 팀, 인프라를 위한 IT 자동화 최신 동향

AI icon

인공지능

고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트

open hybrid cloud icon

오픈 하이브리드 클라우드

하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요

security icon

보안

환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보

edge icon

엣지 컴퓨팅

엣지에서의 운영을 단순화하는 플랫폼 업데이트

Infrastructure icon

인프라

세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보

application development icon

애플리케이션

복잡한 애플리케이션에 대한 솔루션 더 보기

Original series icon

오리지널 쇼

엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리