Sie sind auf Seite 1von 50

BASIC FEATURES OF UNIX OPERATING SYSTEM

It is written in high-level language, 'C' making it easy to port to different configurations.


 It is a good operating system, especially, for programs. UNIX programming environment is unusually rich and
productive. It provides features that allow complex programs to be built from simpler programs.
 It uses a hierarchical file system that allows easy maintenance and efficient implementation.
 It uses a consistent format for files, the byte stream, making application programs easier to write.
 It is a multi-user, multiprocess system. Each user can execute several processes simultaneously.
 It hides the machine architecture from the user, making it easier to write programs that run on different
hardware implementation.
UNIX System Architecture AS do most computer systems, UNIX consists of two separable parts: the Kernel and System
programs. We can view the UNIX operating system as being layered as shown in figure 1.
Application Programs Created by Users
Shells editor and commands
(who, we, grep, comp)
complilers and enterpretens system libraries
System call interface to the kernel
Signals File system
CPU scheduling,
terminal handling swapping
pages replacement,
character I/O block I/O system
demand paging,
system terminal disk & tape
visual memory
drivers drivers
Kernel interface to Hardware:
Terminal Device controllers Memory
controllers disks and Controllers
terminals tapes Physical memory
Figure 1 : Unix System Architerture
Everything below the system call interface and above the physical hardware is the Kernel. The Kernel provides the file
system, CPU Scheduling, memory management and other operating system functions through system calls.
Programs such as shell (Sh) and editors (vi) shown in the top layer interact with the Kernel by invoking a well defined set
of system calls. The system calls instruct the Kernel to do various operations for the calling programs and exchange data
between the Kernel and the program.
System calls for UNIX can be roughly grouped into three categories: file manipulation, process control and information
manipulation. Another category can be considered for device manipulation, but, since devices in UNIX are treated as
(special) files, the same system calls support both files and devices.
FILE STRUCTURE A file in UNIX is a sequence of bytes. Different programs expect various levels of structure, but the
Kernel does not impose any structure on files, and no meaning is attached to its contents - the meaning of the bytes
depends solely on the programs that interpret the file. This is not true of just disc files but of peripheral devices as well.
Magnetic tapes, mail messages, character typed on the keyboard, line printer output, data flowing in pipes - each of
these is just a sequence of bytes as far as the system and the programs in it are concerned.
Files are organized in tree-structured directories.
Directories are themselves files that contain
information on how to find other files. A path
name to a file is a text string that identifies a file
by specifying a path through the directory
structure to the file. Syntactically it contains of
individual file name elements separated by the
slash character. For example, in
/usr/Akshay/data, the first slash indicates the
root of the directory tree, called the root
directory. The next element, usr, is a subdirectory
of the root, Akshay is a subdirectory of usr and
data is a file or a directory in the directory
Akshay.
Figure 2 shows a typical UNIX file systems.
The file system is organised as a tree with a single root node called root (written "f); every non-leaf-node of the file
system structure is a directory of files, and files at the leaf nodes of the tree are either directories or regular files or

1
special device files. ldey contains device files, such as /dev/console, /dev/lp0, /dev/mt0 and so on; /bin contains the
binaries of essential UNIX system programs.
Create, open, read, write, close, uplink and trunc are system calls which are used for basic file manipulation. 7be create
system call, given a pathname, creates a (empty) file (or truncates and existing one). An existing file is opened by the
open system call, which takes a path name and a node (such as read, write or read-write) and returns a small descriptor
which may then be passed to a read or write system call (along with a buffer address and a number of bytes to transfer)
to perform data transfer to or from the file.
A file descriptor is an index into a small table of open files for this process. Descriptors start at 0 and seldom get higher
than 6 or 7 for typical programs, depending on the maximum number of simultaneously open files.
Each read or write updates the current offset into the file, which is associated with file table entry and is used to
determine the position in the file for the next read or write.
Processing Environment
A process is a program in execution. Many processes can execute simultaneously on UNIX System (this feature is
sometimes called multiprogramming or multitasking) with no logical unit to their number, and many instances of a
program (such as copy) can exist, simultaneously in the system. Various system calls allow processes to create new
processes, terminate processes, synchronize stages of process execution, and communicate with the rest of the world.
For example, in UNIX new processes are created by the fork system call. Every process except process 0 is created when
another process executes the fork system call. The process that invoked the fork system is the parent process and the
newly created process is the child process. Every process has one parent process, but a process can have many child
processes. The Kernel identifies each process by its process number, called the , process ID (PID). Process 0 is a special
process that is created when the System boots; after forking a child process (process 1), process 0 becomes the swapper
process. Process 1, known as init, is the ancestor of every process in the system.
CPU SCHEDULING
CPU scheduling in UNIX is designed to benefit interactive processes. Processes are given small CPU time slices by a
priority algorithm that reduces to round-robin scheduling for CPU-bound jobs.
The scheduler on UNIX system belongs to the general class of operating system schedulers known as round robin with
multilevel feedback which means that the kernel allocates the CPU time to a process for small time slice, preempts a
process that exceeds its time slice and feed it back into one of several priority queues. A process may need many
iterations through the "feedback loop" before it finishes. When kernel does a context switch and restores the context of
a process. the process resumes execution from the point where it had been suspended.
Each process table entry contains a priority field. There is a process table for each process which contains a priority field
for process scheduling. The priority of a process, is lower if they have recently used the CPU and vice versa.
The more CPU time a process accumulates, the lower (more positive) its priority becomes, and vice versa, so there is
negative feedback in CPU scheduling and it is difficult for a single process to take all the CPU time. Process aging is
employed to prevent starvation.
Older UNIX systems used a 1-second quantum for the round- robin scheduling. 4.33SD reschedules processes every 0.1
second and recomputes priorities every second. The round-robin scheduling is accomplished by the -time-out
mechanism, which tells the clock interrupt driver to call a kernel subroutine after a specified interval; the subroutine to
be called in this case causes the rescheduling and then resubmits a time-out to call itself again. The priority
recomputation is also timed by a subroutine that resubmits a time-out for itselfevent. The kernel primitive used for this
purpose is called sleep (not to be confused with the user-level library routine of the same name.) It takes an argument,
which is by convention the address of a kernel data structure related to an event that the process wants to occur before
that process is awakened. When the event occurs, the system process that knows about it calls wakeup with the address
corresponding to the event, and all processes that had done a sleep on the same address are put in the ready queue to
be run.
MEMORY MANAGEMENT
The CPU scheduling is strongly influenced by memory management schemes. At least part of a process must be
contained in primary memory to run; a process cannot be executed by a CPU if it is existing entirely in main memory. It is
not also possible to contain all active processes in the main memory. For example 4MB main memory will not be able to
provide space for 5MB process. It is the job of memory management module to decide which process should reside (at
least partially) in main memory, and manage the parts of the virtual address of a process which are residing on
secondary storage devices. It monitors the amount of physical memory and provide swapping of processes between
physical memory and secondary storage devices.
Swapping
The early development of UNIX systems transferred entire processes between primary memory and secondary storage
device but did not transfer parts of a process independently, except for shared text Such a memory management policy
is called swapping. UNIX was first implemented on PDP-11, where the total physical memory was limited to 256Kbytes.

2
The total memory resources were insufficient to justify or support complex memory management algorithms. Thus,
UNIX swapped entire process memory images.
Allocation of both main memory and swap space is done first- fit. When the size of a process memory image increases
(due to either stack expansion or data expansion), a new piece of memory big enough for the whole image is allocated.
The memory image is copied, the old memory is freed, and the appropriate tables are updated. (An attempt is made in
some systems to find memory contiguous to the end of the current piece, to avoid some copying.) If no single piece of
main memory is large enough, the process is swapped out such that it will be swapped back in with the new size.
There is no need to swap out a sharable text segment, because it is read-only, and there is no need to read in a sharable
text segment for a process when another instance is already in memory. That is one of the main reasons for keeping
track of sharable text segments: less swap traffic. The other reason is the reduced amount of main memory required for
multiple processes using the same text segment.
Decisions regarding which processes to swap in or swap out are made by the scheduler process (also known as the
swapper). The scheduler wakes up at least once every 4 seconds to check for processes to be swapped in or out. A
process is more likely to be swapped out if it is idle or has been in main memory for a long time, or is large; if no obvious
candidates are found, other processes are picked by age. A process is more likely to be swapped in if its has been
swapped out a long time, or is small. There are checks to prevent thrashing, basically by not letting a process be
swapped out if it's not been in memory for a certain amount of time.
If jobs do not need to be swapped out, the process table is searched for a process deserving to be brought in
(determined by how small the process is and how long it has been swapped out). Processes are swapped out until there
is not enough memory available.
Many UNIX systems still use the swapping scheme just described. All Berkeley UNIX systems, on the other hand, depend
primarily on paging for memory-contention management, and depend only secondarily on swapping. A scheme similar
in outline to the traditional one is used to determine which processes get swapped in or out. but the details differ and
the influence of swapping is less.
Demand Paging
Berkeley introduced demand paging to UNIX with BSD (Berkeley System) which transferred memory pages instead of
processes to and from a secondary device; recent releases of UNIX system also support demand paging. Demand paging
is done in a straightforward manner. When a process needs a page and the page is not there, a page fault to the kernel
occurs, a frame of main memory is allocated, and then the process is loaded into the frame by the kernel.
The advantage of demand paging policy is that it permits greater flexibility in mapping the virtual address of a process
into the physical memory of a machine, usually allowing the size of a process to be greater than the amount of
availability of physical memory and allowing more Processes to fit into main memory. The advantage of a swapping
policy is that is easier to implement and results in less system overhead.
FILE SYSTEM
The UNIX file system supports two main objects: files and directories. Directories are just files with a special format, so
the representation of a file is the basic UNIX concept.
Blocks and Fragments
Most of the file system is taken up by data blocks, which contain whatever the users have put in their files. Let us
consider how these data blocks are stored on the disk.
The hardware disk sector is usually 512 bytes. A block size larger than 512 bytes is desirable for speed. However,
because UNIX file systems usually contain a very large number of small files, much larger blocks would cause excessive
internal fragmentation. That is why the earlier 4.IBSD file system was limited to a 1024-byte (IK) block.
The 4.2BSD solution is to use two block sizes for files which have no indirect blocks: all the blocks of a file are of a large
block size (such as 8K), except the last. The last block is an appropriate multiple of a smaller fragment size (for example,
1024) to fill out the file. Thus, a file of size 18,000 bytes would have, two 8K blocks and one 2K fragment (which would
not be filled completely).
The block and fragment sizes are set during file-system creation according to the intended use of the file system: If many
small files are expected, the fragment size should be small; if repeated transfers of large files are expected, the basic
block size should be large, Implementation details force a maximum block-to-fragment ratio of 8:1, and a minimum
block size of 4K, so typical choices are 4096: 512 for the former case and 8 192: 1024 for the latter.
Suppose data are written to a file in transfer sizes of 1K bytes, and the block and fragment sizes of the file system are 4K
and 512 bytes. The file system will allocate a 1K fragment to contain the data from the first transfer. The next transfer
will cause a new 2K fragment to be allocated. The data from the original fragment must be copied into this new
fragment, followed by the second 1 K transfer. The a] location routines do attempt to find the required space on the disk
immediately following the existing ferment so that no copying is necessary, but, if they cannot do so, up to seven copies
may be required before the fragment becomes a block. Provisions have been made for programs to discover the block
size for a file so that transfers of that size can be made, to avoid fragment recopying.

3
Inodes
Associated with each file in LTNIX is a little table (on disk) called an i-node. An inode is a record that describes the
attributes of a file, including the lay out of its data on disk. Inodes exist in a static form on disk and the kernel read them
into the main memory and manipulates them. Disk inodes consist of the following fields:
 File owner identifier - File ownership is divided between an individual owner and a group owner and defines the
set of users who have access rights to a file. There supervisor has access rights to all files in the system.
 File type - Files may be of type regular, directory, character or block special or pipes.
 File access permission - The system protects files according to three classes: the owner and the group owner of
the file and other users; each class has access rights to read, write and execute the file which can be set
individually. Although directory is a file but it cannot be executed, execution permission for a directory gives the
right to search the directory, for a file name.
 File access times - Giving the time the file was last modified, when it was last accessed.
In addition, the inode contains 15 pointers to the disk blocks containing the data contents of the file. The first 12 of
these pointers (as shown in figure 3) point to direct blocks; that is, they contain addresses of blocks that contain data of
Figure 3 : Direct and indirect block of inode
the file. Thus, the data for small files (no more
than 12 blocks) can be referenced immediately,
because a copy of the inode is kept in main
memory while a file is open. If the block size is 4K,
then up to 48K of data may be accessed directly
from the inode.
The next three pointers in the inode point to
indirect blocks. If the file is large enough to use
indirect blocks, the indirect blocks are each of the
major block size; the fragment size applies to only
data blocks. The first indirect block pointer is the
address of a single indirect block. The single
indirect block is an index block, containing not
data, but rather the addresses of blocks that do
contain data. Then, there is a double-indirect-
block pointer, the address of a block that contains
the addresses of blocks that contain pointers to
the actual data blocks.
The last pointer would contain the address of a
triple indirect block; however, there is no need for
it. The minimum block size for a file system in
4.2BSD is 4K, so files wit as many as 232 bytes will
use only double, not triple, indirection. That is, as
each block pointer takes 4 bytes, we have 49,152 (4K x 12) bytes accessible in direct blocks, 4,194,304 bytes accessible
by a single indirection, and 4,294,967,296 bytes reachable through double indirection, for a total of 4,299,210,752 bytes,
which is larger than 232 bytes.
The number 232 is significant because the file offset in the file structure in main memory is kept in a 32-bit word. Files
therefore cannot be larger than 232 bytes. Since file pointers are signed integers (for seeking backward and forward in a
file), the actual maximum file size is 232-1 bytes. Two gigabytes is large enough for most purposes.
Directory Structure
Before a file can be read, it must be opened. When a file is opened, the operating system uses the path name supplied
by the user to locate the disk blocks, so that it can read and write the file later. Mapping path names onto i-nodes (or
the equivalent) brings us to the subject of how directory systems are organized. These vary from quite simple to
reasonably sophisticated.
Now let us consider some examples of
systems with hierarchical directory
trees. Figure 4 shows an MS-DOS
directory entry. It is 32 bytes long and
contains the file name and the first block
number, among other items. The first
block number can be used as an index into the FAT, to find the second block number, and so on. In this way all the blocks
can be found a given file. Except for the root directory, which is fixed size (1 12 entries for a 360K disk). MS-DOS
directories am files and may contain an arbitrary number of entries.
4
The directory structure used in UNIX is extremely simple, as
shown in figure 5. Each entry contains just a file name and its i-
node number. All the information about the type, size. times,
ownership, and disk blocks is contained in the i- node (see figure
3). All directories UNIX are files, and may contain arbitrarily many
of these entries. A Unix directory entry
When a file is opened, the file system must take the file name supplied and locate its disk blocks. Let us consider how
the path name /usr/ast/mbox is looked up. We will use UNIX as an example, but the algorithm is basically the same for
all hierarchical directory systems. First the file system locates the root directory. In UNIX its i-node is located at a fixed
place on the disk.
Then it looks up the first component of the path, usr, in the root directory to find the i-node, the system locates the
directory for/usr and looks up the next component, ast, in it. when it has found the entry for ast, it has the i node for
directory for /usr /ast. From this i-node it can find the directory itself and look up mbox. The i- node for this file is then
read into memory and kept there until the file is closed. The lookup process is illustrated in figure 6.

Figure 6: The steps In looking up/usr/ast/mbox


Relative path names are looked up the same way as absolute ones, only starting from the working directory instead of
starting from the root directory. Every directory has entries for and which are put there when the directory is created.
The entry has the i node number for the parent directory, and searches that directory for disk. No special mechanism is
needed to handle these names. As far as the directory system is concerned, they are just ordinary ASCII strings.

Figure 3 : Direct and indirect block of inode

5
the file. Thus, the data for small files (no more than 12 blocks) can be referenced immediately, because a copy of the
inode is kept in main memory while a file is open. If the block size is 4K, then up to 48K of data may be accessed directly
from the inode.
The next three pointers in the inode point to indirect blocks. If the file is large enough to use indirect blocks, the indirect
blocks are each of the major block size; the fragment size applies to only data blocks. The first indirect block pointer is
the address of a single indirect block. The single indirect block is an index block, containing not data, but rather the
addresses of blocks that do contain data. Then, there is a double-indirect-block pointer, the address of a block that
contains the addresses of blocks that contain pointers to the actual data blocks.
The last pointer would contain the address of a triple indirect block; however, there is no need for it. The minimum
block size for a file system in 4.2BSD is 4K, so files wit as many as 232 bytes will use only double, not triple, indirection.
That is, as each block pointer takes 4 bytes, we have 49,152 (4K x 12) bytes accessible in direct blocks, 4,194,304 bytes
accessible by a single indirection, and 4,294,967,296 bytes reachable through double indirection, for a total of
4,299,210,752 bytes, which is larger than 232 bytes.
The number 232 is significant because the file offset in the file structure in main memory is kept in a 32-bit word. Files
therefore cannot be larger than 232 bytes. Since file pointers are signed integers (for seeking backward and forward in a
file), the actual maximum file size is 232-1 bytes. Two gigabytes is large enough for most purposes.
User Names and Groups
Every LTNIX user is given a name when he is allowed access to a UNIX system. This is also called an account, as in
commercial arrangements an account is kept of the usage of the machine by each user. The user name need not have
any relation to the actual name of the user, though it quite often is some abbreviation of the name. For example a
person called Ram Kumar would usually he given a user name kumarr on a UNIX system. This is formed by his surname
(abbreviated if it is too long) and the first letter of his first name. It is quite possible for a person to have more than one
account on a single machine (in a different name) especially if the person uses the machine in more than one capacity.
For example Ram Kumar might be working on two programming projects, in both of which he is part of a team. On his
cryptography project he might have a name crypt02 and on his natural language processing project he might have a
name like n1p04. The reason for this kind of arrangement has to do with his access rights and privileges in his different
projects. Also if Ram Kumar leaves the company his successor Zafar Khan might continue his work on nlp04 while
somebody else is assigned to work on crypt02. One of the things that motivated the designers of UNIX was their desire
for easy sharing of information, consistent with the needs of security and privacy. So UNIX allows user names to be
grouped together under a common group name. All users belonging to the same group can share group privileges.
Some user names are reserved by UNIX for its use, for example bin and uucp. So you cannot use these names for
yourself. There is also a special kind of user on every UNIX system who has all possible access rights on the system. This
user is called the super user, the system administrator or simply root because that is the user name conventionally
allotted to him. For administrative convenience large systems can have more than one super user account. The super
user is the one who can create new user accounts, shutdown the system and perform other maintenance, tasks.
You might be wondering why everybody cannot access the computer as root. The reason is that when you are granted
access to a computer system you are assigned a user name as well as a password. You can set your password to
whatever you want subject to certain constraints. So you cannot enter the computer as root unless you know the root
password. The root password is zealously protected on any well maintained installation as public knowledge of this
password would compromise the security of the installation.
While root can access all user files and override any system protection meant for mere mortals, nobody can figure out
what your password is. However root can change or remove your password.
Logging in
You will now learn how to gain access to a UNIX system so that you can use its facilities. This process is called logging in
to the computer. To be able to login to a machine you must have a valid user account on it and you must know your
password. Your account would have been created for you by the system administrator when you were allowed to use
the computer. At that time you would also have been told your first password. When you see your terminal it would be
displaying a message like
IGNOU UNIX computer
login:
The actual message on the first line depends on the installation. This could even be absent. The message does not affect
anything else you do in any way.
You should now type in your user name and press the RETURN key. In most cases you have to press the return key for
the computer to register what you have typed. This key is sometimes labelled as ENTER. You will find that as you type on
the terminal screen you will be able to see whatever you have typed. This is because UNIX usually echoes whatever you
type on the terminal. So your screen should now look like this
IGNOU UNIX computer
login: kumarr Password:
6
You must type in your user name, also called the login name, exactly as allocated by the super user. This is because UNIX
is case sensitive, that is, it distinguishes between lower case and upper case letters. In this respect it differs from
operating systems like VMS. So be careful of small and capital letters while working on UNIX.
When MX asks you for your password, key it in carefully. Notice that your password is not echoed as you type. In fact
the cursor does not move at all. This is to prevent somebody from reading your password over you shoulder, as that
would enable that person to masquerade as you by logging into the computer in your name and using it.
UNIX now checks whether you are a valid user and whether you entered the correct Password. If there is any mistake
you get a message saying
Login incorrect
login:
This means you can try to login again. There can be other reasons why you might not be able to login even though you
are a valid user and did not make any typing mistakes. The messages you get in those situations will however be
different.
Why be so pessimistic? Let us assume you have managed to login successfully. The system will then display some
messages and finally give you a sign that it is now ready to obey your commands. Ale messages you see depend on how
the system has been configured or set up by the system administrator and by you. So you might not even see any
messages. However usually there is a message indicating when you logged in last. This is useful because if the date and
time mentioned there are different from what you remember about your last login, it could mean that somebody else is
using your account
Let us now look at some of the other common types of messages you see on most systems as you login. These usually
give some information about the system like the space available on the machine, news about the system and whether
you have any mail. The news is called the message of the day and appears whenever you login. The message
You have mail.
After the login messages you see a prompt, which is the sign that UNIX is ready for your commands. The prompt can be
changed to whatever you like but the default prompt also depends on what shell you have been assigned. One of the
most common shells is the C shell, which has the following prompt by default
%
This is the prompt we will use throughout the block unless some other prompt is explicitly called for. When you see the
prompt on your terminal it normally means that UNIX has finished executing the last command you gave it and is ready
for your next command.
On some UNIX installations there is a limit on the number of attempts, say five, you can make at logging in. The action
taken depends on the installation but can be alerting the system administrator or deactivating the terminal, perhaps for
a short time only. So you should be careful not to make too many typing mistakes. In particular be careful not to forget
or mistype your password and avoid passwords with certain characters like #.
Correcting Typing Mistakes
Many of us are not professional typists and we make a lot of mistakes while typing. In any case all of us are human
beings and are prone to error. Whether you are a one or two finger expert or know touch typewriting, you are going to
mistype your commands some time or the other. What do you do when you want to find out in a session whether you
have any sundays left in the month? Normally you would use the cal command thus
% cal
Suppose now that by mistake you
% csl
After you press the return key UNIX will say
csl: Command not found
if you are lucky and a command csl does not exist. If it does it will be executed and you could well be in deep trouble
depending on what csl does.
You would therefore do better to cancel your command or correct your mistake. These actions can be accomplished by
using the kill and erase characters respectively. The kill character cancels the entire line you typed.
On most UNIX installations you can use the backspace key to erase the previous character. On some terminals the erase
character is ^H. This means that you have to type H while holding down the CONTROL key. This key is usually marked
CTRL or CTL. It is usually located on both sides of the keyboard near the Shift keys. In this block we will write ^H to mean
CONTROL-H and you must be careful not to confuse this with the two separate characters ^(circumflex) and H.
Every time you press the erase key the cursor moves back one character after deleting the last character. So to correct
your mistake
% csl
you should press the erase character twice so that you see
%c
and then retype 'a' and 'I' correctly. You can then press the ENTER key to run the cal command.
7
The line kill character tells UNIX to kill the line, that is, to ignore everything on the line. You do not get the prompt after
typing this character unless you press the RETURN key. The line kill character is usually @ but can be changed to
something else.
There is a command called stty which enables you to see what the erase and line kill character are. It also allows you to
change them if you wish. The command allows you to examine and alter many other terminal settings as well, but for
the moment we will not consider anything else other than erase and line kill. Just type in
% stty
and observe the output. It will, among other things, say something like
erase ^Hkill @
This means that your erase character is ^H and your line kill character is @.
Suppose you want to change your kill character to ^X. You can do this by running the command
% stty kill ^X
Now typing @ has no effect on the command you type other than putting an @ as part of your command. It no longer
kills your command line. You can similarly set your erase character to # by saying
% stty erase #
Both the settings can be changed at one stroke by saying
% stty kill @ erase ^H
Any changes you make using stty will remain in effect only for the duration of your current login session or until you use
the command to make more changes. Next time you login the characters revert to whatever the system administrator
has configured them. You might wonder what will happen if you set the erase character to something like '1'. You can
see this in the Bourne shell which has the default prompt of a $ sign. Try
% sh
$ stty erase 1
and now try running the cal command. You will not be able to convey cal to the computer because '1' is taken as an
instruction to erase the previous character. It is for this reason that the erase and line kill characters are usually not set
to any characters you use commonly as part of commands. Thus it is better not to set them to letters, digits or hyphens
or even other commonly used special, characters.
However if you insist on rising '1'as your erase character you can still run the cal command by typing
$ ca\l
The '\' is called the escape character because it turns off any special meaning attached by the system to the character
immediately after it. This act is called escaping the character. If the character immediately after '\' has no special
meaning then the '\' has no effect. So you can type
$ \c\a\1
which will have the same effect as
$ ca\l
Format of UNIX Commands
We will now look at the general format of UNIX commands and take the opportunity to study some simple commands.
Let us go back to the C shell and the cal command which we mentioned earlier.
$ ^D
% cal
This gives the calendar for the current month and year (of course this will depend on what the system date has been set
to and is what the computer will believe to be the current month and year), and you can use it to say, find out how many
sundays are left in the month. Another simple command is
% date
which displays the current system date and time. You will realise that the computer has no way of knowing what the
current date and time really are, so what it can tell you is only what it thinks is the current date and time. This can be set
by the system administrator to almost anything but in most installations, especially those that are networked with other
computers, care is taken to see that the date is set correctly. The date in UNIX means date and time, so the output of the
command is something like
Wed Jun 15 13:44:39 IST 1994
Notice that the time zone is part of the output. This is significant when you are on a network spanning time zones.
Another simple one word command is
% who
Kumarr tty03 Jun 15 11:49
Khanz tty05 Jun 15 10:22
nlp04 tty07 Jun 14 23:36

8
This tells you the names of the users currently logged in to the system, their terminal numbers and the date they logged
in. You will find that you will always be listed as one of the users, since you usually run the commands only when you are
logged in to the machine. There is another form of the who command which you can now try out.
% who am i
Kumarr tty03 Jun 15 11:49
This time we have given the arguments am i to the actual command who. lie result is similar to that obtained earlier, but
now you are the only user listed. This command has the effect of telling you the login name of the user currently logged
in at that terminal, the terminal number of the terminal and the date the user logged in. Other users of the system are
not listed.
Arguments to commands are separated from the command by one or more spaces. It might seem silly to ask the
computer who you are, but if the previous user has not terminated his session, you can find who it was by this
command. But you would do well never to leave your terminal unattended while you are logged in, as it would be a
security lapse. Some versions of UNIX provide the command
% who are you
which is synonymous with who am i, but sounds much more intelligent.
You have now seen the general format of UNIX commands, which comprises of the basic command followed by zero or
more arguments. The command and the various arguments are separated by one or more spaces and the whole
sequence is terminated by the newline character, which is produced when the ENTER key is pressed.
You can enter more than one command on the same line by separating the commands from one another with
semicolons like this
% date; who
Wed Jun 15 14:02:11 IST 1994
kumarr tty03 Jun 15 11:49
nlp04 tty07 Jun 14 23:36
crypt02 tty08 Jun 15 13:57
The commands are executed one after the other in the order they were specified on the command line. After the last
command is over you get the prompt again.
Arguments to commands should not contain spaces otherwise the different words of the argument would be
interpreted as different arguments by the computer. If for some reason the argument needs to contain a space, you
must enclose the argument in double quotes (") or in single quotes (').
Most arguments to commands are filenames (discussed later in this unit), options or expressions. All of these could
occur in the same command. The exact order in which thee arguments are listed can depend on the command and
should be ascertained by examining the documentation for that command. Usually options immediately follow the
command with the expressions and filenames coming next. You will see details of such cases later when we study more
complex commands than the ones we have looked at so far.
If an argument itself contains quotes of one kind you can enclose it in quotes of the other kind. Thus
% grep -n "Ram Kumar's Salary" employee payroll
looks for the expression Ram Kumar's Salary in the files employee and payroll and prints the line numbers of the lines in
which the expression is found.
Sometimes the shell places restrictions on the use of certain Characters because it interprets them in some special way.
To use these characters in arguments, you have to use quotes. The details of the C-shell are discussed in unit 4.
Changing Your Password
You saw earlier that your password was the only way of preventing somebody else from using your account on the
system. Without it anybody who knew your login name could walk up to the machine and start using your account. This
would be really serious in the case of the super user or root.
When you are first given your account you are told what your password is. On some installations your account is set up
without a password and you are asked to choose one for yourself the first time you login. Ibis can be done with the
command
% passwd
Changing password for kumarr
Old password: pi, 14
New password: expl=2.71
Re-enter new password: expl=2.71
Note that unlike the commands you saw so far, the passwd command is interactive. It asks you to enter some
information rather than doing all the work by itself. The first thing it asks for is your current (or old) password. This is to

9
make sure that somebody else cannot change your password while you have left your terminal unattended. If the wrong
password is entered here, the computer says Sorry and gives you back the prompt.
If you enter the old password correctly you are asked to type in your new password. After that you are asked to enter it
again. If you type two different things here, the system tells you that they do not match and asks you to try again. If you
keep getting mismatches the command terminates after telling you to try again later. This is because if you cannot
change your password you are unlikely to be able to enter it correctly to login.
Although in the example above we have shown the passwords, in an actual session none of the passwords will be
echoed. Your system will probably have restrictions on what passwords you can choose. The password should not be too
short or too long. You should change it periodically so that if someone has been using your account by laying hands on
your password, they cannot continue to do so indefinitely.
If you are wondering how the Passwords are stored on the machine such that even the super user cannot find out what
your password is, the answer is that UNIX encrypts your password before storing it. This means that what is stored on
the computer bears no resemblance at all to what you typed in as your password. When you try to login the next time,
UNIX again encrypts the password you type in and compares it with what has been stored.
If the two are the same, you are allowed to login, otherwise your attempt is blocked. So while a super user, or anyone
else for that matter. can read your encrypted password, nobody can find out what the actual password is ---- at least not
easily.
There is another form of the passwd command which is used to change the password of some other user. By default the
passwd command allows you to change the password of the user who is logged in to the terminal. So to change the
password of khanz, you could say
% Passwd khanz
where the name of the user whose password is to be changed forms the argument to the passwd command. The rest of
the behaviour of the command is just as before. You will now realise that you can change the password of any user,
including your own, only if you know his current password. On your system you might simply get the following message
if you try to change somebody else's password
Permission denied.
How then does root have the power to change Your Password? Ah! When the user executing the passwd command is
the super user, UNIX does not ask it to supply the old password. This is how the super user can change your password to
anything without knowing what it is currently.
UNIX Documentation
UNIX comes with copious documentation, some of which is often available on-line. You should learn how to use the
UNIX manuals. While we will not discuss this topic in detail here, you will have to acquire this skill if you want to obtain a
good understanding of UNIX. This is because in this block we do not have the space to consider any but the most basic
commands, and even those only briefly. We will not even be able to consider all the options available with many of the
commands that we do discuss. The only way for you to master them will be by consulting the documentation.
If any documentation is available on-line at your installation, you can look up the manual entry for a command by using
the man command. For example, to learn more about the who command than what we have talked of, say
% man who
You can similarly learn more about the date, cal or any other command. So to learn more about the man command
itself, say
% man man
If the documentation is not on-line you will have to use the printed UNIX manuals.
FILES AND DIRECTORIES
In this section we will describe the file and directory structures Of UNIX. Just as a paper file is something into which you
can put Papers and bunch a group of papers together, a UNIX file is something into which you can put data. A file has a
name, and this name is a property of the file rather than the data present in it at any given time. It is possible to change
the data in file. Thus UNIX commands can be made to operate on the data in a file as a group.
A file usually exists on the hard disk(s) of the computer. This will be the case when you are logged in to the machine and
are engaged in a session. The actual areas of the hard disk used by a file can change as the file is increased and
decreased in size. As you will see later, the size of a file in UNIX has a Precise technical meaning, and the size of a file
does not necessarily tell you the actual amount of data in it.
UNIX has three kinds of files-- ordinary, directory and special. You have already got an idea of what ordinary files are.
Special files will be discussed in unit 5 of this block.
Directory files contain information about about other files, including other directories or special files. A directory groups
its contents together hierarchically under itself, and a directory within a directory is called a subdirectory of the
directory at the higher level, also call the parent directory. Thus in UNIX the file system is like an inverted tree of
directories, starting as a root and going down to an arbitrary depth of hierarchically arranged levels.
We will now look at some of the files in UNIX and learn how to use the file structure.
10
Current Directory
Every user who is given an account on a UNIX system is also given a directory where he reaches on logging in. This
directory is also called the home directory. The current, working or current working directory is the directory in Which
you are currently located. On logging in, your current directory is normally your home directory. You can find out what
you current directory is at any time by using the command
% pwd
/usr/kumarr
This means that your current directory is called kumarr and is located under the directory usr, which is in turn located
under the root directory of course the actual home directory you are allotted will depend on your installation. By the
way pwd is one of the few UNIX commands which do not take any arguments or options.
The Output that pwd displays is called the full Pathname of your current working directory. This is also known as the
complete or absolute Pathname, that is, the pathname starting from root. You can refer to your directory by just saying
kumarr. But this is not unambiguous because there can be another directory called kurnarr under some other directory
as well. But no two directories or files on the same UNIX machine can have the same complete or full Pathname. The
various components of the path are separated from one another by slashes ( '/').
We have not yet talked of what a valid filename can be. Actually in UNIX there are no restrictions and a filename can
have any characters upto a maximum of 14. The same rules apply to directories as well. In some UNIX implementations
filenames can be of any arbitrary length. In Practice it is best to avoid certain characters in filenames because they have
special meaning to the shell.
Looking at the Directory Contents
We will now see how to look at the contents of a directory. The command is
% ls
This gives you a listing of all files in the current directory. If you have just been allotted your account and are logging in
for the first time. you will be in your home directory and that directory will be empty, that is, there will be no files in it.
Is has several options and it will take you some experimentation to understand them all. The first option we look at is
% ls -a
This is your first taste of UNIX options, so look at the command line carefully. The command ls is followed by at least one
space after which the hyphen or minus sign introduces the option letter. The -a option tells UNIX to list all files including
those that are 'hidden'. Hidden files are those which start with a '.'character. Unless the -a option is used, ls never lists
such files in its output. The output of ls is always sorted in some order, the default order being alphabetical. This sort
order can be altered by other options to Is which we will take up later. This is why the file (actually a directory) '.' is listed
before '..'in the output.
The '.' refers to the current directory and '..' to its parent. These are pronounced dot and dot dot respectively. In this
case '.' refers to the /usr/kumarr and '..' to /usr. The directory '/'or root is its own parent. This output is of course not
very interesting because your home directory is devoid of files and you do not yet know how to create any. So let us look
at some other directory. You can get the listing of any directory by supplying its name as an argument to ls. Thus to look
at the directory listing of the root directory use the command
% ls/
aardvark
bin
dev
etc
lib
lost+found
tmp
usr
We must caution you that it is very unlikely that you will see the same listing as shown here. It is self evident that the
listing will depend completely on the machine you are working on. However there are some files that will surely exist on
the root directory of a working installation. The directories from bin to usr are such files.
As you have seen the Is command lists one, file per line of output to see several names per line you can use
% ls -x
aardvark bin dev etc lib lost+found
tmp usr
Now the output is sorted from left to right on each line. Another variation is the -C option which sorts down each
column
% Is –C
Aardvark dev lib tmp usr
11
Bin etc lost+found
You might have found your output to be in one of these forms the first time itself. This would have been because your
system was configured to make the -x or -C option the default option for ls.
From the outputs so far you can get no indication of whether the files shown are ordinary files or directories. For this
you can use the -p option, which appends a '/' to every file name which is a directory. The '/' is not part of the name, so
do not get confused. For example
% ls -Cp/
Aardvark dev/ lib/ tmp/ usr/
bin/ etc/ lost+found/
Another such option is -F which also appends a '*' to every filename which is an executable file, that is, a command. Try
it out and see whether the result differs from the -p option.
If you have a really large directory you might want to use an option of ls which gives a very compact output
% ls -m /
aardvark, bin, dev, etc, lib, lost+found, tmp, usr
This gives you the filenames separated by commas.
You can see from the above that the contents of the root directory consist of both directories and ordinary files. The
directories here, or anywhere else, can themselves contain sub directories. To see the contents of /usr, you can say
% ls -xp /usr
bin/ khanz/ kumarr/ lib/ tmp/
On most systems you will see the names of user accounts in this directory. The -p or -F options will show you that they
are directories. You must have deduced that you are seeing the home directories of the users. You can also see your
own home directory here. But wait! When you logged in and checked the name with
% pwd
/usr/kumarr
you found your home directory Specified differently. Why is this so? We have seen in the last section that the pwd
command tells us the full, complete or absolute pathname of the current working directory. When we look at the
contents of /usr, kumarr is merely one of the directories under it, and is shown as such. To get the complete pathname
we must specify the preceding portion which is /usr. Thus he full or complete pathname is /usr/kumarr.
It will now be easy for you to realise that the bin You saw listed as one of the contents of the root directory, that is, '/', is
different from the bin listed under usr. The former has the full pathname /bin, whereas the complete pathname of the
latter is /usr/bin. You can now look at the contents of the other directories and try specifying their complete pathnames.
You can also try looking at their contents by providing relative pathnames. We will look at complete and relative
pathnames again in the next section. You would do well to understand pathnames, relative and absolute, thoroughly as
that will be necessary in navigating around the directory tree.
But let us now get back to our friend the ls command. One of the most useful and often used options is -1, which gives
the so called long listing of the directories asked for
% ls-1/
-rwxr-xr-x 1 root root 1298 May 14 09:26 aardvark
drwxr-xr-x 2 bin bin 1248 Jan 01 1970 Bin
Now this is a complicated looking Output, SO let us try and understand the meaning of this listing. The first column of
the output tells you whether the file is a directory or not. A '-' means that it is an ordinary file while a directory has a 'd'
in that position. So you now know another way of telling whether a file is a directory, apart from the -p and -F options
you have already looked at. The other 9 columns in the first field tell you about the permissions on that file. We will look
at these in detail in section 2.4.6.
The next field in the output is a number indicating the number of links to the file. For a file this shows the number of
names it has. In UNIX the same physical data may have several names, although it must have at least one. Each name is
a link to the file. Usually ordinary files have only one link, but if there are more it does not mean that there are that
many copies of the data in the file. There is only one physical copy of the data which can be referenced using any of its
names. In the case of directories the number of links tells you about the number of subdirectories it has.
The third field of the output shows the owner of the file. Root and bin are names reserved by UNIX for its use as we have
seen earlier. In some cases you might see a number like 207 instead of the user name.
The next field is the group name and in certain situations can be a number in the display. The user is a part of the group
shown here.
12
The fifth field is the size of the file in bytes. You already know that the size of a file in UNIX has a precise meaning which
is unrelated to the amount of data in it. However, do not be alarmed because in most cases the intuitive meaning of size
does hold good and the figures you see usually do represent the number of bytes of data in the file in question.
The next item of information is the date the file was last modified, and in the end the name of the file is shown.
You now know how to find out many useful things about the file. You should now look at the directory long listing of the
various system and other directories on your machine. In the course of this when you look at /bin you will see many
familiar names. For instance, who, pwd and ls itself will be found in the /bin directory. Actually /bin is where many of the
binaries or executables of the commands are to be found. There are other commands located under /usr/bin and /etc as
well.
We will now briefly look at three other options to the ls command. When a directory is given as an argument to ls you
get to see the contents of the directory. But suppose you want to check the permissions on a directory, say /usr/kumarr.
If you try
% ls -1 /usr/kumarr
you will see nothing because Is tries to list the contents of the directory and at present there is nothing in your home
directory. To see the desired output you could say
% ls -1 /usr
whereupon kumarr would be one of the entries. But this is awkward. The answer to this is the -d option
% ls -ld /usr/kumarr
which lists /usr/kumarr as a directory and shows all the information about it.
You have seen that ordinarily subdirectories are shown only as single entries and any files inside them are not shown. To
look at the contents of a directory and recursively of all subdirectories within it, use -R
% ls -R /usr
will show the contents of /usr and also recursively of every subdirectory inside it, down to ordinary files. Thus using
% ls-R/
you can see every file and directory on your system.
Another option is the reverse option. The -r option reverses the sort order of files displayed by ls. You can try this with
any option
% ls -r /
usr
tmp
lost+found
lib
etc
dev
bin
aardvark
So far you have given only directories as arguments to ls, but you can give it an ordinary file as well. It then lists only that
file if it exists. Moreover you can give any number of files or directories as arguments to ls and it will list whichever ones
exist.
If You feel out of breath after looking at these options, there are a few more we have not looked at. You are encouraged
to look up the documentation for Is and experiment with them. Many UNIX commands have zillions of options- getting
used to them all requires time and effort. But you will find that you soon get to know the options you use often. It is
probably best, when learning a new command, to concentrate on a few useful looking options only. As you use them
frequently you will get to know them well.
Then you can spend some time deepening your knowledge of the command by trying out the other options. Most
beginners get overwhelmed by the large number of options and do not know where to start or when to stop. You will
have to work out a method which suits you. Maybe you are the type who likes to learn everything about a command at
one go. But many people, including the author, find that building on a solid foundation of already know options is
easiest.
Absolute and Relative Pathnames
You saw in the last section how pathnames could be relative or absolute. Since the UNIX file system is logically
structured like an inverted tree, it is important to understand how to specify pathnames. Both methods can be used and
in UNIX it does not matter which approach you use in identifying the file you mean, as long as you are careful about
specifying it correctly. However there are situations where one or the other approach is more convenient. So you should
take the trouble to assimilate the concept and learn how to navigate around the file system with felicity. Let us look at a
typical directory hierarchy on a UNIX machine.

13
Of course the exact layout of the directory hierarchy on your machine is likely to be different. We will soon be looking at
some of the main directories and files on a UNIX system. For the moment though, just concentrate on learning how to
move around. You already understand what is meant by the current directory. This is the directory in which you are
located at any given time. If you say Is, it is the filenames in the current directory that are brought up for you to see. If
you have logged in as kumarr, you will probably land up in /usr/kumarr when you get your prompt unless it has been
arranged otherwise.
Now consider a file in /usr/kumarr/nlp like augcfg.C. Suppose you want to see the size of this file alone. For this you
need to use the Is command and provide the filename as an argument to it. In UNIX you can provide a pathname
(relative or absolute) as an argument to a command wherever you could otherwise provide a bare filename. So that
actually gives you three ways of accomplishing what you want to (we will assume that you have the required
permissions -- this will, in fact, be the usual situation) do.
Let us first use an absolute pathname. So you have to specify the filename starting from root or '/'. Thus your command
needs to be
% ls -1 /usr/kumarr/nlp/augcfg.C
You have already used this method in the last section. The second way is to use a relative pathname, where you specify
the pathname relative to where are currently. Here you only need to remember that '..' stands for the parent directory
of the current directory '.'. So if you are at /usr/khanz, you can say
% ls -1../kumarr/nlp/augcfg.C
The '..' takes you one level up, that is, to /usr. From there you continue naming the file as before. Of course you could
have used the following rather convoluted way
% /s -1.././usr/kumarr/nlp/augcfg.C
This is inefficient because you implicitly move to root before naming the file. The first'..' takes you to /usr and the
second '..'takes you one level higher, to '/' or root itself. Then you begin your descent until You reach the file you desire.
Here it would have been better to use an absolute pathname instead of this, for then You would not have.had to use
two steps to reach root.
Usually a file narne is specified by the method that results in the shortest possible specification of the name. This
depends on whether the filename is closer to you or to the root.directory. T'hus if you are located in /usrlkhanz and you
want to specify a file in the directory /usrlkumarr, it is easier to say ..ikumarr rather than Jusr/kumarr.
There is a third way of looking at the size of augcfg.C. For this you will have to learn a new command cd, which let you
change your current directory. This command can be given an argument which is your intended destination and it then
changes your directory to what you asked, provided you have the appropriate perrnissions. And how do you specify your
desired destination? By specifying die pathname, of course. The pathname can be specified, as you would have
undoubtedly guessed, either as a relative pathnarne or a complete one. So you can say from /usrlkhanz
% cd /usrikumarf/nip
or
% cd../kumarr/nlp
and then look at the size by
% ls -1 augcfg.c
This really amounts to specifying the filename relative to lusr/kumarr/nlp, the current directory. In general when you
specify a bare filename you are specifying the filename relative to the current working directory. So the command above
is really a shorter way of saying.
% ls -1 /augcfg.C
One form of the cd command can be very convenient if you have wandered far off your home directory and you want to
return there, especially if your home directory happens to be far away from the root directory. This is
% cd

14
without any arguments. it always brings you back to your home d irectory irrespective of where you are, even if you
were there to start with.
Some UNIX Directories and Files
It will be useful and interesting to get acquainted with the UNIX system directory structure. We will now look at the
layout and contents of the UNIX system directories and understand how the various system files 'are grouped under
directories. We will also learn about the functions of some of the system files. Ale UNIX directory structure is typically as
shown in the earlier figure.
We again emphasise that only some of the system directories are shown here. Your machine could have a somewhat
different organisation. How will you find out the directory tree for your UNIX system? You can now explore the files on
your machine.
The directory /bin contains, as you have already seen, the executables of UNIX system commands. These include the
commands you have learnt so far, like ls, cd, pwd and who. You can look at the long listing of this directory and note the
information provided. Look at the sizes to get an idea of the sizes of executable files on your machine. These will
depend, among other things, on the architecture of your computer.
The /dev directory contains device special files concerned with hardware devices like printers, terminals and hard disks.
You will learn more about these files and the /dev directory later in this unit
The /etc directory, as the name suggests, has several miscellaneous files and directories. It contains many commands
which are reserved for the use of the system administrator. Ordinary users cannot execute many of these commands.
Apart from this, the /etc directory also contains some text files.
Let us take a quick look at some of these text files. /etc/issue contains the message before you login. /etc/motd has the
text of the message you see just after you login. /etc/group has the names and group numbers of all the groups in the
installation. /etc/passwd contains the login name of each user, his user identification number, his encrypted password,
his home directory, the default shell when he logs in and other information about him. In some cases the password is
stored in another file called /etc/shadow.
/lib contains system libraries used with your 'C' compiler. /tmp is used to store temporary files. Some UNIX commands
need work space in order to execute, this is where they create their temporary files. This directory is cleared out
periodically on many installations. In any case any files you put here can be erased without warning. So do not try to
store anything here on a permanent basis. Keep files important to you under your home directory only.
/usr/bin, as you have already found, holds UNIX system commands which are more of utilities, although there is no clear
distinction between commands in /bin and those located here.
/usr/include contains header files used in writing C programs. /usr/games holds games distributed with UNIX. This might
not be present on some installations.
/usr/local/bin is often present as a repository of local commands, often developed by local talent. These are commands
of interest to and found convenient in that installation.
While looking at the /usr/include directory, you must have noticed that all files have names ending in '.h' and similarly
you will find many files with names ending in '.a' in nib and /usr/lib. Although we said earlier that UNIX places no
restrictions on the characters you can use to construct a filename, there are some conventions followed in a few cases.
Usually filenames ending in '.xyz' are referred to as '.xyz' files or even as xyz files. Such conventions are not enforced by
UNIX, although in many cases standard UNIX utilities might do so.
Thus h files are C or C++ program header files, C program files end in '.c' (enforced by cc), C++ program files end in '.C',
lex source files end in '.1', yacc source files end in '.y', assembler source files in '.s', object code files end in '.o', library
archive files in .a', SCCS (Source Code Control System - to be discussed in unit 4) files start with 's.', 'p.' and so on.
The file command is useful in determining of what type a given file is. This command takes any number of files as its
arguments and tries to determine the type of each. Although it is not hundred per cent reliable and is open to deceit,
the command usually does a good job.
FILE PERMISSION
It is now time for us to explore what file permissions are and what effect they have on its accessibility. You can see the
file permissions for a file by saying
% ls -1
drwxr-xr-x 2 kumarr users 32 June 19 23:04 crypt
drwxr-xr-x 2 kumarr users 32 June 19 23:04 learn
drwxr-xr-x 2 kumarr users 32 June 19 23:04 nlp
As you already know, the first column of the first field has a 'd' in it if the file is a directory. So we can see that crypt,
learn and nlp are all directories. Then you see 9 columns which specify the file permissions. The user community in UNIX
is divided into three categories- owner, group and others. The owner is the person who first creates the file. Several
users at an installation can be made part of a user group. Such a facility is useful in keeping people working on the same
15
project categorised together, and UNIX was first conceived of as an operating system which would allow groups of
programmers to work together and share information. All group members form the second category of users. Finally the
rest of the community is lumped under others, which are users who are neither the owner nor part of the same group.
In the example shown, the owner of the directories is kumarr and he belongs to the group users. Other users like khanz
might also belong to this group.
Having understood this categorisation of UNIX users, you can now begin to study the permission modes. Every file has
three possible modes of access-- read, write and execute, represented in the directory listing by r, w and x respectively.
A file to which you have read access can be read by you, which means you can look at its contents by using any method
like the cat command. If you have write access to a file, you can alter its contents. Execute permission is relevant in the
case of executable files like those that we have seen in /bin, or for shell scripts, which we will study later. You can run or
execute a file only if you have execute permission on it. Execute permission does not mean anything in the case of text
files, nor actually for any file that is not an executable, except for directories where this permission has another
meaning.
Now you know enough to understand the permission information. 'Me nine columns are divided into three parts-
owner, group and others - of three columns each. The permissions are specified in the order read, write and execute. If a
permission is available, the corresponding letter is shown, while the absence of a permission is indicated by a hyphen. So
rwx means the category concerned can read the file, can write to it or alter its contents and can execute, the program
which the file contains r-x means the file can be read or executed but not altered or written to, because write
permission is absent. r- means the file can only be read, and -x means it can only be executed. - - means there are no
permissions available and such a file cannot be accessed at all. However you can still see an ordinary file like this listed in
a directory listing taken in the usual way. So you see that the permission columns in the listing shown above mean that
the owner has read, write and execute permission, whereas oilier members of his group have read and execute
permission but no write permission. Similarly others, that is, users who are not part of the owner's group also have read
and execute but no write permission. Thus if you want to have all the permissions on a file, while denying group
members write permission and allowing others only execute permission, the permission modes should be rwxr-x----x.
With this knowledge you should again look- at the UNIX system files and study the owners and permission bits of each.
You will find that all users have execute permission on the files containing system commands. This is obviously necessary
because otherwise you could not use those commands. If possible, ask your system administrator to remove your
execute permission on the Is command for a short while and then try to see your directory listing. If you are at a large
installation where this kind of thing might not be possible, you can either wait until you know more about UNIX to be
able to see the effects of the absence of permissions, or ask somebody to make a copy of a system command in your
directory, remove execute permissions on it and try to run it from you directory, taking care that you do not execute the
system version of that command.
So far we have talked only about ordinary files and what file permissions mean in their case. But directories are also
files, as you have seen earlier. Do the permissions all have the same meaning where directories are concerned? If so,
what is it like to execute a directory? Let us delve a bit into this and find out some answers.
First of all You must understand that since directories are special kinds of files which contain information about other
files, the permission bits have somewhat different meanings than what a hasty guess would suggest. Before we even
look at what read permission for a directory means, make sure you have done the exercise (1) of section 2.3.5, which
asked you to cat a directory and report what you saw.
If you tried the cat command on a directory which has some files you will see a long line of words, at least some of which
you should be able to identify as names of files in the directory. So you see that the contents of a directory are the
names of files under it.
Now it is easy to deduce what read permission on a directory could mean. If you can read a directory you can cat it and
it follows that you can do an Is on the directory. In the absence of read Permission you will not be able to look at the
directory listing.
What about write Permission? If you have write permission on a file you can change its contents. In the same way having
write Permission on a directory allows you to change its contents. What does that mean? Creating, renaming or
removing files would mean altering the contents of that directory. So it follows that having write permission on a
directory enables you to create, rename or delete files in that directory
Beginners often find it a bit difficult to grasp this point though it is not really hard to understand. You should remember
that having write permission on an ordinary file allows you to change the contents of the file but does not allow you to
delete or rename the file. That is possible only. if you have write permission on the directory containing the file. Later
we will see how this fact could create a security lapse in some situations.
Lastly, coming to execute permission you would agree that there is no way you could execute a directory corresponding
to the normal sense that the operation refers to for an ordinary file. For a directory this permission bit determines
whether you can cd to the directory or can copy files from that directory (this only if you have read permission on the
directory as well).
16
This permission is often called search permission. To be able to cd to any directory you must have search permission on
every component of the absolute pathname of the directory. If search permission is absent on any component, all files
and directories on that component and below it become inaccessible.
Apart from these permissions there are some other permission modes which You will come across. We will take these
up in the unit on system administration. For now it will be sufficient to know that some other permission modes exist so
that you do not get taken aback if you find characters other than r, w or x in- the permission modes of a directory listing.
BASIC OPERATIONS ON FILES
We have covered a fair amount of ground in UNIX but we still do not know how to create or remove files. We will look at
some basic file operations in this section. You will then at last be able to have some files of your own in your directory
instead of having to rely on system directories to be able to see a non- blank directory listing.
Copying Files
Until you learn about text editors in a later unit in this block, you will have to make files of your own by making a copy of
an existing file or by running a program wh ich creates a file. Let us now study how to make a copy of a file. Make a
directory learn in your home directory and cd to it. Now say
% cp/ etc/passwd passwords
and look at your directory listing. You will find that you are the proud owner of a file called passwords, which gets
created in the current directory. This file will be an exact copy of the original file /etc/passwd. You can now look at the
contents of both files and generally verify that this is probably the case, although you will not be able to easily do an
exact match visually and you do not yet know of utilities which can do it for you. For the time being you can try copying
short files to verify that it seems to be so.
The cp command in this form takes two arguments of which the first is the source, or the file of which a copy is to be
made. The second argument is the destination, or the name of the file which is to hold the copy. The names of the files
can be specified as absolute or relative pathnames and the current directory is the default, which is why in the example
above the new file got created in the current directory.
Here we copied the file into our own directory with a different name, but we could have also given a command like
% cp /etc/group group
This will create a copy of /etc/group in the current directory and the copy will also be called group. Another way of doing
this is to omit the name of the destination file and specify only the destination directory. In such a case the copy is given
the same name as the original. Thus
% cp /etc/motd .
or
% cp /etc/termcap ~
will create files called motd and termcap in the current and home directories respectively.
You can copy several files from several other directories into a single directory by enumerating all the source files and
giving the destination directory name at the end. When you use this method the names of the files are preserved, that
is, the copies have the same names as the originals. Thus
% cp /etc/issue /etc/rc /usr/include/stdio.h.
will create files called issue, rc and stdio.h in the current directory.
So far we have looked at cases where there was no file with the same name as the target already present. What would
happen if a file with the same name as the target did exist? In keeping with the general philosophy of UNIX, which is to
silently obey whatever commands the user gives (thereby respecting the intelligence of the user and giving him credit
for being responsible enough to know what is good for him), the target will be unceremoniously overwritten by a copy
of the source. So before issuing the cp command you must ensure that this is what you want, otherwise you will have
lost the original contents of the target file.
Let us now see some reasons which can cause cp to run into a problem. One is if you try to create a file in a directory
where you do not have write permission, or when you try to overwrite an existing file on which you lack write
permission. cp will then complain that it cannot create the target. Another problem could be the absence of read
permission on the source, or running out of disk space while copying.
The cp command preserves the permission modes of the original but changes the owner and group to that of the user
making the copy. Also there is regrettably no straightforward way of recursively copying files from subdirectories under
a directory.
Deleting Files
By now you have acquired a sprinkling of files in your work area, and it is time you learnt-to delete files you no longer
require. This can be done using the rm command
% rm termc-2n
If you look at your, diectory listing You can see that the file termcap is now gone.
The rm command takes any number Of arguments and deletes all files specified in them. The files can, as usual, be
specified using absolute or relative pathnames. However the command will refuse to touch directories. So if you say
17
% cd; rm learn
rm: learn directory
UNIX informs You that learn is a directory and leaves it intact. If You do not have write Permission in a directory you
cannot remove any files from it and the command will fail. But if only the file is write protected rm tells you the
permission modes of the file and asks you to confirm whether you really want to delete it.
% rm Protected
rm: Protected 444 mode? (y/n)
Now if you type any response starting with a y (for yes) the file will be deleted after you Press the RETURN key.-Any
Other response will leave the file as it is.
A related option to rm is the -i option which tells rm to Print the name of the file and wait for Your Confirmation before
deleting it. But now the mode of the file is not shown. The converse of this Option is the -f option (for force) which tells
rm to delete the file silently without asking any questions, irrespective of the Permissions. This Of course does not mean
that you can delete files without having the appropriate access rights, only that files you can delete will be deleted
without any further reference to you.
At least in the beginning while you are new to UNIX it is wise to use the -i (interactive) option to rm, lest you
inadvertently lose important data because of a command wrongly given. Remember that in UNIX a file once deleted
cannot be recovered- it is gone for ever.
Now let us look at a very powerful form of the rm command. The - r (recursive) option deletes all files in the directory
specified and recursively in all subdirectories of the directory, including the subdirectories themselves. In Other words it
cleans everything from under that directory including the directory itself. Thus it is far More powerful than rmdir, which
refuses to destroy non-empty directories. In one command you can destroy the entire directory tree using the r- option
% rm -r ~
will wipe out everything under your home directory. At this point you could afford it because you had not laboured
much to create whatever files you had. But imagine a situation on where you wipe out months of effort at one stroke, So
be very careful while using this option. The most sweeping form of the rm Command is
% rm -r /
which will wipe out clean out everything on the disk except any unmounted file systems. Since usually all partitions are
mounted at boot time, you will end up wiping out he entire installation including the rm command and the UNIX
operating system itself (see the unit on system administration for an explanation of the mount command). Even if you
have a complete backup, it will take a lot of effort to restore the installation to its original state. Therefore you must be
very careful indeed when using rm - r. We see no situation where you might need to use rm -r /. Removing everything on
the disk can be done more thoroughly other means.
You are now encouraged to try out rm r / on your installation as an ordinary user! Why do we contradict ourselves in the
space of two sentences, you might wonder. The point is that rm - r can and will delete only those files which the
permission modes allow. While the superuser can delete any file irrespective of permissions, an ordinary user will not be
able to cause any damage to an installation which has been carried out properly. If you are able to cause any problems
to others or to any other part of the installation it is a signal to the system administrator that all is not right with security
at the installation. But do not do this as superuser unless you know you can reinstall UNIX on you computer!
Links between Files
As we have said before when ~g about the Is command it is possible for a file to have more than one name. If there is a
file in your home directory called motd, you can give it another name like this
.% In motd headline
Now if you look at the contents of the files you will see that they are the same. You can create a link between the two
file names by using this command. If you wish you can create more links subject to a system imposed limit. Of course
one rarely has occasion to have more than a few links to a file. You can link a file to an existing filename too
% In -f password headline
Now if you look at headline it will have the same contents as password rather than motd as was the case till now.
Actually headline and password are now two different names for the same file. The name headline has got detached
from motd and has got attached or linked to password.
You cannot link files across two different file systems. This is because the super block, which stores file system
information, is different in the two. However later versions of UNIX permit what are called symbolic links between files
in different file systems, but we will not look at them in this block. The links we have seen here are also called hard links
as opposed to the symbolic links mentioned. Let us now create a few more links to headline
% In headline sensation newsworthy
Study the directory listing now.
% Is -1

18
-rwxr-r- 4 kumarr users 1690 June 27 03:09 Headline
-rwxr-r- 4 kumarr users 1690 June 27 03:09 Newsworthy
-rwxr-r- 4 kumarr users 1690 June 27 03:09 Passwd
-rwxr-r- 4 kumarr users 1690 June 27 03:09 Sensation
Although these look like four different files, the second field in the listing tells you that each of them has 4-links. This
does not however mean that they are linked to one another. For the moment you can assume this because we just
created the links ourselves.
What is the difference between creating links and making copies of files? When you create a copy of a file, there is
physical copying of the data in the source file to the target. So some disk space will be used up in storing the Copy. Also,
if you now alter die source, the target is not affected, and vice versa because they are two different, independent flies.
But if two files are linked there is only one physical copy of the data on the disk. Altering one file automatically changes
the other one because there is actually only the one file although it has two different names. Extra disk space is not
required to store the linked file.
So if you now delete the files above one by one and keep observing the directory listing you will find that the number of
links keeps reducing by one in all the remaining names. When the last link to a file is removed the file gets deleted
because there is no name by which it can be accessed, and the disk blocks it occupied eventually get reallocated to other
files.
Can you now look at the listing of cp, mv and rm in the /bin directory? You will find that each of these executable files
has three links. Actually the code which performs all these three functions is the same. Depending on the name the
executable was invoked with. the code performs somewhat different tasks. You have only to recall that mv has to do
both a cp and an rm if the move is across different file systems, and the close relation between these three commands
becomes at once apparent.
CHANGING PERMISSION MODES
We will now see how to change the permission modes of files and directories. So far we have had occasion to look at
various commands, many of which have to do with files of various types. We have seen that the permissions on the files
make a great difference to the actions of these commands. For example the cat command will usually type a file on the
terminal but will refuse to do so if the user does not have permission to read the file. So we can keep our files protected
from ordinary users if wish or need to do so by changing the permissions on it appropriately. The command to do this is
chmod. The permissions on a file can be changed only by the owner of the file or by the super user. The owner, as you
have already seen, is the user who created the file. This creation can be by any means, for example, by using a text
editor, through a user written program or by simply copying an existing file. Note that having permissions on a file does
not amount to owning it. If you can read somebody else's file, it does not mean that you can prevent others from doing
the same, but you can establish such protection for a file of your own. Also do not get confused between the original file
and a copy you might have created, having only read access to the source file. You can change your copy in any fashion
you wish, but you will not be able to alter the original.
There are two forms in which you can use the chmod command. Let us look at the absolute method first, as It is slightly
easier to understand and use. In this the permission mode desired for the files is given to the command in an octal
notation which we will explain shortly. The mode of the file then gets changed to what was asked irrespective of the
permissions before the command was run. The form of the command is thus
% chmod mode filename
where filename is a list of one or more files whose permission modes are to be set to mode. If the permission bits are
mode to start with there is no effect on the file or files after running the chmod command.
You know that file permissions are specified by 9 columns, for example rwxr-xr-x or rw-r---r-. In the absolute method the
presence of a permission is indicated by a 1 and its absence by a 0. The resulting 9 bit binary number is then converted
to octal. This octal number is what has to be specified as mode for the chmod command.
You know that a binary number can be easily converted to octal by making groups of 3 bits starting from the right. Now
convert each group into octal as if it were a single number. The resulting string of octal digits is the number in octal. Thus
rwxr-x-x
can be written in binary as111101001 after replacing each permission by a land each hyphen by a 0. This binary number
can be written as111 101 001 after grouping the bits in threes. The octal form of the number is thus 75 1. So to convert a
file to this mode say
% chmod 755 progfile
This will give progfile the pennissions rwxr-x-x. Likewise rw- r--r- in octal is 644. So you can provide these permissions to
your file motd by saying
% chmod 644 motd
Instead of one file you can set the permissions on several files at the same time (all to the same value) by listing the files
19
after the mode. So
% chmod 600 motd passwd
will make their permissions rw-----. Thus you can read these files or change them whereas nobody else (except root) can
even read them. So
% chmod 0 passwd
will mean nobody has any permissions on the file passwd and even you will not be able to read a file of your own with
such a set of permissions. However you can change the permissions anytime since you own your file. Also the super user
can change the permissions of any file.
Let us now look at the symbolic method of telling chmod the mode. Here the permission types are, as always, r, w and x.
In addition there is a set of characters which specify the target of the actions. The targets can be u (users), g (group), 0
(others) or a (all of these). The actions are + to add a permission, = to set it absolutely and - to remove a permission. So
you can say (there must be no spaces in the mode argument)
% chmod a+x progfile
to allow everybody to run progfile, irrespective of the earlier execute permissions on it However this is different from
saying
% chmod 1 1 1 progfile
because this would remove read or write permission for everybody, whereas in the earlier case, those permissions
would have been left untouched. If the owner had read, write and execute permission, he would retain it. If the group
earlier had read and execute permission it would continue to have that privilege. If others had no permission they would
acquire execute permission. One can remove read and write permissions for others by saying
% chmod o-rw progfile
one can specify absolute permissions by saying
% chmod u=rwx,g=rxo=x progfile
Here the different target categories are given different permissions on progfile by separating them with commas. No
spaces should be present in the argument, otherwise only the first part will be taken as the desired mode. The portion
after the space will be treated as a file name which has to be assigned those permissions.
Most people find the numerical way easier to use. However if one does not want to alter some permission bits, then
there is no straightforward alternative to using the symbolic mode. For example, if you want to deny others any
permissions on a file but do not want to alter your own or your group's permissions, you can say
% chmod o-rwx progrile
But to achieve the same result using the absolute method you would have to first determine the existing permissions.
Suppose the value is 644. You will now have to say
% chmod 640 progfile
If the initial permissions were 666, you need to say
% chmod 660 progfile
instead. If using the symbolic method you would not need to worry. 'Me command you need to give remains unchanged
since the o action leaves the u and g permissions intact.
STANDARD FILES
We have seen quite a few UNIX commands by now, and you must have observed that many commands produce or can
produce output on the, terminal screen. Likewise many commands can take input from the keyboard. Actually these
commands have been written to accept input from a standard input file and to produce output in a standard output file.
Usually these standard files are set to the keyboard and the terminal screen respectively. Let us look at this in somewhat
more detail by studying some examples.
Standard Output
If you make a list of commands you have leamt so far you will find that many of them produce some output For instance
let us say
% cal
which Prints the calendar for the current month and year on the screen. In practice there are very few commands
designed to produce output on the screen specifically. The programs are written to Produce output on what is called the
standard Output, and UNIX sets the standard Output to be the screen by default. That is how the output happens to
appear on the terminal.
The shell, which interprets all your commands and passes them onto the UNIX kernel for execution, has a facility to alter
the standard output. In other words, you can define a file, rather than the screen, to be your standard output. (The
terminal screen is also a files as far as UNIX is concerned, but we will look at this only in the unit on system
administration.) To do this you need to say
% cal calfile
There can be zero or more Spaces before and after the sign. This sign indicates that the standard output of the
command preceding it should go to the file specified to its right rather than to the terminal screen. This is called
20
redirecting the standard output. In the present case the calendar for the current month will be placed in the file calfile.
You can verify this by
% cat calfile
although You could have redirected this Output as well
% cat calfile catfile
Is that not away Of copying calfile to catfile? Note that the file to which output gets redirected gets overwritten if it
already exists, although it is possible to arrange matters such that a command which would have overwritten an existing
file doesnot get executed. When the shell sees the sign it first creates an empty (zero byte) file with the name given on
the right of the sign. If such a file exists it gets truncated to zero bytes. You can verify this easily by
% Is- I Isfile
If you now examine Isfile by using the cat command, you will see a 0 byte file called Isfile shown as one of the files in the
current directory. This is because a zero byte Isfile was created before the Is- I was run
% cal 06 1994 calfile
Now calfile will contain the calendars for the current month as well as June 1994. Compare this with
% cal 06 1994 calfile
which leaves only the calendar for June 1994 in calfile.
Thus the sign is safer to use because it never destroys any data, but this operation will keep adding to the file, and it can
sometimes be difficult to make out what part of the output was produced by your last command and which portion is
the outcome of previous redirections or was simply the original contents of the file
Standard Input
Just as many command produce output on the screen, some Commands take input from the keyboard although most
take input from files. Look at an aspect of the cat command you have not studied so far
% cat
The result of this is deafening silence. The uninitiated might wait several minutes before aborting the command,
thinking there is something wrong because the system does not appear to be doing anything at all. The truth is that cat
can take its input both from the standard input as well as from a file. However the output is always produced on the
standard output. If any filenames are specified they are used as the input but if none is mentioned the input is taken
from the standard input. There are also some commands which take input only from the standard input.
In the present case no filename has been specified and cat is waiting for input from the standard input, the keyboard
here. So if you type something cat writes it out to the standard output and the effect is that of echoing your input.
A foolish consistency is the hobgoblin Of little minds - Emerson
A foolish consistency is the hobgoblin of little minds - Emerson
(Actually if you had given the command just as shown then the above is not strictly correct. The cat command will buffer
your input and when that buffer is full it will straightaway write it out onto the standard output. So you will probably
find that you have to type several lines of text before you see it again on the screen. However if you say
% cat -u
the result will be just as described, for the -u flag calls up cat in unbuffered mode.) If you want to put an end to your
misery you can terminate your input file by saying ^d, thereby causing cat to finish and present you with your prompt.
To redirect standard input, say
% cat catfilesrc
whereupon cat will print the contents on the screen. This is just the same as
% cat catfilesrc
because cat can take its input from a file as well. So to copy this file to catfiletarget, you can say
% cat catfilesrc catfiletarget
or
% cat catfilesrc catfiletarget
Thus you can redirect both standard input and standard output in the same command. Some commands do not take
input from the standard input. In such cases redirection of the input is not possible, as with the Is, cp, mv, rm or who
commands.
Standard Error
So far we have seen the effect of redirecting the output of some commands that completed successfully. Pet us look at
this a bit more closely. For example, if there is no command like gah, say
% gah gahfile
If you do so you will find that you get a protest message from UNIX on the terminal but that gahfile is empty. Similarly
% Is -1 gah Isfile
produces a message on the terminal but nothing in Isfile. Why does the redirection fail? After all the command did
produce output.
The reason is that there is a third standard file in UNIX, called the standard error. UNIX programs and utilities are usually
21
designed to provide error messages in case there is something wrong and the program is not able to proceed as
expected. Such messages are often referred to as diagnostic output because they can help the user diagnose the reason
for failure. This kind of output is usually written to the standard error file. Usually the standard error is also connected to
the terminal by default, but like the standard input or output, the standard error can also be, redirected. To do this in
the C-shell say
% gah &gahfile
This will place both the standard output and the standard error in gahfilc. Here it will have only the error message telling
you that there is no file called gah. How do you place the standard output and standard error in different files? Well, this
is easy to do in the Bourne or Korn shells, but in the C-shell the way to achieve this is somewhat convoluted. So we will
not look at it right now. You are referred to unit 4 on shell programming for this.
Filters and Pipelines
A filter is a command which can take its input from the standard input and can produce output on the standard output.
Having the capability to read from or write to files is not a disqualification. So Is is not a filter because it does not read
from the standard input but cat is one because it can do so (although it can read from a file as well) and also writes to
the standard output.
You can think of a filter as a "device" placed between the standard input and the standard output which filters the
standard input before placing it on the standard output. In the case of cat there is no filtering action at all, but a
command like grep does perform some weeding action on its output
The standard output of a command can serve as the standard output of another. Several commands can be chained
together like this. Such an arrangement is called a pipeline. Pipelines are one of the big strengths of UNIX, because they
often enable us to group several existing commands quickly to perform a task for which there is no command directly
available.
A major design goal Of UNIX was to have an operating system which allowed easy sharing of data and programs, and
allowed people to build on the work of others instead of having to do things from scratch. The facility of pipelining helps
meet this goal because you can piece together commands written by different people to achieve your objective rather
than wasting your time on doing things which have already been done. Let us take a simple example.
Suppose you want to find out how many of the files in a directory are directories rather than ordinary files. It would have
been wonderful if there were an option to Is which did this job, but since that is not the case we will have to try
something else. One way is to look at the listing with Is -p and count lines which end in 1. Such a visual method is tedious
and prone to error especially if there are many files in the directory. So let us try to make UNIX do this for us. How about
the following?
%IS -p tmp
% grep -c'$/' tmp
We first get the listing in a temporary file tmp and then count the number of occurrences of / at the end of a line in tmp
using the grep command. The result will be available on the standard output While this method will work it has a few
disadvantages. One is that it is slow because an intermediate file has to be created. Secondly we cannot start the grep
command before the Is finishes. Also if we run many commands like this we will be left with temporary files which we
will have to meticulously delete lest they clutter up our directory listing and otherwise waste disk space. So we can use a
pipeline like this
% IS -p 1 grep -c '$/' tmp
The 1 symbol is the pipe character. It means that the standard output of Is -p is passed to grep. The act of connecting the
standard output of a command to the standard input of another is also referred to as piping the output of the first
command to the second. Here no temporary files need to be created or cleared up by the user as UNIX itself takes care
of the details. Also the speed improves because the subsequent commands can start as soon as some data is available to
them.
A command like Is which does not take its input from the standard input can only be the first command in a pipeline.
Similarly a command which does not write to the standard output can only be the last command in a pipeline. Also it is
the user's responsibility to see that each command receives input in a form which it can meaningfully transform,
otherwise the results will be gibberish. Thus do not pass data files other than text files to grep because grep works only
with text files, with lines delimited by the newline character.
PROCESSES
As you know, UNIX is a multi user, multi tasking operating system. This means that several users can work on a UNIX
system at the same time. Also each user can be performing more than one task, or process at a time. A process can be
loosely thought of as a running program. Every process is thus a program but only programs which ire running at a
particular instant are processes. Every instance of a running program is a process, so that a single program can be
running as several processes in different states. Processes have a life, they are created, they run and finally terminate or
die.
So far we have looked at processes running in the foreground. An example is the C-shell which you invoke on logging in.
22
It enables you to run other processes and it presents you' with the prompt every time a process terminates. However
such foreground or synchronous processing is not the only option you have.
You can run any program asynchronously, that is, in the background by terminating the command with an &. If you do so
you are presented with the prompt immediately and you can then start another process, in the foreground or again in
the background. Before that UNIX gives you a number called the process identifier, the process-id or simply the pid. This
is a number you can use to find out about the state of that process later on. Try
% Is -1 &
This gives you back the prompt but you might miss it as well as the process-id that UNIX displays because of the output
which fills your screen. So running a command in the background does not affect its output. Also if you have only a few
files in your directory this does not help you very much because the process ends very soon anyway.
So it makes Sense to run a process in he background only when you expect it to take a long time to complete and want
to do somthing else in the meantime. If you are going to get output make sure it goes to a file otherwise it will be mixed
up with the output of your background process. So you can say
% 18 -IR / dirfile &
which will give You a complete directory long listing Of Your insWla6on in dirfile. If you logout before background
Processes terminate they will all die but this can be prevented by saying nohup before the command
% nohup IS -IR /
This will send the output of the command to the default file nohup out unless you have specified some other file to take
the output
% nohup Is -IR / dirfile
But this still leaves the nohup command vulnerable to disconnection by an interrupt, to prevent which you should give it
in the background.
% nohup Is -IR / dirfile &
Now even if you logout the Process will continue until it terminates normally. Next time you login you can examine
dirfile, for your results. This is particularly useful if you are connected to your machine over a telephone line and are
vulnerable to disconnections. Usually you would be well advised to redirect your standard error as well, so that even if
your command fails for some reason you can examine the error messages it issued.
Beginners tend to go overboard with background processing because it is such a wonderful facility. While there is
nothing wrong in experimenting and learning by trying out various things, you should pay heed to some general rules
when You are in a production environment. If you have started a process in the background and the Output is going to a
file, do not repeatedly examine the file to the exclusion of other activity.
%IS -IR / dirfile &
% tail -f dirfile
If you find yourself doing this you might have perhaps have run the Is command in the foreground. Running it in the
background certainly has the advantage of being able to do something else anytime you wish, an your process is not
vulnerable to interruption by any accidental depression of the DELETE key. Secondly do not run too many background
processes because every process increase load on the system, and you must show consideration to other users. Look up
the nice command and use it to reduce the priority of lengthy processes you run in the background. Of course how any is
too many depends on the machine and the other users. You should also make sure you do not set up race conditions or
otherwise issue a series of commands whose outcome depends on the time taken for execution. For example look at
% prog Progfile &
% grep city progfile grepfile &
If prog Completes before you can issue the grep command, all is well. If prog does not poduce any output before the
grep starts, repfile will be empty. If the execution speed of prog is somewhere in between, the contents of grepfile will
depend on this speed. So this is not a sequence of commands one should put in the background one after the other.
Start the grep only after the prog terminates or put the two in a pipeline
Finding Out About Processes
If you have run a command in the background and want to know whether it has completed or not you can use the ps
command.
% PS
PID TTY TIME CMD
1149 07 0:47 csh
1159 07 0:02 Is
1149 07 0:09 Cc
1149 07 0:01 Ps
23
This means that you have three commands running at the moment. The csh is your login shell with process-id 1149 and
started from terminal number 7. The command has used up 47 seconds of computer time. There are three other
commands running and Is and cc are probably in the background. It is easy to see that ps is always one of the commands
running.
Thus one knows how many background processes one has set off, although some of the processes ps shows might have
been created by commands you ran and not by you explicitly. This simple form of the ps command gives only the first
word of your command but will show all your processes including those you might have started from some other
terminal.
ps has quite a few options but we will look at only a few more. The -1 option gives more information on each process
while the -f option gives the full command line including all arguments with which it was invoked. The -e option shows
all running processes instead of just your own.
If you run ps and find that a process you started in the background is no longer listed, it means that the process has
completed or that it aborted for some reason. Also remember that ps gives a snapshot of the state of processes at some
time, and that by the time the output is displayed on the terminal matters might have changed.
Stopping Background Processes
You know that a foreground process can be terminated by pressing the DELETE key. But this will naturally not work for a
process running in the background. To stop such a process you need to know its pid, which you might have noted while
invoking the command or can deduce by examining the listing produced by ps. Then just say
% kill 1168
and that process will be stopped. If a process has created other child processes, you need to find out their process-ids
using ps and then give them all as arguments to kill. Processes can be killed by passing them various signals with
numbers 1 to 15. If nothing is specified then 15 is the default signal number used by kill. So the command above is the
same as saying
% kill -15 1168
If you have many processes running and are desperate because you cannot find out the process-ids, you can kill them all
except your login shell by saying
% kill 0
Some commands are written such that they can catch signals and act in a predetermined manner on receiving them
rather than executing the default action of terminating the process. The login shell is such a command. So
% kill 1149
has no effect. Such processes can be killed with signal number 9, which cannot be caught or ignored. But
% kill -9 1149
will kill your login shell itself and you will be back at the login prompt.
INSPECTING FILES
Let us first look at commands that allow us to inspect files without altering them. For example, we might want to find
out how many words there are in a file, or we might want to locate places in the file which contain a particular text
expression. Before going further, we must be clear as to what a text file is. This is a file which contains only printable
characters and which is organised around lines. Although in some cases we can alter the files, these commands are
really meant to let us look at the files or to find about them
File Statistics
The wc command tells you the number of characters, words and lines in a text file.
% wc quotation
8 43 227 quotation
This means that quotation has 8 lines, 43 words and 227 characters. A word is a string of characters delimited by any
combination of one or more spaces, tabs or newlines. If you wish you can make wc operate on the standard input,
whereupon you will not find any filename displayed in the output.
% WC
No generalisation is ever wholly true, including this one.
The problem with equality is that we desire it only with our superiors.
^D
2 202 130
This also means you can use wc in a pipe, either to read from or write to. Thus
% cat quotation | we
8 43 227
or
% who | wc
9 45 333
In both cases the method used is perhaps not the most natural one. For example, to find out the number of users in a
24
system you could say
% who -q
You can do a wc on several files at a time and then you get an additional line of output giving the total figures.
If you wish, you can find only the number of lines in the input by using the -1 option, only the number of words by using
- w and only the number of characters by saying -c. These options can be combined in any order. So
% wc -cl quotation
227 8 quotation
You can see that
% WC -Iwc
is the same as we.
Searching for Patterns
We can now come to a few commands which help in locating patterns in files. One such program is grep (for global
regular expression printer). It takes one regular expression which you want it to search for, and looks for it one by one in
all of the specified fields. Whenever grep finds a line in a file that contains the pattern, it prints the line on the standard
output. if more than one file was given to grep to search. the line is preceded by the file name in. which the match was
found, followed by a colon. If only one file was to be searched then only the line is printed.
A word on regular expressions is in order. A regular expression is away of specifying a template or pattern which can
match several text strings according to certain rules. For specifying the template, some characters are used with a
certain meaning. Such characters are called metacharacters. Thus a dot (.) matches any single character. We will not go
into the details of the rules governing regular expressions here, because You must have learnt about them in Your
compiler design course. Regular expressions are used there specify languages consisting of legal sentences from an
alphabet. From such a specification you must have learnt how to construct a lexical analyser which accepts only valid
sentences, that is, sentences of the language specified bv the regular expression. In the sent context, our alphabet is the
set of printable character; and the language is the set of all the text strings that match the regular expression. You
should refer to your UNIX manual to find out the exact rules for constructing regular expressions for grep.
Since the C-shell itself attaches a special meaning to many of the metachamcters, you will need to tell the shell not to
interpret the regular expression which You are trying to pass to grep. Single quotes we the safest way of telling the shell
this. So the regular expression argument to grep should be enclosed in single quotes, although double quotes also do
work in many cases. We will examine this matter in the next unit on Shell Programming.
Unfortunately the meaning attached to metacharacters in different utilities of UNIX is not always consistent. For
example, in grep, as we just saw an arbitrary single character is matched by a period (.) while in the C-shell this is done
by the question mark (?). This is a potential source of confusion, and all the more so because a beginner can find it hard
to construct or even interpret a regular expression anyway. However, with practice this difficulty reduces somewhat.
Moreover, not all utilities support regular expressions in their fullest manifestation, and actually the degree of support
varies amongst them.
By now you will be complaining because you want to see some real examples, not endless commentary on the
command. So here we go
% grep Gupta Payfile
tells You where the suing "Gupta" occurs in the file payfile. As shown here grep is matching a text string exactly. Every
line in payfile that contains the given string anywhere will be printed. You can give more than one file as an argument.
% grep Thomas custfiie orderfile
If you want to know the line number in the file of the line on which the matches were found, say
% grep -n Australia country
To count the number of lines which matched, just say
% grep -c India prodfile
This will not print the line and only the count will be shown. You can invert the sense of a match like this
% grep -v India prodfile
This command will print lines that do not include the string India. Remember that grep looks for only one regular
expression but can look at More than one file. So do not try
% grep Ram Kumar users
grep: can't open Kumar
to look for Ram Kumar in a file users. The Command as shown will look for a string Ram in the two files called Kumar and
Users. Instead you should say
% grep "Ram Kumar. users
whereupon Ram Kumar will be searched for in the file users. There is also an option to turn off case sensitivity. So
% grep -i "Ram Kumar" users
will find any occurrence of Ram Kumar Irrespective of case. Thus this would report RAM KuMAr as a match. What if
there could be occurrences of die string in the file with an unknown number of spaces between the two words? You will
25
now need to use regular expressions.
% grep "Ram *Kumar" users
matches Ram Kumar in this case. The * metacharacter specifies a closure meaning that the preceding pattern is to be
matched 0 or more times, which is what we want here.
Grep is line oriented and patterns are not matched across line boundaries. The metacharacters ^ and $ stand for the
beginning and the end of a line respectively. So to look for an empty line, say
% grep 'A$' users
But if you are looking for blank lines, say
% grep '^[ ^I]*$' users
The [^I] is the character class consisting of spaces and tabs, and the * metacharacer is a closure which looks for 0 or
more occurrences of these.
To see whether khanz is a valid login, name, say
% grep "Akhanz" /etc/passwd
because the login name is the first field in the passwd file. You can get every line in a file with line numbering by saying
% grep -n . letter
This is like a cat on the file but with the line numbers displayed too. To find lines containing a number, say
% grep '[0-9]' table
which will find a sequence of one or more digits.
We have seen that grep cannot search for more than one regular expression at a time. There is another utility called
egrep which can handle regular expressions with alternations. We will not look at it here but you should study the
manual entry for it.
There is another utility in this family called fgrep which does not handle regular expressions. Since it handles only fixed
text strings, however, it is faster. Thus you can say
% fgrep "Ram Kumar" empfile custfile
Another advantage to this command is that you can store a list of words in a file, say search, one word per line. You can
then look for the occurrence of any of those words in a file like this
% fgrep -f search story
Usually grep is sufficient for everyday use but whenever needed you can make use of fgrep or egrep.
Comparing Files
We will now look at a group Of utilities which help us compare two files. While talking of cp in #2.4.8, we did not know
of commands which could help us ascertain whether the original and the copy indeed had the same contents. First let us
make a copy of the passwd file in our directory and then examine the two
% cp /etc/passwd ~
% cmp /etc/passwd -/passwd
The cm command takes two filenames as arguments and prints on the standard output the character offset and the line
number of the first position where the two files differ. It is useful in comparing two binary files to see whether they are
the same. It is not of much help in comparing two text files to see how they differ, because if you add or delete even one
character in one of two different characters in the two files. So you can try something like
% cmp lbinils /bin/cp
/bin/IS /bin/cp differ. char 27, line I
to see that they differ.
To look at all the differences in two files say
% cmp -1 /bin/Is /bin/cp
and you should be flooded with several thousand lines of output, each line containing the bytes offset, the character in
the first files (Is) represented in octal and the character in the second file (cp) also in octal, for every byte position where
the two files differ (almost all in this case, one would imagine) until one or both the files end.
If one file is shorter than the other but no differences are detected in the two upto the point the shorter file ends, cmp
reports end of file on the relevant file.
Now let us turn our attention back to text files. Suppose we have two text files which are sorted in ascending order. Now
try
% comm file1 file2
This produces on the standard output three columns of text. The first column contains lines that are to be found only in
the first file and not in the second. The second column likewise contains lines present only in the second file. 'Me third
column contains lines common to both files. T-his output could be all jumbled up if the files are not sorted. You can
suppress the printing of one or more columns like this
% comm -1 file 1 file2
This suppresses the printing of lines only in the first file. So
% comm -3 file 1 file2
26
will print lines only in file 1 and those only in file2 but not those that are common to both files (column 3). You can
suppress two columns as well. Thus to print only lines in file 1, say
% comm -23 file 1 file2
As you would expect
% comm -123 file 1 file2
will print nothing.
cmp and comm are simple commands and it would be easy to write a program to accomplish what they do. We will now
take a quick look at a utility which is far more complex. The diff command takes two text files as arguments and brings
out the smallest set of differences between them. It can also produce output which can be used by the text editor ed to
produce the second file from the first At the heart of diff is a complex algorithm to find the largest common
subsequences in two blocks of text Let us look a little more at how these utilities can help you. Suppose you have a file
containing the names of a few places you would like to visit, as follows
% cat places
Agra
Cochin
Delhi
Goa
Guwahati
Jhansi
Puri
Secunderabad
^D
Also let there be, another file containing the names of places a friend of yours would like to visit.
% cat moreplaces
Agra
Guwahati
Goa
Gwalior
Kochi
Madras
Udaipur
^D
Now you want to plan out an itinerary after discussion. In this discussion if you both agree about wanting to visit a place,
there is no difficulty. Otherwise you will have to decide what to do. So first you need to know whether you disagree at
all. We will assume here that both the files are sorted, and later in this unit we shall see how this can be easily done. So
to find out about the disagreements, we can say
% Cmp places morephices
Places and moreplaces differ char 6 line 2
Well, it was too much to expect complete agreement. To find out the differences you can use comm.
% comm places moreplaces
Here column 3 will tell you about the places you both agree upon. Now you only have to discuss columns 1 and 2 to
arrive at an agreement. But we have still not talked of diff. Say
% diff places moreplaces
This indicates the differences between the files in three ways a, d and c. The a stand for lines which have been added, d
for lines deleted and c for lines changed between the two files. The symbol refers to the first and to the second file. We
will not discuss the command a length but will see a few options to diff.
% diff -e places moreplaces
Produces output in a form suitable for the editor ed. You can save this output to a file and apply the change file to the
first file to produce the second. If you are wondering why one would want to do such a thing, you should wait until the
unit on programming tools, where we discuss version control. The essence of it is that instead of storing every version of
a file completely. One stores only the initial version and all the change to it. One can always recreate any version by
applying the appropriate set of changes to the initial version. There is another option to diff which ignores all but leading
white space on a line
% diff -b places moreplace
diff can handle files of a limited size only. There is a Command called bdiff which can be used for large files, but it just
uses diff after breaking up the files into manageable chunks. So difference across chunk boundaries may not come out
optimally.
Another command is sdiff, which works like diff but places the output from the two files side by side. Lines that are
27
present in one file but not in the other are shown by and. Lines that are present in both files but differ somewhat are
shown separated by a pipe (|). This command can be used to merge two files into one, keeping the common portion
intact and incorporating the differing parts of both files.
OPERATING ON FILES
We can now look at several utilities which will allow us to alter files in some way. The utilities in the previous section, in
contrast, allowed us to look at the files without manipulating their contents. However, most of these utilities are filters
and can write out the changed file only to the standard output, hence it can be redirected to another disk file. Very few
commands allow you to change a file inplace.
Printing Files
If there is a long file and you cat it to the screen, the output is difficult to understand because there are no page breaks,
headers and the like. If you redirect the output to a printer, the resulting file is a long stream of lines without regard to
the page length of your stationery. To get a formatted output, you can use the pr command.
% pr places
pr breaks up the file into pages with a header, text and footer area. The header contains a line giving the date, the name
of the file and the page number. The length of the page can be altered by the -1 option and the header can be set by the
-h option. Thus if you want to print itinerary as the heading instead of the filename places, say
% pr -h Itinerary places
The header and footer can be suppressed by the -t option. You can expand tabs to any desired number of spaces by
using the -e option followed by the number. Thus to expand tabs to 4 spaces instead of the default of 8, say
% pr -e4 places
You can give a left margin to your output by using the -o option followed by the number of characters you want to use
for the margin. Thus to have a 5 character margin, say
% pr -o5 places
You can also set up double space printing by using the -d option. If you want to print in more than one column, just use
the -n option where n is the number of columns you want. So to print in two column format, use
% pr -2 places
The column separator is a tab by default but can be changed by the -s option to whatever single character you want. You
only have to put your desired separator after the -s. The width of the output can be changed by using the -w option. For
example, if you are using 132 column stationery, you can say
% pr -4w132 places
which will print the file in 4 column format with the width being 132 characters. If you want to merge several files, you
can use the -m option. Thus
% pr -2m places moreplaces
will print the two files, one per column. You can use the -p option to pause after every page if the output is to a
terminal. Thus it could be some sort of a substitute for more or pg, although pr will not provide the several other
features that more has (pattern matching, for instance).The output from pr is usually redirected to a printer to produce
a hard copy. It is rarely useful to just look at a formatted file on the terminal.
Rearranging Files
There are two commands which will enable you to obtain a vertical section of a text file. This is like implementing the
projection of a database relation. Let us say that we have a file studfile containing the names of students and the marks
they have obtained in some examination.
From this we want to create a file containing only the names. The cut command is well suited to perform such a task. Let
us look at a small portion of the file
Ajay Sapra 87
Pappu Ahmed 85
Vinod Bhalla 91
You can see that the names extend from column 1 to 20 and the marks are in columns 21. To obtain the names alone we
can cut out those columns like this
% cut -c 1-20 studfile
Ajay Sapra
Pappu Ahmed
Vinod Bhalia
This gives us the columns 1 to 20 of the file studfile oh the standard output Similarly to get the marks alone (for some
analysis, for example) you can say
% cut -c21,22 studfile
or
28
% cut -c21-22 studfile
or
in this case, even
% cut -c21- studfile
This last command cuts out all the columns starting from column number 21.
Remember that cut does not affect the original file in any way. It does the transformation only onto the standard output
which can be redirected as always, if You want it in a disk file. Now suppose you want the surnames of all the students in
surfile. Can you do this with what you already know?
You will find that you cannot achieve the desired result because the first 20 columns, which contain the name, are
actually Only one fixed length field (name) of studfile as currently organised. The first name and the surname take up an
arbitrary number of columns out of these 20. In other words, the first name and the surname are not of fixed length. So
there are no parameters you can give to the -c option of cut which will be correct for all records in this file.
In such a case You must tell cut to work with variable length fields rather than column positions. So say
% cut -fl,2 studfile
to try and get the names alone. You might be a trifle surprised at the result because there will be no effect. If so, it was
because you expected that the field separator would be a space. But actually cut expects the fields to be separated by
tabs by default. To tell it to consider a space (or any other character) as the separator, use the -d flag before specifying
the field numbers
% cut -d" " -f2 studfile surfile; cat surfile
Sapra
Ahmed
Bhalla
You can. create another file containing only the first names
% cut -d" " -f1 studfile firfile
and we might put the marks into a file as well
% cut -d" " -f3- studfile marksfile
Since every space is now considered to delimit a field, we have to cut out every field from the 3rd field onwards. That is
why You will find it necessary to give the hyphen after f3. We. have now separated studfile into three files, each
containing one of the fields of the file. Let us now see how we can put the fields back in a different order.
Suppose we want the marks list but with the names given as surname followed by a comma and the first name. and
followed by the marks secured. We have all the components available with us in the three files we just created. To put
them back we can use the paste command like this
% paste -d", " surfile fiffile marksfile
Sapra, Ajay 87
Ahmed, Pappu 85
Bhalla, Vinod 91
What does this command do? It writes lines to the standard output and constructs each line by concatenating lines from
the files specified with the field separator for that field. Thus the first line of the output consists of the first line of the
files in the order they are specified on the command line, with the first delimiter being used after the first field, the
second after the second, and so on. If only one delimiter is given, it is used to delimit all fields. The default delimiter is a
tab character.
We could have achieved this result using only two intermediate files because cut and paste are both filters.
% cut -d" " -f2 studfile 1 paste -d". " - firfde marksfile
Whenever a command accepts multiple filenames, one can use - to specify that the standard input be used at that point
So we could also have achieved our result by using only two intermediate files like this
% cut -d" " -f 1 studfile 1 paste -d", " surfile - marksfile
Sorting Files
While cut and paste allow you to rearrange a file vertically, it is very common to want to rearrange a file horizontally,
that is, to sort it in some order. UNIX has an elaborate sort command which allows you to sort files in various ways with
a variety of options. Here we will look at some of the features of the sort command. Consider a file empfile containing
the first name, the surname, the date of joining the company, the employee number and the basic salary
Ram Gupta 24/03/84 2038 15200.00
Harish Gupta 18/10/89 5496 4300.00
Thomas Robinson 04107/87 3562 4800.00
Gopal Das 28/02191 8764 4400.00
Anil Jain 13/09/85 2867 6500.00
The UNIX sort is based on fields of variable length and the field delimiter can be specified. The default is the space
29
character. Let us see the result of sorting empfile
% sort empfile
Anil Jain 13/09/85 2867 6500.00
Gopal Das 28/02191 8764 4400.00
Harish Gupta 18/10/89 5496 4300.00
Ram Gupta 24/03/84 2038 15200.00
Thomas Robinson 0.4/07/87 3562 4800.00
As you can see the result is written to the standard output. Sort can read from the standard input and is thus a fitter.
The default mode of sorting is in the collating sequence of the machine, ASCII for example, and in ascending order
starting from the first character of the line.
To sort on the surname, which is the second field, you can say
% sort+ 1 empfile
Gopal Das 28/02/91 8764 4400.00
Ram Gupta 24/03/84 2038 15200.'k)0
Harish Gupta 18110/89 5496 4300.00
Anil Jain 13/09/85 2867 6500.00
Thomas Robinson 04/07/87 3562 4800.00
So the +1 means that sorting starts at the second field. To sort on multiple field ranges, you can give the field number to
stop sorting at
%sort + 1- 2 empfile
Gopal Das 28102191 8764 4400.00
Harish Gupta 18/10189 5496 4300.00
Ram Gupta 24103184 2038 15200.00
Anil Jain 13/09185 2867 6500.00
Thomas Rqbinson 04107187 3562 4800.00
Let us now try
% sort +4 empfile
Ram Gupta 24103184 2038 15200.00
Harish Gupta 18/10189 5496 4300.00
Gopal Das 28102191 8764 4400.00
Thomas Robinson 04/07187 3562 4800.00
Anil Jain 13/09/85 2867 6500.00
How has Ram Gupta with the highest basic salary of Rs 15,200.00 come at the beginning of the list? This is because sort
sorts from left to right in the ASCII collating sequence and 1 is smaller than any other digit in this case. So the field
starting with 1 appears at the beginning. In other words sort looks at the dictionary order rather than the numeric value
of the field. To make sort use the numeric order for numeric fields, say
% sort -n +4 empfile
whereupon the record for Ram Gupta will appear at the end.
Let us now see how to sort on portions of fields. A practical example is when you have the dates given in dd/mm/yy
form as above and you want to sort in the ascending order of the date. If the dates were in yy/mm/dd order there would
have been no problem. So we need to sort on the 7th and 8th characters of the third field followed by the 4th and 5th
characters and the 1st and 2nd characters. Note that including a constant "f' character in between will not make a
difference, but to illustrate the syntax we will exclude this character.
% sort +2.6 -2.9 +2.3 -2.6 +2.0 -2.3 empfile
Ram Gupta 24103/84 2038 15200.00
Anil Jain 13/09/85 2867 6500.00
Thomas Robinson 04/07187 3562 4800.00
Harish Gupta 18/10/89 5496 4300.00
Gopal Das 28102191 8764 4400.00
The field delimiter can be any character other than the default space, in which case it has to be specified with the -t
option
% sort -t"|" +2 -3 +0 -1 testfile
will sort on the 3rd and 1st fields of testfile considering the "I" character to be the field delimiter.
If there is more than one record with the same value, you can get unique records by using the -u option, and duplicate
records will not be repeated in the output.
You can specify an output file where the output is to be written or you can redirect the output if you want.
% sort +2.6 -2.9 +2.3 -2.6 +2.0 -2.3 empfile -o emp.out
writes the result to the file "emp.out'. The sort command is one of the few utilities which can work inplace. So the
30
output file can be the same as the input, but do not try this using redirection unless the environment variable noclobber
is set!
% sort +2.6 -2.9 +2.3 -2.6 +2.0 -2.3 empfile -o empfile
Now empfile will have changed after the command completes. Ibis method is particularly useful when you are sorting a
large file and do not have space to keep both the unsorted and sorted files on the disk. But remember that sort uses
temporary space in the directory /usr/tmp, so there must be enough space there or your sort will abort, though your
source file will not be overwritten unless the sort has completed successfully.
The sort command is not limited to sorting one file. You can sort, several files in the same manner simultaneously by
giving the names of all the files on the command line, but remember that the output will then be all in one file. You can
check if a file is sorted in a particular manner by giving the sort command with the -c (check) option. You can sort in
reverse order by using the -r option.
To merge two or more files that are already sorted, use sort with the -m (merge) option. This is of course much faster
than sorting the files from scratch. Incidentally the UNIX sort is not a very efficient one.
Splitting Files
Sometimes one wants to split files into pieces. We will take a practical example later, and first see how we can do the
job. Suppose there is a large file called "stores" consisting of the stores transaction data of a large organisation. If the file
is a large one, say with 324532 records, you might at times want to split it For this you can say
% split stores
and the file will be split into 1000 line pieces. Each piece will be stored in a file. The last piece will have whatever is left
after the penultimate piece has been created, which in this case will mean that the last piece has 532 lines.
What are the pieces called, you might ask. The files are by default named xaa, xab, xac, ...I xaz, xba, xbb and so on upto
xzz. So you cannot split a file into more than 676 pieces using the split command.
If you want to, you can specify a prefix different from x to name each of the portions produced by indicating it on the
command line. Thus if you want to call the pieces partaa, partab and so on, you can say
% split stores part
You can also change the number of lines that are put into each piece by giving this number on the command line. So
% split -10000 stores part
will split the file into 10000 line pieces instead of the default of 1000. Note that the split is done based on lines in each
piece rather than on the size in bytes of each piece. Also there is no way of automatically telling split to produce a
specified number of pieces. Thus if you need to split a file into exactly 20 pieces, you will have to first determine the
number of lines in the file. Then work out a piece size which will give you the number of pieces you want (20 in this
case). It should be easy to see that there can be more than one piece size which will produce a specified number of
pieces from a file, because the size of the last piece can vary depending on what is left over. However, keep in mind that
split will also work if the number of characters in a line is not fixed, that is, when you have variable length lines.
There can be various situations where it might be necessary or desirable to split a file into several parts. Let us look at
one such situation. Imagine a data file of 100 MB in a partition on the hard disk which has 150 MB of free space left. The
file has 100000 records of 1000 bytes each, including the newline. Also assume that the partition which contains the
/usr/tmp directory has only 80 MB of space free. We want to sort our 100 MB data file. How can we do this?
You know that the sort command uses temporary work space in the /usr/tmp directory and that it needs about 1.2 times
the size of the file as work space. So to sort a 100 MB file the /usr/tmp partition must have at least 120 MB of free
space. Since this is not so in this case, we cannot sort the file directly although there is enough space to hold the sorted
file.
What you can do is to split the source file into two parts of 50 MB containing 50000 records each. Now sort each piece
separately inplace. There is enough space in /usr/tmp to sort a 50 MB file. Then merge the two pieces together using the
-m option to the sort command. This option does not need much work space. By reducing the size of the files to be
sorted you can accomplish your goal of sorting the file.
Translating Characters
There is a very useful command to translate characters in a text file. Suppose we have a file quotation
% cat quotation
Chess, like music, like love, has the power to make men happy
and we want to change all letters to capitals or upper case. We can do this easily by using the tr command
% tr '[a-z] " [A-Z]' quotation QUOTA77ON
CHESS, LIKE MUSIC, LIKE LOVE, HAS THE POWER TO MAKE MEN HAPPY
Notice that letters that are already upper case are not affected because there is no translation specified for them. The tr
command takes two arguments which specify character sets. Every character from the first set is replaced by the
corresponding character from the second set.
The command is a filter and takes input from the standard input and writes to the standard output. If you want to use
the command on disk files you will need to redirect the input and the output accordingly, as has been done in the
31
example above.
The arguments, that is, the character sets can be specified either by enumerating them or as ranges. In the example
given both the arguments have been specified as ranges. For this to be possible the characters must be in the ascending
order of the ASCII collating sequence without any gaps.
To implement Caesar's cipher, for instance, you can use tr on the source file. In this primitive cipher, every letter of the
Roman alphabet is shifted forward by three characters. Thus a becomes d, b becomes e and z becomes c. So try ,
% tr '[a-z]'defghijklmnopqrstuvwxyzabc plaintext ciphertext
Here we have specified the first character set as a range but the same cannot be done for the second. So the second
character set has been enumerated in full. 'Me command given will not encipher upper case letters. If you want to
change these too, or also want to change digits, you can modify the command appropriately.
As usual, if a character in the command has special meaning to the shell, it needs to be escaped. Here we have used
single quotes to escape the square brackets although double, quotes could have worked as well.
What happens if the number of characters in the two character sets does not tally? Well, if there are more characters in
the second set than in the first, there is no problem because there will never be occasion to translate to them. If there
are more characters in the first set, the extra characters are ignored. Thus
% tr '10-91' '19-fl' srcfl targetfl
will change 0 to a, 1 to b, and so on, making a 5 into an f. Ale digits 6, 7, 8 and 9 will not be changed because there is no
translation specified for them.
The command has some other facilities. We can delete any set of characters from the input by using the -d option of tr
and specifying only one character set So if you want to get rid of punctuation marks like a semicolon, a colon, a dash and
a comma, you can say
% tr -d srcfl targets
The characters can also be specified by giving their octal representation after a backslash. Thus to delete all tab
characters, you can say
% tr -d '\0 1 1 ' srefl targetfl
There is also the -s Or squeeze option using which you can collapse or squeeze multiple (Consecutive) occurrences of a
character to a single occurrence of that character. Thus to replace multiple spaces by a single space, you can say
% tr -s ' ' srcfi targetfl
Finally we will look at the complement option specified by - c. This option complements or inverts the character set you
specify. While
% tr -d , [0-91 1 srcfl targetfl
will delete all digits from srcfl and write the result out into targetfl, using the -c option with this will delete everything
except digits from the file
% tr -cd '[0-9]' srcfl targctfl
leaves only digits from srcfl in targetfl.
OBJECTIVES
After going through this unit you shall be able to:
 See the full screen editor vi with ease
 Discuss various features of vi editor
 Edit text files with the line editors ex and od
 Transform text files with sed
 Motivate yourself to study awk
GENERAL CHARACTERISTICS OF Vi
Many editors operate on a copy of the file that is being edited rather than on the file directly. Sometimes these editors
save the older version of the file with a standard extension like ".bak", which can be very useful if you make a mess of
the original file. You can then rename the ".bak" file and delete the recently edited file. Then you can proceed with your
editing again on the same version of the file as you had started out earlier. In vi this is not so. Although editing occurs on
a copy of the file, this copy is in the /tmp directory and when you save the file the original file gets overwritten and the
temporary file gets deleted. Therefore once you issue the save command you cannot get back your original file unless of
course you had explicitly saved a copy of it earlier.
Like many other editors vi does the editing on a buffer in memory. The changes are written out to the file only when you
issue a command to do so. This allows one to abandon an edit session, which can be a godsend if you have made a mess.
However it also means possible trouble if there is a system crash. Usually a copy of your edit buffer is saved by UNIX if
possible. Such a buffer can be edited when the system restarts by a special option to vi. But it is safest to save your file
often if you are going on the right track.
Many editors place you in some default mode when you start them up. Usually you are in a mode where you can insert
text at the cursor position right away. With vi there is no such thing and when you start up you have to explicitly tell vi
what you want it to do.
32
Like many other editors, vi also has a maximum limit on the size of the file it can edit. If this limit is exceeded vi
complains that the buffer is full. Should this happen, you will need to split your file into manageable pieces and edit each
piece separately, but we are sure this problem will not arise in the beginning at least, unless you start learning vi on
large data files.
With this we will start on the description of vi's capabilities. It is difficult to describe an interactive command on paper
and such a utility is best learnt by doing on the terminal. It is also difficult to represent the screen on paper, but we will
try to show as much as is necessary.
Starting up and quitting from vi
Let us see how one can invoke vi. To edit a file called UNIX doc you only need to say
% vi UNIXdoc
When you give this command several things happen. The screen gets cleared, the file gets read into the edit buffer, the
first portion of the window to the buffer appears on the screen and the cursor is at the first character of the first line of
the window. Now we have said many things here, so let us get the meaning of all this straight.
First in order that vi work properly, your terminal type must be known to vi. We will not dwell on this aspect now but
assume that the appropriate action has been taken to ensure this. If you have any difficulty in using vi you can ask your
System Administrator to set up your terminal type for you.
Now if the file UNIX doc already exists it is read into the, buffer and you placed beginning of this buffer. If the file does
not exit you get a blank screen. The size of your window is usually the size of the screen less one line unless you are
using a very slow form of Communication between Your terminal and the host computer. So normally you would get a
window of 24 or 23 lines. You can change the size of your window at any time.
The bottom line of the screen shows the name of the file and its size in characters and lines. At times when one is giving
Commands to vi, this information disappears and instead one can see the command one is issuing. There are several
kinds of commands one can give vi this happens, and if you are in command mode (where one can give commands, as
opposed to being in insert mode where whatever one types gets inserted as text) you can refresh or redraw the screen
by saying ^L.
The text one enters in vi is organised in lines. As already mentioned vi is not a word processor and it works only in non-
document mode. So every line feed character has to be explicitly given to vi otherwise it will continue to add text on the
same line until it reaches its limit on the length of a line. How do such long lines looks on the screen? On the screen vi
wraps the text onto the next line if the width of the screen has been reached. You cannot directly make out whether two
lines on the screen are two physical lines or are the same line wrapped around on the screen. One can easily find out,
however. So to avoid confusion it is best to press a carriage return when one is near the end of a line. Of course if one is
writing a program and one is writing, say, along condition in an if statement, one might prefer to let it remain a single
physical line.
While starting UP vi, one can give more than one filename
% vi UNIX doc examples
or even use wildcards that the shell understands
% vi *.c *.h
In such a case the files are presented to You for editing One by one. While starting up, vi also tells you the number of
files that you have selected for editing. After you are done with the first file, you are Presented with the next unless you
choose to exit the whole operation. In that case the remaining file., are not brought up for editing. In this respect vi is
different Iron most word processors where you can select only one file at a time. Also you can start up vi without a
filename at all
% vi
will also present you with a blank screen and accept commands. When you want to save the file you can provide it with
a name. In fact you can give the buffer you are editing a name at any time, even if you are not going to save it. Why
would one want to do such a thing? One answer is security. If you do not want anybody to know what file you are
editing, you should start up vi without a name and then call up the file you want to work on. Now anybody who tries to
see what you are doing with the ps command will only know that you are working in vi but will now know which file you
have called up. On the other hand if you call up vi with a filename, that filename is visible to anybody who looks up your
processes with ps.
There are other modes in which you can call up vi. We will mention a couple of these only briefly. The first is with the -x
flag which brings up vi in crypt mode. So the file will be stored on the disk in encrypted form but you can edit it normally
if you know the key. Although crypt has its limitations and dangers it will certainly put off casual busybodies. Another
mode is the - R flag for read only mode. In this mode you can only look at the file you are editing and navigate as you
would normally. The only thing is that commands which would alter the file will have no effect. You can also invoke this
mode by.
% View UNIXdoc

33
This is equivalent to calling up vi with the read only flag. It is much more convenient than looking at the file using more
of any such command, because you can move around the file as you wish and there are safeguards against changing
anything. In this mode you can still manage to make changes to the file and save them if you have the requisite
permissions, but you will have to use the force options to do so. This makes it unlikely that you will make accidental
changes.
We had mentioned the possibility of recovery from system crashes earlier. To invoke this recover mode, say
% vi -r UNIX doc
whereupon vi will do its best to recover the file.
You can begin editing at any position other than the beginning of the file like this
% vi +24 UNIX doc
which will place you at the beginning of the 24th line of UNIX doc initially. If you want to go to the end, you can say
% vi + UNIX doc
and you will be placed at the beginning of the last line of the file you mentioned, that is, UNIX doc.
You now know how to start up vi. Although that is only the beginning and we have a host of commands to look at, the
many ways you can start the editor would have given you an idea of the kind of things to expect when we launch on a
discussion of its features. Before we get on with them, let us learn to get out of vi. It is very disconcerting to enter an
interactive utility like vi and then not know how to get out gracefully. That is why we talk of this in the beginning itself.
Most vi commands are obeyed immediately, that is, you do not have to type a RET. So commands that do require a RET
will need to be explicitly indicated as needing that key. To save a file that does not exist, say
:w
and if the file exists you will need to force the save with
:w!
In many places in vi the exclamation mark indicates forcing something. To quit immediately after a save when no more
changes have been made you can say
:q
But if you want to abandon any changes you have to force a quit with
: q!
You can also save and quit by saying
:x
or
ZZ
So let us now understand how to edit a file using vi.
Adding Text and Navigation
Let us start by taking a non-existent file and putting text in it. Call up vi.
% vi UNIX doc
As we have mentioned vi has no default mode and if you now uy to press a key you will probably get no response that
you can see. Almost every key on the keyboard is a vi command, however, and if you press a key that is not valid at that
time vi responds with a beep. Also there is usually more than one way to accomplish any task. To start entering text, say
i for insert When you do this, vi enters text insert mode, and now whatever you type will be taken not as a command but
as text to be inserted at the cursor position. You can, stop this insertion and get back to command mode by pressing the
ESC key. The same action can be done by saying ^[, but of course ESC is more convenient. Now let us look at the text
after inserting.
In such a case the files are presented to you for editing one by one. After you are done with he first file, you are
presented with the next unless you choose to exit the whole operation. In that case the remaining files are not brought
up for editing. In this respect vi is different from most word processors where you can select only one file at a rime. Also
you can start up vi without a filename at all.
This is a Paragraph of this document itself but a sentence is missing in the middle. We will soon see how to add it at the
desired position, but first we must know how to move around the edit buffer. One way is to use the arrow keys on the
numeric keypad. If your terminal is not set up Properly, this might not work, in which case you can use j for down, k for
up, 1 for right and h for left. You can prefix any of these keys with a number and then that will be the number of times
the movement occurs. For example. Sj will move you down 5 lines, 81 will move you 8 characters to the right and 3h will
move you 3 characters to the left. You can move left or right only in the line. This is how you can tell whether you have a
long line wrapping around to the next screen line. On such a line the cursor will continue to move to the next screen line
if you keep pressing 1. If the line has ended, pressing 1 will have no effect on the cursor position. The same holds for a
leftward movement. If the current line is actually the previous line wrapped around, you can move the cursor to the
previous screen line by saying h, but not if they are two different physical lines.

34
You can also use the space character to move forward. To move to the first non-blank character on the line, say ^ and a
0 takes you to the first position on the line. To move to the end of a line say $. Also RET takes you to the beginning of the
next line and a - to the beginning of the previous line.
If you are on a line and move to the next line with a j, then if this line has fewer characters than the previous cursor
position the cursor will jump leftwards to the last character of the line. If you now move one line downwards again and
this line is longer than the cursor position of the first line, the cursor comes back to the same relative position on the line
as before. The same holds when you try to move upwards in the file. At the last line of the file you cannot go down any
further and likewise you cannot go before the first line.
You can go to any line number by saying :n, where n is the line number. This can also be done by saying nG. Actually
commands that start with: are commands of the ex editor which adds greatly to the power of vi. You can move n lines
forward by saying n and n lines backward by saying n-. To go to the last line of the file, say G and of course 1G to get to
the beginning.
You can now add text to the middle of existing text. Move the cursor to where you want to insert the text and say i, or
you can move one character before the place where you want the text added and say a for append. The append
command is useful when you want to add text to the end of a line, where you cannot use i. Now you can type in the text
to complete the paragraph.
In such a case the files are presented to you for editing one by one. While starting up, vi also tells you the number of files
that you have selected for editing. After you are done with the first file, you are presented with the next unless you
choose to exit the whole operation. In that case the remaining files are not brought up for editing. In this respect vi is
different from most word processors where you can select only one file at a time. Also you can start up vi without a
filename at all
There are some other insertion commands You might find useful. An A appends text to the end of the current line
irrespective of the cursor position. So you need not give a $ and then an a to add text to the end of the line. Similarly an
1 inserts text at the beginning of the current line.
None of these insertion commands will directly let you start a new line although you could move to the end of a line and
press a followed by.
You can achieve the same effect from anywhere in the line by saying A or simply o. This opens up a blank new line after
the current line. You can open up a line before the current line by saying 0. Note that the insert mode can be terminated
by or ^[, irrespective of what command was used to enter it.
When one is at the bottom of a window and you press j or any key which causes the cursor to go down, the display
scrolls up by that many lines so that the cursor remains on the last line. Similarly when one is at the top of a window and
you press k or any key that causes the cursor to go up, the display scrolls up. You can scroll down, that is, forward half a
window by saying ^D and can go backward half a window by saying ^U. To go forward one window, say AF and to go
back one window you can say ^B. One can also say ^Y to scroll forward one line without changing the position of the
cursor on the text, which means that the cursor is still on same line of the document but the display has moved forward.
To scroll backward one line say ^E.
Any change in the window blanks out the last line which is the status line. As already mentioned, some vi commands use
this space. To get back the status line you can say ^G, whereupon vi will display the name of the file and the other
information again.
You should now practice inserting text and moving around the edit buffer to get familiar with the various navigation
commands. Instead of starting out with a blank file as we have done here, you can choose an existing file of convenient
size, so that there is enough of it to traverse around.
After you are at a stage where you no longer feel lost, try to reach a point in the most economical fashion. You will
surely find yourself using some commands more than others, but you should take care that you do not forget the
commands you use less frequently.
With this let us look at a few more commands which will help you to insert text and to move around. An insert command
we have not -looked at so far is the s command. This deletes the character at the cursor position and then adds text at
that point until you terminate insert mode with an. You must also be wondering if there is any editing possible while in
insert mode.
Actually when you are inserting text you can erase typing mistakes by using the backspace character or ^H. The
characters after the cursor remain visible but will be overwritten if you type something else. For example take a
sentence being added
He go home
where you are at the last character e. You now realise that you have made a mistake. You could either carry on and
correct your mistake later, or do so right now. To correct it now, press the backspace without leaving insert mode tn
reach the space after the o of go
He go_home

35
You can now correct your mistake by retyping all the characters from that point onwards. Note that although the
characters after the cursor are still visible you will have to retype them
He goes home
You can check that out by going back a few characters and pressing to leave insert mode. You will find that the
characters after the cursor position disappear and do not get added to the text
He goes_home
leaves you with only
He goes
If you want to add a control character to the file, especially a character which has special meaning to vi and could be
mistaken for a command, then you can do so by typing ^V followed by the character. Thus to enter the character, enter
insert mode and type ^V.
There are navigational commands which operate on words, which are defined as sequences of letters, digits and
underscores separated by any character which is not one of these. These commands work across line boundaries and
are thus useful over large blocks of text, unlike the character based navigation commands. w takes you forward one
word and b takes you back one word, both to the beginning of the word, except that b brings you to the beginning of the
current word itself if you are not already there. There is also a command e which takes you to the end of the word if you
are not already there and to the end of the next word otherwise.
These commands use a definition of word which is convenient when programming, because it corresponds to the
definition Of an identifier in most lane ages, especial] c or c++. However if you are entering text to form a document,
then in ordinary english it might be more useful to consider a word differently. For example, one might want to consider
a comma as part of the word and when one wants to go to the next word one really means the next word after the
comma. To use this definition of a word, you can use the commands W, B and E which work like their lower case
counterparts but consider a word to be a sequence of characters separated by spaces.
All these commands also take preceding counts. So 23w will move you forward 23 words Using the w command's
definition of a word.
Let us now look at a small example to illustrate these commands. Take the lines
Carrots, radishes and cabbages, I detest them all! How, and when, and why, were they deemed good for me?
You are now in the middle of a word by any token. Now Pressing a w will take you beginning Of the next word Carrots,
radishes and cabbages, I detest them all! How, and when, and why, were they deemed good for me?
Now 2w will take you to the exclamation mark but 2W will take you to the beginning of How. At this point a b takes you
to the exclamation mark but a B will take you to the beginning of all. In the initial situation, an e would take you to the
end of the current word, detest.
When you are typing C program text, you can move to the brace { or } one on which you are by saying %. This can also be
used for Parentheses {or }. Thus in
if ((rc == fopen("UNIXfile", "r")) != ( File *) NULL)
the % command brings you to the right Parenthesis Corresponding to the one on which the cursor is
if ((rc == fopen("UNIXfile" ,"r")) != ( File *) NULL)
This feature is useful, like when trying to see whether Your loops are straight.
There are also Commands (and ) which move you a sentence back or forward, ( and ) which move you a paragraph back
or forward and [[ and ]] which move you back or forward whole sections. Again these Commands can be Preceded by a
count, but we will not discuss these in any more detail. You are urged to refer the documentation for more information
on these.
Changing Text
This is a very common activity while editing because once your text in, you tend to spend a lot of time polishing it up and
this often involves changing what was written before, not always requiring entry of fresh material. So here we will look
at how one can change existing text and also how to delete text that has already been entered. Now that you have got
your feet wet in vi, it will be much easier to learn these commands because the flavour is the same.
First let us see how to delete a character. To delete a character at the current cursor position Press x and to delete the
character before the current cursor Position press X. As usual these commands can be preceded by a number to indicate
the count, so that 7x will delete 7 characters from the current cursor position. You cannot delete characters not on the
current line using these commands.
When you delete the character the rest of the text moves up to take its place. So if you press x the cursor remains at the
same position but now it is on the next character. Pressing an x again will delete this character and thus pressing x
repeatedly will delete characters on the line in the forward direction.
When the fast character on the line has been deleted, the situation changes a bit. Since the x command does not affect
the new line character, the cursor always is at the last character of the line. So pressing x now brings up the previous
character under the cursor. Thus pressing x repeatedly will now result in characters on the line getting deleted in the
backward direction. You can easily end up deleting the whole line if you are not carefully to detect this ion in behaviour.
36
In contrast, the X command does not behave differently. it always deletes characters to the left of the cursor. So when it
reaches the beginning of the line it is not able to delete any more characters and it therefore results in a beep, indicating
that the command is not valid in that situation.
To delete a word, one must use the d command. This takes a second character which indicates what is to be deleted. If
this is w, it deletes the word (obeying the definition used by w) from the current cursor position including the delimiting
character. One can say dW or delete a word from the current cursor position using W's definition of a word. To delete a
word in the backward direction, you can use db or dB as you need. As usual these commands can be preceded by a
number to indicate the count of how many times they should be executed.
Unlike character deletions, deleting words can be done across line boundaries. If a word on the next line has to be
deleted, the two lines are joined together. Of course this might mean that the fine is a very long one, in which case you
can split it at the appropriate point by positioning the cursor there and then pressing a.
Now you can learn how to delete a sentence. To do so from the current cursor position just say d), or d( to delete
backwards. The end of a sentence is considered to occur at the next., ! or ? followed by two spaces. You can delete
paragraphs by d) or df, and sections by d[[ or d]], for forward and backward respectively. Again, these commands can be
preceded by numbers to indicate the count of how many times they should be executed.
To delete an entire line, position the cursor on the line and press dd, or give a count if you want to delete more than one
line. This command deletes the line including the new line character, so that the next line is immediately after the
previous one. There is also a command D which deletes all the characters on the line from the current cursor position
onwards except the new line. To blank out a line, go to the beginning of the line and press D. The line will remain but its
contents will have been deleted. As you must have guessed, the same can also be achieved by saying d$.
To delete a block of several lines, you can use the: command. Saying :.,$d will delete all lines from the current line to the
end. In this context, the . refers to the current line and the $ to the last line of the edit buffer. You can replace either by
an absolute line number. So :10,25d will delete lines 10 through 25 irrespective of the current cursor position. You can
also say :10,+Sd to delete 5 lines starting from line 10.
vi internally maintains the line number of each line in the edit buffer. We will see how to display these later, These are
the line numbers that it uses when given commands that take line numbers.
You are now familiar with commands to delete text in vi. There is an analogous set of commands to help you change
text. Let us first see how to replace characters. You can replace a single character by bringing the cursor there and
pressing r followed by the character you want to be present. For example, given the line
Lead, kingly light!
you can correct the mistake by saying rd, whereupon the character gets replaced by the character d
Lead, kindly light!
You can precede this command by a count, but that is not usually very useful because 6rd will replace all the 6
characters from the cursor position with d's. You cannot rise this command to replace them with different characters; all
will get changed to the character you ,specify.
Another change command is R. Unlike r which is effective only for the next character, this command places you in
replace mode, meaning that whatever you now type replaces the character originally in that position. This continues till
you press to stop. This command can be useful when you have text in a fixed format and you want to change it to some
other value. For example, suppose you have a data file containing name and phone number with the name being of 30
characters and the phone number of 7 characters. Now if you want to change the phone number, let us see how you
could do it for a line of that file
Rajat Jain3489872
In this position you can say R4587296, and the line now becomes
Rajat Jain 4587926
We now come to a command which is specialised but can be very convenient This is the ~ (tilde) command which
changes the case of the letter under the cursor. It has no effect on characters that are not letters. For example in the line
above if you want to have the name in capitals, say 01 to position the cursor under the a just after R
Rajat Jain 4587926
Now say 4 ~ and you will have
RAJAT_Jain 4587926
Bur if you say 29~ at that point, the J of Jain will become lower case, which you can change back again. Alternatively you
can treat the words separately as shown here, move the cursor to the a after J, and say 3~ to complete the operation
The command works like a toggle, so if you give the command twice, you will get back the previous case. So you need to
be a bit careful with this.
We have now seen commands which can he used to change text at a character level. Let us now take a look at a few
commands which will work on larger blocks of text. To change a block of text one uses the c command, followed by the
abbreviation for the block to be changed. Thus to change a complete word., Position the cursor at the first letter of the

37
word and say cw for change word. Now a $ appears at the last character of the word to mark out the Scope of the
change, and you are in 'change mode. You can also use cW if you want to use that definition, of a word.
You now type in the text you want to replace the current text. This text can be of any length. It does not matter whether
it is shorter than or longer than the original text, or is of exactly the same length. The change can be terminated with an,
and the position of the rest of the text is adjusted accordingly. Thus if there is extra text now, the following matter is
pushed ahead, and if there is less text, the following matter is Pushed up. Let us see how this happens with a simple
sentence
The man was running
Give the Command cw and you will see
The ma$ was running
as the last letter of the word is shown with a 5. Now type athlete and the sentence becomes
The athlete was running
But You are not limited to typing in one word. You can actually type in as much text as you want in place of the word. So
you could have said athletic looking robber and the sentence would have become
The athletic looking robber was running
You can also change more than one word by preceding the cw command with a count, and then the scope of the change
becomes that many words. In the original sentence
The man was running
You can try 2cw and you will see
The man wa$ running
Now type athletic looking robber is, and the sentence becomes
The athletic looking robber is running
Although it does not make a difference in this case, you can say cW to use W's definition of a word, and Of course this
can also be Preceded by a count. The count can also be given as c5W rather than 5cW. Another thing is that the
command does not have to change complete words, and can be used from anywhere in a word like this
The chairman spoke.
Oops! This is sexist, and we had better change this sentence before people take offence. So say cw and type person to
get the sentence
The chairperson spoke.
And you can change several words while starting the change from the middle of the first word. For example, in this
sentence we could have said 2cw and typed person called the meeting to order. The result would have been
The chairperson called the meeting to order.
After words, let us learn to change lines. As you might have guessed, to change the current line from any point, you can
go to that point and say c$. and follow it up with the text you want to now put in. You end change mode as usual with
an. You can also change complete lines from anywhere on the line by typing cc, and several complete lines by typing a
count before this. You can also change sentences, paragraphs or sections, but we will not look at this in any more detail.
While we have seen how to create a new line anywhere in the text, we do not yet know how to delete the new line
character, that is, to join two lines. You must have already noticed that you cannot use any of the normal delete
commands to delete the new line. Thus x, X, dw, db etc do not affect the new line character at all, and so you cannot join
lines using these commands. For that you need to use the J command. This joins the next line to the current one by
deleting the new line character between the two. The command can be issued from anywhere on the line and there is
no difference on the effect it will have. So let us take the lines
Ram, Shyam
and Mohan
walked home.
The J or 2J commands will result in
Ram, Shyam and Mohan
walked home.
Saying 3J would have joined all three together
Ram, Shyam and Mohan walked home.
One more thing that is useful is the dot command. This command has the effect of repeating the last command that was
executed. Thus if you have just deleted a character with x, then saying. will mean issuing an x again. But if the action you
had last performed was changing a word with cw in the sentence
The grass is blue. The rose is red.
to sky leaving you with
The sky is blue. The rose is red.
and if you now reach the beginning of the word rose and say ., then the meaning of the command will now be again
cwsk, and you will be left with
38
The sky is blue. The sky is red.
This command is useful for doing things repeatedly.
Searching for Text
We will now look at the powerful features available in vi to search for string patterns. As usual, we will start with single
character searches on the same line. To find a given character on the same line in the forward direction use the f
command followed by the character you want to look for. Thus in the line
He froze in terror at the sight

with the cursor located as marked, you can reach the character e by saying fe. Now you will have the cursor at the e in
froze
He froze in terror at the sight.
You can now repeat the command with a different character if you wish. To find the second occurrence from the current
cursor position you can precede the command with a count, as you might have guessed. If the count you give is too
large, that is, if there are not enough occurrences of that character on the line, vi will refuse to do anything and will beep
back at you. Thus from where you are, saying 5ft will result in a beep, while 3ft will bring you to where you want
He froze in terror at the sight.
To continue looking for the next occurrence of the same character in the same direction you can use the; command.
Thus if you now say; you will have the following situation
He froze in terror at the sight
Another; will result in a beep. To look for the same character in the reverse direction, use the, command. Saying, now
will bring up
He froze in terror at the sight.
There is a command to perform the search in the backward direction on the line to start with.This is the F command, and
if the cursor is at the position shown, saying Fr will result. in
He froze in terror at the sight.
You can use the; and the, commands in conjunction with the F command as well, but you have to remember that;
continues the search in the same direction while , does it in the reverse direction. This is always relative to the original
direction. So with F, a ; means the backwards along the line while a, means forward along the line. With a f, a; means
forward along the line and a, means backward along the line.
These commands will not go beyond the line, but the; and, commands can be used even after you change the line by
using any of the cursor positioning commands.
We are now ready to look at other search commands. The command to look for a specified character string is 1. So to
look for the string the in the file, say /the. This places the cursor at the beginning of the first occurrence of the, which
means at the t of the word. You can also use the capabilities of the ex editor to look for strings, by saying :/ followed by
the string you want. This places the cursor at the beginning of the line that contains the string. If the string does not
occur in the file the message
Pattern not found
is displayed on the bottom line of the window. In any case, the command you give. is echoed on the bottom line, and in
doing this any status then being displayed is erased. Also remember that this command has to be completed with a or,
otherwise vi would not know when your string has been completely specified. This is one place where signifies the
completion and not the aborting of a command. If you have made a mistake while writing out your string you can
correct yourself by using the backspace key.
The / and :/ commands wrap around the file, meaning that if you are at a point in the file where the specified string does
not occur after the cursor position but does occur before it, then you will be brought to the first occurrence of the string
from the beginning of the file.
There is another command ? which looks for strings but in the backward direction. You can also use the ex command ?
from vi by saying:?. These also wrap around the file while doing a backward search, and need to be terminated with a
or an.
You can repeat a search for the same string by saying n or N. The difference is that n searches in the same direction and
N searches in the reverse direction. A search can also be repeated by saying 11 or ??, which will always search forward
and backward respectively.
One can also delete text or change text between the current cursor -position and the result of a search string by using
the d command. Thus d/and will delete text from the current position up to the first occurrence of and.
Let us now look at another useful feature of vi. You can place bookmarks at any place in the text that you want. Move
the cursor to that point and say mp, which means you have marked that position with p. To now get back there from
anywhere in the file, say 'p, and to get to the beginning of that line you can say 'p. This is useful when you want to refer
to a place repeatedly, especially when you are entering program files. And you can also delete. and change text from the
cursor position to a bookmark. Thus to delete text from the cursor up to the bookmark p, you can say d'p or c'p to
39
change the text Bookmarks are temporary and hold only for. that session. Next time you edit the file you will have to
place them afresh. Using " brings you to the position from where the last search command was given. While takes you to
the beginning of the line from where the command was given .
Copying and Moving Text
It is now time we learnt how to copy and move text around, or what is frequently called cut and paste. When you delete
text in vi, it does not disappear immediately. This makes it possible to correct any mistakes you might have made in the
operation. So first see how one can undo the effect of most commands.
The u command undoes the last change you made. So if we have
The little dog laughed
and we add some text by saying a to see such fun. we will get the following
The little dog laughed to see such fun.
If you now say u, you will get back the first sentence
The little dog laughed
The undo operation applies to text you delete or change as well. So if you now do a db you can get back the word you
deleted by saying u. This command is not limited to the same line. If you delete 20 lines by saying 20dd, and then realise
that you have made a mistake, you can from any position in the file get back the previous situation by saying u, as long
as you have not deleted, inserted or changed any other text in the meantime and have only moved around or searched
for text in the file.
However note that this command works like a toggle, and if you give the command again, the situation before you gave
the command the first time will be restored, that is the undo will be undone. Also the command moves you to the point
where the effect of the undo occurs from wherever you were in the file.
It will be clear that if you have made more than one mistake you cannot rectify all of them; only the last can be taken
care of. But if you have not moved away from the line and the changes refer to only that line, you can use the U
command. This undoes all changes made to that line since you arrived there. For example
It is better light a candle than curse the darkness.
While entering this ancient Chinese proverb we have made a couple of mistakes. Let us say we correct the mistakes to
have It is better to light a candle curse the darkness.
but introduce one more in the bargain. Assuming the word than was the last to be deleted, it could be got back by
saying u. If here we issue a U command, we will get back the original sentence
It is better light a candle than curse the darkness.
and all the three changes will be undone.
Where does the text you delete go? It goes into a buffer referred to as the unnamed buffer. Text from a change
command also goes into this buffer, because a change is considered to be a delete followed by an add. To retrieve text
from here you can use die p command which puts the contents of the unnamed buffer after the cursor position. You can
also use P which places the contents before the cursor position.
Now the thing is that using the p or P commands does not affect the unnamed buffer. So if you need to place the same
text in many places, all you have to do is go to the proper position and issue either of these commands. The text will be
added at those points.
Let us take an example to illustrate the p command. From your own experience you must have noticed that
transposition is one of the commonest errors while typing. We will now look at a simple way to correct such an error.
Take the line
That punctual servant of all work. the sun
To Correct the mistake here, we first reach the wrong part by. say, fv
That punctual servant of all work, the sun
Now say x to get
That Punctual servant of all work. the sun
and say p to place the character you just deleted after the cursor position
That Punctual servant of all work, the sun
You will soon find yourself using this sequence so often that it will come automatically without any thought. P is needed
to Put the text at the beginning of the line, for example.
So you now know one way of moving text from One Point to another. Simply delete it from the first place and put it back
into the new position. How does one copy text rather than move it? You can delete the text and put it back in the same
place, and then also put it in the place where you want the copy. But there is another command which helps you to do
so.
They command yanks a block of text into the unnamed buffer without deleting the text from the file. It therefore will
overwrite the previous contents of the unnamed buffer. You can then write those contents anywhere you wish by using
p or P in the usual manner.

40
The argument to y is the block of text. For example yw yanks the rest of the current word and yy or Y yanks the line.
These can be preceded by counts as After yanking the text You can write it wherever you wish unless you have changed
the contents of the buffer by usual. some other operation.
Apart from the unnamed buffer, you can make use of 26 named buffers to store text for pasting. To store the block of
text in the named buffer b, say "b followed by the command to get the text into the unnamed buffer. So to place the
next 8 lines into the named buffer b, just say "b8yy. To put this text, say "b followed by your put command, like "bp.
Named buffers are very useful when you want to have more than one block of text stored for pasting later. There is only
one unnamed buffer, and it is easy to change its contents accidentally. The text in named buffers is unlikely to get
changed accidentally by a carelessly given command. Also if you change the file being edited, even within the same
session, the unnamed buffer gets cleared. The text in named buffers remains there for the whole session. So you can use
them to copy text from one file to another.
You can append text to the current contents of a named buffer by referring to it by the corresponding uppercase name
while putting text into it. Thus to append the next 3 lines to the current contents of the named buffer b, you can say "B.
When putting text, the case is not significant.
The Features of ex
We have been referring to some capabilities of vi in this discussion which are actually part of the line editor ex. All
commands (except multi line commands) of ex can be referenced from within vi by using the: command, as already
mentioned. This is possible because vi and ex are actually the same editor called up in different modes. Vi is the full
screen or visual mode whereas the ex editor works in line mode. We will there we will look at some commands of ex
which are useful in vi as well.
There are many options to vi which can be set. To see what these are, you can use the command :set all. We have
referred to line numbering in vi. This can be turned on by saying :set number. To disable the option, precede the name
of the option with a no. So to turn off line numbering, say :set no-number. These commands can also be abbreviated;
number can be referred to as nu. So you can say :se nu and :se nonu with the same effect. The line numbers appear at
the beginning of each line but you cannot reach them with the cursor because they are not part of the text Options have
defaults with which vi is called up if you do not specify otherwise. Line numbers are off by default.
We will look at only a few set commands and you should refer to the documentation for complete details. A useful
option is tab stop. This determines how many characters the cursor will advance in response to a tab character. By
default the value of this parameter is 8. But some6mes one wants this to be less. For example if you write C++ programs
and if like most programmers you indent loops and conditions, you can quickly find yourself running out of space on the
line. It is not unheard of to be starting a statement on the next line because of the level of indentation. Obviously this
would defeat the purpose of indentation which is clarity. The solution to this is to reduce the amount of indentation. You
could use spaces, of course, but a more convenient way is to change the value of a tab stop to say 4 characters. You can
do so by saying set tabstop=4.
One option useful while entering text is wrap margin. By default the value of this is 0. If you set it to 8, then whenever
there am fewer than 8 character positions left on the line and a space is encountered, the cursor moves to the next line.
To set it you can say :set wrapmargin=8.
The error messages of UNIX are anyway short and can seem mysterious to a novice. The status line at the bottom of the
window in vi is no exception to this, nor are the error messages vi sometimes gives. But this is the verbose mode of vi. If
you wish you can make these messages even shorter by saying :set terse. The default is (mercifully) noterse.
We spoke earlier of how you can reach the corresponding parenthesis, brace or bracket with the % command. This
command is because by default show match is off. If this is on, then every time vi encounters a parenthesis, brace or
bracket at the cursor position it will take the cursor to the matching character for one second before returning to its
place. This can be useful while entering programs using vi. To set the mode you can say :set show match.
So far we have talked of cut and paste, but how does one read another file into the edit buffer? To do so you can use :r
which is actually an ex command. If you are editing the file UNIX.doc and you want to read in material which you have
available in a file ch 1, you should move the cursor to the point where you want to read in the file and say :r ch 1. The
contents of ch 1 will now get inserted at the cursor position. This kind of thing is particularly useful when entering
programs, because you might have a master template which you need to keep modifying, but differently each time.
You can start editing a different file by saying :c filename. The current file should have been saved before doing this,
otherwise the command will not work. You can force the issue, however, by saying :c! filename. In that case the changes
you might have made to the current file will be lost. Similarly when you have called up vi with a lot of files you can say :n
to go to the next file. Here also if you have not saved the current file and do not want to do so, you need to say :n! to go
to the next file. Thus the exclamation mark forces an operation in this context.
We have talked of searching throughout a file for some text but not how to replace it. This will be discussed in the next
section, but for now it is sufficient that it is an ex feature.

41
There is another feature that is frequently useful. If you want to execute a UNIX command while inside vi, you can do so
by saying:! followed by the command. It follows that if you want to execute several commands, you would be better off
by invoking a shell by saying :!csh, although you could execute each command one by one if you so desire.
Finally, you are not stuck with vi's default settings for its options. If you have your own favourite set of options you can
have them take effect every time you call up vi. This can be done by setting a shell environment variable, which we will
discuss only in the next unit. Another way is to create a file called exrc in your home directory and put your settings
there. For example, if you want line numbering on, tabstops after every 4 characters and the terse option, you can
create the file (you can use vi for this' now, can't you!) with the following contents
% cat exrc
set number
set tabstop=4
set terse
with this, we conclude our somewhat sketchy discussion of the editor vi and go over to the line editors ex and ed.
THE LINE EDITORS EX AND ED
The UNIX system has two line editors available with it, called ex and ed. In this section we will take a quick look at them.
Although in the last section on vi we have looked at some ex a features, here we will try to be a bit more systematic in
discussing ex and ed Certainly line editors are primitive compared to text editors, but there are times when a line editor
is useful. For example a full screen editor needs to be told about the terminal you are using while a line editor can easily
work on any terminal without knowing about it.
So if you are connected Over a network to a host which does not understand your terminal, you will find a line editor of
great help. In fact if you call up vi and the computer does not understand your terminal, it will tell You so and start up in
open mode, which is actually line mode.
Ed and ex both are oriented towards lines. So any operations you perform have to specify the lines on which You want
them done. The specification can be done by line, numbers or other means, like by giving some information about the
text contained in the line. Ed is more Primitive than ex, It is less communicative and has fewer features. Ex is easier to
use and has more facilities. In fact ex was written by performing some surgery on ed. So if you know ed, you can use ex
easily but you might not be taking advantage of all the Power that ex has.
In this section we will take both together and will also look at the extra facilities that ex Provides. Both of them use an
edit buffer into which the file being edited is read. This buffer is written back to the file when You save the contents of
the buffer.
Starting up and Quitting
To edit at, existing file you say
% ed UNIX doc
?UNIX doc
You were warned that ed was cryptic in its communication, and here is a good sample of what to expect. The, ? is ed's
way of saying that it does not understand your command. Here it means that the file UNIXdoc does not exist and if you
save the contents of your edit session, a new file by that name will be created. If the file exists, the number of characters
in it is echoed and you can take that to mean that you specified the file correctly.
% ed CXrc
35
That is all! You never know whether anything is happening, whether your command his been recognised or whether ed
is just waiting for commands for you. Part of the reason for this is that ed has no prompt of its own, you would do well
to start up ed with a prompt
% ed -p ok.exrc
35
ok
This will at least tell You when input is expected from you. The characters after the -p are taken to be the prompt. you
can certainly specify a single character prompt if you wish. If you are inside ed and want to get a prompt, you can use
the P command which toggles between a * prompt and none. By the way, ed commands are limited to single characters,
although they might be followed by parameters.
You can invoke a UNIX command from within ed by prefixing a! to the command. So to see the directory listing from
within ed, say
ok ! Is
If you wish, you can invoke a new shell by saying
ok ! csh
and you can now run UNIX commands until you want to get back to ed, which can be done with ^D or exit.
If you feel disoriented by ed's taciturn behaviour, you' can toggle help on by H. Another H will turn it off and an h will
give you an explanation of only the last error it encountered, to which it would have responded with a ?.
42
Compared to this, ex is much more helpful, partly because the messages are familiar from our knowledge of vi. To call up
ex, say
% ex UNIX doc
"UNIX doc" [New file]
:
This is self explanatory; UNIX doc does not exist. You can give several filenames when you start up ex and these will be
dealt with in the order given. You can call up the next file by n and can call up any file you want by e filename. This last
facility is also available in ed. There is a special character % which can be used to refer to the current file. So if you have
made a mess of the current edit session and want to restart on your file, you can abandon the changes and start afresh
by
e! %
If ex finds the file you invoked it with, it tells you about the file, as you saw in vi
% ex exrc
.exrc" 3 lines, 35 characters
The contents of exrc apply to ex as well, so you will find line numbering on and terse mode in effect, if the contents of
exrc are as in the last section. However we have not shown the effects of this in the example above to avoid confusion.
The set options apply only to ex or vi, and there are no options that can be set in ed.
Now let us see how to get out of the editors. As we mentioned in the earlier section on vi, it can be very irritating to get
into an editor and not know how to get out. That is why we deal with this in the beginning although that is hopefully the
last command you will want to give, unless you have made many mistakes and want to start again afresh.
You already know how to get out of ex, for we saw this in vi. But to recapitulate, q gets you out of the current file if you
have saved it. So if you had invoked ex with several files then q tells you the number of files still to be edited, and if
there are no more left you get back the shell prompt. The command q! forces an exit to the shell irrespective of whether
there are any files left to be edited or whether you have saved the changes to the current file.
Here one must mention that ex is 'not very intelligent when it comes to detecting a change to its buffer. It only
remembers if any command was executed which could have changed the buffer, for example a delete, insert or change
operation. Any of these it will consider to be a change in the buffer. But if you save a file and then perform a series of
operations which leave you in the same state (like deleting some text and putting it back again in the same place), ex will
not know and will continue to think that the buffer needs to be saved. Of course you can use q! since you know better.
To save a file you use the w command followed by the filename you want, or w alone to save under the same name. If
the file already exists you have to use w! to force a write to a different filename.
As far as ed is concerned, a q will take you back to the shell if the file has been saved, and if you have not saved the file
you can do so if you wish with a w. If you have not saved the file, a q will elicit a question mark; to force an exit type q
again. You can also use Q to force an exit straight away.
We have now learned how to invoke the editors and how to get out of them. Let us now move on to the interesting
intermediate part.
Addressing Lines
As is to be expected, these editors revolve around lines. Each line in the edit buffer has a number although you cannot
see it (except ex with the set number option). Most commands take one line number, or two line numbers separated by
a comma, before them to indicate the range of lines that they will act upon. If you have to type a long command you can
split it up into several lines by escaping the extra new lines with a\ character.
There is a concept of the current line in the buffer, and whenever you omit the line numbers before a command, the
current line is assumed to be the one affected. This can also be referred to by a . character. When you call up the editors
with a filename the current line is the last line in the file. In general the last line affected by an operation becomes the
current line. The last line in the buffer can also be referenced by typing a $, irrespective of the number of lines.
The first line is referred to as line, 1, but there is a hypothetical line number 0 which has no text in it and can be used to
insert text before the first line. Lines get renumbered automatically with any insertion or deletion of lines. The simplest
way of addressing a line is by line numbers, as we have already mentioned. Thus 29 means line number 29 and .,$-3
means from the current line to the third last line of the buffer.
You can also refer to lines by the text they contain. Again what you learnt in vi will come in handy. You can refer to text
as a fixed string or a regular expression surrounded by 1 or ?. Thus /person/ or ?person? both refer to the line nearest
the current line (usually called dot) in the forward and backward directions respectively. But you must remember that
these searches wrap around the extremities of the file, so that a forward search might end up giving you a line before
the current line. In ex, you can turn this wraparound off by saying :set nowrapscan, but in ed there are no options that
can be set. A pattern can naturally occur several times in a file, so you can repeat a search for the last mentioned text
pattern by simply saying // or ?? instead of having to write out the pattern again and again. Finally, you can refer to lines
by marking them out with the k command for ed or the ma command of ex. There are 26 marks you can put at one time

43
as marks can be named by any letter. Thus to mark the current line with c, you can say kc in ed or rp-ac in ex. To refer to
the marked line later, you say 'c and that line will be addressed irrespective of its position in the buffer.
Apart from these direct methods of addressing lines. you can also use numbers with any of the methods to obtain an
address. Thus you can use + or - with any of the methods given above' That is why $-4 refers to the Sth last line of the
buffer. Moreover, if you are addressing a line range, you do not have to specify both lines in the same manner. One can
be expressed as a marked line and the other as a text expression, for example.
Thus you have a great deal of flexibility in addressing lines in these editors. So you can have a line range like
/Personl+7,'c- 8, which refers to the range of lines starting from the 7th line after the first line in the forward direction
containing the pattern Person and ending with the 8th line before the line marked with a c. If a number is omitted after
the + or the - it is assumed to be 1. So it should now be easy for you to refer to lines in the file even if you do not know
the line numbers.
Looking at Text
In vi you had a window and you could easily see text in that window at all times. You needed to learn some vi commands
only when you wanted to change the window or move around in ,it. Unlike vi, in ed or ex you get to see nothing at all of
the file when you open it for editing. You have seen in a previous section that when ed recognises a filename and opens
the file successfully, it only gives you the character count n the file. In fact if you press a at that time you will get its
infuriating question mark because that would be construed as an attempt to move past the fast line of the rile. So let us
learn the commands to look at the text in the edit buffer and to move around in it.
The command to print the range is p. It is also the default action if you do not specify anything to be done in the line
range. So (assuming an ok prompt) in the exrc file
ok p
set terse
ok
because you are at the last line of the buffer when the file is opened. To print the entire buffer, say
ok 1,$
set number
set tabstop=4
set terse
ok
If we do not specify any action on the line range 1,$ and so the default action of printing the last line of the range is
performed. Let us look at a file empf 1 containing the names and salaries of employees of some company arranged
alphabetically
% ed -p "ok " empfl
75
ok /Chetan/,/Hari/p
Chetan Gaur 10,000
Deepak Kumar 8,000
Hari Ram 12,000
ok
There is also an 1 command to print control characters as well, but we will not look at it here. There is an n command (#
in ex) which shows the text with the line numbers in front
% ed -p "ok ".exrc
35
ok p
3 set terse
ok
The line numbers are not part of the text. You can also use = to find the number of the addressed line, or the last line
number by default
ok /Chetan/=
7
ok
which means that the line is line number 7 in the file. You can move forward in the buffer by typing or + and backward
by a -. In either case the contents of the line are printed. Saying will move you back 3 lines and then print its contents, as
will -3. You cannot move past the last line or before the first line. As always, the current line becomes the line you reach.
Also simply typing a number has the effect of making that line the current line.
Adding, Deleting, Changing Text

44
In vi, adding text meant getting the cursor to the exact point where you wanted the new textto be and then typing in the
new text after entering insert mode. You can see where you are in relation to the rest of the matter and 1 it is not a
difficult task, especially because you can immediately make out if you making a mistake.
The situation in a line editor is different. Since the text consists only of lines, the only thing you can meaningfully add is a
line. Adding some text to a line amounts to correcting the line in these editors. To add a line of text you can say a or i.
The former adds lines after the current line, whereas i does so before the current line. Thus i is useful in inserting text
before the first line whereas a is useful in adding text after the last line. In between, both commands can be
conveniently used.
When you open a new file you can use either a or i to start inserting text and there is no difference in the effect they
have. If you are editing an existing file and You want to append text at the end, You can do so immediately because
when You open a file you are automatically Positioned at the last line. In other situations You need to reach the fine you
want before issuing a command to insert text.
To terminate text entry mode, you have to type a period or. followed immediately by There should he no other
character on that line, otherwise the line is taken to be a line of text to be added. If you do not have a prompt. it will
be not immediately apparent whether you have successfully finished entry or not. You can check that out by giving a
Command to print a line. If the line is printed, you are all right. Otherwise your "command" would have got entered
as text in the edit buffer.
To delete text,, that is, lines of text you can use the d command. Just give the range of lines you want to delete followed
by d. Thus, to delete from the file empfl the lines we had earlier Printed out to illustrate the P command, just say
ok /Chetan/ /Hari/d
ok
Since here we know the number of lines, we could also have said
ok /Chetan/ /Chetan/+3d
ok
In ex we an also use the Command
: /Chetan/d3
which will delete 3 lines from the line that matches /Chetan/.
There is a change command available which allows you to change lines. But it is only a convenience because it first
deletes the lines and then adds the text you specify. You have to terminate the entry as usual with .
The advantage is that You do not have to explicitly delete the lines and then add your text. This gets done as one
operation by the editor itself. In exg we can specify the range Of addresses to be changed either by giving a starting and
ending address or by giving the starting address followed by a count, just as in the case of the delete command. It is
frequently more convenient to use the second method.
You do not have to change only the exact number of lines that You have addressed. That is, if you have given a
command to change the line number 4 by saying
ok 4c
You are not now bound to type in only one line. You may type in any number of lines just as you do in insert mode. Thus
You could type in one fine, or none at all or three lines.'
We should also understand the u or undo command. As in vi, this allows you to reverse or undo the last change made to
the edit buffer. Also if you had moved away from that line, the current line or dot becomes that line. The command is a
toggle in the sense that a second u Will undo the previous u command and restore the position that Prevailed before
either of them was given.
Searching for and Replacing Text
You have already seen how to search for some text as one of the methods of addressing lines. That only gives you the
first occurrence of-that text Needless to say, there might be several other lines in the file that also contain that text
string. There are commands which enable you to find all such lines in the file. The first of these is the g command for
global. The command has the following form (suppose you are editing exrc)
ok g/tab/p
set tabstop=4
ok
The command is followed by the text pattern to look for, which can be a regular expression. This should be enclosed in
slashes. This is followed by the command to be performed on the lines containing the pattern. Here we have asked for
ed to find all lines containing the text pattern tab and to print that line. One can omit the trailing slash and the p because
that is the default action to be taken in ed. So you could also have said
ok g/tab
set tabstop=4
ok

45
There is another command v which is the complement of the g command. It inverts the sense of the match with the
regular expression. So
ok v/tab
set number
set terse
ok
prints out all lines in the file that do not contain the text pattern. Of course any other action could have been specified.
There are also the G and V commands but we will not discuss them here. In ex you can also use g! as an alternative form
of v.
With this let us look at how to achieve substitution of text with the s command. This command is a godsend because it
obviates the need for retyping entire lines to change only some of the text on it. It works on the line ranges specified by
you. Thus
ok /tab/s/4/3
ok
This means that ed should look for the next line in the forward direction containing the string tab and in that line replace
4 with 3. The whole thing gets done silently and if you now give a p command you will find that tabstops are now set at
three spaces in the exrc file. This will take effect the next time you start up ex or vi.
The substitution as shown here is done for only the first occurrence in the line of the text pattern searched for. If the
pattern occurs more often you could repeat the command for the current line or use the g flag at the end. Suppose you
have the line
The chairman asked the deputy chairman to conduct the meeting in her absence.
Now saying
/chair/s/man/person
will convert this line to the following
The chairperson asked tne deputy chairman to conduct the meeting in her absence. To replace both occurrences say
/chair/s/man/person/g
The substitution can be done on a range of lines by preceding the s with the range desired. You can also use g for all
lines containing a pattern here. The print flag (p) can be specified as well to print the lines afterwards.
There are many Powerful features available as regards substitution in ed and ex, but in this short discussion we will not
go beyond the basics. You are encouraged to refer to the UNIX documentation for a full treatment Of grouping regular
expressions, ignoring case, the magic option and other arcane facilities. All these are available in vi too, as you may have
guessed.
Cut and Paste Operations
We will now Only quickly look at cut and paste operations. Actually you have already learnt the basic technique in the
section on vi. Looking at ed first, the command to move lines is shown below
ok 1,$
Australia Canberra
Denmark Copenhagen
France Paris
China Beijing
ok 2,3m4
ok 1,$
Australia Canberra
China Beijing
Denmark Copenhagen
France Paris
Thus the m Command moves the lines specified to the line after the line mentioned. Note that this results in the lines
getting renumbered but the line numbers of the Command are always the line numbers before the operation is carried
out This example is not meant to suggest that files be sorted in this fashion.
There is a similar command to copy lines, the t command. It is similar to m except that the original lines remain where
they were. One can also mark lines with the k command as mentioned in the beginning of the discussion on ex and ed.
The command is followed by a single letter which is the mark. The marked line can be addressed by preceding the letter
used with a '.
The mark vanishes if the line is deleted but remains with the text if the line is moved or copied. In the latter case, The
mark remains only with the original and not with the copy.
Lines can be joined together by using the j command. Thus in the above example
ok 1,3j
Australia Canberra China Beijing Denmark Copenhagen
46
France Paris
ok
To copy text from one file to another in ed, you need to use temporary files as explained in the next section. But ex is
more intelligent in other ways as well. The command to copy is co or t, m for move and ma for mark. The join command
also pr vides spaces between the joined lines, unlike ed. which just puts the lines one after the other.
In ex you can yank lines into an unnamed or named buffer and write them out later. The concept has been dealt with
while discussing vi. Named buffers do not get cleared when you open another file in the same session. So You can easily
transfer text from one file to another, by yanking out what you want and using the Pu command to put it back where
you want it written.
Files and Miscellaneous Features
We will close this discussion on the line editors with a word on the facilities for editing other files and some other
features. In ed the f command will tell you the name of the current file and can also be used to change it. Thus
ok f
.exrc
ok f ed_file
ok f
ed_file
ok
Any write operation now works on ed-file. You can also read in some other file into the edit buffer by the r command
ok 2r /etc/motd
ok 1,$
set number
set tabstop=4
Please delete unneeded files to make space!!!
set terse
ok
A write can be done of parts of the buffer by giving the range of lines needed. To get the result of a UNIX command into
the edit buffer, precede the command by an ! like this
ok 2r !cat /etc/motd
will place the file after line 2 of the buffer, which is the same result as before. However, you could have given any other
UNIX command had you wished.
There is also an e command to start editing another file. It results in the contents of the current buffer getting discarded.
So you can use it to edit a file after you have saved the changes to die current file, or if you want to abandon a file, or if
you have got the name wrong while starting up ed. Thus you have a way of copying text from one file to another by
using an intermediate temporary file. Let us quickly see how this can be done.
Suppose you have two files containing the following names
% cat places
London
New York
Paris
Rafflesia
New Delhi
% cat flowers
buttercup
tulip
lotus
You want to clean up the places file by moving the flower (Rafflesia) to the flowers file. You can do the following (we
assume here that tmp is the name of a temporary file you can safety create in the current directory)
% ed -p "ok " places
ok 4w tmp
ok d
ok w places
ok e flowers
ok r tmp
okw .
ok!rm tmp
We have used tmp as a temporary buffer on disk to hold the text we want to transfer. As you know that is not needed in
ex because named buffers are preserved within an edit session.
47
In ex he default for reading in a file is after the current line. If the target of a write exists then you have to force a write
with w! You can append to an existing file name by w filename and can make the contents of the buffer available as
input to a command with w ! followed by the command. Thus to cat a file you can say 1,$ !cat.
Also in ex, the e command will force you to write the changes to the current file before bringing up the other file, but
you can override this by saying e! filename. You can start up ex with several files and can edit one file after another by
saying n to bring up the next file. There is the concept of an alternate filename as well which is accessed by saying e#.
Thus if you are editing file 1 and then say e file 2, then file 1 is the alternate file and can be called up with e#. As soon as
this is done, file 2 becomes the new alternate file.
Finally you can call up vi from ex by saying vi. Also you can try and recover an unsaved buffer after an abnormal
shutdown of your system by saying ex -r filename. All these features are similar to what you have already seen in vi.
In the next section we will look at the stream editor sed, as we had promised before we took up this discussion on the
various editors commonly available in UNIX.
THE STREAM EDITOR sed
As might be surmised from the name, sed is a stream editor and gives a non-interactive way of making changes to files.
We are taking it up after looking at UNIX editors because it command are similar to those of ed, though not the same.
We will look at the differences presently. The sed command does not read in the contents into any buffer. Since sed
writes to the standard output, it is useful for making a transient version of a file available after applying some
transformation on t. the command can be thought of as performing some change to the input file. This change can be
applied to any part of the file as described later. One can also use it like grep or tr but it has many more capabilities.
As always, we will be looking only at some of the features of the command. It has many possibilities because it
understands regular expressions and most ed commands. It can take input from the command line or from a file, usually
said to be the script. This script file can be created by using any editor like vi. When so invoked, the commands given in
the script are applied to the input file. This method is useful it there are several commands you want to run, for if there
are only one or two commands we can conveniently place them on the command line.
Let us look at a file containing the names of some people working together on some project.
% cat team_crypt
Ram Kumar
Ajay Bansal
Shiv Prasad
Zafar Khan
Pramod Kumar
Anthony d'Costa
Surendra Kumar
Suppose you want to change the name Pramod Kumar to Pramod Jain. This can be done like this
% sed -e "s/Pramod Kumar/Pramod Jain/" team_crypt
Ram Kumar
Ajay Bansal
Shiv Prasad
Zafar Khan
Pramod Jain
Anthony d'Costa
Surendra Kumar
The -e option must precede every occurrence of an edit command unless there is only such command. So we could also
have said
% sed "s/Pramod Kumar/Pramod Jain/" team_crypt
with the same output as before.
If we had mentioned only Kumar then three names in the file would have got changed, all those that contain a Kumar in
their name. The reason for the quotes in the edit command will become clear in the next unit, but here we can say that
they are needed to allow spaces to be part of the command line. Without them sed would have thought that s/Pramod
was the edit command and Kumar/Pramod was the file name. This would have caused sed to complain that the
command was garbled. Also one could have redirected the output to another file if desired.
The format of sed commands is similar to those of ed, but there are differences here too. One is that if no line numbers
are specified, the operation is performed on all lines in the input file. Also, one cannot mark lines or talk of a current
line. All line numbers are absolute and fixed throughout the input file because one cannot move or copy text around.
In the example above, one could also have said
% sed -c /Pramod/s/Kumar/Jain/ team_crypt
to achieve the same result. To delete the name Anthony d'Costa, you could say
% sed -c "/Anthony/d" team_crypt
48
We will now see how to put the transformation commands into an edit script and apply that script to an input file. This is
useful when one wants to apply several changes to the file, or when there is a complex set of changes where you could
goof up while typing the command itself, or when you might want to be applying the same changes to several input files.
The changes are applied conceptually in sequence, but actually sed might alter the order in which you specify them in
order to optimise the operation. For example, a delete will be done before a substitute. Such niceties need not concern
the user because the end result is what you ask for. Thus the previous two commands could have been put in a file
% cat first.sed
/Pramod/s/Kumar/Jain/
/Anthony/d
and then applied to the input file
% sed -f first.sed team-crypt team-crypt.new
The -f option tells sed that the next argument is the name of the file containing the edit script. Here we have also
redirected the output to a disk file.
The append command is specified a bit differently from ed. Every line except the last including the line containing the
command is terminated with a\. So we can add a blank line at the end of the file by
% cat blank.sed
$a\
where there is a blank in the second line of the file. To add a blank line after every name containing Kumar, say
% cat blank.sed
/Kumar/a\
% cat blank.sed
a\
This works because the default is to apply the command to every line of the input file. By the way, you can specify both
the -e and -f options in the same invocation of sed, but then the -e has to be specified even if there is only one instance
of a command line option.
CHANGING SEVERAL LINES IN sed
So far we have looked at sed operating n only one file. If you do not specify any file, sed works on the standard input.
Thus sed can be used as a filter. It can also accept multiple file names. If that is done, then the changes specified apply to
all files. The line numbering starts at the first file and continues uninterrupted unless the last file is reached. Thus $ is the
last line of the last file in such a case.
Suppose you have two files containing the names of some cities.
% cat city- 1
Delhi
Bombay
Calcutta
Madras
% cat city-2
London
New York
Tokyo
Now You could give a command like
% sed "3,5d" city-1 city-2
Delhi
Bombay
New York
Tokyo
and the lines have been deleted. You will find that lines from both files have been affected. because line 3 was the 3rd
line of the first file and line 5 was the first line of the second file. The same concept applies to any sed command and any
number of files.
Now let us see how to print only some of the lines in the output file. This is different from the lines of the input files on
which the operation is Performed, so do not confuse the two. By default sed prints all the lines of the output file. You
can prevent this by using the -n option to sed. If this is used then only lines specifically asked for are printed out. To print
a line from the output file you need to give the p command. Note that
% sed -n city- 1
is an error because no action has been specified for sed to perform. So you can print the file by
% sed -n p city-1
Delhi
Bombay
49
Calcutta
Madras
but notice what happens if you say
% sed "2,3p" city-1
Delhi
Bombay
Bombay
Calcutta
Madras
This happens because sed prints all lines and on top of that it prints the lines requested for by the p command. So those
lines get printed twice.
You have seen how to apply the sed commands to the range of addresses given. To invert the sense of this set you can
precede the command with an !, whereupon the command will be applied to those lines whose addresses do not match
those given. So
% sed -n "2,3p" city-1
Bombay,
Calcutta
and you can also check out
% sed -n "2,3\! p" city- 1
Delhi
Madras
There are some other advanced features available in sed, but we will only talk of the y command, which translates
character sets like you saw in the tr command. Here you cannot give ranges and both sets must contain exactly the same
number of characters, unlike what is needed in the tr command.
% sed "y/abcde/edcba/' city- 1
Dalhi
Bombey
Celcutte
Mebres
The characters not part of the transposition set are not affected. of course one could have used the tr command for the
example given here. You should study and work out the other commands available with sed.
Sed is useful when you want to perform some transformation on a file with a given format again and again. You can then
make a script which can be used every time. But there are times when you need a more powerful text manipulation tool.
To that end we discuss awk, which is really a programming language, in the next section of this unit.
AWK
Awk is a powerful pattern matching and text processing language which has been put to various uses. It is a versatile
language and bringing out all its capabilities with realistic examples would require a block in itself. It was developed by
Aho, Kernighan and Weinberger and you should look up the documentation to get a better knowledge of awk than one
can impart here in this brief section. It is very useful for quick data extraction and report formatting, especially if you are
doing one off data processing on small sized text files of 10Mb or so (this of course depends on your hardware).
The syntax of the language resembles that of C to quite an extent and so it should be easy for you to learn and use. In
fact it has many constructs from C but in a more convenient form. Thus you do not have to declare any variables or their
types. It supports associative arrays and you can define functions and invoke them with parameters.
Unfortunately there is not enough space in this unit or this block to even touch upon awk's capabilities. So as always we
regretfully refer you to the UNIX documentation and books on UNIX where this language might be described.

50

Das könnte Ihnen auch gefallen