Sie sind auf Seite 1von 15

HugePages

HugePages on Linux: What It Is... and What It Is Not

This document describes the HugePages feature in the Linux kernel available
for 32- and 64-bit architectures. There has been some confusion among the
terms and uses related to HugePages. This document should clarify the
misconceptions about the feature.
SCOPE
Information in this document is useful for Linux system administrators and
Oracle database administrators working with system administrators.
This document covers information about HugePages concept that applies to
very large memory systems for 32- and 64-bits architectures including some
configuration information and references.
DETAILS
Introduction

HugePages is a feature integrated into the Linux kernel with release 2.6. This
feature basically provides the alternative to the 4K page size (16K for IA64)
providing bigger pages.
Regarding the HugePages, there are some other similar terms that are being
used like, hugetlb, hugetlbfs. Before proceeding into the details of
HugePages, see the definitions below:

Page Table: A page table is the data structure of a virtual memory


system in an operating system to store the mapping between virtual
addresses and physical addresses. This means that on a virtual
memory system, the memory is accessed by first accessing a page
table and then accessing the actual memory location implicitly.

TLB: A Translation Lookaside Buffer (TLB) is a buffer (or cache) in a


CPU that contains parts of the page table. This is a fixed size buffer
being used to do virtual address translation faster.

hugetlb: This is an entry in the TLB that points to a HugePage (a


large/big page larger than regular 4K and predefined in size).
HugePages are implemented via hugetlb entries, i.e. we can say that a
HugePage is handled by a "hugetlb page entry". The 'hugetlb" term is

also (and mostly) used synonymously with a HugePage (See Note


261889.1). In this document the term "HugePage" is going to be used
but keep in mind that mostly "hugetlb" refers to the same concept.

hugetlbfs: This is a new in-memory filesystem like tmpfs and is


presented by 2.6 kernel. Pages allocated on hugetlbfs type filesystem
are allocated in HugePages.

Common Misconceptions

WRONG: HugePages is a method to be RIGHT: HugePages is a method to have larger pa


able to use large SGA on 32-bit VLM
where it is useful for working with very large mem
systems
is both useful in 32- and 64-bit configurations

RIGHT: HugePages can be used without indirect


WRONG: HugePages cannot be used
64-bit systems does not need to use indirect buffe
without USE_INDIRECT_DATA_BUFFERS have a large buffer cache for the RDBMS instance
HugePages can be used there too.
WRONG: hugetlbfs means hugetlb

RIGHT: hugetlbfs is a filesystem type **BUT** hu


is the mechanism employed in the back where hu
can be employed WITHOUT hugetlbfs

WRONG: hugetlbfs means hugepages

RIGHT: hugetlbfs is a filesystem type **BUT**


HugePages is the mechanism employed in the ba
(synonymously with hugetlb) where HugePages c
employed WITHOUT hugetlbfs.

Regular Pages and HugePages

This section aims to give a general picture about memory access in virtual
memory systems and how pages are referenced.
When a single process works with a piece of memory, the pages that the
process uses are reference in a local page table for the specific process. The
entries in this table also contain references to the System-Wide Page Table
which actually has references to actual physical memory addresses. So
theoretically a user mode process (i.e. Oracle processes), follows its local
page table to access to the system page table and then can reference the
actual physical table virtually. As you can see below, it is also possible (and
very common to Oracle RDBMS due to SGA use) that two different O/S
processes can point to the same entry in the system-wide page table.

When HugePages are in the play, the usual page tables are employed. The
very basic difference is that the entries in both process page table and the
system page table has attributes about huge pages. So any page in a page
table can be a huge page or a regular page. The following diagram illustrates
4096K hugepages but the diagram would be the same for any huge page
size.

HugePages in 2.4 Kernels

The HugePages feature is backported to some 2.4 kernels. Kernel versions


2.4.21-* has this feature (See Note 311504.1 for the distributions with 2.4.21
kernels) but it is implemented in a different way. The feature is completely
available. The difference from 2.6 implementation is the organization within
the source code and the kernel parameters that are used for configuring
HugePages. See Parameters/Setup section below.
Some HugePages Facts/Features

HugePages can be allocated on-the-fly but they must be reserved


during system startup. Otherwise the allocation might fail as the
memory is already paged in 4K mostly.

HugePage sizes vary from 2MB to 256MB based on kernel version and
HW architecture (See related section below.)

HugePages are not subject to reservation / release after the system


startup unless there is system administrator intervention, basically
changing the hugepages configuration (i.e. number of pages available
or pool size)

Advantages of HugePages Over Normal Sharing Or AMM (see below)

Not swappable: HugePages are not swappable. Therefore there is no


page-in/page-out mechanism overhead.HugePages are universally
regarded as pinned.

Relief of TLB pressure:


o Hugepge uses fewer pages to cover the physical address space,
so the size of book keeping (mapping from the virtual to the
physical address) decreases, so it requiring fewer entries in the
TLB

o TLB entries will cover a larger part of the address space when
use HugePages, there will be fewer TLB misses before the entire
or most of the SGA is mapped in the SGA
o Fewer TLB entries for the SGA also means more for other parts of
the address space

Decreased page table overhead: Each page table entry can be as


large as 64 bytes and if we are trying to handle 50GB of RAM, the
pagetable will be approximately 800MB in size which is practically will
not fit in 880MB size lowmem (in 2.4 kernels - the page table is not
necessarily in lowmem in 2.6 kernels) considering the other uses of
lowmem. When 95% of memory is accessed via 256MB hugepages,
this can work with a page table of approximately 40MB in total. See
also Document 361468.1.

Eliminated page table lookup overhead: Since the pages are not
subject to replacement, page table lookups are not required.

Faster overall memory performance: On virtual memory systems


each memory operation is actually two abstract memory operations.
Since there are fewer pages to work on, the possible bottleneck on
page table access is clearly avoided.

The Size of a HugePage

The size of a single HugePage varies according to:

Kernel version/linux distribution

HW Platform

The actual size of the HugePage on a specific system can be checked by:
$ grep Hugepagesize /proc/meminfo
The table below shows the sizes of HugePages on different configurations.
Note that these are general numbers taken from the most recent versions of
the kernels. For a specific kernel source package, you can check for the
HPAGE_SIZE macro value (based on HPAGE_SHIFT) for a different (more
recent) kernel source tree.
HW Platform

Source Code
Tree

Kernel
2.4

Kernel
2.6

Linux x86 (IA32)

i386

4 MB

4 MB *

Linux x86-64 (AMD64,


x86_64
EM64T)

2 MB

2 MB

Linux Itanium (IA64)

ia64

256 MB

256 MB

IBM Power Based


Linux (PPC64)

ppc64/powerpc N/A **

16 MB

IBM zSeries Based


Linux

s390

N/A

1 MB

IBM S/390 Based


Linux

s390

N/A

N/A

* Some older packaging for the 2.6.5 kernel on SLES8 (like 2.6.5-7.97) can
have 2 MB Hugepagesize.
** Oracle RDBMS is also not certified in this configuration. See Document
341507.1
HugePages Reservation

The HugePages reservation feature is fully implemented in 2.6.17 kernel, and


thus EL5 (based on 2.6.18) has this feature. The alloc_huge_page() is
improved for this. (See kernel source mm/hugetlb.c)
From /usr/share/doc/kernel-doc-2.6.18/Documentation/vm/hugetlbpage.txt:
HugePages_Rsvd is short for "reserved," and is the number of hugepages for
which a commitment to allocate from the pool has been made, but no
allocation has yet been made. It's vaguely analogous to overcommit.
This feature in the Linux kernel enables the Oracle Database to be able to
allocate hugepages for the sublevels of the SGA on-demand. The same
behaviour is expected for various Oracle Database versions that are certified
on EL5.
HugePages and Oracle 11g Automatic Memory Management (AMM)

The AMM and HugePages are not compatible. One needs to disable AMM on
11g to be able to use HugePages. See Document 749851.1 for further
information.
What if Not Enough HugePages Configured?

Configuring your Linux OS for HugePages is a delicate process where if you


do not configure properly, the system may experience serious problems. If
you do not have enough HugePages configured you may encounter:

HugePages not used (HugePages_Total = HugePages_Free) at all


wasting the amount configured for

Poor database performance

System running out of memory or excessive swapping

Some or any database instance cannot be started

Crucial system services failing (e.g.: CRS)

To avoid / help with such situations Bug 10153816 was filed to introduce a
database initialization parameter in 11.2.0.2 (use_large_pages) to help
manage which SGAs will use huge pages and potentially give warnings or not
start up at all if they cannot get those pages.
Parameters/Setup

The following configurations are general guidelines to configure HugePages


for more than one Oracle RDBMS instance. If you are looking for use of
HugePages on 32-bit Linux please refer below as appropriate:

Document 317055.1 How to Configure RHEL 3.0 32-bit for Very Large
Memory with ramfs and hugepages

Document 317067.1 How to Configure Asianux 1.0 32-bit for Very


Large Memory with ramfs and hugepages

Document 317141.1 How to Configure RHEL 4 32-bit for Very Large


Memory with ramfs and hugepages

Document 317139.1 How to Configure SuSE SLES 9 32-bit for Very


Large Memory with ramfs and hugepages

Document 361468.1 HugePages on 64-bit Linux

For all configurations be sure to have environment variable


DISABLE_HUGETLBFS is unset. If it is set (specifically to 1) it will disable the
use of HugePages by Oracle database.

The performed configuration is basically based on the RAM installed and


combined size of SGA of database instances you are running. Based on that
when:

Amount of RAM installed for the Linux OS changed

New database instance(s) introduced

SGA size / configuration changed for one or more database instances

you should revise your HugePages configuration to make it suitable to the


new memory framework. If not you may experience one or more problems
below on the system:

Poor database performance

System running out of memory or excessive swapping

Database instances cannot be started

Crucial system services failing

Kernel Version 2.4


The kernel parameter used for HugePages is vm.hugetlb_pool which is based
on MB of memory. RHEL3, Asianux 1.0, SLES8 (Service Pack 3 and over) are
examples of distributions with the 2.4 kernels with HugePages support. For
the configuration, follow steps below:
1.
2.
3.
4.

Start instance(s)
Calculate hugetlb_pool using script from Note 401749.1
Shutdown instances
Set kernel parameter:
# sysctl -w vm.hugetlb_pool=<value from above>
and make sure that the parameter is persistent to reboots. e.g. On Asianux
1.0 by editing /etc/sysctl.conf adding/modifying as below:
vm.hugetlb_pool=<value from above>
5. Check available hugepages:
$ grep Huge /proc/meminfo
6. Restart instances
7. Check available hugepages:

$ grep Huge /proc/meminfo


Notes:

If the setting of hugetlb_pool is not effective, you will need to reboot


the server to make HugePages allocation during system startup.

The HugePages are allocated in a lazy fashion, so the


"Hugepages_Free" count drops as the pages get touched and are
backed by physical memory. The idea is that it's more efficient in the
sense that you don't use memory you don't touch.

If you had set the instance initialization parameter


PRE_PAGE_SGA=TRUE (for suitable settings see Document 30793.1),
all of the pages would be allocated from HugePages up front.

Kernel Version 2.6


The kernel parameter used for HugePages is vm.nr_hugepages which is
based on the number of the pages. SLES9, RHEL4 and Asianux 2.0 are
examples of distributions with the 2.6 kernel. For the configuration, follow
steps below:
1.
2.
3.
4.

Start instance(s)
Calculate nr_hugepages using script from Document 401749.1
Shutdown instances
Set kernel parameter:
# sysctl -w vm.nr_hugepages=<value from above>
and make sure that the parameter is persistent to reboots. e.g. On SLES9:
# chkconfig boot.sysctl on
5. Check available hugepages:
$ grep Huge /proc/meminfo
6. Restart instances
7. Check available hugepages:
$ grep Huge /proc/meminfo
Notes:

If the setting of nr_hugepages is not effective, you will need to reboot


the server to make HugePages allocation during system startup.

The HugePages are allocated in a lazy fashion, so the


"Hugepages_Free" count drops as the pages get touched and are
backed by physical memory. The idea is that it's more efficient in the
sense that you don't use memory you don't touch.

If you had set the instance initialization parameter


PRE_PAGE_SGA=TRUE (for suitable settings see Document 30793.1),
all of the pages would be allocated from HugePages up front.

Notes on HugePages in General

The userspace application that employs HugePages should be aware of


permission implications. Permissions HugePages segments in memory
can strictly impose certain requirements. e.g. Per Bug 6620371 on
Linux x86-64 port of Oracle RDBMS until 11g was setting the shared
memory flags to hugetlb, read and write by default. But that shall
depend on the configuration environment and with Patch 6620371 on
10.2 and with 11g, the read and write permissions are set based on the
internal context.

HugePages on Oracle Linux 64-bit

APPLIES TO:
Oracle Database - Enterprise Edition - Version 9.2.0.1 and later
Linux OS - Version Enterprise Linux 4.0 to Oracle Linux 6.0 with Unbreakable
Enterprise Kernel [2.6.32] [Release RHEL4 to OL6]
Linux x86-64
Oracle Linux
Red Hat Enterprise Linux (RHEL)
SUSE Linux Enterprise Server (SLES)

PURPOSE
This document aims to provide.

Basic configuration of HugePages on 64-bit Linux

Fundemental reasons to use HugePages on Linux

References to known problems

References to technical background on HugePages

SCOPE
Information in this document is useful for Linux system administrators and
Oracle database administrators working with system administrators.
This document covers information about Linux HugePages for 64-bit
architectures. For more generic and uses on 32-bit and for references please
see Document 361323.1
The configuration steps provided here is primarily for Oracle Linux. Still the
same concepts and configurations should apply to other Linux distributions.
DETAILS
Introduction

HugePages is a feature of the Linux kernel which allows larger pages to


manage memory as the alternative to the small 4KB pagesize. For a detailed
introduction, see Document 361323.1
Why Do You Need HugePages?

HugePages is crucial for faster Oracle database performance on Linux if you


have a large RAM and SGA. If your combined database SGAs is large (like
more than 8GB, can even be important for smaller), you will need HugePages
configured. Note that the size of the SGA matters. Advantages of HugePages
are:

Larger Page Size and Less # of Pages: Default page size is 4K


whereas the HugeTLB size is 2048K. That means the system would
need to handle 512 times less pages.

Reduced Page Table Walking: Since a HugePage covers greater


contiguous virtual address range than a regular sized page, a
probability of getting a TLB hit per TLB entry with HugePages are

higher than with regular pages. This reduces the number of times page
tables are walked to obtain physical address from a virtual address.

Less Overhead for Memory Operations: On virtual memory


systems (any modern OS) each memory operation is actually two
abstract memory operations. With HugePages, since there are less
number of pages to work on, the possible bottleneck on page table
access is clearly avoided.

Less Memory Usage: From the Oracle Database perspective, with


HugePages, the Linux kernel will use less memory to create pagetables
to maintain virtual to physical mappings for SGA address range, in
comparison to regular size pages. This makes more memory to be
available for process-private computations or PGA usage.

No Swapping: We must avoid swapping to happen on Linux OS at


all Document 1295478.1. HugePages are not swappable (whereas
regular pages are). Therefore there is no page replacement mechanism
overhead. HugePages are universally regarded as pinned.

No 'kswapd' Operations: kswapd will get very busy if there is a very


large area to be paged (i.e. 13 million page table entries for 50GB
memory) and will use an incredible amount of CPU resource. When
HugePages are used, kswapd is not involved in managing them. See
alsoDocument 361670.1

There is a general misconception where the HugePages is a feature specific


to 32-bit Linux. HugePages is a generic feature available to all word-sizes
and architectures. Just that there are some specifics with 32-bit platforms.
Please see Document 361323.1 for further references.

How to Configure

The configuration steps below will guide you to do a persistent system


configuration where you would need to do a complete reboot of the system.
Please plan your operations accordingly:
Step 1: Have the memlock user limit set in /etc/security/limits.conf file. Set
the value (in KB) slightly smaller than installed RAM. e.g. If you have 64GB
RAM installed, you may set:
*
*

soft
hard

memlock
memlock

60397977
60397977

There is no harm in setting this value large than your SGA requirements.
The parameters will be set by default on:

Oracle Linux with oracle-validated package (See Document 437743.1)


installed.

Oracle Exadata DB compute nodes

Step 2: Re-logon to the Oracle product owner account (e.g. 'oracle') and
check the memlock limit
$ ulimit -l
60397977

Step 3: If you have Oracle Database 11g or later, the default database
created uses the Automatic Memory Management (AMM) feature which is
incompatible with HugePages. Disable AMM before proceeding. To disable,
set the initialization parameters MEMORY_TARGET and
MEMORY_MAX_TARGET to 0 (zero). Please see Document 749851.1 for
further information in case you encounter the error below:
ORA-00845: MEMORY_TARGET not supported on this system

Step 4: Make sure that all your database instances are up (including ASM
instances) as they would run on production. Use the
script hugepages_settings.sh in Document 401749.1 to calculate the
recommended value for the vm.nr_hugepages kernel parameter. e.g.:
$ ./hugepages_settings.sh
...
Recommended setting: vm.nr_hugepages = 1496
$

Note: You can also calculate a proper value for the parameter yourself but
that is not advised if you do not have extensive experience with HugePages
and concepts.
Step 5: Edit the file /etc/sysctl.conf and set the vm.nr_hugepages parameter
there:
...
vm.nr_hugepages = 1496
...

This will make the parameter to be set properly with each reboot.
Step 6: Stop all the database instances and reboot the server
(Although settings could have been done dynamically they would not be
effective to the extent we require before a reboot. The best practice is to do
a persistent system configuration and perform a reboot to complete the
configuration as presented through the steps above)
What If the Database / SGA Configurations Change?
The performed configuration is basically based on the RAM installed and
combined size of SGA of database instances you are running. Based on that
when:

Amount of RAM installed for the Linux OS changed

New database instance(s) introduced

SGA size / configuration changed for one or more database instances

you should revise your HugePages configuration to make it suitable to the


new memory framework. If not you may experience one or more problems
below on the system:

Poor database performance

System running out of memory or excessive swapping

Database instances cannot be started

Crucial system services failing

Check and Validate the Configuration

After the system is rebooted, make sure that your database instances
(including the ASM instances) are started. Automatic startup via OS
configuration or CRS, or manual startup (whichever method you use) should
have been performed. Check the HugePages state from /proc/meminfo. e.g.:

# grep HugePages /proc/meminfo


HugePages_Total:
1496
HugePages_Free:
485

HugePages_Rsvd:
HugePages_Surp:

446
0

The values in the output will vary. To make sure that the configuration is
valid, the HugePages_Freevalue should be smaller than HugePages_Total and
there should be some HugePages_Rsvd.HugePages_Rsvd counts free pages
that are reserved for use (requested for an SGA, but not touched/mapped
yet).
The sum of Hugepages_Free and HugePages_Rsvd may be smaller than your
total combined SGA as instances allocate pages dynamically and proactively
as needed.
Troubleshooting

Some of the common problems and how to troubleshoot them are listed in
the following table:

Symptom

Possible Cause

Troubleshooting
Action

System is running
out of memory or
swapping

Not enough HugePages to


cover the SGA(s) and
therefore the area
reserved for HugePages
are wasted where SGAs
are allocated through
regular pages.

Review your HugePages


configuration to make
sure that all SGA(s) are
covered.

Databases fail to
start

memlock limits are not set Make sure the settings


properly
in limits.conf apply to
database owner
account.

One of the
database fail to
start while another
is up

The SGA of the specific


database could not find
available HugePages and
remaining RAM is not
enough.

Cluster Ready
Services (CRS) fail
to start

HugePages configured too Make sure the total SGA


large (maybe larger than is less than the installed
installed RAM)
RAM and re-calculate
HugePages.

Make sure that the RAM


and HugePages are
enough to cover all your
database SGAs

HugePages_Total = HugePages are not used


HugePages_Free
at all. No database
instances are up or using
AMM.

Disable AMM and make


sure that the database
instances are up.
See Doc ID 1373255.1

Database started
successfully and
the performance is
slow

Make sure that the


HugePages are many
enough to cover all your
database SGAs

The SGA of the specific


database could not find
available HugePages and
therefore the SGA is
handled by regular pages,
which leads to slow
performance