Computer Technology

UNIVERSITY OF SURREY DEPARTMENT OF COMPUTING
COMPUTER TECHNOLOGY COURSEWORK
THEODOROS PITIKARIS
1.1 SCENARIO We are going to work based on the following scenario: Our company is planning to upgrade the following PCs: Companys web & mail server Companys File/Active directory server 10 Graphics Creation workstations 4 stations for every day office work
Our company has a private network and all the PCs, except the Web/Mail server, belongs to the demilitarise zone, in order to achieve the maximum security for our intranet, this stations can access only the e-mail and Web using proxy server with antivirus software for content checking. The web & mail server has a routable IP and as a consequence can be directly accessed by an internet using, the operating system of this server is Redhat Linux AS 3, the web server is Apache version 2.20 , the MTA is the sendmail version 8.12.1 and As Imap/Pop3 server is used the UW IMAP server. Because of the services that are provided fast I/O is needed, but also due to our network configuration security is a great importances issue. Currently is a Xeon 1.8 GHz (400 MHz FSB) system with RAMBUS 512 MB Ram and SATA-I raid controller (level 1). We assume 55.000 hits min. Our file server is equipped with 1 SCSI raid 5, 2 GB ram , a P4 processor and windows 2000 server. This server provides storage/authentication services for both DTP department and other departments (accounting, reception e.t.c). Two Gigabit cards connect the server with the rest of the network (full switched network). Graphics workstations are used for high resolution graphics and because of that they deal with large amount of data so a very fast I/O but also a big cache are required, the operating system is windows XP SP1. The company had provided special training to the DTPs personal on how to interact with XP environment. Currently this system has a P4 at 2,4 GHZ (400 FSB) and 1 GHz (2x512MB) of DDR-266 RAM. The 4 office pcs that we consider to upgrade are low cost pcs that are used for the every day office work (word processing, excel spreadsheets, e.t.c). Currently these stations are AMD Duron 1 GHz system with SDRAM 512 MB Ram 133 MHZ and ULTRA 4 HDD. After the upgrade we plan to keep the Linux Os for the internet services, upgrade the file server to windows 2003 because windows 2000 is an end of life product and finally windows XP will continue to exist as our desktop OS.
1.2 Overview of 32-bit Pentium instruction set architecture. Pentium4 use a mix of the traditional CISC x86 architectures, and some RISC like features (computer architecture pg.12) (Vafiades, 2004) Four bank of memory are connected with the 32bit data bus in 32 bit x86 processors. Because the access is in 32 bit the processor can access any byte in memory in one operation. Also depending on memory arrangement may a word also be accessible in The 80x86 technologies use the Little Endean model to arrange the data in memory (roger).he x86 processors allow us to access the memory in multiple ways the most common one is using the flat model. Flat model make the executing programme to believe that all the memory is a united space (Linear Address Space) as we read from Intel Pentium 4 Basic Architecture: Code (a programs instructions), data, and the procedure stack are all contained in this address space. The linear address space is byte addressable, with addresses running contiguously from 0 to 232 1. An address for any byte in the linear address space is called a linear address(Corporation, 2004). Another way to access the memory is indirectly. The term indirect means that the operand is not the actual address, but rather, the operand's value specifies the memory address to use. Finally Pentium processor uses the segment register addressing. Intels family processor has four segment register (code,data,stack, extra segment registers) of 16 bits length each. In every programme the code register provides the start address of the segment that contains the command, the data register contains the start address of the segment that contains the variables and the stack register contains the start address of the segment that is used as stack. Finally the extra register is used as balladeer. The selection of the segment register is be done by the CPU, furthermore when we have to use up to 1 MB of ram the segment register that is used is 16 bit and not the default 20 bits segment register. The formula that calculates th e effective memory space is the following: Effective address = segment-register x 24 + displacement. (the 2^4 is used to add the final 0 that we need in hexadecimal format)
In the above table we can see all the methods that a 32-bit Pentium processor uses to calculate the final memory address(Vafiades, 2004). Method Immediate Register PC-relative Displacement Base Base-displacement Index-displacement Base-index-displacement Formula FMA=operant FMA=Register FMA= PC+displacement FMA= Segment + displacement FMA=Segment + Base FMA=Segment + index + displacement FMA=Segment + Index + displacement FMA=Segment + Index+Base+dispalcement Table 1.2-1 Formulas of Final Memory Address (available from Computer Organisation & Architecture pg.14) These kinds of memory access are used by the kernels in the various operating systems to ensure OS stability and allow secure multitasking and multi-user environments and enable the VM technology. In general we meet three operating modes in the various 80x86 CPUs:
Real-address mode. This mode lets the processor to address "real" memory address. It can address up to 1Mbytes of memory like the original 8088 and 8086. It can also be called "unprotected" mode since operating system (such as DOS) code runs in the same mode as the user applications. We use this mode to keep the backwards compatibility. Protected mode. This is the preferred mode for a modern operating system. It allows applications to use virtual memory addressing and supports multiple programming environment and protections. System management mode. This mode is designed for fast state snapshot and resumption. It is useful for power management(Intel, 2004).
The fundamental data types in I-32 architectures are (Intel, 2004): Bytes (8 bit) Words (2xBytes)16bits Doubleword (2 x words) 32bits Quadword (4xwords)64bits [introduced with i486] Double Quadword (2xQuadwords)128 bits [Has been introduced via SEE] Word, doubleword and quardword is not obligatory to be aligned, but for performance reasons y get aligned, when alignment is possible(so only one memory access is needed in order the CPU to get access on the data) the. In Intels possessors we find the following arithmetic types: Integer o Signed o Unsigned Floating point Binary Code Decimal To present the integers the 2s compliment is used, the length is varies 8,16 and 32 bits are available but also an 128 bit instruction is available via new SSE2 set. the set of Real Number (R), which is expressed by the floating point the dominate way is the implementation of IEEE 754 with 32,64 and 128 bits(Vafiades, 2004). For Logical Data representation memory locations that have length of 8,16, or 32 bits that contains 0 and 1 and through them the logical operations like AND,OR,XOR,NOT e.t.c are implemented. The new Pentium 4 has introduce 6 new data types classes: a 128-bit packed doubleprecision floating point, a 64-bit quadword integer, and four 128-bit integer data types. Other data types that lives in ISA IA-32 are : Pointer Data Types that are address of memory locations. We have two type of them the near pointer (32 bits) which also is called as the effective address and the far pointer (48 bit) that is used for reference in a segmented memory model. Bit Field Data Type String Data Type 64-bits SIMD Packet data types, which have been introduced with MMX technology, and are aliased to the FPU instructions. 128-Bit Packed SIMD Data Types introduced by SSE2 and also usable from SSE3 set.
The IA32 instruction set is divided to the following categories:
General purpose x87 FPU x87 FPU and SIMD state Intel MMX technology SSE extensions SSE2 extensions SSE3 extensions System instructions
Using the cat /proc/CPU in a Linux box that runs on a Pentium 4 processor we can get the total of flags that are supported by this CPU-chip. The result is the following: Flags:fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm It is worthy to discuss in brief the MMX and SSE sets, the MMX is 64-bits set is used for Multimedia operations and reduces the need for hardware DSP on motherboard. The backward is that overlap FPU registers ST(0) to ST(7) some times that cause conjunctions(Mahoney, 2004).SSE,SSE2,SSE3 are extensions over the MMX initial idea to embedded special commands for multimedia purposes and offer 64 bit (SSE) and 128 bit (SSE2,SSE3) SIMD . For an extensive presentation of Instructions please referee to Appendix I. The stack in P4 can extend up to 4 GB. The flat memory model allow stack to be placed anywhere in main memory. The common POP and PUSH instructions are used to place add retrieve data from stack. When multiple stack exists, only the current stack is available. Stack manipulation is be done using CALL and RET instructions or ENTER and LEAVE In conjunction with CALL and RET instructions. The stack pointer should be aligned on a word or a double word boundary(Intel, 2004). The parameter passing between procedures can take place via: General purpose register An argument list Stack. A very important aspect of IA-32 architecture is the call to other privileged process. The IA-32 Architecture has 4 levels that characterize the protection level where each process is assigned(Intel-XD, 2004). For example kernel processes belongs to level 0, no other process can use the kernel space or give instruction to kernel space. An application belongs to level 4 that means that the kernel can manipulate in the way that he believes that is better every aspect of application and even to cause the termination of Applications process.
Operating System Services
Applications The outer in the circle a process is The less privileged this process is 0 1 Kernel 2 3
1.2-1 Protect Rings (based on Intel programmers reference volume I pg. 144) The protection rings aims to provide more stability to operating system, and more effective multiprocessing and multi-user environments(Vafiades, 2004). The final subject that we are going to see in this overview is the way the IA-32 CPUs Handle the interrupts and the exceptions. When an interrupt occurs the CPU pause the current process and is looking to handle the interrupt, by go on with the I/O that the device that cause the interrupt is ready to perform (roger). 17 Interrupts lives in IA-32 architecture, some of are assigned to the CPU itself and internal peripheral devices and some others are free. Also 227 exceptions are predefined in IA-32 Architecture. Pentium 4 has three classes of exceptions: faults, traps, and aborts(Intel, 2004).
2. AMD64, IA64, EM64T (iAMD64) Nowdays there are multiple solutions in 64-bit computing, the major players in this area are: SPARC III and IIIe INTEL IA64 (ITANIUM II Currently) INTEL XEON with EM64T (Nocona) INTEL PENTIUM with EM64T AMD Opteron AMD ALTHON64 Mips PowerPC (IBM,MOTOROLA,APPLE alliance) But on this essay we are going to examine only INTEL and AMD solutions. Since EM64T & AMD64 are just an extension of IA-32 the processor still use CISC architecture in the hydrid mode where some parts are RISC and some other options are CISC. 2.1 INTEL Intel offer two platforms for 64 bit computing the IA64 and the PENTIUM4 /XEON (Xeon is based on Pentium 4 core) series with EM64T extension. 2.1.1 ITANIUM IA64 newest member is the Itanium II processor. Is a pure 64 bit processor that can support up to (2^64)-1 bytes of RAM(Vafiades, 2004). One of the main innovations in Itanium II processor is the out-of-order design, using this technique Itanium II. This technique forces the compiler to collect all the instructions that can be run in parallel into one large instruction (very long instruction word- VLIW). The performance impact is remarkable, because the compiler has much more time to find and collect the appropriate instruction than CPU, so a more effective parallel is possible(Hwu, 2002). 128 register of 82-bit floating point and 64-bit integer registers. Furthermore a register rotation mechanism has been implemented, so Itanium processors can accommodate new function using a set or rotating registers. The IA64 has been introduced by HP and Intel as the post-RISC era processor, but not easy for Intel to forget the old CISC architecture, this has as result the IA 64 to use a CISC-like complement of instructions combined with explicit instructions for multimedia operations and floating point operations (the same logic used for MMX and SSE, SSE2, SSE3).The architecture of IA64 allows a balanced execution for parallel (caused by VLIW) and serial processes.
Another very important feature is the EFI unit, which allows loading platform specific instruction into processor (for instance if we are going to run HP-UX specific instruction for the OS will be loaded in the EFI boot ROM). The execution of 32-bit applications if possible via switching to 32-bit mode using special register, but that has some penalties that have as result the low performance of 32-bit application on IA64 platform. The architecture of IA64 allows us by using the appropriate motherboard chipset to built system that are based on multiple processors. 2.1.2EM64T The EM64T is an extension to IA-32 bits, that enable processors to handle and execute 64bit applications. The EM64T is the answer of Intel to AMD64 platform and has as target group the low and medium part of the Market. EM64T is not compatible with IA64 but is almost 100% compatible with AMD64, And is something like common secret, that EM64T is a reverse engineering of AMD64 thats the reason that a lot of resources refers to EM64T as iAMD64. EM64T gives Intel processors permit Intel CPUs to handle more than 4 Gigabytes (32bits limit) and allows larger applications to be run. Additional registers has bit added while the eXecute Disable (XD) function is enable (E0 revision). The XD is equivalent to NX of AMD64 and allows the prevention of overflows that is the main security issue for unauthorized code execution. This extension is embedded to the New Xeon (Nocona) and Pentium4 processors that will be based on new Prescott core. There are two modes in EM64T CPUs the Compatibility mode: that allows old 16,32 bit applications to run on the CPU without the need to recompile them in 64 bit. Maximum memory that can be accessed in this mode is the 64GB and the original instruction set of IA32 is only available, but the virtual 8086 mode is not anymore available. The normal operation mode is the 64-bit mode where the CPU functions as a normal 64-bit CPU capable to run 64 bit applications(Dell, 2004).
It is noticeable that only Xeon can be used on system with more than one CPU. 2.2 AMD AMD has two main platforms that support 64-bit computing the opteron and the ATHLON 64.
2.2.1 OPTERON This processor can execute both 64-bit and 32-bit applications. Because of his 64-bit capabilities can handle (2^64)-1 bytes of memory. The DDR controller is built into the processor chip to avoid the Northbridge chipset any latency caused while memory is accessed. The processor has 3 arithmetic logic units (ALU), 3 address generation units (AGU) and3 floating point units (FPU) for arithmetic with floating point numbers (ALUs process only integers). The controller to access the memory is build in to reduce the delay, when a request for memory access is sent Multi-processor environment is not only possible but also very effective, because each Opteron has his own memory controller and bank to use while Intel products have to share the same available bandwidth (Wikipedia, 2004)). Each processor in a multiprocessor platform communicates with the others using Hyper Transport that permits one processor to access other processors memory transparently. For information about 64-bit mode please referee to chapter 2.2.2 2.2.2 ATHLON 64 The Athlon 64 FX is a product for Personal computer systems(one CPU). The processor implements the x86 and AMD64 instruction sets. It features 1024 KB (1MB) of Level 2 cache, a 128-bit memory controller unlike the 64-bit controller of previous versions, and operates at frequencies from 2.2GHz. It, like the other processors of the K8 series, features a HyperTransport bus (800MHz and later 1000MHz). All processors based on the AMD64 architecture feature either a single or dual channel memory controller "on-die" - a feature not seen before in mainstream CPUs. This feature is useful as it vastly reduces the latency between the CPU and main memory there is no longer a "northbridge" having to negotiate access, and the on-die controller runs at the same clock rate as the CPU itself(AMD, 2002). Due to the larger width for address space, the AMD64 architecture can address up to 256 terabytes of memory. MMX,SSE,SSE2 and 3D-NOW sets are available in AMD64 platform. Furthermore the new chipset for AMD64 platform supports up to 6 HyperTransport, that is fair enough even for up 6 of the new PCI-X devices. The flags that are active (using again cat /proc/CPUinfo in Linux box running on ATHLON 64 2800+) are: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow.
Apart
from the standard modes (Real, virtual, protected) a new one has been added the Long mode. Long mode introduces two sub modes: compatibility and true 64bit. When we use a 32bit OS it makes no sense to use the 64bit mode. But when we move to the 64bit one, we can use either the old 32bit software (then you choose the compatibility mode) or new 64bit one(AMD, 2002). In LMA mode we take advance of :

Virtual 64-bit addressing 64-bit instruction pointer Flat-addressing mode
To make possible to use advantages of the x86-64 architecture, 8 additional generalpurpose registers, 8 additional SSE2 registers and other software tasty things it's necessary to have a compiler (http://www.digit-life.com/articles2/amd-hammerfamily/). The most basic units of organization for the instructions are specified the following way (AMD manual again pg. 38-39): 1. General Purpose Instructions: The basic integer instructions, which are used nearly everywhere. Also often referred to as the x86 instruction set and easily illustrated by examples like addition of integers, moving, load, store, shifts and so on. 2. 128-Bit Media Instructions: Named due to their primary application, these instructions operate on vectors of large data packages (e.g. video, scientific applications, games, etc.). Moreover, they operate in parallel. That means they are able to access multiple data sets at once. Obviously, these instructions are designed for speed in one special field of applications and therefore are not able to perform any task. 3. 64-bit Media Instructions: Also SIMD instructions and not much different in use compared to the 128-bit instructions. 4. x87 Floating Point Instructions: As GPIs only work for integers, these instructions are designed to have a suitable tool for floating point operations. The memory is a single flat address space starting at the address 0 and is distributed linearly over 64-bits. The operating system can specify several levels of data access/protection for the address space(AMD, 2002). Also power management flags are available for AMD64 CPUs 3. Opportunities for performance gains, and application areas that are likely to benefit. It is obvious that users that need more memory will be the first that will take advance of the new 64-bit processors since more memory is accessible (2^64). In this group of users belong the CAD or graphics designers and movie rendering machines. Of course
the fact the new VGA cards have built-in a noticeable amount of memory, reduces the importance of how speed is the access of CPU to Main memory as concerning the graphics but still some data has to been exchanged between the CPU and the GPU. In the same group belongs more advanced applications like scientific runs (for instance applications like materiel studio, of fortan programmes for simulating biological o material structures), also, some algorithms in the 64bit representation have a much simpler form, but also we achieve bigger accuracy. Because memory operations is more effective in AMD64 platform because that the memory controller is built-in into chipset but also the front bus runs at 800-1000Mhz we can expect that these applications will run more comfortable on AMD64 platform. Databases are the second field where we can expect enormous performance improvement. Today is possible to access up to 64GB of Ram using a Xeon processor but the a transition to the flat memory model in the 64bit space is much more advantageous in terms of speed and ease of programming(?????), thats the reason that the most popular ERP programmes that uses extensively databases like SAPs ERP suggests 64-bit machine (SAP suggest SUN SPARC but for non-demanding environments can run on Windows2000 and 80x86 processors). Cryptography applications get a t benefit from 64bit integer calculations. In this sphere usage of the x86-64 can favour a real breakthrough. Games also are likely to take advances of the new architecture because of faster code execution, but is more likely other futures that comes with the new CPUS(PCI-X vga cards, hypertronsport, 2 MB of cache in Pentium 4 extreme edition and so on) to have greater impact than the CPUs itself. As concerning the office application, we have to be realistic if we have to deal with just the need of secretary to run a word and write some texts a simple Pentium-II is fair enough. On the other hand if we are going to use excel for complicated calculations etc definitely the new processors is going to provide a better and faster environment. It very likely because Intel Pentium 4 has bigger level 1 cache, and is well known that office applications appreciate a lot the cache, this applications to perform better on Intel platform. Finally antivirus programs and network servers, can take advantages from the new overflow protecting feature of AMD64 and EM64T. But still this is a feature that may rise concerning about compatibility issues and thus must be used very carefully. 4. Issues of software and operating system compatibility. As concerning the AMD64 and EM64T, they can execute all the existent 32-bit code plus the code that is written for this platform. Also because this two architectures are very similar possible a large portion of code that is writer for AMD64 can run on iAMD64 and vice versa(Stefanakis, 2004).
All major Linux vendors (SUSE,REDHAT,DEBIAN) have already Linux editions for AMD64 and iAMD64 and IA64. Windows 2003 advanced server are fully compatible with single or multi-processor IA64 and the 32-bit edition is available for AMD64, iAMD64. We have to mention here that Microsoft will not support Itanium II in her Windows 2003 Cluster Edition(probably as turnabout to Intels support to Open Source community). Windows XP 64-bit is still in beta phase; primarily they have been developed for AMD64 but are likely EM64T to be supported. Compilers are also available for 64-bit applications, the current versions of GNU C 3.4.3 supports both IA64 and EM64T/AMD64 code compilation while Intel provides its own compilers. A lot of other commercial compilers exist for these 64-bit machines. But still there is a lack of a developers suite like Microsoft Studio. As we have mentioned before IA64 has very poor performance when executes IA32 software. Also IA64 applications cannot be run on EM64T CPUS. Also the NX/XD bit, can cause abnormal behaviour to some software.
5. Conclusions Our suggestion for our company will be : For our web server and mail server we are going to use the Itanium II 1.6Mhz solution with 4 Mb of Layer 3 Cache, because of his large cache we believe that operations like serving web content will gain extra performance. Also security is enhanced by the overflow protection feature that is built-in in the CPU. Furthermore the MP capabilities of Itanium II give as flexibility in case that we need to add more CPUS. For our fileserver while the existence of ULTRA-160 64-bit SCSI controller reduces the need for CPU power, the network I/O demand a fast bus and memory operations and for that reason the AMD64 and more specifically the Opteron processor will be our favour. We grant Opteron and not ATHLON 64 because opteron has MP capability which permits CPU scalability. For our graphics stations the need for keeping windows XP and the fact that no major graphics packets are currently available in 64-bit make us to go on with new AMD64 (ATHLON 64) processor, so we will be ready when the appropriate software is available, using a cost effective solution and gain from the 2-fpu units that lives inside the AMD chip. Finally for our office the need for statistic operations and excel use, give us a hint to go on with XEON with EM64T technology, so we can gain extra performance from the facts that larger memory portion is available and the bigger than the Athlons cache.
Bibliography
AMD (2002) AMD x86-64 Programmer's Manual. AMD. Available from[http://www.amd.com/usen/assets/content_type/white_papers_and_tech_docs/24592.pdf ] Last Accessed on: 5/12/2004 DELL (2004) BLAST on Intel EM64T Architecture. Dell co. Available from [http://www.dell.com/downloads/global/power/ps4q04-20040144-Kochhar.pdf] Last Accessed on : 12/12/2004 HWU, W. W. A. P., S. J. (2002) ECE 511: Computer Architecture. University of Illinois. Available from [http://www.cs.fit.edu/~mmahoney/cse3101/mmx.html] Last Accessed on : 20/11/2004 INTEL-XD (2004) Execute Disable Bit Functionality Blocks Malware Code Execution. Intels' Web Site, Intel Corporation. Available from[ftp://download.intel.com/design/Pentium4/manuals/25366514.pdf] Last accessed on: 23/11/2004 INTEL (2004) IA-32 Intel Architecture Software Developer's Manual Volume 1: Basic Architecture. [ ftp://download.intel.com/design/Pentium4/manuals/25366514.pdf] Last accessed on: 12/12/2004 MAHONEY, M. (2004) CSE 3101 Lecture Notes. Florida University. Available from [http://www.cs.fit.edu/~mmahoney/cse3101/mmx.html] Last Accessed on 8/12/2004 STEFANAKIS, D and PITIKARIS, T. (2004) Cluster Server for Materials Science Department, Unversity of Crete. Heraklion, University of Crete. VAFIADES, A. (2004) Computers' Organisation and Architecture. Technological Educationa Institute of Thessaloniki. Available from [ http://aetos.it.teithe.gr/~vaf/] Last accessed 14/12/2004. WIKIPEDIA (2004) Opteron. Wikipedia.Available from [http://en.wikipedia.org/wiki/Opteron] Last accessed on 13/12/2004

Computer Technology

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Computer Technology

Hochgeladen von

Copyright:

Verfügbare Formate

UNIVERSITY OF SURREY DEPARTMENT OF COMPUTING

COMPUTER TECHNOLOGY COURSEWORK

The IA32 instruction set is divided to the following categories:

Operating System Services

Virtual 64-bit addressing 64-bit instruction pointer Flat-addressing mode

Das könnte Ihnen auch gefallen