The patent's inventors are Kapil, Deepika (
This patent was filed on
From the background information supplied by the inventors, news correspondents obtained the following quote: "FIG. 1 shows an embodiment of a processor 100. The processor 100 may be any one of a variety of processors such as a central processing unit (CPU) or a graphics processing unit (GPU). For instance, they may be x86 microprocessors that implement x86 64-bit instruction set architecture and are used in desktops, laptops, servers, and superscalar computers, or they may be Advanced RISC (Reduced Instruction Set Computer) Machines (ARM) processors that are used in mobile phones or digital media players. Other embodiments of the processors are contemplated, such as digital signal processors (DSP) that are particularly useful in the processing and implementation of algorithms related to digital signals, such as voice data and communication signals, and microcontrollers that are useful in consumer applications, such as printers and copy machines.
"The processor 100 operates by executing instructions on data values stored in memory. Examples of instructions that operate on data values are additions, subtractions, logical conjunctions (ANDs), logical disjunctions (ORs), and shifting and rotating binary numbers. Processor 100 may also be capable of performing other instructions, such as moving and copying data values from one memory location to another. Modern processors are capable of performing many millions of these instructions per second, the collection of which, for instance, causes a GPU to produce images for display on a computer screen or to enable the usage of a word processing program in a desktop computer.
"The processor 100 includes execution units 110 which are computational cores of the processor and are responsible for executing the instructions or commands issued to the processor 100. Execution units 110 operate on data values stored in a system memory and produce results and outcomes that may be written back to memory thereafter.
"Processor 100 is equipped with a load and store unit 120 that is coupled to the execution units 110, and is responsible for managing loading and storing data operated on by the execution units 110. The load and store unit 120 brings memory data to the execution units 110 to process and later store the results of these operations in memory. Processor 100 is also equipped with a Level 1 (L1) data cache 130 which stores data for access by the processor 100. L1 data cache 130 is advantageous because of the small amount of delay that a load and store unit 120 experiences in accessing its data.
"In most processors it is costly (in terms of silicon design) to store all the data the processor operates on in easily-accessible L1 caches. Processors usually have a hierarchy of memory storage locations. Small but fast storage locations are expensive to implement but offer fast memory access, while large but slower storage locations are cheaper to implement, but offer slower memory access. A processor has to wait to obtain data from these large storage locations and therefore its performance is slowed.
"FIG. 2 shows a memory hierarchy of a processor, such as processor 100. Registers represent the fastest memory to access, however, in some instances they may only provide 100 Bytes of register space. Hard drives are the slowest in term of memory access speed, but are both cheap to implement and offer very large storage space, e.g., 1 TeraByte (TB) or more. Level 1 (L1) through Level 3 (L3) caches range from several kilobytes (kBs) in size to 16 megabytes (MBs) or more, depending on the computer system.
"Data stored in memory is organized and indexed by memory addresses. For instance, addressing 4 kB of data requires 4*1024=4096 distinct memory addresses, where each memory address holds a Byte (eight bits or an octet) of data. Therefore, to completely reference the memory addresses of a 4 kB memory, a minimum of 12 bits are required. Processors also use a system of paging in addressing memory locations, where memory is sectioned in pages of memory addresses. For instance, a processor may use a 4 kB page system in sectioning memory and therefore may be able to point to a memory location within a page using 12 bits. On the other hand, a page may be comprised of 1 MegaByte (MB) of data in which case, 20 bits are required to point to each of the 1048576 (1024*1024) distinct addresses within the page.
"Further, many pages may be indexed in order to completely cover the memory locations that are accessible to the processor. For instance, if the processor memory hierarchy includes 256 GigaBytes (GB) of data and a 4 kB paging system is used, then the memory system comprises 256*1024*256 which is 67108864 pages. Therefore, 8+10+8=26 bits are further required to identify each of the 67108864 pages in the memory system. FIG. 3 graphically illustrates this example, where a 38-bit memory address comprises a 26-bit page address and a 12-bit Byte index within the page. This memory address of FIG. 3 is hereinafter referred to as a physical address (PA), to be distinguished from a linear address (LA) or a virtual address (VA). As will be described herein, a PA format is an external format, whereas a LA format is an internal processor address format.
"It is desirable to have a method and an apparatus that efficiently translates LAs to PAs. It is also desirable to have a memory address translation device, such as a Translation Look-aside Buffer (TLB), that translates LAs to PAs in a power-efficient way."
Supplementing the background information on this patent, VerticalNews reporters also obtained the inventors' summary information for this patent: "Embodiments of a method and apparatus for reducing power consumption in a memory address translation device, such as a Translation Look-aside Buffer (TLB) are provided. In a method and apparatus, reading a physical address (PA) corresponding to a received linear address (LA) is suppressed if a previously translated LA is the same as the received LA. Additionally, the PA corresponding to the previously translated LA is maintained as an output if the previously translated LA is the same as the received LA.
"In some embodiments, the received LA is compared with a previously translated LA by passing the previously translated LA through a flip-flop and equating the previously translated LA with the received LA to determine if they are the same. In other embodiments, a Static Random Access Memory (SRAM) holds PA translations and PA address translation is an output of the
For the URL and additional information on this patent, see: Kapil, Deepika; McIntyre,
Keywords for this news article include:
Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC
Most Popular Stories
- Doctor Who Christmas Episode Begins Production
- HCL America Adding 1,200 IT Jobs
- Medical Mfg. Jobs Coming to Dayton
- Michael Jackson, Freddie Mercury on Previously Unreleased Queen Cut
- Longtime Unemployed to Get Help in Las Vegas
- SpaceX Aims for Predawn Launch on Saturday
- Women Key to Democratic Party: Clinton
- U.S. Chamber Caught Up in Tax Inversion Question
- Feds Won't Say How Many Border Crossers Jailed
- Christie Didn't Order Bridge Shut Down, Feds Say