Nvidia GPU – the new hacker tool

Nvidia is justifiably proud of how their GPU can accelerate scientific applications. People have been able to speed up everything from ray tracing to computational chemistry by 10x to 50x. Unfortunately, tools for good can also be used for evil. In this case, accelerating brute-force password cracking, for Wifi network encryption: Turbo-charged wireless hacks threaten networks.

Russian firm ElcomSoft has applied GPU acceleration technology to its password recovery tool to allow PCs or servers running supported NVIDIA video cards to break Wi-Fi encryption up to 100 times faster than is possible by using conventional microprocessors.

Customizing ARM for Iphone

New iPhone Chip Will Cost an ARM and a Missile

Wei-han Lien, the senior manager of Apple’s chip team, [says on LinkedIn] he’s busy at work crafting an ARM processor for the next-generation iPhone.

PA Semi had assembled an all-star cast of chip engineers, including Lien, and Apple confirmed that it bought the company for that talent. In a June interview with The Times’ John Markoff, Apple chief executive Steve Jobs went one step further, saying the PA Semi team would work on designing brand-new processors for future iPhones and iPods. The only question was which kind of processors. […]

By developing its own ARM variant, Apple could create a processor that meets the specific needs of the iPhone and iPod, building support for functions such as the touch screen or scroll wheel into silicon and possibly savings on costs by reducing the number of processors needed in each device. […]

Petaflops from IBM and Sun

New IBM supercomputer achieves petaflop

IBM has devised a new Blue Gene supercomputer–the Blue Gene/P–that will be capable of processing more than 3 quadrillion operations a second, or 3 petaflops, a possible record. Blue Gene/P is designed to continuously operate at more than 1 petaflop in real-world situations.

Blue Gene/P marks a significant milestone in computing. Last November, the Blue Gene/L was ranked as the most powerful computer on the planet: it topped out at 280 teraflops, or 280 trillion operations a second during continuous operation.

The chip inside Blue Gene/P consists of four PowerPC 450 cores running at 850MHz each. A 2×2 foot circuit board containing 32 of the Blue Gene/P chips can churn out 435 billion operations a second. Thirty two of these boards can be stuffed into a 6-foot-high rack.

The chip inside the Blue Gene/L contained two PowerPC cores running at 700MHz.

The 1-petaflop Blue Gene/P comes with 294,912 processors and takes up 72 racks in all. Hitting 3 petaflops takes an 884,736-processor, 216-rack cluster, according to IBM. The chips and other components are linked together in a high-speed optical network.

Sun seeking supercomputing glory

The TACC system will provide a peak performance of around 500 teraflops, or 500 trillion operations a second. A fully built-out Constellation system, with contemporary components, could hit a peak of 2 petaflops, or 2 quadrillion operations per second.

The linchpin in the system is the switch, the piece of hardware that conducts traffic between the servers, memory and data storage. Code-named Magnum, the switch comes with 3,456 ports, a larger-than-normal number that frees up data pathways inside these powerful computers.

“We are looking at a factor-of-three improvement over the current best system at an equal number of nodes,” said Andy Bechtolsheim, chief architect and senior vice president of the systems group at Sun. “We have been somewhat absent in the supercomputer market in the last few years.”

Graphene dreams

Researchers have developed a transistor from graphene membrane, a “new class of carbon allotrope”. And no, I don’t know what that means either. But it sure sounds promising:

Researchers at the University of Manchester, working with a group at the Max Planck Institute in Germany, claim to have created transistors that are just one atom-thick and less than 50 atoms wide from a new class of material.

The substance, dubbed graphene, is described as a two-dimensional material that exhibits exceptionally high crystal and electronic quality, and the researchers claim has numerous potential applications in condensed matter physics and electronics.

The resulting transistor is way smaller than silicon transistors, requires less charge to control, and could be much faster. There’s only one small hitch.

They caution there is still some way to go to create a working chip from graphene single-electron transistors, with etching being a particular area for future work.

Professor Geim indicated graphene based circuits would not come of age before 2025 and till then silicon based devices would predominate.

Playing games with wave functions

On Tuesday, a Canadian company chose an unusual setting to announce a new computer system – the Computer History Museum in Mountain View. (Home to thousands of unsuccessful machines made by now defunct companies). There D-Wave Systems demonstrated their 16 qubit quantum computing device, remotely located in Burnaby, BC. You can’t do much with 16 qubits, other than solving Sudoku puzzles. But D-Wave claims they can scale up their device to 1000 qubits or more.

In most prototype quantum computing systems, researchers hit atoms with lasers or use other means to excite particles into fuzzy quantum states. But in a technique called adiabatic quantum computing, researchers cool metal circuits into a superconducting state in which electrons flow freely, resulting in qubits. They then slowly vary a magnetic field, which lets the qubits gradually adjust to each other, sort of like people huddling in the cold. In 2005 German researchers built a three-qubit adiabatic quantum computer.

D-Wave announced that it has constructed a 16-bit version crafted from the superconducting element niobium. “What we’ve built is really a systems-level proof of concept,” says Geordie Rose, D-Wave’s co-founder and chief technology officer. “We want to get people’s imagination stimulated.”

Well, that and raise a bunch of venture funding too, of course.

D-Wave quantum computer - Scientific American

Apparently, D-Wave is specifically targeting NP-complete problems. My buddy Steve Leibson has some more technical details:

Briefly, D-Wave’s Orion solves such problems by holding all possible solutions in a superposed state in a 16-qubit register, arranged in a ring on the 5×5 mm chip. A qubit is a quantum storage element that can hold a 0 or 1 (like a digital bit) and an infinite number of intermediate states, all in simultaneous superposition. The qubit’s operation depends on the physics of quantum mechanics and, consequently, Orion operates at 4 mK (that’s 4 thousandths of a Kelvin above absolute zero).

Orion accepts queries phrased in the common and familiar SQL (structured query language). […] D-Wave’s Orion determines the answer to such problems by creating “graphs” of problem solutions, superimposing all such graphs onto Orion’s 16-qubit storage register, and then searching all answers in parallel to find the solution with the lowest energy, which is the right answer based on the graph constructions.

It’s an impressive technical achievement, and I wish them good luck scaling up their machine. But to me this just seems like an expensive version of the old Spaghetti Computer.

Streaming video

Our friends at Stream Processors just announced their new DSP architecture at ISSCC this week. Good luck to them. (SPI was started by my old boss, Stanford Professor Bill Dally).

They claim their processor can perform an order of magnitude better than conventional DSPs.  According to SPI, one of their chips can encode H.264 video at 1080p in real-time.

SPI has been sampling the initial Storm-1 product family members since late 2006, and executives say the startup has been engaged with a customer for nearly a year. The devices target such high-performance signal-processing applications as video and image processing. They are the eight-lane SP8-G80, which can execute 80 giga-operations per second, and the 16-lane SP16-G160, which can execute 160 Gops, SPI said. The devices are said to deliver a more than tenfold cost/performance advantage over conventional DSPs.

The executives said the SPI architecture does not include hardware caches, which can dominate the silicon area of traditional DSPs. Instead, an SPI device relies on lane register files to store I/O streams for each of its multiple lanes. Maintaining data locality enables the architecture to maximize both efficiency and bandwidth.

SPI SP16-G160

Intel Teraflop chip

Intel Prototype May Herald a New Age of Processing

The Teraflop chip, which consumes just 62 watts at teraflop speeds and which is air-cooled, contains an internal data packet router in each processor tile. It is able to move data among tiles in as little as 1.25 nanoseconds, making it possible to transfer 80 billion bytes a second among the internal cores.

Intel’s teraflops chip uses mesh architecture to emulate mainframe

Intel fabricated its 80-core Teraflop Research Chip in its Ireland manufacturing facility, using a state-of-the-art 65-nanometer process. Each core houses two single-cycle floating-point units, which were first described in another ISSCC paper presented two years ago. The 80 cores are arranged in a 10 x 8 two-dimensional mesh network, with each core housing a router with five I/Os–four of its paths going to adjacent processors and one going out vertically to an SRAM chip stacked 3-D style above them.

“Each of our cores measured 3 mm2, including its two independent 32-bit floating-point processors with single-cycle instruction execution,” said Jerry Bautista, director in Intel’s Tera-Scale research program. “A separate 2 Mbytes of SRAM for each core will be mounted on a second chip vertically above the Teraflop Research Chip, with one of the ports in the five-port router communicating with it vertically.”

A cent per MIPS?

In The Rise of “Freeconomics”, Chris Anderson at The Long Tail claims that we have recently passed a milestone: 20,000 MIPS of processing power for $200, or a penny-per-MIPS. He goes on to argue that when technology becomes cheap enough to be effectively free, it radically changes how we use those resources.
Of course, the cheapest computer still cost real money. And even though that 20 GIPS claim is a bit suspect, it’s clear that processing power has increased dramatically in the past few decades. Alec Saunders tries to give some context in When MIPS are free:

  • In 1977, Digital Equipment’s Vax 11/780 was a 1 MIPS minicomputer, and the Cray-1 supercomputer delivered blindingly fast execution at 150 MIPS.
  • By 1982, 5 years later, a 6 Mhz 286 had about the same equivalent processing power as the Vax.
  • Sometime in the mid 1990’s, Cray’s benchmark was finally passed on PowerPC processors, as PowerMac’s emerged benchmarked at 150 to 300 MIPS.
  • A 1999 era Pentium III/500 delivered 800 MIPS of processing power.
  • A year later, in 2000, the Playstation 2 pumped out an astounding 6000 MIPS.
  • My 2002 vintage Athlon XP clocks in at 4200 MIPS.
  • And today, for about $200, you can buy a 20,000 MIPS processor.

Sony is the loss leader king

“With the PlayStation 3, you are getting the performance of a supercomputer at the price of an entry-level PC,”

[iSuppli estimates] the combined materials and manufacturing cost of the PS3 at $805.85 for the model equipped with a 20GB drive and $840.35 for the 60GB version (not including additional costs for stuff like the controller, cables and packaging). That means Sony is losing more than $300 per unit on the lower end PS3 and about $240 on the top-end console. In contrast, iSuppli’s latest breakdown for the Xbox 360 shows Microsoft’s component costs coming in about $75 under the selling price.

Some of the key parts: dual graphics processing units from Nvidia and Toshiba; IBM’s Cell Broadband Engine, which serves as the PS3’s CPU and provides the equivalent computing power of eight individual microprocessors; and four Samsung 512Mbit DRAMs that employ high-speed memory interface technology from Rambus.

“To give an example of how cutting-edge the design is, in the entire history of the iSuppli Teardown Analysis team, we have seen only three semiconductors with 1,200 or more pins. The PlayStation 3 has three such semiconductors all by itself,” Rassweiler noted. “There is nothing cheap about the PlayStation 3 design. This is not an adapted PC design. Even beyond the major chips in the PlayStation 3, the other components seem to also be expensive and somewhat exotic.”

Sell at a loss and make it up in volume

PlayStation 3 on Rescue Mission

Sony will not disclose the total cost of creating the PlayStation 3, which has been in development for six years. But analysts say the sum reaches into the billions of dollars. Sony has revealed that it spent $2 billion on one major component alone, the high-speed Cell microprocessor, co-developed with I.B.M. and Toshiba.With such vast investments, analysts estimate Sony will have to sell 30 million to 50 million units just to break even. To be the sort of mega-hit that Sony needs, analysts say the new game console will at the minimum have to outdo its predecessor, PlayStation 2, which has sold 106 million units since 2000.

Sony is also counting on PlayStation 3 to promote other technologies that it has developed, the Blu-ray next-generation DVD drive as well as the Cell chip. These technologies give the new PlayStation more processing power and sharper graphics than rivals, but also makes it expensive: a model with a 60-gigabyte hard drive will list at $599 in the United States, and one with a 20-gigabyte drive will be $499. […] And even at those prices, most analysts say, Sony will be selling below production costs, and possibly losing hundreds of dollars a machine.