Hacker News new | past | comments | ask | show | jobs | submit login
Analysis of PS4's security and the state of hacking (cturt.github.io)
210 points by jsnell on Aug 20, 2015 | hide | past | favorite | 47 comments



FreeBSD was vulnerable to BadIRET. Oddly, they never seem to have published an advisory, but the fix was here:

https://reviews.freebsd.org/rS275833

I thought the handling of that issue was very strange. I notified CERT, who apparently coordinated with FreeBSD, but no one ever really responded. The closest thing to an advisory that I can find at all is my post:

http://www.openwall.com/lists/oss-security/2015/07/09/1

which contains a PoC that crashes the system. It's almost certainly possible to turn it into privilege escalation, though.

Go figure. I suspect that the security community just doesn't pay as much attention to FreeBSD as they do to Linux.


Probably has to do a lot with timing. At the time of your report, the Security Officer was DES. Life happened, and he wasn't able to keep up or respond to events as quickly.

As of June, Xin Li (previously Deputy Security Officer) has taken over as security officer and things have been handled very promptly and succinctly.

https://lists.freebsd.org/pipermail/freebsd-announce/2015-Ju...

edit: I'm passing word to FreeBSD security officers to see if they can review this


amluto, thanks for commenting about this. The fix is now getting pushed out:

https://www.freebsd.org/security/advisories/FreeBSD-SA-15:21...

In the future do not be afraid to bang a drum loudly or do whatever it takes to get people's attention. It's unfortunate this wasn't immediately acted upon, but the community is needed just as much as the core team to keep things on track.

Thanks again for airing it publicly; I know I'm glad my servers are patched now.


Or they want people to think FreeBSD is more secure?


Capsicum is part of FreeBSD itself so it doesn't have to be listed separately in the list of Open Source Software.

I'm also surprised anyone thought it was using ASLR -- it's a huge effort to get that completed and working sanely, which is being handled by the HardenedBSD folks. Their work didn't even exist when the PS4 was released.

I think it's possible for Sony to backport it and use it, but seems unlikely they would do that at this stage.

I'd also like to point out that FreeBSD jails on PS4 means there are ~23 million units in the wild deploying that technology. Will take quite a while for Docker containers to catch up, haha :-)


The article says userspace ASLR is used, and indeed presented a speed bump along the way.


I think it's unlikely they would fork from the upstream FreeBSD kernel that much without contributing ASLR back. They would end up having to repeat it all again for Playstation 5.

Stranger things have happened, though.


Lots of good detail here. Raise your hand if you knew the switch to AMD would greatly benefit hacking vs custom, PPC chips. (raises hand) Now you know why I deployed security-focused stuff on PPC at one point. The economics of hacking alone work in your favor. :)

Personally, I think they should've tried to license Cavium's Octeon III processors: RISC (MIPS) ISA; 48 cores at 2.5GHz; many pre-made accelerator engines; huge I/O bandwidth. A royalty deal that made Cavium money for profit & continued R&D, while letting Sony have them cheap in PS4, would've made for one, badass gaming rig. Cavium might have had upgrades ready for PS5, too, given all the improvements they made going from Octeon II to III.

Everyone went with AMD instead. So, we get the real benefits of reuse of x86 code, low-costs, and reduced production issues. However, we also get all the black-boxes of risk in x86, a monoculture where an attack on AMD CPU's/firmware might break all game systems, inherent inefficiencies of x86, and lack of hardware differentiation (mainly accelerators) that developers can benefit from. Only time will tell if it was a good trade.


Sony got pretty burned by the SPU architecture on the PS3 by being late follower to XBox, This meant that developers were porting from Xbox or PC(!) which was the Wrong Thing(TM).

You could really make the SPUs scream if you knew what you were doing and the added benefit of better cache locality on other platforms. I think they went to a similar architecture to avoid the pain that it brought to the developers on their platform that weren't ready for it.

It's kind of a shame the PS3 hadn't lead. Almost every team that went PS3->360->PC saw huge gains(due to needing to pack into 256kb blocks for the SPU). Where as everyone that went (PC)->360->PS3 was in a world of pain.


I saw this one coming. It's why I put accelerators in there. Yeah, it's best to keep the hardware mostly like whatever developers are used to using and compatible with existing code. Cell was so radically different from typical systems, even SIMD/MIMD CPU's, that it was a pain to work with even for cross-platform companies.

The Cavium model is more what I was thinking. They make the processors simple, fast, and on a good NOC. Then, add accelerators for whatever. I'd straight up ask game companies what code, algorithms, patterns, etc keep popping up in their games and could be accelerated. I'd accelerate some of that. However, I'd mostly focus on low-level stuff like disks and networking that most developers would prefer to ignore. Solid, high-performance, real-time implementations of all that with hardware acceleration of critical paths. Octeon already does that. That plus a dedicate I/O processor & asynchronous interrupts. That will let the CPU focus on gaming stuff while getting massive utilization.

Beats the hell out of "throw more cores and cache at PC architecture." Intel and AMD certainly dominate in general performance. Yet, the amount of people, tooling, and dollars that go into that is mind-boggling. Of course, that my recommendation is the better model is obvious by the fact that Intel and AMD are taking it themselves. They're also both doing well with their "semi-custom" business that does that for clients.


Sure. You can do all of this. But you'd also be requiring Sony to put in the effort to write an optimized compiler, debugger, and toolchain effort to bring FreeBSD up on a MIPS platform. MIPS is dead.

You never talk about what an "accelerator" is in your case -- you say "stuff like disks and networking". Both of those already have dedicated hardware, and they're IO bound, not CPU bound, so an "accelerator" doesn't help much.

What you want for an accelerator is something to accelerate graphics and physics, perhaps a "Graphics Processing Unit", and something realtime to do audio processing, like a "Digital Signal Processor". Modern computers already have both of those accelerators built-in. Your funky MIPS architecture doesn't add anything but dev annoyance.


"...requiring Sony to put in the effort to write an optimized compiler, debugger, and toolchain effort to bring FreeBSD up on a MIPS platform. MIPS is dead."

You have to bring Linux and BSD, which already do MIPS, up on MIPS? Gotta be hard. If it is, they have a new ARM chip (ThunderX) with similar specs. Had to go with what was already proven and more negotiable, though, so that was MIPS.

"You never talk about what an "accelerator" is in your case..."

I referenced Octeon III as an example. Had you Googled it, you would know exactly what I was talking about. Let me help you out:

http://www.cavium.com/OCTEON-III_CN7XXX.html

The prior models, with 16 cores + accelerators, were used mostly in applications that required line-rate processing on applications like stream processing, networking, etc: CPU and I/O intensive. The low end did 24 GIPS peak and supported Interlaken (10+Gbps) w/ dedicated hardware for compression, crypto, etc.

The new models go from 24-48 cores (120-240 GIPS peak), run at 2.4GHz, do 500Gbps max I/O, offer easy integration with application-specific accelerators (500 so far) directly on network-on-chip, and have mature toolchains for Linux & RTOS's. So, you could offload compression, search (eg pathfinding), graphics, crypto, physics... whatever... onto engines that handle it at hardware speed/efficiency while letting CPU focus on everything else.

"like a "Digital Signal Processor". Modern computers already have both of those accelerators built-in. Your funky MIPS architecture doesn't add anything but dev annoyance."

You must have never programmed a Digital Signal Processor. "Funky" and "dev annoyance" are very good descriptions of it. It's like a different world compared to regular programming and with no standardization. There were whole companies founded to build tools to solve that problem. OpenCL made nice strides but it's still not regular programming. Much easier for dev's to program in C/C++, use a good concurrency approach, call a library function (SW or HW accelerated) for specific hotspots, and hit compile.

Of course, if you find that very challenging, you can always handcode several models of DSP to save yourself time. ;)


> Now you know why I deployed security-focused stuff on PPC at one point. The economics of hacking alone work in your favor.

Perhaps true for a small-time, individual hacker. Over a decade ago, I recall being quite impressed by a professional pentest team whose tools included a lovely exploit authoring DSL embedded in a popular scripting language. That DSL allowed them to write "abstract" descriptions of specific software exploits, ala vs openssh vX.YpZ, then "render" (~~ compile) the exploit code automatically against any of their supported target architectures and/or OSes. (i.e. all of them.) Even though perfectly capable of it, they got tired of manually porting everything around.

This has also been a strong reminder for me that exploits are usually against the software, irrespective of the hardware architecture. It's easy for folks to get mixed up about that.

For something like hacking a PS4, the "obscure platform" logic might apply(), but never assume it applies to any attacker who can afford (n.b.: not just build) a sophisticated attack platform.

() With the caveat that now it's a numbers game, and you're up against bored teens/tweens with way too much time to throw at the problem.


You're also up against hardware hackers who stand to make a buck selling prepackaged hardware exploits (AKA modchips). You'd be surprised how much resources these guys have available to them.


I wouldn't be. They're quite clever. And mod-chips are a physical attack. All systems are assumed compromised if enemy has physical possession, especially modifications. How well are they currently doing with remote, software attacks to fully compromise the gaming consoles?


Well, they don't really care about software attacks as much since those don't make them as much money. It's also fairly difficult for a remote hack because everything leaving the console has to go through an IPSEC tunnel to pass cert (at least on the 360).

And you'd be surprised how secure you can get a game console these days from a hardware perspective. For instance, on the 360, all stacks and everything in the hypervisor is encrypted with a per boot random key and hashed as it leaves L2 for main memory. If the hash doesn't match on the way back, the system resets. You're very protected from DMA attacks. Particularly of the kind that that's easily reproducible and able to be sold in mass.


The console security was good work. They wisely applied the tech being developed by academics and industry. There's a number of similar tech designed to stop leaks, protect control flow, and so on with crypto in SOC. I can dig up and post some of you'd like.


Yeah, I'd love to read that if it's not too much effort. : )


First that was easier than AEGIS was SP/Bastion:

http://palms.ee.princeton.edu/sp_bastion

SecureME's cloaking was interesting:

https://docs.google.com/file/d/0B1i_Zf52vJctMTA4YTI1MmUtNzdj...

HIDE - an infrastructure for efficiently protecting information leakage on the address bus http://www.cc.gatech.edu/people/home/santosh/papers/asplos20...

Using address independent seed encryption and bonsai Merkle Trees to make secure processors OS- and performance-friendly http://www.ece.ncsu.edu/arpers/Papers/micro07-brian.pdf

Embedded Software Security through Key-Based Control Flow Obfuscation http://engr.case.edu/bhunia_swarup/papers/C/C80.pdf

Memory encryption: Survey of Existing Techniques http://www.thayer.dartmouth.edu/tr/reports/tr13-001.pdf

ASIST - architecture support for instruction set randomization http://www.ics.forth.gr/_publications/papadog-asist-ccs.pdf

Hardware architectures for software security http://scholar.lib.vt.edu/theses/available/etd-10112006-2048...


I don't assume anything and actually made security DSL's myself at one point for defence. The threat model was widely deployed malware and some targeted attacks. The ISA knocked out the malware plus targeted attacks according to monitoring. The more clever attacks we saw (firmware I guess) were blocked by guards and strong, app-level security. One thing I always got guards to do was modify the traffic patterns to make the system pretend to be a different type of system (eg Linux on x86). Even sophisticated attackers, unable to bypass or even see the guard, apparently couldn't get by because they were never smart enough to try an obscure OS + ISA combination.

That was a nice hold-up until I learned about separation kernels and split-trust architectures. That was a nice hold-up until recent work on modifying CPU's to increase security. Got plenty of those at concept stage that should do plenty with a few having been implemented by others w/ ranging Linux or FreeBSD support. Now, I'm working on verified specification and synthesis of hardware plus mask and mixed-signal verification to solve the last part of the problem.

On a side note, I told some people with simple IT needs to switch to PPC Mac's after Apple switched to x86. They just hated all the hacking and malware issues. Set them up with portable software on PPC so they could leave at any time and take their data with them. Predicted software would get supported for years, some of Internet would be usable, and replacement hardware would be cheap due to PPC Macs going on eBay. They're still using them today and I got another laptop as an air-gapped machine-in-progress last year for $80. Gonna port Tinfoil Chat transmit node to it if I get time. :)


"That DSL allowed them to write "abstract" descriptions of specific software exploits, ala vs openssh vX.YpZ, then "render" (~~ compile) the exploit code automatically against any of their supported target architectures and/or OSes. (i.e. all of them.)"

Fair enough, but was one of those targets PPC ? MIPS ? Sparc ? Genuinely curious ...


Absolutely all of those three (and x86, obviously) were supported. I forget if ARM was supported back then, but I can't imagine that it isn't now.


So, I'm kind of curious: It seems that the whole ROP thing depends on X86's CISC architecture in order to allow turing complete programming. Am I wrong in this understanding?


Oh, the attackers in academia have been clever for a long time. Check out this old gem:

Automatic Patch-based Exploit Generation is Possible http://bitblaze.cs.berkeley.edu/papers/apeg.pdf

Gave me a sly grin when I saw it years ago. And to think I thought I was clever because I always tried to compromise networks via whatever they trusted for security. These jokers straight-up turned patches into weapons. I realized at that point, along with all the hacks in media, that computer security was fundamentally (censored).

Started focusing on clean-slate approaches where possible with obfuscation, diversity, and strong interface protection everywhere else.


Yes. The techniques are possible (though have slightly different tactics) on other architectures. I'm aware of ROP being used for exploits on x86, ARM, MIPS and PPC.


Games for a long time struggled to with scaling up to more then one core. Even today the games scale to like 2 or 4 cores. Just at the PS3 cell, it took a long time for games to really utilize the PS3's full potential. That's why I don't think that dropping 48 cores on game devs would have been a good business decision.


You can just drop 48 cores on them or you can drop 48 cores plus tools and patterns to use them well. There's been techniques for linear, reliable scaling in academic literature and even in industry (eg NonStop, MPP's) for years. There were also toolkits that helped automate that process using c-like languages.

That said, just because they can't use it right off the bat doesn't mean they won't figure it out over time. We really didn't get to see them try. The Xbox used a non-power-of-2, lame core count while Cell didn't use traditional cores at all. Apple to oranges a bit although 360 is a realistic comparison if the multicore tooling sucks.

So, what's Microsoft and Sony's recent approach? Drop a bunch of cores on them. :)


GPU in PS4 has 18 cores, in Xbox one - 12. They work quite well. Not exactly powers of two, but neither is 48, to be honest.

The problem with Cell was not that it had multiple cores. It was that it had to do a lot of graphics tasks to take the load off the weak GPU. And doing graphics on a CPU is quite hard. Doing so in concert with another GPU is doubly so. On the other hand, doing CPU work on GPU is much easier.


GPU and CPU cores are very different in how they process data / architecture.

You can think of GPU cores as you schedule a series of "vector" instruction over a large chunk of data. They are great at internally paralleling this process because the flow is simple. You assume the there will be little synchronization and little branching. That's why you need such fast memory on GPUs to keep streaming data at instruction speed.

CPU cores are a lot more general prepuce where you're generally building the business logic and the work load is a lot more branch oriented.

So if you have a problem that's trivially parallizable without (much) branching logic then throwing more GPU cores is easy. Problems like this numerical in nature like large matrix calculations (math or 3d meshes), streaming data (hashing).

Hence why it's easier to keep adding GPU cores... it means games can process more geometry data in a fame, but adding more general prepuce CPU cores is harder because the complexity of business logic goes up as you split up the work and have to worry about ordering/synchronization.


There is not much difference in how the GCN and a generic CPU process data, I can not speak for every GPU out there though. The only significant difference is that each GCN core can run up to 40 threads while the CPU at this power level run at most two (Jaguar in the 8th gen consoles is one thread per core). Having so many threads simplifies a great deal of parallel programming because instead of splitting each task into several chunks, where each chunk still has multiple items that need to be looped over you can just spin a thread for each item and get rid of the loops. Thread switching is zero overhead and creation/destruction is a few cycles. These are not Windows/pthread software threads.

Of course, you don't have to run the same code in every thread. If you can figure hundreds of different tasks to do simultaneously in a game - you can put each one of them in a thread too.

This is also why a GPUs, generally, use a very slow memory, contrary to your belief, With so many threads latency does not matter that much since a thread that gets stalled on a memory access is preempted for one that already has data available.


Current gen GPUs use GDDR5 compared to DDR4 in only handful of new Intel chips that started shipping in the last few weeks. The GDDR4 chips runs at 750Mhz and DDR4-2133 as supported by the fasted shipping Intel CPU runs at 266Mhz. That is an effective transfer rate of 48 GB/s vs 17Gb/s for the DDR4.

The current GPUs effectively have the fastest off core memory of current devices. They need those transfer rates to keep all the stream processors running.


"Transfer rate" is not synonymous to speed. Latency is. GDDR5 latency is greater than any DDR3 memory lest DDR4. And HBM that is the new video memory is much slower than GDDR5 (even though it's physically GDDR5 the implementation s we have run it half the clock of normal GDDR5).

Bandwidth is great though, but if bandwidth had been speed you could also say that a container ship is faster than a supersonic jet.


Obviously, bandwidth is not the same as latency.

GPUs cards makers the trade of between bandwidth and latency i n favor of latency. When you're doing mostly branch free processing in large chunks that's the trade of to make. All you need is a strait forward pre-fetcher and you don't need to worry about latency.

That's not true for general purpose CPUs that perform lots of branches, that need to predicted (so we can predict what to fetch). The data processed on CPUs tends to be different (structures vs. vectors) and lots of pointer chasing (vtables, linked lists, hash tables, trees). That requires lower latencies since the access pasterns a lot more random.

The stated goal of HBM is taming the power consumption (and thus also heat) of GPU systems while keeping the same (or higher) bandwidth. The name HBM stands for high bandwidth memory.

And while HBM has a lower clock frequency compared to GDDR5 (like 1/4) it has a much wider bus. The bus on HBM is 1024 bits vs 32 bits for GDDR. At one time it can send 32x times the data in the bus. 32x / 4 = 8. The transfer rate of is 8 times bigger. The recent radeon cards with HBM now have memory transfer speeds of 256GB/s vs the 48GB for GGDR5.

Again, HBM trades latency for bandwidth. It negates some the latency drops due to 1/4 of the clock by putting the HBM memory on die vs off die.

I think you're conflating a few different arguments. GPU workloads are not latency sensitive, so in GPU land transfer speed (bandwidth) is speed.


I am not sure you are familiar with modern GPU architecture. Both AMD's and NVidia GPU have no problems with branches. They do not do prediction and prefetch because it's pretty pointless on a single issue architecture. I believe the ISA docs are available to general public - you could easily familiarize yourself with them. I am also quite familiar with latency and bandwidth so the concept of negating one with another sounds very amateurish to me. If you could do that then everyone switched to high bandwidth memory and negated all the latency :) Speed is still speed and bandwidth is still bandwidth.


Good point. Powers of 2 a bit of a brain fart of mine. That adding more cores worked well was my point in recommending an Octeon-like design. Didn't know the PS3's graphics system sucked that bad. Trying to do a real GPU + a fake one would definitely be hard, esp if the CPU was hard to begin with.


The reality is they don't need perfect security. Just "enough" to keep things like piracy at modest levels. For example, if your new system allows you to trounce your competitor and obtain most of the marketshare, it is easily worth a modest increase in piracy.

Sony has done MIPS (PS2) before, and they have done PPC (PS3) before. They might change their mind again for the PS5, but it's not like they didn't even think of MIPS or PPC.


Oh I agree. I was just pointing out it would happen. Happens every time a company jumps on a mainstream platform unless what they had before was horrific. What you say is the exact model for how device-manufacturers look at security. Well, that plus I.P. protection where appropriate. They mostly use obfuscations for that.


>Raise your hand if you knew the switch to AMD would greatly benefit hacking vs custom, PPC chips

Eh, I'd argue the opposite. The consoles being boring PC hardware has basically made hacking them pretty unrewarding.


What consoles were you using? The closest thing to PC hardware were the Xbox's, which was intentional. The Playstations used MIPS-based setups with tricks like scratchpad while PS3 had Cell processor. Had it been boring and PC-like, then porting to PS3 would've been a breeze.


I'm comparing to things like:

TI-83 - embarassingly bad hardware for the year 2000+, but everyone had one and you got to learn z80 assembly. Very similar to programming for 80's home computers.

Gameboy Advance - really nice open source toolchain, fun hardware (classic framebufferless sprite / tile graphics, but very powerful), and being somewhat inexpensive and portable (this is long before the iPhone).

Nintendo DS - by far my favorite system, it keeps the great toolchain of the GBA and still has the fun graphics hardware. Flash cartridges for this were cheap and plentiful, making development super easy. It also had primitive but very easy to use 3D graphics to combine with the traditional GBA hardware. It also had a touchscreen (when resistive was still cool), a wavetable synthesizer, wifi, and all sorts of other goodies to play with.

Wii - the last system I thought was interesting before I stopped messing with consoles. It was still very PC like but had the neat controllers, was cheap, and had weird fixed function graphics inherited from the Gamecube.


Well, sure, odd and limited systems can certainly be fun to learn and bring to a usable state. You can play the same game with the PC hardware: it's called the demo scene (e.g. kkkrieger). Usually a different game with that kind of power, though. The game is to do the coolest stuff you can with the hardware. You might squeeze out extra performance, modify the software architecture, leverage components in creative ways, and so on.

You should've had a lot of fun with Cell processor if you like hackery. It blended several different models into one interesting piece of hardware that could be tweaked into emulating them all and doing some side-by-side. More interesting (and useful!) than any of the above systems. That's why supercomputer people put a cluster of them to good use. :)


FreeBSD's default x86-64 calling convention doesn't pass arguments in registers?


Presumably the function calling convention does, but not syscalls.


Neither does; the author is linking to the i386 documents, where there is indeed a difference between syscall and C calling conventions on some operating systems. On x86_64, everyone follows the SysV ABI and uses registers for both purposes.


The section on ROP is fascinating. Has anyone done a ROP compiler?

Some program that takes a binary + c code -> address list?

If you think of the gadgets as being like assembler instructions, it seems like it'd be possible, though tricky to do.


When was this published?


Originally on 2015-06-29, updated frequently and as recently as three days ago.

Commits: https://github.com/CTurt/cturt.github.io/commits/master/ps4....

Blame: https://github.com/CTurt/cturt.github.io/blame/master/ps4.ht...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: