Gitlab servers are being exploited in DDoS attacks

Jach · on Nov 4, 2021

It's somewhat refreshing that the underlying bug isn't from some C or C++ utility, but instead a Perl program using eval: https://github.com/exiftool/exiftool/blob/11.70/lib/Image/Ex... Another instance of "avoid eval as much as possible" for languages that have it.

xxpor · on Nov 5, 2021

With dlopen() and libclang/libgccjit, you could argue C (on an OS that supports dynamic loading) has eval too ;)

cozzyd · on Nov 5, 2021

You have system("gcc uploaded.c && ./a.out") too!

Spivak · on Nov 5, 2021

But that code doesn’t run in the memory space of the process.

willbudd · on Nov 5, 2021

Then compile it as a library, and dlopen() it after all. (I'm just guessing the gp's point was that JIT isn't really an essential modality.)

superjan · on Nov 5, 2021

Only in programs that invoke it on hacker controlled data.

xxpor · on Nov 5, 2021

But if you only evoke eval() on non-hacker controlled data in a scripting language, it's probably fine too.

winrid · on Nov 5, 2021

My personal career favorite use of eval was for an import system that "unrolled" the loop that went through the columns for each row, using eval. It was much faster, but obviously a huge security risk.

Today with modern JIT compilers its probably not much faster...

rsj_hn · on Nov 5, 2021

I saw a very similar thing with eval. There was an evaluation of a nested JSON object

    x["a"]["b"]["c"]

And the developer decided that this was best evaluated by eval. During the code review phase I talked to them and asked why they were using eval, and they didn't know it could be evaluated directly as they were a little unclear if javascript supported that syntax.

jdc · on Nov 5, 2021

How much longer did said dev continue working there?

dylan604 · on Nov 5, 2021

Fire the dev, or educate the dev. Sure, eval is just a sure sign of a lot of lack of understanding, but I'd hope for something less vile as eval would be a little more understanding that everyone learns something sometime

iratewizard · on Nov 5, 2021

It's gross negligence of the dev is only a Javascript dev, but maybe understandable if it's just one of the three languages he uses regularly. I probably wouldn't fire, but I also can't imagine any of my devs doing that.

rsj_hn · on Nov 5, 2021

err, no comment

dylan604 · on Nov 5, 2021

that dev is now your manager? /s

RexM · on Nov 5, 2021

He is the dev

rsj_hn · on Nov 5, 2021

LOL, no to both!

mxrlkn · on Nov 5, 2021

I was once tasked with creating a new frontend on an old project that had an API endpoint return something like this:

  var array = ["foo", "bar"]

I was expecting xml or json (like the rest of the endpoints), but I realized that they just served this as text, and then eval'd it on the frontend...

david2ndaccount · on Nov 5, 2021

This was how you used to do it before JSON.parse was built into the browser.

zmmmmm · on Nov 5, 2021

it was literally a pseudo "standard":

https://en.wikipedia.org/wiki/JSONP

mxrlkn · on Nov 5, 2021

Right, but for it to be JSONP, the response should be injected into a script tag, I believe.

tentacleuno · on Nov 5, 2021

And it should be wrapped in a function call (most times, you can choose which function is called by a query parameter).

sedatk · on Nov 5, 2021

> avoid eval as much as possible

"eval is evil", if you will.

beefield · on Nov 5, 2021

Okay, I'll bite. I have known for a long time that eval is evil. Then, last year I actually needed to evaluate a string (from a file). As the case was safe enough (input 100% controlled by me), I did not worry too much and just used eval. But what would be a safe way to evaluate things if you needed to do that in unsafe environment? Say, you would like to make a safe website that allows user type a python code snippet and that would be evaluated/executed server side? Is that even possible?

lemoncucumber · on Nov 5, 2021

Sure, that's basically what services like AWS Lambda do. As a starting point, you'd want to run the code in a short-lived VM with little to no network access which is dedicated to just running untrusted code.

webignition · on Nov 5, 2021

Yes, this is what I'm currently doing with a cloud-based website automated-testing system.

In my case, code supplied by the end user is compiled into a different language such that I think I can prevent intentionally-malicious activity.

Nonetheless, spinning up a VM to create an environment in which potentially untrustworthy code is executed before then destroying the VM seems the safest option.

nl · on Nov 5, 2021

This is a bad solution.

Lambda allows arbitraty network access and may allow access to your AWS resources.

If you have to do this, the best approach is to containerise it, use capabilities to enforce restrictions and run in a virtual machine as isolated as possible.

It's still not great though. Some languages (eg Java) have additional features that help with this though.

lemoncucumber · on Nov 5, 2021

Yes, I was simplifying — the important part is keeping the untrusted code off of a trusted network. If you want to do the legwork of carefully segregating things then of course network access can work.

I didn’t mention containers since they don’t provide strong isolation and some people misuse them as though they do. There’s no harm in using them as another layer of defense, but hardware virtualization provides much better security.

PLG88 · on Nov 5, 2021

Why not just not trust the network or host at all. Put private connectivity inside trusted code using an SDK. Then the trusted apps can only communicate to devices/apps defined and nothing else. Untrusted code cannot access the trusted network as the network is acutally inside the apps/system.

andrewaylett · on Nov 5, 2021

But lambda as presented here isn't so much as a way for you to sandbox code, but AWS sandboxes your code. If you're needing to execute untrusted code, you need to play the role of AWS in this scenario.

aasasd · on Nov 5, 2021

When people actually want just a subset of `eval` to permit some custom computation, the proper thing to do is to define that subset as a language and make an interpreter that will read only that language.

As for mostly-full-featured `eval`: iirc Perl itself has a facility to create restricted sub-interpreters and run scripts that can't do certain things. (Though I might be confusing Perl with PHP here.)

tinco · on Nov 5, 2021

Almost all modern programming languages have parsers for that language either as a standard library feature or as a package available in the ecosystem. That means it's very easy to run a production quality parser over an input string and then validate and the interpret the resultant AST as you see fit.

Besides that approach, simply rolling your own parser using a parser combinator library is super simple. The word combinator makes it seem complicated, but it's actually the opposite, using parser combinators is a lot simpler than writing a parser the traditional way you might have learned in formal education.

Implementing a simple DSL like for example an event-filtering language should cost a competent but fully inexperienced programmer maybe 1 or 2 weeks for a proof of concept, and then 3-6 more weeks to get it production ready depending on the feature set of course.

Of course, that's more time than simply running the V8 interpreter over your input string, and maybe running the V8 interpreter over your input string is an awesome way to empower your (trusted) customers.

niros_valtos · on Nov 5, 2021

If the user is trusted but the environment isn’t, the user can sign the string and you can validate it pre execution. If the user is not trusted, you need to contain the execution as much as possible, e.g. container without file or network access to your resources. If there is a need to access specific resource, white list it.

franga2000 · on Nov 5, 2021

Basically, just use this: https://github.com/judge0/judge0

bsdooby · on Nov 5, 2021

Tcl's possibility to use a restricted child interpreter and the active file pattern come to mind.

tyingq · on Nov 5, 2021

Perl does ship a sandbox module called Safe as a standard module, though I don't know how strong it is.

blacksqr · on Nov 5, 2021

"The Safe module does not implement an effective sandbox for evaluating untrusted code with the perl interpreter."

https://perldoc.perl.org/Safe

tyingq · on Nov 5, 2021

Yeah, though in this case, whitelisting opcodes probably would have at least avoided qx and ``(both exec()). Avoiding the eval() altogether would be the right path, of course.

tester34 · on Nov 5, 2021

Of course, we are Google. /s

Have you tried writing your own interpreter?

ed25519FUUU · on Nov 5, 2021

Public facing?

Handles user input data??

Uses ‘eval’???

Gigachad · on Nov 5, 2021

The line existed in the 2014 commit which migrated the repo to git. It wasn't designed in the current era of mass automated abuse and internet connected everything.

majou · on Nov 5, 2021

2014 was very much that era, I'd accept this excuse maybe for 2007

kibwen · on Nov 5, 2021

The line wasn't written in 2014, that's just the earliest that the history goes back. Presumably it's no newer than the early 2000s.

Gigachad · on Nov 5, 2021

Considering that initial commit contains 400k lines, I would say there is a very long history before git was used.

leejo · on Nov 5, 2021

The comment in the code slightly above says:

    this doesn't work in perl 5.6.2! grrrr

So this code is at least 20 years old, and probably pre-2000.

sebcat · on Nov 5, 2021

https://exiftool.org/ancient_history.html

    Sept. 26, 2008 - Version 7.44
           - Added read support for DjVu images

There were probably enough systems running perl 5.6.2 around 2008 to cause bug reports, or the code was migrated from an older piece of code and added to ExifTool.

It was not uncommon to manually ./configure, make, make install tarballs locally in those days, especially not on systems like Slackware so I can see it being possible to have old packages installed that were not automagically updated.

danudey · on Nov 5, 2021

I remember before 2007 seeing people using eval and thinking "what on earth is wrong with you?"

I'd accept this for maybe pre-2000, but people should really know better.

smashed · on Nov 5, 2021

That line is in a Perl metadata cleaning library. Not in Gitlab itself. Gitlab, who prides themselves and sell the gospel of improving security, willfully chose to use that library which was obviously never designed for their use case.

I looked at the front-page of that library. It says it cleans metadata from a huge number of file format. Frankly it looks more like something you would use on your own, known safe, files before sharing them online.

I'm not sure the tool is presented as a sanitizer for untrusted input. At least, it does not claim to be.

Why does Gitlab need to clean metadata from DjVu files? Wtf are DjVu files?!

jorams · on Nov 5, 2021

> Why does Gitlab need to clean metadata from DjVu files?

It doesn't. It needs to clean metadata from JPEG and TIFF files. They didn't properly check if the files were actually of those types, and Exiftool performed its own content type detection to end up in its DjVu code.

> Wtf are DjVu files?!

DjVu is basically an alternative to PDF.[1]

[1]: https://en.wikipedia.org/wiki/DjVu

tingletech · on Nov 5, 2021

I think they are like a very old version of jpeg 2000s that support pan and zoom / tiling of large image files. It needed a license to create and one to display iirc

FpUser · on Nov 5, 2021

>"avoid eval"

We are already avoiding / trying to avoid way too much interesting and useful things. All for the sake of security and only to encounter new ways to be attacked. Instead of "avoid" how about actually organizing worldwide intolerance and hunt for those attackers.

KronisLV · on Nov 5, 2021

I feel like this article could have been far more useful with the following points being explicitly mentioned, or at least summarized:

  - the problem appeared in GitLab 11.9.0
  - the problem seems to have been fixed in GitLab 13.8.8
  - the vulnerability uses ExifTool, so to exploit it, a user needs to be able to upload images
  - if an update is not (yet) possible, DjVu format file uploads can be blocked to avert this vulnerability
  - this vulnerability isn't relevant for GitLab instances that just have 1 user, or are not publicly accessible on the Internet

Now, i'm not saying that the above is entirely true, but after reading something like the above, one should be able to figure out how to best act:

  - if you have a public GitLab instance with open registrations, consider updating it immediately (with backups in place, of course)
  - if you have a private GitLab instance with many users in your own corporate network (that somehow isn't updated yet) - this is a good reason to put updating it into your agenda today, even if your users aren't necessarily hostile
  - if you have a private GitLab instance or one with registrations closed (e.g. you're the only user or people that you trust use it), mark this down and update whenever possible, however it's probably not necessary right this moment

Of course, i can't say the above with 100% confidence, because the article itself lacks this actionable information to aid in decisionmaking and so i'm left to piece things together on my own, because of which i could be wrong.

On an unrelated note, DjVu is a pretty interesting file format, though sadly i've only seen it be used very sparsely, on some Russian forums for tractor manuals or something: https://en.wikipedia.org/wiki/DjVu

mjochim · on Nov 5, 2021

Caution, CVS-2021-22205 has later been found to be exploitable without authentication. No need to be "able to upload an image," unfortunately. Also no need to take the detour through a mirrored repo as sibling suggests; so long as the GitLab instance is accessible from the internet.

morelisp · on Nov 5, 2021

> exploitable without authentication

Can you provide more info? I skimmed the upstream ticket but didn't see how. Getting access to anything other than the login page on an accessible-but-private instance seems like a security bug regardless of this CVE.

albinolobster · on Nov 5, 2021

Full disclosure, I wrote both of these.

The following describes the entire unauthenticated attack:

https://attackerkb.com/topics/D41jRUXCiJ/cve-2021-22205/rapi...

And, if you like that sort of thing, there is a metasploit module you can use to reproduce the unauthenticated attack:

https://github.com/rapid7/metasploit-framework/commit/6f4aa5...

morelisp · on Nov 5, 2021

> Specifically HandleFileUploads in uploads.go is called from a couple of PreAuthorizeHandler contexts allowing the HandleFileUploads logic, which calls down to rewrite.go and exif.go, to execute before authentication.

I'm no security guy, but this seems... incredibly dumb? Like even for perfectly secure code, the asymmetry in resource usage alone to submit an image vs. get them to dump a file, shell out to a scanner, and rewrite that file would probably be enough to seriously hurt smaller GitLab VMs.

albinolobster · on Nov 5, 2021

Not only that, but it still works in exactly this way. I would have thought they would have fixed this "feature." But an unauthenticated user can still provide GitLab with tiff/jpeg images and have them reach ExifTool.

morelisp · on Nov 5, 2021

> the vulnerability uses ExifTool, so to exploit it, a user needs to be able to upload images... this vulnerability isn't relevant for GitLab instances that just have 1 user, or are not publicly accessible on the Internet

A lot of private GitLabs contain mirrors of public repositories or vendored copies of public libraries. Our GitLab is private but practically speaking there's probably several hundred people, most of whom we couldn't identify, that could "upload an image" to it.

magicalhippo · on Nov 5, 2021

> if an update is not (yet) possible, DjVu format file uploads can be blocked to avert this vulnerability

Note that the issue was enabled by GitLab not verify the file format, ie that a .jpg is a JPEG and not a DjVu file for example, before handing it over to ExifTool.

So a simple extension/mime check won't cut it.

morelisp · on Nov 5, 2021

Nor should they rely on their verification of the file format for anything other than a temporary mitigation of this specific bug! If the ExifToof / DjVu tooling itself isn't secured or sandboxed, this would be a future exploit waiting to happen.

magicalhippo · on Nov 5, 2021

Absolutely. To be honest I was a bit shocked they used an ExifTool binary that had anything but JPEG/TIFF support compiled in.

deepsun · on Nov 5, 2021

In Eastern Europe professors share scanned books/papers mostly in .djvu format.

aasasd · on Nov 5, 2021

Which afaik amounts in that case to a bunch of grayscale JPEGs. Dunno about other cases, but I've never seen a djvu with extra capabilities besides raster images of the pages.

deepsun · on Nov 5, 2021

Just checked my backups. Wildly popular Skanavi books [1] have copy-pasteable text, so they OCRed it.

https://www.amazon.com/Problems-Mathematics-education-instit...

w1nk · on Nov 5, 2021

Given the creators, I'd guess the format is also used in certain machine learning circles as well (or was meant to be at least :)).

thepill · on Nov 5, 2021

How do you check if you have been compromised? I did apply the patch a few days after it was released, but im unsure if the system has been compromised....

john_cogs · on Nov 5, 2021

GitLab team member here.

Please see this post on the GitLab forum for details how you can determine if your instance has been compromised through the exploitation of CVE-2021-22205: https://forum.gitlab.com/t/cve-2021-22205-how-to-determine-i...

codegeek · on Nov 4, 2021

"..Bowling said he discovered a way to abuse how ExifTool handles uploads for DjVu file format used for scanned documents to gain control over the entire underlying GitLab web server"

Ah, the good old "File upload vulnerability". File uploads remain one of the hardest problems to solve when it comes to security.

ignoramous · on Nov 4, 2021

> uploads for DjVu file format used for scanned documents to gain control over the entire underlying GitLab web server

A usecase for WASM's nanoprocesses (capability-based security) perhaps? Of course, until such a time someone exploits the WASM runtime itself.

jrockway · on Nov 5, 2021

I really like the idea of using WASM for application "plugins". You can pick your implementation language, and with the right runtime, it can be speedy and secure. Seems like a win.

The blockers, to me, right now are:

1) I mostly write Go, and the Go runtimes didn't seem to be particularly maintained when I last looked. So it just hasn't been worth it to me to do plugins. (I have done "provide your own code to a Go application" before -- "gojq" and "expr" got the job done. Less features than a full WASM runtime, but still pretty powerful.)

2) It's unclear to me which programming language APIs should target. You add a plugin system and you want developers to use it -- what are the popular languages that target WASM? Go and Tinygo look great here, but I have a feeling that the average programmer wants something a little more dynamic for their small plugins. AssemblyScript obviously wants to be the standard, but it's probably too different from Typescript to make it a no-brainer for Javascript developers. Some sort of Perl/Python/Ruby that compiles to WASM would be great, but I haven't seen much progress on that front.

As for running untrusted code in general, I don't think WASM needs to block you. gVisor simulates the linux kernel for containers, providing stronger isolation between them, and is designed to protect you from things like this. (I think the original usecase was running ffmpeg to transcode user-provided video files?) And you can always go full VM on these things. Or take the nuclear option -- carefully audit the untrusted code and build up that trust ;)

dmw_ng · on Nov 5, 2021

Retrofitting seccomp or a custom Apparmor policy are both much lower hanging fruit. The problem is folk tend to link these things directly into their web servers

mintplant · on Nov 5, 2021

RLBox is a toolkit for doing just this: https://plsyssec.github.io/rlbox_sandboxing_api/sphinx/

Used in Firefox to sandbox some libraries, including image handling IIRC.

seph-reed · on Nov 4, 2021

> File uploads remain one of the hardest problems to solve when it comes to security.

Why? It seems like they should have read/write but no execute. What goes wrong?

ASalazarMX · on Nov 4, 2021

The file is not executable, but the parser executes it in its own context:

> When uploading image files, GitLab Workhorse passes any files with the extensions jpg|jpeg|tiff through to ExifTool to remove any non-whitelisted tags.

> An issue with this is that ExifTool will ignore the file extension and try to determine what the file is based on the content, allowing for any of the supported parsers to be hit instead of just JPEG and TIFF by just renaming the uploaded file.

> One of the supported formats is DjVu. When parsing the DjVu annotation, the tokens are evaled to "convert C escape sequences".

https://gitlab.com/gitlab-org/gitlab/-/issues/327121

formerly_proven · on Nov 4, 2021

eval $schmuh in a file format parser is... uhm, yeah. Not subtle.

staunch · on Nov 4, 2021

> What goes wrong?

Usually what goes wrong is parsing or processing the files. It's hard to get programmers to safely validate a 20 byte email address string. It takes a lot more care to safely parse a 4,000,000 byte image file in a complex format.

SahAssar · on Nov 4, 2021

That might be true if you treat all files as opaque blobs, but services like these do things like resizing images, extracting metadata, and converting to other formats.

meragrin_ · on Nov 4, 2021

That doesn't help when the uploaded file is only read/write but crafted in a way to exploit the code processing the file.

jacquesm · on Nov 4, 2021

You essentially have a gateway into a very large chunk of code that was most likely not built with security in mind on the parsing side, on top of that you are guaranteed write access to some file system.

danudey · on Nov 5, 2021

Ideally you would sandbox this with:

1. No filesystem access

2. No network access

3. Input passed on stdin (or a pre-opened fd)

4. Output passed to stdout (or a pre-opened fd)

5. A hard timeout specified before the process is killed

Suddenly bam, dramatically safer.

If you're looking for a tool that can do all of this for you, check out firejail:

https://firejail.wordpress.com/

It has a ton of options, but you can do all of what I suggested and more, really easily.

cookiengineer · on Nov 5, 2021

Note that firejail had a serious RCE in the way it parses URLs for at least emails. Personally, having read the codebase, I wouldn't put too much faith in its security, given how long it took that the meta character parsing problems were discovered.

A user input facing software should always use fuzzing to uncover such bugs.

mdavidn · on Nov 4, 2021

File formats like PDF contain a script that must be interpreted in order to read the image.

sushsjsuauahab · on Nov 4, 2021

That sounds like a pdf problem ^.^

spijdar · on Nov 4, 2021

PDF is an example, but far from the only one. Many file types have executable/scripted portions or simply very complex file formats that have huge attack surfaces. JPEG XL predictors are Turing complete, given an unlimited image size...

Off the top of my head, a lot of old console exploits on the Wii and GameCube revolved feeding malicious save files to games. Same idea, missing bounds checks or whatever when deserializing some field lets you shell the process. Parsing random files from users is just dangerous.

Hamuko · on Nov 5, 2021

Some say the P in PDF stands for Problem.

fragmede · on Nov 4, 2021

Well, it's a media file parser vulnerability for one, so the filesystem execute bit doesn't protect against it, and the memory nx bit just means you have to get to a page where it is set.

tata71 · on Nov 4, 2021

Is that because people use hackjob dependencies to handle it more often than not?

djbusby · on Nov 4, 2021

ExifTool is hackjob? I think not.

But also, file uploads should be handled in a jail or box of some type - and never let their analysis make network calls.

jeffbee · on Nov 5, 2021

It's a perl program that evaluates untrusted strings it finds in user files. What exactly is your standard for "is hackjob"? It appears to be a complete piece of shit.

djbusby · on Nov 5, 2021

It's clearly not complete shit, else it wouldn't be used by literally millions of people/systems. ExifTool is so far away from shit that in fact it was chosen by a highly respected company with a very good team.

A hackjob usually has less deploys than my own stuff (which, outside of Windows 2000 components, is less than a few millions)

justin_oaks · on Nov 5, 2021

Just because something is useful doesn't mean it's not a hackjob.

Just looks at how PHP got so popular. Clearly it was useful and thus became popular. I think it's hard to argue that it wasn't a hackjob when it first started.

popcube · on Nov 5, 2021

JavaScript is a perfect instance

justin_oaks · on Nov 5, 2021

Quite so. It was put together in a day, and we have to endure some of its quirks decades later.

jeffbee · on Nov 5, 2021

By this reasoning "Baywatch" was a great TV show because lots of people saw it.

aurelianito · on Nov 5, 2021

Baywatch was a great TV show. Maybe you don't like it. Millions all over the world do.

rsj_hn · on Nov 5, 2021

I have never seen it, but am aware of the low brow reputation. If you genuinely like it, could you tell me what you like about it? I recently saw some old episodes of Knight Rider and admit that the show was fun.

rndgermandude · on Nov 5, 2021

ExifTool is the tool to extract and write and fix metadata, bar none. I haven't seen anything that even comes remotely close at how good it is at handling just jpeg metadata (exif, XMP, IPTC, MarkerNotes, and all the fine bugs every vendor has when creating these), let alone other formats too.

ExifTool is essentially for (image) metadata what ffmpeg is for video.

It isn't a hack job, but it started out as one, like so very many other things. Version 1.00 was released end of 2003, while the problematic code was added in 2008. Mind you, the problematic code does not just eval whatever it sees, it tries to make sure the input isn't dangerous first. That check failed spectacularly, defeated by a newline combined with how '$' in perl regex works without any special flags[0]. Using eval was a bad choice to begin with (but I am told something you would commonly see in perl software of the time, yes, even in 2008 still), it was the lazy choice of re-using perl to unescape C-strings instead of rolling your own unescaping code.

So what do you suggest? Use libexif[1]? Exiv2[2]? Where would I run it? Can you suggest any operating system that never had a "stupid" RCE?

>It appears to be a complete piece of shit.

Let's see your code then. All the code you ever wrote that is possibly still in use somewhere. So if you never ever fucked up or got lazy, feel free to cast the first stone, otherwise I would suggest you dial down your rhetoric when it comes to taking massive steaming piles on other people's work.

Yes, Phil Harvey had a "WTF?!"-class security bug here[4], shit happens, "goto fail", let me deRail your yaml and the Debian random number of the day is: 6.

He patched it promptly compared to other vendors and projects (public release on April 13th, while April 7th was the initial bug report, to Gitlab not ExifTool, which Gitlab then passed along[3]).

You can blame him for the bug, you can blame Gitlab for not running exiftool in some sandbox. But that half the gitlab instances remain unpatched some 7 months after patches became available, that you'll have to put on the people running these instances.

[0] https://github.com/exiftool/exiftool/commit/cf0f4e7dcd024ca9...

[1] http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=libexif

[2] http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=exiv2

[3] https://hackerone.com/reports/1154542

[4] I cannot be sure if he wrote it, or if somebody else contributed it, but at the very least he didn't catch it during review. Looks like he wrote it, tho.

wswope · on Nov 4, 2021

I’ve had a really hard time finding good guides for hardening VMs for malware analysis or processing untrusted inputs as you suggest - any guidance on learning resources?

djbusby · on Nov 5, 2021

I learned the hard way. But for these restricted jails I'd start by making a VM that is not allowed out at all, like DROP on iptavles OUTPUT chain.

yjftsjthsd-h · on Nov 5, 2021

At that point, why not just boot the virtual machine with no network interface attached?

jcpham2 · on Nov 5, 2021

Malware anti detection: some will not launch without a network connection. The idea is to simulate a functional network but not provide access.

djbusby · on Nov 5, 2021

Yea, the VM has a network so it looks "real" and I'm using the network interface to push the files to it (via SSH). Iptables let's related. And on the Host I'm also blocking/logging the VM network.

magicalhippo · on Nov 4, 2021

From the issue report[1] comments:

> Thanks in no small part to your recent findings, [GitLab] are rolling back our policy about paying half-bounties for third party findings. These have great impact on GitLab and we want to continue to incentivize research for high+ severity issues in that area.

At least they're taking these things a bit more seriously now.

[1]: https://hackerone.com/reports/1154542

Someone1234 · on Nov 4, 2021

Anyone have background on what a "third party finding" is? Are they talking about dependencies? Because, to me, this specific bug is both a first AND third party issue (e.g. GitLab not pre-verifying before calling ExifTool).

PragmaticPulp · on Nov 5, 2021

> Anyone have background on what a "third party finding" is? Are they talking about dependencies?

Yes.

I've had some experience with bug bounty programs. The situation with vulnerabilities in dependencies is complicated.

On one hand you want to incentivize researchers to find vulnerabilities in your dependencies, which likely don't have their own bug bounty programs.

On the other hand, you run into situations where researchers discover a bug in a 3rd-party library and, instead of reporting it ASAP, they keep it secret as long as possible while they build up an arsenal of bug bounty exploits to report to multiple companies that use the library. This is a weird disincentive to fix the underlying bug quickly because the researchers know they only have so much time to exploit it in bug bounty programs before one of the companies fixes it upstream.

We also had a problem where amateurs would spam our bug bounty program any time a CVE came out for one of our dependencies, even if they couldn't exploit it. It was relatively easy to close these bug reports because they couldn't provide a proof of concept exploit, but it still wasted a lot of time arguing with them, especially when they'd try to blackmail us on social media for not paying them out (for a bug that didn't exist in our product).

I do not miss my days of dealing with bug bounty programs. Met a few great researchers, but most of the (attempted) participants were trying harder to exploit the bug bounty program than find exploits in our code.

pm90 · on Nov 5, 2021

Fascinating. I would love to read more about your experience if you have a blog.

magicalhippo · on Nov 5, 2021

> Because, to me, this specific bug is both a first AND third party issue

Seems GitLab agreed on that one, they registered two CVEs:

ExifTool https://nvd.nist.gov/vuln/detail/CVE-2021-22204

GitLab https://nvd.nist.gov/vuln/detail/CVE-2021-22205

buildbuildbuild · on Nov 4, 2021

This happened to all Gitlab instances that I manage around 2 days ago. Good to see publicity, I’m still dealing with not-so-understanding abuse departments at my hosting providers.

Sure, my fault for not keeping it up to date. But there is much noise to filter through in the many tools we juggle these days, especially if an organization prefers to self-host.

Someone1234 · on Nov 4, 2021

> But there is much noise to filter through in the many tools we juggle these days, especially if an organization prefers to self-host.

If an organization is too overloaded to patch for six months, maybe they should re-evaluate if self-hosting is the best course of action. Seems like this is a foot-gun of your own creation.

Gigachad · on Nov 5, 2021

There is very little reason to even self host Gitlab unless you are insanely paranoid or for philosophy reasons like debian/gnome.

mdoms · on Nov 5, 2021

Sorry but this comment strikes me as incredibly ignorant. There are lots of good reasons to self host, chief among them reasons which can't be worked around - compliance and data sovereignty.

Gigachad · on Nov 5, 2021

Already covered those under paranoid. There is nothing wrong with self hosting and being paranoid. But that comes with the responsibility to have someone dedicated to keeping up with the latest news on the software they run. Leaving gitlab unpatched for 6 months shows that the company is not capable of running it themselves and should not be self hosting.

throw3455aerr · on Nov 5, 2021

It is false to tell people that self-hosting has to be that hard. If you know what you are doing and put the services in a jail / VM only not being exposed to the internet, you can leave things unpatched very long, and only update them when bug fixes or new features of recent versions are needed.

If you are a 50,000 people company where your own employees could be the adversaries maybe not. But in this case you also have the budget to have a proper security team. Apart from this case, with proper virtualization and no direct internet exposure, you'll find out that 99.99% of CVEs are not a risk to you.

If you are under-budgeted, it's fine to neglect such internal services and check vulnerabilities once a year or less. However always stay on top of the CVEs of internet facing services. My point is your message is basically the propaganda of cloud services "doing it yourself is HARD", "email is HARD", "this and that is HARD" lol.

Self-hosting in jails not being directly exposed to the internet is a productivity booster as it allows you to not fix what works FOR YOU. Just keep using this 4 years old version if it works well for you. But make informed choices as much as possible, try to stay on top of CVEs even if you decide not to patch these internal services 99% of the time. But even if you can't stay on top of your non-internet facing jails, it won't be a real risk 99% of the time.

sslalready · on Nov 5, 2021

Reasons for self-hosting: GitLab.com's not-so-great availability [1], slow code searches (compared to self-hosted with GitLab Advanced Search) and the fact that you can keep your code and resources off the public Internet.

The fact that you do need to upgrade it yourself regularly is indeed a drawback. On the other hand, an Omnibus upgrade has only failed me twice in the last five years or so, so there's little reason to not do automatic upgrades at night and fire off an alert in case something doesn't work as expected afterwards. Their releases are typically solid, so kudos to the team.

[1] https://status.gitlab.com/pages/history/5b36dc6502d06804c083...

danudey · on Nov 5, 2021

My company self-hosts Gitlab, but our instance is inside of our internal network, with no access to the internet, in or out, and almost no access to the corporate network, in or out. Every net/host and port is whitelisted, every change is documented and challenged (do you really need this port?), and you need to use 2FA to access it via HTTP or SSH.

Suffice to say, we fall under "insanely paranoid".

2ion · on Nov 5, 2021

In EU, some sectors that are doing business with public institutions are effectively barred from using US clouds, esp. in education, as contracts tend to demand a degree of control and assurement that no US hyperscaler or SaaS is going to give you.

Also let's not forget that self-hosted still offers features SaaS does not, such as server hooks, which are absolutely not uncommon in grown environments based on gitolite etc looking to migrate.

pm90 · on Nov 5, 2021

Compliance often forces the hand.

SV_BubbleTime · on Nov 4, 2021

Article said GitLab patched back in April. Safe to say you didn’t deploy these patches?

No judgment. I’m paid to make things, not apply patches. This is however why I don’t use self-hosted, pros and cons, etc.

charcircuit · on Nov 5, 2021

Not him, but how are you supposed to know about the update? Do you need to check some page every day if there's an update? Why can't security updates just autoupdate like apps on phones or at least email the admin saying there is an important update.

john_cogs · on Nov 5, 2021

GitLab team member here.

The way to keep up-to-date on critical security updates for GitLab is to sign up for our Security Alerts mailing list: https://about.gitlab.com/company/preference-center/

Gigachad · on Nov 5, 2021

Yes, you are meant to be paying attention to the security news for all tools you self host. That's part of self hosting. If you can't manage that (which is completely fair), then you shouldn't be self hosting.

justin_oaks · on Nov 5, 2021

What's the best way to keep up on security news for a set of tools?

I once had a CVE RSS feed, but it was mostly noise even after I filtered it to only tools/libraries we used.

nightfly · on Nov 5, 2021

Depending on how you have it installed you could have your package management system automatically install the updates. They aren't always perfect though, I've had at least one Gitlab update that required manually running migrations commands since the ones in the update script failed for some reason. I wouldn't trust doing it automatically.

And I'm not sure about gitlab, but their are often mailing lists for security updates for major software packages.

gnramires · on Nov 5, 2021

I highly recommend debian's auto update feature (works on derivatives):

https://www.linode.com/docs/guides/how-to-configure-automate...

I've never had any problems with it, although I just run a couple of servers :).

If everyone enabled this one thing I'm sure the Internet would be significantly safer.

Obs: You can select security updates only which I believe are unlikely to break anything!

buildbuildbuild · on Nov 5, 2021

Thanks for the productive response. I was using Watchtower in Docker Compose on all instances, but I had it misconfigured. My fault for assuming auto updates were working without verifying, lesson learned.

sofixa · on Nov 4, 2021

Honestly GitLab is so stable and foolproof to update you cn have it autoupdate on a schedule like 1-2 weeks after their monthly eelease.

henning · on Nov 4, 2021

Would putting your Gitlab instance behind a VPN mitigate this issue and similar? At least, it would limit attackers to malicious people with VPN access.

staunch · on Nov 4, 2021

Yes, it would. Most people should put their internal cloud infrastructure behind a VPN, if at all possible. It's easy to do and dramatically reduces the surface area for an attacker.

It's not a silver bullet but in many cases it will be the difference between getting hacked and not getting hacked.

It is totally possible to design applications that can be safely exposed to the public internet but it requires some real effort.

jodoherty · on Nov 4, 2021

It would dramatically limit your attack surface to those who could gain access to your VPN.

I prefer requiring TLS mutual authentication with a corporate PKI and issuing employees client certificates.

Doing both wouldn't be a bad idea either.

relaunched · on Nov 5, 2021

The number of software products, SaaS and on-prem, that don't support mutual tls is a disgrace.

tfigment · on Nov 5, 2021

I frequently do mTLS with a reverse proxy (httpd, nginx, caddy, ...). Not perfect but you can tighten the connection security a lot without touching the other service. But by outsourcing it you lose some control.

PLG88 · on Nov 5, 2021

We put all our DevOps tools behind Open Ziti (ziti.dev) which ensures we do not need any public IPs (unlike a VPN or bastion) while giving granular access control for only trusted users.

surfer7837 · on Nov 4, 2021

How can you protect yourself from file upload threats? It's basically the worst possible threat model -- executing complex user input that conforms to a spec that was written 20 years ago by some proprietary company with no security.

Executing everything on an isolated container with no permissions? Audit trial etc/good logging? If someone comes up with an RCE you're basically done for, you can only mitigate it but not completely stop it.

JoshTriplett · on Nov 4, 2021

If you have to process it at all, do it in a WebAssembly sandbox on the server. Or, alternatively, in a seccomp-secured sandbox that isn't allowed to make any system calls whatsoever, just read data from one file descriptor and write processed data to another.

Someone1234 · on Nov 4, 2021

I've seen companies use Headless Chrome and then WebAssembly to process files. You then lock down the Headless Chrome process. You're then "triple covered"; WebAssembly's limited context, JavaScript engine's limited context, and the Chrome process boundary itself.

This is obviously "expensive" though. Doesn't scale very well.

magicalhippo · on Nov 5, 2021

> This is obviously "expensive" though. Doesn't scale very well.

Unlike this issue then, going by the 1Tbps attack it's reportedly causing...

wheresmycraisin · on Nov 5, 2021

.... why webassembly?

JeremyNT · on Nov 5, 2021

Yeah, I don't see the value here either. You don't need wasm or chrome or any of that stuff.

Linux itself has several features that can be used to isolate processes, and there are use friendly tools like bwrap [0] that make configuration easy.

It should be entirely possible to sandbox something like ExifTool itself such that it has no network access and is limited to reading and writing files in a particular directory.

https://wiki.archlinux.org/title/Bubblewrap

JoshTriplett · on Nov 6, 2021

Several reasons:

- It's a separate interface with a different attack surface than your system, so compared to a locked-down version of the normal syscall API, it provides better defense-in-depth.

- It's designed to be a fully self-contained sandbox, by default. If you're locking down everything but reading and writing previously opened file descriptors, you can build a secure sandbox atop syscalls fairly easily. If you need more nuance than that, WebAssembly seems more likely to remain secure, while syscall sandboxes seem more likely to fail-insecure if you get a detail wrong.

- It seems easier to sandbox otherwise-unmodified code that way. If you have code that needs some access to system resources, I think WebAssembly makes it easier to give it just what it needs and nothing else.

(Also, note that I'm not talking about running in a browser; I'm talking about standalone WebAssembly runtimes like wasmtime.)

stefan_ · on Nov 5, 2021

The first step is always "don't do it at all". Here is the original commit:

https://gitlab.com/gitlab-org/gitlab-workhorse/-/commit/8656...

It's hard to find a linked detailed requirement for this. I would certainly prefer if GitLab didn't silently mangle uploaded images (not least if I'm working on an EXIF library..).

Bonus points for a commit that includes the words "perl" and "exec" not also having a detailed security review attached.

armchairhacker · on Nov 4, 2021

This seems like a great use case for formal methods. e.g. in this case EXIF removers which are formally verified to not crash and successfully remove the identifying data.

These types of programs are relatively simple, and this is a case where a formal proof is much better than reliability.

Is anyone aware of research on this?

SahAssar · on Nov 4, 2021

The most straightforward answer is to not process the upload at all, treat it as a binary blob. As for serving it as an image etc. on your site have a strict CSP and turn off mime sniffing (and don't allow SVG uploads as images).

marcosdumay · on Nov 5, 2021

You know, if you to it in a pure Haskell function, you can be assured that the worst it can do is to use too many resources so it kill its own process. If you do it in a Rust function, well, you have no formal guarantees, but you have to get really out of your way to put a vulnerability like that in the code.

What you don't do is pulling an ages old perl codebase to run over complex formats.

tialaramex · on Nov 4, 2021

If you must Wrangle Untrusted File Formats you should do so Safely:

https://github.com/google/wuffs

baggy_trough · on Nov 5, 2021

I do it inside a systemd nspawn container with a volatile file system, no network, minimal caps.

Eduard · on Nov 4, 2021

As the exploit requires uploading a file, is it required for the attacker to first have a user account with file upload permissions?

cjbprime · on Nov 4, 2021

No. The original submission described the vulnerability as requiring an authenticated user, but it was later discovered (recently) that it works unauthenticated too, and that's what kicked off this mass exploitation. No user account is required.

nielsole · on Nov 4, 2021

Anyone who can open issues in a repo I think

mattbee · on Nov 4, 2021

I am confused by this right now. I built a self-hosted gitlab install years ago for my own use, turned off sign-up, no public project listings - and still it was compromised. The HackerOne PoC URL throws a sign-in redirect for me, so I'm still trying to work it out.

ceochronos · on Nov 5, 2021

Same thing happened with my self-hosted gitlab. Besides I got a new user (gitlab) with admin privileges and somehow installed a miner script running under git user (system).

cjbprime · on Nov 5, 2021

It was originally incorrectly thought to require authentication, but actually doesn't.

DelightOne · on Nov 4, 2021

How much effort is it for a company like Gitlab to add kill switches to certain features and to trigger them incase they are exploited in the wild? Has to be pretty fine-grained though to disable analysis or upload on certain files, right?

What are alternatives to automatic immediate updates and kill switches besides not exposing the service to the internet?

bink · on Nov 5, 2021

Kill switches sound great in theory until you disable a service used to host a billion dollar company, a hospital admission network, or your biggest paying customer.

The solution here is to offer a security notification service, which GitLab does. It's up to the admins to maintain these systems. It's GitLab's job to give them the information they need to do so, which they have.

mattbee · on Nov 4, 2021

Yikes, it would have been nice to have a heads-up email from my Gitlab install that a new user had been created on September 3rd :-O

I do occasionally get an email to say my account had been locked for security, but had assumed that was just noise from random login attempts, and the 2FA didn't make me look any further.

1vuio0pswjnm7 · on Nov 4, 2021

Seems like one solution would be to require users uploading images to remove metadata themselves before uploading. The website could reject images that contain matadata.

Was Gitlab was extracting the metadata and using it for some purpose. If not, what is the reason to accept images with metadata. Perhaps they assume their customers prefer less "security" and more "convenience", instead of vice versa (less "convenience", more "security").

Scaevolus · on Nov 5, 2021

Theoretically the pure-perl implementation will be immune to most categories of exploits that sloppy binary parsing runs into-- buffer overflows, use after frees, and so on.

That better security goes out the window once you start using eval(), of course.

1vuio0pswjnm7 · on Nov 5, 2021

Both exiv2 and jhead, as examples, have had numerous such bugs. However, there is a difference IMO between running these programs on one's own images (created by oneself), e.g., before uploading them to the internet, versus running them on untrusted images from others.

Gigachad · on Nov 5, 2021

How do you reject an image that has metadata without doing basically exactly the same process as removing the data?

1vuio0pswjnm7 · on Nov 5, 2021

exiftool does not remove djvu metadata.

im3w1l · on Nov 4, 2021

> GitLab Workhorse could check if the file is a valid TIFF of JPEG before passing it to ExifTool

This approach doesn't work in general. An attacker could craft a polyglot file - and in that case it's a matter of which format is tried first. Valid tiff's could potentially be processed as something entirely different.

setpatchaddress · on Nov 4, 2021

Yup. PoC||GTFO article (one of many, IIRC) on crafting a polyglot PDF / JPEG file:

https://github.com/angea/pocorgtfo/blob/master/contents/arti...

sytse · on Nov 4, 2021

TLDR; A vulnerability in GitLab that was patched on April 14, 2021 is now being exploited to hack into self-managed servers that are accessible through the internet.

For more context see https://about.gitlab.com/blog/2021/11/04/action-needed-in-re... If you are using GitLab.com you are not affected.

BraverHeart · on Nov 5, 2021

Does anyone know how they actually find all these instances to exploit? For instance say your instance is attached to a subdomain, how would they find it? or say your instance is running in a port other than 80, how would they find it as well?

BuffaloBagel · on Nov 4, 2021

My VOIP vendor, voip.ms, has been under attack for weeks. Wonder if this is the source?

menscher · on Nov 5, 2021

Totally different.

The attacks on VOIP vendors mostly used UDP amplification, which relies on having a server that can fake its source IP due to an incompetent (or complicit!) network provider, while this is a botnet (that is only about a week old).

donkarma · on Nov 5, 2021

You say incompetent when most haven't implemented BCP38 iirc

colek42 · on Nov 5, 2021

At this point the threat model for you CI system should be to assume it it comprimised if you havent patched.

kkjjkgjjgg · on Nov 5, 2021

Are there good reasons for people to not turn on automatic updates for security issues, I wonder?

rightbyte · on Nov 5, 2021

You need to hunt breaking updates all the time as a sysadmin with autoupdates.

It it much easier to pinpoint any problem if you are aware the update happes and choose the time to do so.

dorianmariefr · on Nov 5, 2021

Maybe time to think about the hosted GitLab version, GitHub enterprise or just GitHub like everyone else