Hacker News new | past | comments | ask | show | jobs | submit login
Gitlab servers are being exploited in DDoS attacks (therecord.media)
312 points by intunderflow on Nov 4, 2021 | hide | past | favorite | 169 comments



It's somewhat refreshing that the underlying bug isn't from some C or C++ utility, but instead a Perl program using eval: https://github.com/exiftool/exiftool/blob/11.70/lib/Image/Ex... Another instance of "avoid eval as much as possible" for languages that have it.


With dlopen() and libclang/libgccjit, you could argue C (on an OS that supports dynamic loading) has eval too ;)


You have system("gcc uploaded.c && ./a.out") too!


But that code doesn’t run in the memory space of the process.


Then compile it as a library, and dlopen() it after all. (I'm just guessing the gp's point was that JIT isn't really an essential modality.)


Only in programs that invoke it on hacker controlled data.


But if you only evoke eval() on non-hacker controlled data in a scripting language, it's probably fine too.


My personal career favorite use of eval was for an import system that "unrolled" the loop that went through the columns for each row, using eval. It was much faster, but obviously a huge security risk.

Today with modern JIT compilers its probably not much faster...


I saw a very similar thing with eval. There was an evaluation of a nested JSON object

    x["a"]["b"]["c"] 
And the developer decided that this was best evaluated by eval. During the code review phase I talked to them and asked why they were using eval, and they didn't know it could be evaluated directly as they were a little unclear if javascript supported that syntax.


How much longer did said dev continue working there?


Fire the dev, or educate the dev. Sure, eval is just a sure sign of a lot of lack of understanding, but I'd hope for something less vile as eval would be a little more understanding that everyone learns something sometime


It's gross negligence of the dev is only a Javascript dev, but maybe understandable if it's just one of the three languages he uses regularly. I probably wouldn't fire, but I also can't imagine any of my devs doing that.


err, no comment


that dev is now your manager? /s


He is the dev


LOL, no to both!


I was once tasked with creating a new frontend on an old project that had an API endpoint return something like this:

  var array = ["foo", "bar"]
I was expecting xml or json (like the rest of the endpoints), but I realized that they just served this as text, and then eval'd it on the frontend...


This was how you used to do it before JSON.parse was built into the browser.


it was literally a pseudo "standard":

https://en.wikipedia.org/wiki/JSONP


Right, but for it to be JSONP, the response should be injected into a script tag, I believe.


And it should be wrapped in a function call (most times, you can choose which function is called by a query parameter).


> avoid eval as much as possible

"eval is evil", if you will.


Okay, I'll bite. I have known for a long time that eval is evil. Then, last year I actually needed to evaluate a string (from a file). As the case was safe enough (input 100% controlled by me), I did not worry too much and just used eval. But what would be a safe way to evaluate things if you needed to do that in unsafe environment? Say, you would like to make a safe website that allows user type a python code snippet and that would be evaluated/executed server side? Is that even possible?


Sure, that's basically what services like AWS Lambda do. As a starting point, you'd want to run the code in a short-lived VM with little to no network access which is dedicated to just running untrusted code.


Yes, this is what I'm currently doing with a cloud-based website automated-testing system.

In my case, code supplied by the end user is compiled into a different language such that I think I can prevent intentionally-malicious activity.

Nonetheless, spinning up a VM to create an environment in which potentially untrustworthy code is executed before then destroying the VM seems the safest option.


This is a bad solution.

Lambda allows arbitraty network access and may allow access to your AWS resources.

If you have to do this, the best approach is to containerise it, use capabilities to enforce restrictions and run in a virtual machine as isolated as possible.

It's still not great though. Some languages (eg Java) have additional features that help with this though.


Yes, I was simplifying — the important part is keeping the untrusted code off of a trusted network. If you want to do the legwork of carefully segregating things then of course network access can work.

I didn’t mention containers since they don’t provide strong isolation and some people misuse them as though they do. There’s no harm in using them as another layer of defense, but hardware virtualization provides much better security.


Why not just not trust the network or host at all. Put private connectivity inside trusted code using an SDK. Then the trusted apps can only communicate to devices/apps defined and nothing else. Untrusted code cannot access the trusted network as the network is acutally inside the apps/system.


But lambda as presented here isn't so much as a way for you to sandbox code, but AWS sandboxes your code. If you're needing to execute untrusted code, you need to play the role of AWS in this scenario.


When people actually want just a subset of `eval` to permit some custom computation, the proper thing to do is to define that subset as a language and make an interpreter that will read only that language.

As for mostly-full-featured `eval`: iirc Perl itself has a facility to create restricted sub-interpreters and run scripts that can't do certain things. (Though I might be confusing Perl with PHP here.)


Almost all modern programming languages have parsers for that language either as a standard library feature or as a package available in the ecosystem. That means it's very easy to run a production quality parser over an input string and then validate and the interpret the resultant AST as you see fit.

Besides that approach, simply rolling your own parser using a parser combinator library is super simple. The word combinator makes it seem complicated, but it's actually the opposite, using parser combinators is a lot simpler than writing a parser the traditional way you might have learned in formal education.

Implementing a simple DSL like for example an event-filtering language should cost a competent but fully inexperienced programmer maybe 1 or 2 weeks for a proof of concept, and then 3-6 more weeks to get it production ready depending on the feature set of course.

Of course, that's more time than simply running the V8 interpreter over your input string, and maybe running the V8 interpreter over your input string is an awesome way to empower your (trusted) customers.


If the user is trusted but the environment isn’t, the user can sign the string and you can validate it pre execution. If the user is not trusted, you need to contain the execution as much as possible, e.g. container without file or network access to your resources. If there is a need to access specific resource, white list it.


Basically, just use this: https://github.com/judge0/judge0


Tcl's possibility to use a restricted child interpreter and the active file pattern come to mind.


Perl does ship a sandbox module called Safe as a standard module, though I don't know how strong it is.


"The Safe module does not implement an effective sandbox for evaluating untrusted code with the perl interpreter."

https://perldoc.perl.org/Safe


Yeah, though in this case, whitelisting opcodes probably would have at least avoided qx and ``(both exec()). Avoiding the eval() altogether would be the right path, of course.


Of course, we are Google. /s

Have you tried writing your own interpreter?


Public facing?

Handles user input data??

Uses ‘eval’???


The line existed in the 2014 commit which migrated the repo to git. It wasn't designed in the current era of mass automated abuse and internet connected everything.


2014 was very much that era, I'd accept this excuse maybe for 2007


The line wasn't written in 2014, that's just the earliest that the history goes back. Presumably it's no newer than the early 2000s.


Considering that initial commit contains 400k lines, I would say there is a very long history before git was used.


The comment in the code slightly above says:

    this doesn't work in perl 5.6.2! grrrr
So this code is at least 20 years old, and probably pre-2000.


https://exiftool.org/ancient_history.html

    Sept. 26, 2008 - Version 7.44
           - Added read support for DjVu images
There were probably enough systems running perl 5.6.2 around 2008 to cause bug reports, or the code was migrated from an older piece of code and added to ExifTool.

It was not uncommon to manually ./configure, make, make install tarballs locally in those days, especially not on systems like Slackware so I can see it being possible to have old packages installed that were not automagically updated.


I remember before 2007 seeing people using eval and thinking "what on earth is wrong with you?"

I'd accept this for maybe pre-2000, but people should really know better.


That line is in a Perl metadata cleaning library. Not in Gitlab itself. Gitlab, who prides themselves and sell the gospel of improving security, willfully chose to use that library which was obviously never designed for their use case.

I looked at the front-page of that library. It says it cleans metadata from a huge number of file format. Frankly it looks more like something you would use on your own, known safe, files before sharing them online.

I'm not sure the tool is presented as a sanitizer for untrusted input. At least, it does not claim to be.

Why does Gitlab need to clean metadata from DjVu files? Wtf are DjVu files?!


> Why does Gitlab need to clean metadata from DjVu files?

It doesn't. It needs to clean metadata from JPEG and TIFF files. They didn't properly check if the files were actually of those types, and Exiftool performed its own content type detection to end up in its DjVu code.

> Wtf are DjVu files?!

DjVu is basically an alternative to PDF.[1]

[1]: https://en.wikipedia.org/wiki/DjVu


I think they are like a very old version of jpeg 2000s that support pan and zoom / tiling of large image files. It needed a license to create and one to display iirc


>"avoid eval"

We are already avoiding / trying to avoid way too much interesting and useful things. All for the sake of security and only to encounter new ways to be attacked. Instead of "avoid" how about actually organizing worldwide intolerance and hunt for those attackers.


I feel like this article could have been far more useful with the following points being explicitly mentioned, or at least summarized:

  - the problem appeared in GitLab 11.9.0
  - the problem seems to have been fixed in GitLab 13.8.8
  - the vulnerability uses ExifTool, so to exploit it, a user needs to be able to upload images
  - if an update is not (yet) possible, DjVu format file uploads can be blocked to avert this vulnerability
  - this vulnerability isn't relevant for GitLab instances that just have 1 user, or are not publicly accessible on the Internet
Now, i'm not saying that the above is entirely true, but after reading something like the above, one should be able to figure out how to best act:

  - if you have a public GitLab instance with open registrations, consider updating it immediately (with backups in place, of course)
  - if you have a private GitLab instance with many users in your own corporate network (that somehow isn't updated yet) - this is a good reason to put updating it into your agenda today, even if your users aren't necessarily hostile
  - if you have a private GitLab instance or one with registrations closed (e.g. you're the only user or people that you trust use it), mark this down and update whenever possible, however it's probably not necessary right this moment
Of course, i can't say the above with 100% confidence, because the article itself lacks this actionable information to aid in decisionmaking and so i'm left to piece things together on my own, because of which i could be wrong.

On an unrelated note, DjVu is a pretty interesting file format, though sadly i've only seen it be used very sparsely, on some Russian forums for tractor manuals or something: https://en.wikipedia.org/wiki/DjVu


Caution, CVS-2021-22205 has later been found to be exploitable without authentication. No need to be "able to upload an image," unfortunately. Also no need to take the detour through a mirrored repo as sibling suggests; so long as the GitLab instance is accessible from the internet.


> exploitable without authentication

Can you provide more info? I skimmed the upstream ticket but didn't see how. Getting access to anything other than the login page on an accessible-but-private instance seems like a security bug regardless of this CVE.


Full disclosure, I wrote both of these.

The following describes the entire unauthenticated attack:

https://attackerkb.com/topics/D41jRUXCiJ/cve-2021-22205/rapi...

And, if you like that sort of thing, there is a metasploit module you can use to reproduce the unauthenticated attack:

https://github.com/rapid7/metasploit-framework/commit/6f4aa5...


> Specifically HandleFileUploads in uploads.go is called from a couple of PreAuthorizeHandler contexts allowing the HandleFileUploads logic, which calls down to rewrite.go and exif.go, to execute before authentication.

I'm no security guy, but this seems... incredibly dumb? Like even for perfectly secure code, the asymmetry in resource usage alone to submit an image vs. get them to dump a file, shell out to a scanner, and rewrite that file would probably be enough to seriously hurt smaller GitLab VMs.


Not only that, but it still works in exactly this way. I would have thought they would have fixed this "feature." But an unauthenticated user can still provide GitLab with tiff/jpeg images and have them reach ExifTool.


> the vulnerability uses ExifTool, so to exploit it, a user needs to be able to upload images... this vulnerability isn't relevant for GitLab instances that just have 1 user, or are not publicly accessible on the Internet

A lot of private GitLabs contain mirrors of public repositories or vendored copies of public libraries. Our GitLab is private but practically speaking there's probably several hundred people, most of whom we couldn't identify, that could "upload an image" to it.


> if an update is not (yet) possible, DjVu format file uploads can be blocked to avert this vulnerability

Note that the issue was enabled by GitLab not verify the file format, ie that a .jpg is a JPEG and not a DjVu file for example, before handing it over to ExifTool.

So a simple extension/mime check won't cut it.


Nor should they rely on their verification of the file format for anything other than a temporary mitigation of this specific bug! If the ExifToof / DjVu tooling itself isn't secured or sandboxed, this would be a future exploit waiting to happen.


Absolutely. To be honest I was a bit shocked they used an ExifTool binary that had anything but JPEG/TIFF support compiled in.


In Eastern Europe professors share scanned books/papers mostly in .djvu format.


Which afaik amounts in that case to a bunch of grayscale JPEGs. Dunno about other cases, but I've never seen a djvu with extra capabilities besides raster images of the pages.


Just checked my backups. Wildly popular Skanavi books [1] have copy-pasteable text, so they OCRed it.

https://www.amazon.com/Problems-Mathematics-education-instit...


Given the creators, I'd guess the format is also used in certain machine learning circles as well (or was meant to be at least :)).


How do you check if you have been compromised? I did apply the patch a few days after it was released, but im unsure if the system has been compromised....


GitLab team member here.

Please see this post on the GitLab forum for details how you can determine if your instance has been compromised through the exploitation of CVE-2021-22205: https://forum.gitlab.com/t/cve-2021-22205-how-to-determine-i...


"..Bowling said he discovered a way to abuse how ExifTool handles uploads for DjVu file format used for scanned documents to gain control over the entire underlying GitLab web server"

Ah, the good old "File upload vulnerability". File uploads remain one of the hardest problems to solve when it comes to security.


> uploads for DjVu file format used for scanned documents to gain control over the entire underlying GitLab web server

A usecase for WASM's nanoprocesses (capability-based security) perhaps? Of course, until such a time someone exploits the WASM runtime itself.


I really like the idea of using WASM for application "plugins". You can pick your implementation language, and with the right runtime, it can be speedy and secure. Seems like a win.

The blockers, to me, right now are:

1) I mostly write Go, and the Go runtimes didn't seem to be particularly maintained when I last looked. So it just hasn't been worth it to me to do plugins. (I have done "provide your own code to a Go application" before -- "gojq" and "expr" got the job done. Less features than a full WASM runtime, but still pretty powerful.)

2) It's unclear to me which programming language APIs should target. You add a plugin system and you want developers to use it -- what are the popular languages that target WASM? Go and Tinygo look great here, but I have a feeling that the average programmer wants something a little more dynamic for their small plugins. AssemblyScript obviously wants to be the standard, but it's probably too different from Typescript to make it a no-brainer for Javascript developers. Some sort of Perl/Python/Ruby that compiles to WASM would be great, but I haven't seen much progress on that front.

As for running untrusted code in general, I don't think WASM needs to block you. gVisor simulates the linux kernel for containers, providing stronger isolation between them, and is designed to protect you from things like this. (I think the original usecase was running ffmpeg to transcode user-provided video files?) And you can always go full VM on these things. Or take the nuclear option -- carefully audit the untrusted code and build up that trust ;)


Retrofitting seccomp or a custom Apparmor policy are both much lower hanging fruit. The problem is folk tend to link these things directly into their web servers


RLBox is a toolkit for doing just this: https://plsyssec.github.io/rlbox_sandboxing_api/sphinx/

Used in Firefox to sandbox some libraries, including image handling IIRC.


> File uploads remain one of the hardest problems to solve when it comes to security.

Why? It seems like they should have read/write but no execute. What goes wrong?


The file is not executable, but the parser executes it in its own context:

> When uploading image files, GitLab Workhorse passes any files with the extensions jpg|jpeg|tiff through to ExifTool to remove any non-whitelisted tags.

> An issue with this is that ExifTool will ignore the file extension and try to determine what the file is based on the content, allowing for any of the supported parsers to be hit instead of just JPEG and TIFF by just renaming the uploaded file.

> One of the supported formats is DjVu. When parsing the DjVu annotation, the tokens are evaled to "convert C escape sequences".

https://gitlab.com/gitlab-org/gitlab/-/issues/327121


eval $schmuh in a file format parser is... uhm, yeah. Not subtle.


> What goes wrong?

Usually what goes wrong is parsing or processing the files. It's hard to get programmers to safely validate a 20 byte email address string. It takes a lot more care to safely parse a 4,000,000 byte image file in a complex format.


That might be true if you treat all files as opaque blobs, but services like these do things like resizing images, extracting metadata, and converting to other formats.


That doesn't help when the uploaded file is only read/write but crafted in a way to exploit the code processing the file.


You essentially have a gateway into a very large chunk of code that was most likely not built with security in mind on the parsing side, on top of that you are guaranteed write access to some file system.


Ideally you would sandbox this with:

1. No filesystem access

2. No network access

3. Input passed on stdin (or a pre-opened fd)

4. Output passed to stdout (or a pre-opened fd)

5. A hard timeout specified before the process is killed

Suddenly bam, dramatically safer.

If you're looking for a tool that can do all of this for you, check out firejail:

https://firejail.wordpress.com/

It has a ton of options, but you can do all of what I suggested and more, really easily.


Note that firejail had a serious RCE in the way it parses URLs for at least emails. Personally, having read the codebase, I wouldn't put too much faith in its security, given how long it took that the meta character parsing problems were discovered.

A user input facing software should always use fuzzing to uncover such bugs.


File formats like PDF contain a script that must be interpreted in order to read the image.


That sounds like a pdf problem ^.^


PDF is an example, but far from the only one. Many file types have executable/scripted portions or simply very complex file formats that have huge attack surfaces. JPEG XL predictors are Turing complete, given an unlimited image size...

Off the top of my head, a lot of old console exploits on the Wii and GameCube revolved feeding malicious save files to games. Same idea, missing bounds checks or whatever when deserializing some field lets you shell the process. Parsing random files from users is just dangerous.


Some say the P in PDF stands for Problem.


Well, it's a media file parser vulnerability for one, so the filesystem execute bit doesn't protect against it, and the memory nx bit just means you have to get to a page where it is set.


Is that because people use hackjob dependencies to handle it more often than not?


ExifTool is hackjob? I think not.

But also, file uploads should be handled in a jail or box of some type - and never let their analysis make network calls.


It's a perl program that evaluates untrusted strings it finds in user files. What exactly is your standard for "is hackjob"? It appears to be a complete piece of shit.


It's clearly not complete shit, else it wouldn't be used by literally millions of people/systems. ExifTool is so far away from shit that in fact it was chosen by a highly respected company with a very good team.

A hackjob usually has less deploys than my own stuff (which, outside of Windows 2000 components, is less than a few millions)


Just because something is useful doesn't mean it's not a hackjob.

Just looks at how PHP got so popular. Clearly it was useful and thus became popular. I think it's hard to argue that it wasn't a hackjob when it first started.


JavaScript is a perfect instance


Quite so. It was put together in a day, and we have to endure some of its quirks decades later.


By this reasoning "Baywatch" was a great TV show because lots of people saw it.


Baywatch was a great TV show. Maybe you don't like it. Millions all over the world do.


I have never seen it, but am aware of the low brow reputation. If you genuinely like it, could you tell me what you like about it? I recently saw some old episodes of Knight Rider and admit that the show was fun.


ExifTool is the tool to extract and write and fix metadata, bar none. I haven't seen anything that even comes remotely close at how good it is at handling just jpeg metadata (exif, XMP, IPTC, MarkerNotes, and all the fine bugs every vendor has when creating these), let alone other formats too.

ExifTool is essentially for (image) metadata what ffmpeg is for video.

It isn't a hack job, but it started out as one, like so very many other things. Version 1.00 was released end of 2003, while the problematic code was added in 2008. Mind you, the problematic code does not just eval whatever it sees, it tries to make sure the input isn't dangerous first. That check failed spectacularly, defeated by a newline combined with how '$' in perl regex works without any special flags[0]. Using eval was a bad choice to begin with (but I am told something you would commonly see in perl software of the time, yes, even in 2008 still), it was the lazy choice of re-using perl to unescape C-strings instead of rolling your own unescaping code.

So what do you suggest? Use libexif[1]? Exiv2[2]? Where would I run it? Can you suggest any operating system that never had a "stupid" RCE?

>It appears to be a complete piece of shit.

Let's see your code then. All the code you ever wrote that is possibly still in use somewhere. So if you never ever fucked up or got lazy, feel free to cast the first stone, otherwise I would suggest you dial down your rhetoric when it comes to taking massive steaming piles on other people's work.

Yes, Phil Harvey had a "WTF?!"-class security bug here[4], shit happens, "goto fail", let me deRail your yaml and the Debian random number of the day is: 6.

He patched it promptly compared to other vendors and projects (public release on April 13th, while April 7th was the initial bug report, to Gitlab not ExifTool, which Gitlab then passed along[3]).

You can blame him for the bug, you can blame Gitlab for not running exiftool in some sandbox. But that half the gitlab instances remain unpatched some 7 months after patches became available, that you'll have to put on the people running these instances.

[0] https://github.com/exiftool/exiftool/commit/cf0f4e7dcd024ca9...

[1] http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=libexif

[2] http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=exiv2

[3] https://hackerone.com/reports/1154542

[4] I cannot be sure if he wrote it, or if somebody else contributed it, but at the very least he didn't catch it during review. Looks like he wrote it, tho.


I’ve had a really hard time finding good guides for hardening VMs for malware analysis or processing untrusted inputs as you suggest - any guidance on learning resources?


I learned the hard way. But for these restricted jails I'd start by making a VM that is not allowed out at all, like DROP on iptavles OUTPUT chain.


At that point, why not just boot the virtual machine with no network interface attached?


Malware anti detection: some will not launch without a network connection. The idea is to simulate a functional network but not provide access.


Yea, the VM has a network so it looks "real" and I'm using the network interface to push the files to it (via SSH). Iptables let's related. And on the Host I'm also blocking/logging the VM network.


From the issue report[1] comments:

> Thanks in no small part to your recent findings, [GitLab] are rolling back our policy about paying half-bounties for third party findings. These have great impact on GitLab and we want to continue to incentivize research for high+ severity issues in that area.

At least they're taking these things a bit more seriously now.

[1]: https://hackerone.com/reports/1154542


Anyone have background on what a "third party finding" is? Are they talking about dependencies? Because, to me, this specific bug is both a first AND third party issue (e.g. GitLab not pre-verifying before calling ExifTool).


> Anyone have background on what a "third party finding" is? Are they talking about dependencies?

Yes.

I've had some experience with bug bounty programs. The situation with vulnerabilities in dependencies is complicated.

On one hand you want to incentivize researchers to find vulnerabilities in your dependencies, which likely don't have their own bug bounty programs.

On the other hand, you run into situations where researchers discover a bug in a 3rd-party library and, instead of reporting it ASAP, they keep it secret as long as possible while they build up an arsenal of bug bounty exploits to report to multiple companies that use the library. This is a weird disincentive to fix the underlying bug quickly because the researchers know they only have so much time to exploit it in bug bounty programs before one of the companies fixes it upstream.

We also had a problem where amateurs would spam our bug bounty program any time a CVE came out for one of our dependencies, even if they couldn't exploit it. It was relatively easy to close these bug reports because they couldn't provide a proof of concept exploit, but it still wasted a lot of time arguing with them, especially when they'd try to blackmail us on social media for not paying them out (for a bug that didn't exist in our product).

I do not miss my days of dealing with bug bounty programs. Met a few great researchers, but most of the (attempted) participants were trying harder to exploit the bug bounty program than find exploits in our code.


Fascinating. I would love to read more about your experience if you have a blog.


> Because, to me, this specific bug is both a first AND third party issue

Seems GitLab agreed on that one, they registered two CVEs:

ExifTool https://nvd.nist.gov/vuln/detail/CVE-2021-22204

GitLab https://nvd.nist.gov/vuln/detail/CVE-2021-22205


This happened to all Gitlab instances that I manage around 2 days ago. Good to see publicity, I’m still dealing with not-so-understanding abuse departments at my hosting providers.

Sure, my fault for not keeping it up to date. But there is much noise to filter through in the many tools we juggle these days, especially if an organization prefers to self-host.


> But there is much noise to filter through in the many tools we juggle these days, especially if an organization prefers to self-host.

If an organization is too overloaded to patch for six months, maybe they should re-evaluate if self-hosting is the best course of action. Seems like this is a foot-gun of your own creation.


There is very little reason to even self host Gitlab unless you are insanely paranoid or for philosophy reasons like debian/gnome.


Sorry but this comment strikes me as incredibly ignorant. There are lots of good reasons to self host, chief among them reasons which can't be worked around - compliance and data sovereignty.


Already covered those under paranoid. There is nothing wrong with self hosting and being paranoid. But that comes with the responsibility to have someone dedicated to keeping up with the latest news on the software they run. Leaving gitlab unpatched for 6 months shows that the company is not capable of running it themselves and should not be self hosting.


It is false to tell people that self-hosting has to be that hard. If you know what you are doing and put the services in a jail / VM only not being exposed to the internet, you can leave things unpatched very long, and only update them when bug fixes or new features of recent versions are needed.

If you are a 50,000 people company where your own employees could be the adversaries maybe not. But in this case you also have the budget to have a proper security team. Apart from this case, with proper virtualization and no direct internet exposure, you'll find out that 99.99% of CVEs are not a risk to you.

If you are under-budgeted, it's fine to neglect such internal services and check vulnerabilities once a year or less. However always stay on top of the CVEs of internet facing services. My point is your message is basically the propaganda of cloud services "doing it yourself is HARD", "email is HARD", "this and that is HARD" lol.

Self-hosting in jails not being directly exposed to the internet is a productivity booster as it allows you to not fix what works FOR YOU. Just keep using this 4 years old version if it works well for you. But make informed choices as much as possible, try to stay on top of CVEs even if you decide not to patch these internal services 99% of the time. But even if you can't stay on top of your non-internet facing jails, it won't be a real risk 99% of the time.


Reasons for self-hosting: GitLab.com's not-so-great availability [1], slow code searches (compared to self-hosted with GitLab Advanced Search) and the fact that you can keep your code and resources off the public Internet.

The fact that you do need to upgrade it yourself regularly is indeed a drawback. On the other hand, an Omnibus upgrade has only failed me twice in the last five years or so, so there's little reason to not do automatic upgrades at night and fire off an alert in case something doesn't work as expected afterwards. Their releases are typically solid, so kudos to the team.

[1] https://status.gitlab.com/pages/history/5b36dc6502d06804c083...


My company self-hosts Gitlab, but our instance is inside of our internal network, with no access to the internet, in or out, and almost no access to the corporate network, in or out. Every net/host and port is whitelisted, every change is documented and challenged (do you really need this port?), and you need to use 2FA to access it via HTTP or SSH.

Suffice to say, we fall under "insanely paranoid".


In EU, some sectors that are doing business with public institutions are effectively barred from using US clouds, esp. in education, as contracts tend to demand a degree of control and assurement that no US hyperscaler or SaaS is going to give you.

Also let's not forget that self-hosted still offers features SaaS does not, such as server hooks, which are absolutely not uncommon in grown environments based on gitolite etc looking to migrate.


Compliance often forces the hand.


Article said GitLab patched back in April. Safe to say you didn’t deploy these patches?

No judgment. I’m paid to make things, not apply patches. This is however why I don’t use self-hosted, pros and cons, etc.


Not him, but how are you supposed to know about the update? Do you need to check some page every day if there's an update? Why can't security updates just autoupdate like apps on phones or at least email the admin saying there is an important update.


GitLab team member here.

The way to keep up-to-date on critical security updates for GitLab is to sign up for our Security Alerts mailing list: https://about.gitlab.com/company/preference-center/


Yes, you are meant to be paying attention to the security news for all tools you self host. That's part of self hosting. If you can't manage that (which is completely fair), then you shouldn't be self hosting.


What's the best way to keep up on security news for a set of tools?

I once had a CVE RSS feed, but it was mostly noise even after I filtered it to only tools/libraries we used.


Depending on how you have it installed you could have your package management system automatically install the updates. They aren't always perfect though, I've had at least one Gitlab update that required manually running migrations commands since the ones in the update script failed for some reason. I wouldn't trust doing it automatically.

And I'm not sure about gitlab, but their are often mailing lists for security updates for major software packages.


I highly recommend debian's auto update feature (works on derivatives):

https://www.linode.com/docs/guides/how-to-configure-automate...

I've never had any problems with it, although I just run a couple of servers :).

If everyone enabled this one thing I'm sure the Internet would be significantly safer.

Obs: You can select security updates only which I believe are unlikely to break anything!


Thanks for the productive response. I was using Watchtower in Docker Compose on all instances, but I had it misconfigured. My fault for assuming auto updates were working without verifying, lesson learned.


Honestly GitLab is so stable and foolproof to update you cn have it autoupdate on a schedule like 1-2 weeks after their monthly eelease.


Would putting your Gitlab instance behind a VPN mitigate this issue and similar? At least, it would limit attackers to malicious people with VPN access.


Yes, it would. Most people should put their internal cloud infrastructure behind a VPN, if at all possible. It's easy to do and dramatically reduces the surface area for an attacker.

It's not a silver bullet but in many cases it will be the difference between getting hacked and not getting hacked.

It is totally possible to design applications that can be safely exposed to the public internet but it requires some real effort.


It would dramatically limit your attack surface to those who could gain access to your VPN.

I prefer requiring TLS mutual authentication with a corporate PKI and issuing employees client certificates.

Doing both wouldn't be a bad idea either.


The number of software products, SaaS and on-prem, that don't support mutual tls is a disgrace.


I frequently do mTLS with a reverse proxy (httpd, nginx, caddy, ...). Not perfect but you can tighten the connection security a lot without touching the other service. But by outsourcing it you lose some control.


We put all our DevOps tools behind Open Ziti (ziti.dev) which ensures we do not need any public IPs (unlike a VPN or bastion) while giving granular access control for only trusted users.


How can you protect yourself from file upload threats? It's basically the worst possible threat model -- executing complex user input that conforms to a spec that was written 20 years ago by some proprietary company with no security.

Executing everything on an isolated container with no permissions? Audit trial etc/good logging? If someone comes up with an RCE you're basically done for, you can only mitigate it but not completely stop it.


If you have to process it at all, do it in a WebAssembly sandbox on the server. Or, alternatively, in a seccomp-secured sandbox that isn't allowed to make any system calls whatsoever, just read data from one file descriptor and write processed data to another.


I've seen companies use Headless Chrome and then WebAssembly to process files. You then lock down the Headless Chrome process. You're then "triple covered"; WebAssembly's limited context, JavaScript engine's limited context, and the Chrome process boundary itself.

This is obviously "expensive" though. Doesn't scale very well.


> This is obviously "expensive" though. Doesn't scale very well.

Unlike this issue then, going by the 1Tbps attack it's reportedly causing...


.... why webassembly?


Yeah, I don't see the value here either. You don't need wasm or chrome or any of that stuff.

Linux itself has several features that can be used to isolate processes, and there are use friendly tools like bwrap [0] that make configuration easy.

It should be entirely possible to sandbox something like ExifTool itself such that it has no network access and is limited to reading and writing files in a particular directory.

https://wiki.archlinux.org/title/Bubblewrap


Several reasons:

- It's a separate interface with a different attack surface than your system, so compared to a locked-down version of the normal syscall API, it provides better defense-in-depth.

- It's designed to be a fully self-contained sandbox, by default. If you're locking down everything but reading and writing previously opened file descriptors, you can build a secure sandbox atop syscalls fairly easily. If you need more nuance than that, WebAssembly seems more likely to remain secure, while syscall sandboxes seem more likely to fail-insecure if you get a detail wrong.

- It seems easier to sandbox otherwise-unmodified code that way. If you have code that needs some access to system resources, I think WebAssembly makes it easier to give it just what it needs and nothing else.

(Also, note that I'm not talking about running in a browser; I'm talking about standalone WebAssembly runtimes like wasmtime.)


The first step is always "don't do it at all". Here is the original commit:

https://gitlab.com/gitlab-org/gitlab-workhorse/-/commit/8656...

It's hard to find a linked detailed requirement for this. I would certainly prefer if GitLab didn't silently mangle uploaded images (not least if I'm working on an EXIF library..).

Bonus points for a commit that includes the words "perl" and "exec" not also having a detailed security review attached.


This seems like a great use case for formal methods. e.g. in this case EXIF removers which are formally verified to not crash and successfully remove the identifying data.

These types of programs are relatively simple, and this is a case where a formal proof is much better than reliability.

Is anyone aware of research on this?


The most straightforward answer is to not process the upload at all, treat it as a binary blob. As for serving it as an image etc. on your site have a strict CSP and turn off mime sniffing (and don't allow SVG uploads as images).


You know, if you to it in a pure Haskell function, you can be assured that the worst it can do is to use too many resources so it kill its own process. If you do it in a Rust function, well, you have no formal guarantees, but you have to get really out of your way to put a vulnerability like that in the code.

What you don't do is pulling an ages old perl codebase to run over complex formats.


If you must Wrangle Untrusted File Formats you should do so Safely:

https://github.com/google/wuffs


I do it inside a systemd nspawn container with a volatile file system, no network, minimal caps.


As the exploit requires uploading a file, is it required for the attacker to first have a user account with file upload permissions?


No. The original submission described the vulnerability as requiring an authenticated user, but it was later discovered (recently) that it works unauthenticated too, and that's what kicked off this mass exploitation. No user account is required.


Anyone who can open issues in a repo I think


I am confused by this right now. I built a self-hosted gitlab install years ago for my own use, turned off sign-up, no public project listings - and still it was compromised. The HackerOne PoC URL throws a sign-in redirect for me, so I'm still trying to work it out.


Same thing happened with my self-hosted gitlab. Besides I got a new user (gitlab) with admin privileges and somehow installed a miner script running under git user (system).


It was originally incorrectly thought to require authentication, but actually doesn't.


How much effort is it for a company like Gitlab to add kill switches to certain features and to trigger them incase they are exploited in the wild? Has to be pretty fine-grained though to disable analysis or upload on certain files, right?

What are alternatives to automatic immediate updates and kill switches besides not exposing the service to the internet?


Kill switches sound great in theory until you disable a service used to host a billion dollar company, a hospital admission network, or your biggest paying customer.

The solution here is to offer a security notification service, which GitLab does. It's up to the admins to maintain these systems. It's GitLab's job to give them the information they need to do so, which they have.


Yikes, it would have been nice to have a heads-up email from my Gitlab install that a new user had been created on September 3rd :-O

I do occasionally get an email to say my account had been locked for security, but had assumed that was just noise from random login attempts, and the 2FA didn't make me look any further.


Seems like one solution would be to require users uploading images to remove metadata themselves before uploading. The website could reject images that contain matadata.

Was Gitlab was extracting the metadata and using it for some purpose. If not, what is the reason to accept images with metadata. Perhaps they assume their customers prefer less "security" and more "convenience", instead of vice versa (less "convenience", more "security").


Theoretically the pure-perl implementation will be immune to most categories of exploits that sloppy binary parsing runs into-- buffer overflows, use after frees, and so on.

That better security goes out the window once you start using eval(), of course.


Both exiv2 and jhead, as examples, have had numerous such bugs. However, there is a difference IMO between running these programs on one's own images (created by oneself), e.g., before uploading them to the internet, versus running them on untrusted images from others.


How do you reject an image that has metadata without doing basically exactly the same process as removing the data?


exiftool does not remove djvu metadata.


> GitLab Workhorse could check if the file is a valid TIFF of JPEG before passing it to ExifTool

This approach doesn't work in general. An attacker could craft a polyglot file - and in that case it's a matter of which format is tried first. Valid tiff's could potentially be processed as something entirely different.


Yup. PoC||GTFO article (one of many, IIRC) on crafting a polyglot PDF / JPEG file:

https://github.com/angea/pocorgtfo/blob/master/contents/arti...


TLDR; A vulnerability in GitLab that was patched on April 14, 2021 is now being exploited to hack into self-managed servers that are accessible through the internet.

For more context see https://about.gitlab.com/blog/2021/11/04/action-needed-in-re... If you are using GitLab.com you are not affected.


Does anyone know how they actually find all these instances to exploit? For instance say your instance is attached to a subdomain, how would they find it? or say your instance is running in a port other than 80, how would they find it as well?


My VOIP vendor, voip.ms, has been under attack for weeks. Wonder if this is the source?


Totally different.

The attacks on VOIP vendors mostly used UDP amplification, which relies on having a server that can fake its source IP due to an incompetent (or complicit!) network provider, while this is a botnet (that is only about a week old).


You say incompetent when most haven't implemented BCP38 iirc


At this point the threat model for you CI system should be to assume it it comprimised if you havent patched.


Are there good reasons for people to not turn on automatic updates for security issues, I wonder?


You need to hunt breaking updates all the time as a sysadmin with autoupdates.

It it much easier to pinpoint any problem if you are aware the update happes and choose the time to do so.


Maybe time to think about the hosted GitLab version, GitHub enterprise or just GitHub like everyone else




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: