Hacker News new | past | comments | ask | show | jobs | submit login

FYI pypi tokens look like pypi-9NX39cdNn0AH1cCl1bMT48eKzf4Rhvw1mipk1FZTPrpR9

The integration means that GitHub knows to recognize this format, and calls some API of pypi.org when it finds one so PyPI can revoke it.

As always, please allow me to lament that we don't have a standard for this, such as secret-token:pypi.org/9NX39cdNn0AH1cCl1bMT48eKzf4Rhvw1mipk1FZTPrpR9, which would let any system know that this string is a secret and that pypi.org should be notified (for example via POST pypi.org/.well-know/compromised-secret). See also https://news.ycombinator.com/item?id=25978185




Hey there! I designed and implemented PyPI's tokens (although not the secret scanning integration).

They're actually just macaroons[1] internally, which means that they could easily be upgraded at some point to include a reporting URL like you mention.

Just as a tidbit: they were originally prefixed with "pypi:" rather than "pypi-", but that colon caused problems for a few packaging utilities. Any sort of in-band signaling like that is unlikely to gain widespread adoption for exactly that reason :-)

[1]: https://en.wikipedia.org/wiki/Macaroons_(computer_science)


Interesting. I can get the "pypi.org" ___domain from the base64-encoded part, however I don't see anything about revocation in the paper.

Your reporting endpoint seems protected by a secret key that GitHub holds. Any reason PyPI can't accept anonymous submission of compromised tokens? If I find a PyPI token on my own server, can I not post it to https://pypi.org/_/github/disclose-token without getting a key from you first?


> I don't see anything about revocation in the paper.

I don't believe it's something standardized or considered by the original whitepaper. Macaroons have the ability to contain arbitrary data, however, so it wouldn't be difficult to add revocation information to them.

> If I find a PyPI token on my own server, can I not post it to https://pypi.org/_/github/disclose-token without getting a key from you first?

I wasn't part of the design, but my first thought goes to preventing the endpoint's use as an oracle: after a compromise, a malicious agent might find it useful to have an unlimited endpoint to test their stolen credentials against. Restricting use to a limited set of trusted entities avoids that.


I don't think allowing revocation of a token by any bearer of the token is much of a security issue. Consider a real world example, if one finds a credit card someone dropped on the street it can be reported as lost and revoked by the issuer even though the reporter is not the owner.

As for the endpoint being an oracle, the endpoint doesn't really need to respond to the reporting client other than the revocation request has been received.


> I don't think allowing revocation of a token by any bearer of the token is much of a security issue. Consider a real world example, if one finds a credit card someone dropped on the street it can be reported as lost and revoked by the issuer even though the reporter is not the owner.

Whether or not it's a security issue depends on how the token is being used. Allowing potentially arbitrary parties to revoke tokens right before, say, a critical security release feels like a potential issue to me. Then again, I suppose they could do that by proxy by just publishing it on GitHub and letting the secret scanner do the work.

Long story short: I'm idly speculating. For all I know, they did it because allowing arbitrary parties to report leaked secrets would result in unacceptably high FP rates. I wasn't privy to the decision.


> Allowing potentially arbitrary parties to revoke tokens right before, say, a critical security release feels like a potential issue to me

If the third-party has the token, they can make releases *adding* critical security issues.


Don't every other endpoints work as the oracle you describe? Are you worried about rate-limiting specifically?

Also, the endpoint sends a 204 with no information about the validity of tokens, making it not much of an oracle. I think the payload is processed in the background too, preventing timing attacks.


> Don't every other endpoints work as the oracle you describe? Are you worried about rate-limiting specifically?

Rate-limiting was just the easy example. Other endpoints are subject to additional constraints: tokens don't directly carry their user information (IIRC), so someone with a collection of stolen tokens may not know which projects they can control. Similarly, tokens are scoped, so "create a new project" isn't an ability that an arbitrary token can necessarily do to gain more information about its rightful owner.

Like I said, I don't know too much about the actual design decisions for that endpoint! That was an educated guess, based on what I might have done.


According to the documentation (https://docs.github.com/en/developers/overview/secret-scanni...), secret issuers specify a regex that can detect secrets they've issued. "Be as precise as possible, because this will reduce the number of false positives" - that's the guideline from GitHub. Github runs the regex on every commit that is uploaded and informs the secret provider when a match occurs.


I see that they document the alerting endpoint there. The only piece missing is building the URL from the token format. I hope we get there someday, and everyone can deploy this without having to replicate GitHub's registry of token formats.

This page also mentions that they "strongly recommend you implement signature validation in your secret alert service", but I'm not sure why. Isn't the fact that they send valid tokens proof that they have really found a leak?


So, you can submit an overly generous (or specifically crafted) regex to get notified of tokens that someone else issued if you know their format?


I wonder if false-positives often result in GitHub sending secrets to the wrong service.


I wonder if any of those services have a combination of bad regexes and bad validation and could be SQL injected by committing a malicious faux-token to GitHub.


One cool data format standard I only recently learned about is multihash[1] - a self-describing hash format: the first byte represents the hashing algorithm, the second byte represents the length of the hash, and the subsequent [length] bytes is the actual hash.

Something similar for tokens would be really useful.

[1] https://multiformats.io/multihash/


Until someone stores a secret without the prefix - because it's always the same, right?


As long as the API wrappers don't mess this up, this has no reason to happen.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: