n5321 | 2024年11月17日 18:22

Tags:


MALE SPEAKER: Thank you for coming, everybody. Some of you have probably already heard of Linus Torvalds. Those of you who haven't, you're the people with Macintoshes on your laps. He's a guy who delights in being cruel to people. His latest cruel act is to create a revision control system which is expressly designed to make you feel less intelligent than you thought you were.一个好用,但是难学的工具。

Thank you for coming down today, Linus. I've been getting emails for the past few days from people saying, "Where's Linus? Why hasn't he measured my tree? Doesn't he love me anymore?"

And he walked into my office this afternoon. "What are you doing here?" But thank you for taking the time off. So Linus is here today to explain to us why on Earth he would write a software tool which only he is smart enough to know how to use.

Thanks, Linus.

LINUS TORVALDS: So I have a few words of warning, which is I don't actually do speaking very much, partly because I don't like speaking, partly because over the last few years everybody actually wants me to talk about nebulous visions for the next century about Linux. And I'm a tech geek, so I actually prefer talking about technology. So that's why I am not talking about the kernel, because it's just too big to cram into a one-hour talk. Although apparently, Andrew did that two days ago. And I'm instead talking about Git, which is the source control management system that we use for the kernel. I'm really, really, really bad at doing slides, which means that if we actually end up following these slides, you will be bored out of your mind, and the talk will probably not be very good anyway. So I am the kind of speaker who really enjoys getting questions. And if that means that we kind of veer off in a tangent, you'll be happier, I'll be happier, the talk will probably be more interesting anyway. I don't know how you do things here at the Google talks, but I'm just saying, don't feel shy as far as I'm concerned. If your manager will shoot you, that's your problem.

I want to give a few credits before I start. Credit CVS in a very, very negative way because, in many ways, when I designed Git, it's the "What would Jesus do?" except it's "What would CVS never, ever do?" kind of approach to source control management.

CVS (Concurrent Versions System) 是一种早期的版本控制系统,广泛用于 1990 年代的软件开发。

I've never actually used CVS for the kernel. For the first 10 years of kernel maintenance, we literally used tarballs and patches, which is a much superior source control management system than CVS is.

But I did end up using CVS for seven years at a commercial company, and I hated it with a passion.

When I say I hate CVS with a passion, I have to also say that if there are any SVN users in Subversion, users in the audience, you might want to leave because my hatred of CVS has meant that I see Subversion as being the most pointless project ever started because the slogan for Subversion for a while was "CVS done right" or something like that.

And if you start with that kind of slogan, there's nowhere you can go. There is no way to do CVS right.

So that's the negative kind of credit. The positive credit is BitKeeper. And I realize that a lot of people thought there was a lot of strife over BitKeeper and that the parting was very painful in many ways.

As far as I'm concerned, the parting was amicable, even though it looked very non-amical to outsiders.

And BitKeeper was not only the first source control system that I ever felt was worth using at all, it was also the source control system that taught me why there's a point to them and how you actually can do things.

So Git, in many ways, even though from a technical angle it is very, very different from BitKeeper (which was another design goal because I wanted to make it clear that it wasn't a BitKeeper clone), a lot of the flows we use with Git come directly from the flows we learned from BitKeeper.

And I don't think you use BitKeeper here inside Google. As far as I know, BitKeeper is the only commercial source control management system that actually does distribution. And if you need a commercial run, that's the one you should use, for that reason.

I'd also like to point out that I've been doing Git now for slightly over two years, but while I started it and I made all the initial coding design, it's actually being maintained by a much more pleasant person, Junior Hermano, for the last year and a half.

And he's really the person who actually made it more approachable for mere mortals. Early versions of Git did require a certain amount of brainpower to really wrap your mind around. It's gotten much, much easier since.

Obviously, the way I always do everything is I try to get everybody else to do as much as possible so that I can sit back and sip my pina colada. So there's been a lot of other people involved, too.

That's the credits. With those out of the way:

So this slide is now one day old, and I didn't actually do the slides last night because last night I was out carousing and eating sushi.

But the slides will talk about the implementation of a high-performance distributed content management thing. And the keyword here is actually the distributed part.

I will start off trying to explain why distribution is so important. If we never get past that point, I will actually be happy.

If we never get to actually what Git implementation internally is, it's fine. I am not also trying to teach you how to use Git. There is this thing called google.com. You may have seen it. It has this thing you can type things into. You type "Git" and then you press the "I'm Feeling Lucky" button, and you will actually get the home page.

The home page has tutorials; it has the user manual; they're all in HTML. If you actually want to learn to use Git, that's where you should start—not at this talk.

But as mentioned, if we actually start veering off topic into other tangents because of questions, it's all good. I already gave you kind of a heads-up warning on this.

I use the SCM, which I consider to mean Source Code Management—that is, revision control. Some other people think SCM means Software Configuration Management and see it as a much bigger feature, including release management and stuff like that.

That's not what I'm talking about, although Git is clearly relevant in that setting, too.CVS, we already went there. You can disagree with me as much as you want, but during this talk, by definition, anybody who disagrees is stupid and ugly. So keep that in mind.When I'm done speaking, you can go on with their lives. Right now, yes.I have strong opinions, and CVS users, if you actually like using CVS, you shouldn't be here. You should be in some mental institution somewhere else.

So before I actually go and talk about the whole distribution thing—which I think is the most important part—I'll talk a bit about the background because it invariably comes up.

If people have heard about Git, a lot of the things they've heard about are the background for doing it in the first place.


Background Information

One piece of background information is: I really am not an SCM person.

I have never been very interested in revision control. I thought it was evil until I met BitKeeper. I actually credit that to some degree for why Git is so much better than everything else.

It's because my brain did not rot from years and years of thinking CVS did something sane.


Why Git Was Needed

I needed a replacement for BitKeeper.

The reason for that was: BitKeeper is a commercial product, but BitMover and Larry McVoy allowed it to be used freely for open source projects, as some of you may know. The only restriction was you were not supposed to reverse-engineer it, and you weren't supposed to try to create a competing product. And I was happy with that because, quite frankly, as far as I'm concerned, I do open source because I think it's the only right way to do software.

But at the same time, I'll use the best tool for the job, and quite frankly, BitKeeper was it.

However, not everybody agreed with me. They are ugly and stupid. But they caused problems, and it resulted in the fact that Larry and I had several telephone conversations which ended up saying: We'll all be much happier if we just part ways and don't make this any worse.


Linux 2.6.12-rc2 Release

So we did.

And I made the Linux 2.6.12-rc2 release about two years ago and said: I'm not going to touch Linux until I have a replacement for BitKeeper for doing source code maintenance. One of the replacement options was going back to tarballs and patches, but nobody really liked that anymore.


Evaluating Alternatives

So I actually looked at a lot of alternatives. Most of them I could discard without even trying them out.

  • If you're not distributed, you're not worth using. It's that simple.
  • If you perform badly, you're not worth using. It's that simple.
  • If you cannot guarantee that the stuff I put into an SCM comes out exactly the same, you're not worth using.

Quite frankly, that pretty much took care of everything out there.

Issues with Other SCMs

There's a lot of SCM systems that do not guarantee that what you get out of it again is the same thing you put in.保证代码的一致性!内容、磁盘损坏。(分布式可以解决这个问题!)

If you have memory corruption, if you have disk corruption, you may never know. The only way you'll know is you notice that there's corruption in the files when you check them out.The source control management system does not protect you at all, and this is not even uncommon. It is very, very common.


Performance Issues

The performance issue: one of the things I kind of liked was a system called Monotone, which actually, I think there was a talk at Google about them some time ago, I'm not sure.

It had a lot of interesting ideas, but the performance was so horrendously bad that I tried it for a day and realized that I cannot use it.


The Decision to Write Git

The end result was: I decided I can write something better than anything out there in two weeks, and I was right. 史上最araggrent的工程师吗?


Next Topic: Distribution

So now we get to distribution. And this is the worst slide of them all, and I'm not very proud of it. And the problem is distribution is really, really important, but when I tried to make slides about it, I could not do it.

Part of it is my obvious artistic talents, which are on display for all of you, but part of it is that it's really hard to explain. So before we can start, I'd like to know how many people are used to the notion of a truly distributed source control management system? Are most of you kernel developers?  No? OK. So there were maybe 10 hands coming up.


On Distribution and Centralized Models

Being distributed very much means that you do not have one central location that keeps track of your data. No single place is more important than any other single place.

For example, this is why I would never touch Subversion with a 10-foot pole. There is a massive Subversion repository, and it's where everybody has to write. The centralized model just doesn't work when you want to be. Let's look at a few of the cases.


Offline Work and Distribution

I say it's so much more than just offline work, but the offline work part is actually maybe the most obvious thing. You can take a truly distributed source control management system, you can take it on a plane, and even if they don't offer Wi-Fi and satellite hookups, you just continue working. You can look at all your logs, you can commit, you can do everything you would do even if you were connected to a nice gigabit Ethernet directly to the backbone.

That is really important.(分布式系统)


Importance of Branching(这个确实是牛逼)

It is doubly important when you have hundreds or thousands of people working on the same project, and they may not be literally disconnected, but in practice, they aren't really well-connected either.

So part of distribution is this offline work theme.

Even if it's not completely offline, it is important to be able to do everything you want to do from any location without having to access the server.

What that basic fact actually results in is that you effectively have a lot more branching because everybody who has a complete repository and can do commits on their own will effectively have their own branch—even if you don't realize it.

Even if you think of your project as just having a single branch, every single time you disconnect your laptop and start working with it, you are on your own branch.


Branching in CVS vs. Git

This is really, really important and is very different from anybody who's used CVS, where branching is considered something that only true gurus do. How many of you have ever used CVS? OK, everybody. How many of you have really done a branch and ever merged it in CVS? Good job. I mean, it wasn't everybody, but it was actually more than I expected.How many of you enjoyed the experience? OK, so there were a couple. But it is considered hard.

In CVS, when you merge a branch—I've done it as little as possible, but I've had to do it—what you do is you plan ahead for a week and then you basically set aside one day for doing it.这么复杂。

Am I wrong? I'm not seeing a lot of people say, "No, it was easy. I liked it." It's horrible.制造业的研发也需要linus这样第一流的大神。


Git’s Approach to Branching

If you're distributed, you have to realize that every single person has their own branch. It's not something you even have to set up. It just is.

In fact, in Git, we like branches so much that a lot of people just have five or ten or fifteen of them. Just because once you realize that you have to have a special branch anyway, you might as well have many. One of the branches you do some experimental work on, and one of the branches you do maintenance on.

So branching is much more inherent when you do distribution.


Trust and Backup in Distributed Models

One of the other things that, to me, is very important is that by being distributed, you also automatically get to be slightly more trustworthy.

I have a theory of backup switches. I don't do them. I put stuff up on one side, and everybody else mirrors it. And if I crash my own machine, I don't really care because I can just download my own work right back.防系统崩溃。

And it works beautifully well, and I don't have to have an MIS department. I heartily suggest everybody else do the same. But this only really works in a distributed environment.

牛逼!


On Commit Access

If you use CVS, you can't do this. What do you use here? Perforce?

AUDIENCE: Perforce.
LINUS TORVALDS: I'm sorry. I'm sure it's better than CVS. (whispers)

One of the really nice things—which maybe you don't have this issue inside a company, but we certainly have it in every single open source community I've ever seen that uses CVS or Subversion—is you have this notion of commit access.


Politics of Commit Access

Because you have a central repository, it means that everybody who is working on that project needs to write to the central repository. Since you don't want everybody to write to the central repository because most people are morons, you create this class of people who are ostensibly not morons.

And most of the time, what happens is you make that class too small because it's really hard to know if a person is smart or not.

Even when you make it too small, you will have problems. This whole commit access issue—which some companies are able to ignore by just giving everybody commit access—is a huge psychological barrier and causes endless hours of politics in most open source projects.

If you have a distributed model, it goes away.

Freedom in Distributed Models

Everybody has commit access. You can do whatever you want to your project.编程界的市场经济。

You just get your own branch, you do great work, or you do stupid work. Nobody cares.

It's your copy, it's your branch. And later on, if it turns out you did a good job, you can tell people, "Hey, here's my branch. And by the way, it performs 10 times faster than anybody else's branch, so nyah nyah nyah, how about pulling from me?"

And people do. And that's actually how it works, and we never have any politics. That's not quite true, but we have other politics. We don't have to worry about the commit access thing.

And I think this is a huge issue, and that alone should mean that every single open source system should never use anything but a distributed model. You get rid of a lot of issues.


On Release Processes in Distributed Models

One of the things that commercial companies find helpful with distributed models is the release process.

You can have a verification team that has its own tree, and they pull from people and verify it.

When they've verified it, they can push it to the release team and say, “Hey, we have now verified our version.”(电机开发是不是应该也是这样子的?)

The development people can go on playing with their head. Instead of having to create tagged branches or whatever you do to try to keep off each other's toes, you keep off each other's toes by just having every single group maintain its own tree, tracking its work and goals.

So distributed is really, really central to any SCM you should ever use. So get rid of Perforce now.


On Distribution Slides and Audience Interaction

LINUS TORVALDS:
It's sad, but it is so, so true. That was my only real slide about distribution. I'd love to get questions because we're now moving into other areas that—

AUDIENCE:
So how would you do it? If you had this monstrously, awesomely big code base, and you wanted to use this without stopping business for six months, how would you do it?

LINUS TORVALDS:
Stay by the mic because I couldn't quite make out your question. OK, he went away. How would you do this?


Example of Actual Distribution

LINUS TORVALDS:
So an example of actual distribution is you have a group of five people working on one small, particular feature.

That means that for a while, that feature will be very, very broken, right? Because nobody actually creates perfect code the first time around—except me—but there's only one of me.

What happens is they need to have their own tree that they can work in without affecting other people. You can do this in many different ways.

In CVS, one of the most common ways, because branches are so painful, is that you don't actually commit. You never commit until it passes every single test. For example, at your company, you may have a very strict committing rule saying you will never, ever commit until it's passed the whole test suite. And by the way, the fact that the test suite takes two hours to run? Tough. You cannot afford to commit. This is something that happens at every single company. I bet it happens even here at Google. You probably have a strict test suite, and you are not supposed to commit unless it passes.

And then, in practice, people make one-liner changes and ignore the test suite because they know the one-liner changes can't possibly break. This happens. 限制了想象力,缺乏想象力。改个ID都改不动。

Centralized Commit Pain Points

This is a horrible, horrible model. It just means that you make huge commits because you commit something after you've worked on it for two weeks, and you have three people working in the same sandbox because, before they commit, they can't see the changes that the other people made.

This is common. It happens everywhere. It's scary.

The other alternative is to use branches even in a centralized environment. But branches always end up being pretty expensive to do, so you can't do them for experimental features.

You don't know beforehand if it's something that's going to take one day or two weeks. But most of the time, most programmers say, “Hey, I can do this in 48 hours.”

And it turns out—yeah, no, you couldn't.

But because you feel you can do it in 48 hours, creating a branch, even in systems that are better at creating branches than CVS, is a big pain. So you don't do it because you think you can get it resolved, and you're back to case number one.


Distributed Environment Benefits

In contrast, in a distributed environment, what you do is you have five people. They pull the current head, which is hopefully good and tested, and they start working on it and start committing to it.

You don't need to wait for two weeks until your commits are stable because your commits are always local. And what happens is within that group of five people, you can pull from each other. That's what distributed means. 一个更好的分享、合作、竞争的平台。

There's no central location. It means everybody's the same. So you can merge between yourselves.

Not only can you commit every single line if you want to without having to run the two-hour test suite, but you can then communicate by pulling and merging each other's work.

If one person finds a bug, they commit the fix and tell the other four people, “Hey, my repository has a fix for this.”


Merge Process and Trust Networks

When that group is done two weeks later, they can tell their manager, “Hey, we've done this. Can you ask the main group to pull? They'll get this new feature. By the way, we've tested it over two weeks, and it works and performs better because we've timed it before asking anyone else to look at it.” That's a hugely better model for doing development.  

And this is the model that the kernel uses.

It turns out that in many places, we don't need all that power, even in the kernel. So people usually don't pull within one group, but it does happen. For example, the networking people sometimes affect the NFS people, and the fact that they can synchronize actually helps.

This is a real, practical advantage.

AUDIENCE:

So it feels like the politics has just been moved to an indirect political question. If everyone's got access and they're all playing with their branches and having fun, at the end of the day, there has to be merging and resolving—unless you have 80 billion flavors of every Linux kernel.

LINUS TORVALDS:
Absolutely. There will be 1,000 or maybe 20,000 different branches, but in practice, you won't ever see them because they won't care. You will see a few main branches; maybe you'll only see one.

In the case of the kernel, a lot of people only really look at my branch.


Network of Trust

Even though there are lots of branches, you can ignore them. What happens is the way merging is done is the way real security is done—by a network of trust.

linus本质上在做一个manager!

If you've ever done any security work and it did not involve the concept of a network of trust, it wasn't security work. It was something else.

But you will inevitably have cases where two maintainers send me the question:
"Please pull my stuff."

And I pick one of them at random—usually because their mail happened to be first in my mailbox—and I pull their stuff.

Another person may have made changes that clash so much that I think, "I could fix this up, but I really don't want to."

I didn’t write the code; it’s not my area of expertise—it’s networking or something like that. I can’t really judge it, and I can’t test it. Asking me to resolve the merge is just crazy. That’s not how you should do things.


Merge Resolution in a Distributed System

OK, the Windows machine flaked out again. Remember: distribution means nobody is special.

So instead of me merging, I just push out my first tree that didn’t have any merge issues. Then I tell the second person:
"Hey, I tried to pull from you, but I had merge conflicts, and they weren’t completely trivial, so I decided you get to do the honors instead."

And they do.

They know what they’re doing because it’s their changes. So they can do the merge. And they probably think I’m a moron because the merge was so easy and obvious I should have taken their code.

But they do the merge, and then they update their tree and say:
"Hey, can you pull from me now?"

And I pull from them. They did all the work for me. That’s what it’s all about. They did all the work for me.没完全看懂。

And I take the credit. Now I just need to figure out step three: profit.

But that’s another thing that comes very naturally from being distributed. It’s not something that is special to Git. 


Audience Question: Why Distributed Systems?

AUDIENCE:
So I guess I don’t entirely understand why you think that it’s necessary to have a distributed system.

It seems like you get a lot of the good effects, at least for corporate development. For open source development, it seems very useful that everybody can work on their own.

But when you really have a centralized, corporate tree, wouldn’t a centralized system with really cheap branches give you pretty much the same effect? Or is that just impossible to do?


LINUS TORVALDS:
No. I will argue that centralized systems can’t work.

But it is clearly true that if you’re in a tightly controlled corporate environment, centralized systems work better. And it’s unquestionably true that people have been able to use centralized systems for the last 35 years.

Nobody’s really arguing that centralized systems cannot work. They cannot work as well as distributed systems.


Issues with Centralized Systems

One of the issues you tend to have is centralized systems inevitably have problems when you have groups in different locations.

It tends to work really well if you have a really beefy background fiber. And I guess for Google, you probably do have some kind of network going—I don’t know.

Maybe it’s not as big of an issue for you as it is for other projects, but trust me: not having to go over the network for everything is a huge performance saver.


Performance Comparison

I can’t show you demonstrations—and it’s not a very interesting demonstration anyway—but this is a laptop that is, what, four or five years old?

It’s like a Pentium M 1.6 GHz thing.

I could show you doing a full diff of the kernel on that laptop in, whatever, just over a second. On my main machine, it takes less than 1/10 of a second.

That’s the kind of performance you simply cannot get if you have to go over a network. We’re talking about a couple of packets going over the network, and you just blew the performance.


Branching in Centralized vs. Distributed Systems

If you have a centralized system, even if you make branches technically very cheap to create, the fact that you create them means everybody sees them.

And because everybody sees them, you don’t want to make branches willy-nilly. You will have namespace issues.

What do you call your branch? Would you call it test?

Oh, by the way, there are 5,000 other branches called test-1 through test-5000. So now you have to make up all these naming rules for your branches because you have a centralized system with a centralized branch namespace.


Branching in Distributed Environments

How does that work in distributed environments? You call your branch test, and it’s that easy.

Actually, you shouldn’t call it test. You should basically name your branches the way you name your functions. You should call them something short, sweet, and to the point.

What is that branch doing?

Git, by default, gives you one branch that is called master. It’s short, sweet, and to the point—it’s the master branch.

But you can make a branch called experimental-feature-x, and it will be obvious.

But this is something you simply cannot do in a centralized environment. You cannot call branches experimental-feature-x. You have to make up stupid, idiotic names.


A Real-Life Example

I worked for a company that had—as nice as you can probably make them—scripts around CVS that helped you make branches.

You could actually make branches with a simple command. It didn’t take that long.

It picked a name for you exactly because it would pick the number. So you’d give it a base name, and you’d say:
"This is my branch for doing so-and-so," and it would call your branch so-and-so-56.

It would tag where you started that branch because, in CVS, you need to do that, too.

It took a while, but it worked. You can do these things in centralized systems, but you don’t need to.


Why Distributed Just Works

If your system is decentralized, it just works. That is how it should work.

So I’m not going to force you to switch over to decentralized—I’m just going to call you ugly and stupid.

That’s the deal.

I trust them. But on the other hand, hey, they might have stopped using their medication. I mean, I trust them, but let's just be honest here—they might have been OK yesterday; today, not a good day.

So I do a diff stat, and Git does that by default. You can turn it off if you really want to, but you probably shouldn’t. It’s fast enough anyway.

If it’s a big merge, the diff stat usually takes a second or two because creating a diff and actually doing all the stats on how many lines changed is much more expensive than doing the merge itself.

That is the kind of performance that actually changes how you work. It’s no longer about doing the same thing faster—it’s about allowing you to work in a completely different manner.

And that is why performance matters and why you really shouldn’t look at anything but Git. Hg’s Mercurial is pretty good, but Git is better.


On Git’s Implementation and Data Structures

I think I’m running out of time. OK, this one is still interesting. We never got to the implementation part, but you really don’t care. I will say this much about the implementation:

It’s really simple.

The core data structures are really, really, really simple. If you look at the source code, it’s 80,000 lines and mostly in C. And the kind of C I write—most people don’t understand. But I commented!

The source code may sometimes look complicated because we are very performance-centric. I really care. Sometimes, to make things go really fast, you have to use more complicated algorithms than just checking one file at a time.

For example, when you’re doing 22,000-file merges, you don’t want to check one file at a time. You want to check the whole tree in one go and say, “They’re the same; I didn’t need to do anything.”

So Git does things like that, and that kind of blows the source code up a bit because doing it well is complicated.

But the basics are really, really simple.


Trust and Reliability in Git

One of the basics is this trust and reliability thing.

Every single piece of data, when Git tracks your content, we compress it, we delta it against everything else. But we also do a SHA-1 hash of the content, and we actually check it when we use it.

  • If you have disk corruption,
  • If you have DRAM corruption,
  • If you have any kind of problems at all,

Git will notice them. It’s not a question of if; it’s a guarantee.

You can have people who try to be malicious—they won’t succeed. You need to know exactly 20 bytes. You need to know the 160-bit SHA-1 name of your top of the tree. And if you know that, you can trust your tree all the way down through the whole history.


Long-Term Trustworthiness

You can have:

  • 10 years of history,
  • 100,000 files,
  • Millions of revisions,

...and you can trust every single piece of it because Git is so reliable and all the basic data structures are really, really simple.


Cryptographic Security and SHA-1

We check checksums.

And we don’t just check some piddly UDP packet checksum that’s a 16-bit sum of all the bytes. We check a checksum that is considered cryptographically secure.

Nobody has been able to break SHA-1.

But the point is: as far as Git is concerned, the SHA-1 isn’t even a security feature—it’s purely a consistency check.

The security parts are elsewhere.

A lot of people assume that since Git uses SHA-1—and SHA-1 is used for cryptographically secure stuff—they think, “OK, it’s a huge security feature.”

It has nothing at all to do with security; it’s just the best hash you can get.

Having a good hash is good for being able to trust your data.


Practical Advantages of SHA-1

It happens to have some other good features, too.

  • When we hash objects, we know that the hashes are well-distributed.
  • We don’t have to worry about certain distribution issues.

From an implementation standpoint, we can trust that the hashes are so good that we can use hashing algorithms and know that there are no bad cases.

But it’s really about the ability to trust your data.


Data Integrity Over Time

I guarantee you: if you put your data in Git, you can trust that five years later—after it was converted from your hard disk to DVD to whatever new technology and copied along—five years later, you can verify that the data you get back out is the exact same data you put in.

And that’s something you really should look for in a source control management system.


A Lesson Learned from BitKeeper

One of the reasons I care is this:

For the kernel, we had a break-in on one of the BitKeeper sites where people tried to corrupt the kernel source code repositories.

BitKeeper actually caught it.

BitKeeper did not have a really fancy hash at all—I think it was a 16-bit CRC, something like that.

But it was good enough that you could actually see clumsy attempts. It was not cryptographically secure, but it was hard enough in practice to overcome that it was caught immediately.


On Security and Backup Practices

When that happens once to you—when you get burnt once—you don’t ever want to get burnt again.

Maybe your projects aren’t that important. My projects? They’re important. There’s a reason I care.

This is also one of the reasons I go back to the distribution angle.

When you do Google code, for example, you have your source repositories that you help people maintain. And I think you do so under Subversion.


Why Linus Wouldn’t Trust a Third Party

I would never, ever trust Google to maintain my source code for me.

I’m sorry, but you’re just not that trustworthy.

The reason I prefer a distributed system is that I can keep my source code behind three firewalls on a system that does not allow SSH in at all.


Personal Security Setup

When I’m here, I cannot read my email because my email goes onto my machine.

The only way I can get into that machine is when I’m physically on that network.

Maybe I’m cuckoo, maybe I’m a bit crazy, and I care about security more than most people do.

But this whole notion that I would give the master copy of source code that I trust and care about so much to a third party?

Ludicrous.

Not even Google. Not a way in hell would I do that.