Jason Scott Talks about Preserving Games with the Internet Archive
Jonathan Anson / Aug 29th, 2016 1 Comment
Since it started in 1996, the Internet Archive has remained steadfast in its primary goal: digitally preserving knowledge in the many forms it exists. This task has resulted in making the digital library a host to the most extensive digital collection of media in the world, which includes books, movies, movies and more.
Recently the site has added software, which includes video games, to the list of media it seeks to preserve. The site has quickly amassed one of the most extensive digital libraries of software. Titles stretch across multiple platforms, from PC to consoles, most of which are playable within internet browsers.
The latest success in the site’s quest for software preservation came in November 2014. That was when the site first released approximately 900 classic arcade titles, all of which can be played within its site.
Such achievements owe much to its lead coordinator: Jason Scott. Scott has been in charge of helping to preserve games like the ones featured in the Internet Arcade, and all software in general.
We recently talked to Scott about the Internet Archive’s tremendous task in preserving software and the challenges that come with it.
You can listen to the audio below or continue reading the transcription. Listeners are warned that minor technical difficulties have resulted in some occasional glitches in the audio. Gaming Illustrated has done its best to eliminate these errors.
Gaming Illustrated: Tell me a little bit about what you do, specifically for the Internet Archive.
Scott: For the Internet Archive, my title is “Free Range Archivist” and lately it’s also become “Software Curator.” So I have kind of an open-ended job where my work involves reaching out for collections or materials that might otherwise not be found and then making them available on the archive where possible and helping people kind of connect with the archive. If they feel there are materials they want to host on there and [I’m] just helping them, you know, kind of move through the technical aspects and other processes.
Gaming Illustrated: How did you become in charge of spearheading this project to preserve video gaming on the Internet Archive?
Scott: The head of the Internet Archive is a fellow by the name of Brewster Kahle and Brewster is a visionary who started the Internet Archive in the ’90’s and at a time where everybody was just kind of happy that the web was existing, he thought about saving it and in some way taking copies of it then making those copies available for people and that process had become the Wayback Machine.
After things kind of settled down with the Wayback Machine, where it all kind of just worked, he started to kind of turn his ideas to “well what else doesn’t have preservation in that way” or “what else can represent a library of information and knowledge” and that’s when he started working on books and movies and music.
Around the time that I came on board, which was about 2011, he said “you know, we did music, we did movies and books and the web. But we never really did software. We have some but it’s all locked up in these large archives and how are we going to make people be able to interact with it?” He said “you know, you should come on and definitely help us gather the software. But if you can come up with some ideas for how we will present the software with the ease of that we present books and everything else that would be really nice.”
It took a few years and a lot of volunteers but we came up with a viewer that works for a lot of software that enables you to run it in a browser. On top of that we’re consistently and constantly taking in collections of just general software, as much as we can, in much as the same way as one would gather books or movies or music. So there’s a record and a collection of software going back over the last 30 or 40 years so that scholars or researchers or really anybody can kind of reference it in the same way ideally that we reference all these other types.
Gaming Illustrated: What’s the typical submission process that goes into collecting software and how exactly do you help to preserve that software?
Jason Scott: Some of what software preservation and software archiving is right now is kind of a function of the fact that a lot of institutions didn’t really look on it seriously until probably the ’90’s and the 2000’s. So what that really means is that we’re kind of far behind in terms of collecting these materials.
Whereas, you know, you might go to a place with a bunch of paintings, even if they don’t want the paintings they go like “okay, there’s a bunch of places we can bring you to that will take.” We don’t really have that in terms of software. There’s a couple institutions out there that take it in whole scale amounts and then there’s other ones that, you know, take specific pieces. So it’s kind of a learning process.
Right now I’d say that the number one way that it happens is that a collector or a person who has a large amount of software contacts us and says “I have 50 or a hundred CD-ROM’s” or “I have a few crates of software” and then I will talk to them about what they have and if it’s something that’s new, and by new I mean new to us. I’ll usually arrange for them to mail it to us where I catalog it and put it in storage.
The idea being in the long term to take the digital information off of them and then make them available online as well scanning in any printed materials related to the software. So a lot of what we’re doing right now is in the gathering phase where we’re gathering both physical artifacts that people are mailing us in the mail or we are gathering collections that were put together online and taking them in just in case.
In an ideal world you have every piece of software for a given platform or given type available and functioning. But it’s just simply not the case. We end up with a lot of other material that ends up being kind of stuck between the cracks. There might be evidence that it existed once but it won’t be a fully cataloged description with historical context like we might get with a painting where you might be able to find out “here’s all the things that this painting had done to it over the last hundred years” and “here’s who owned it” and “here’s why they painted it” and “who it was for.” We don’t quite have that with software but we have some of it like that so we’re kind of in a real early stage with all of this.
Gaming Illustrated: That actually brings me up to my next question because… only fairly recently the site’s actually started to really devote itself to video games. I was quite surprised recently to find out the latest venture in regards to that was the Internet Arcade. I was actually curious what actually got this shift in gears to get the Internet Archive in focusing more on preserving video games in regards to software in general.
Scott: A lot of it is that, like I said, Brewster kind of gave me a mandate of “we should really be, you know, gathering software” and one of the fundamental bits of thinking when it comes to the emulation in the browser is that there’s a strong belief at the archive that access drives preservation. That people are more likely to support the preservation of items, to recognize the worth, to understand what’s going on.
If you say to them “go to this location, click this button and now you’re trying the software that’s being talked about” as opposed to “trust us, this spreadsheet is interesting” or, you know, “there’s a pile of CD-ROM’s here and there’s probably something interesting but you can’t see it because it’s hard to get to.” So there was always a move on my part from the beginning of making this emulation and when we put together these playable collections there’s definitely all kinds of software: education, business, entertainment and so on.
The thing is that video games are designed from the ground up to attract attention, be beautiful and imply they are fun. If I link somebody to VisiCalc, a 1979 Apple II spreadsheet, the amount of people who will [play] it without being really pushed into it like it’s some sort of museum, it’s just very unlikely.
I put them up, I treat them well, but for some reason people just really get attracted first to games. They understand games and games are designed to be understood. A spreadsheet, or a business program, it’s designed to be read and studied and then integrated into your business and business processes. So it’s a study; it’s an involved task.
Whereas a game usually tends to put up something beautiful and says “press here on keys and you can move me around” and people get that and that’s all they have to do. So I always understood why it gets the most attention, the most press, but I don’t limit what we grab.
So, I mean, we’ve had all sorts of software. But when we did video games, arcade games, I knew that would gain attention cause, just like I said, video games are designed to be attractive. It brought in millions of people which was interesting and I like to think they discovered all the other bits of material
But nobody’s required when they enter a museum or a library to look at everything. So for some people it was just a matter of “wow, I remember this old thing. I used to play it as a kid,” and they’d play it a few times and then move on.
Gaming Illustrated: And that actually brings me up to my next question. No doubt the whole Internet Arcade thing did have some problems attached to it, specifically in the legal department. Has there been any specific or major problems in regards to that or just the video games and software in general on the site?
Scott: Unfortunately I’m not really in any position to discuss the legal aspects or the ramifications with the archive just because it’s not my position. But I can say that the archive’s policy is that it never wants to a library to represent a financial drain on a functioning company. So if somebody uploads something that’s commercial, something that is sold, let’s say a John Grisham novel, into our general area and we’re informed of it, we take it immediately down. We don’t run some sort of, you know, ranting declaration or anything.
We have people uploading millions of things and we’re putting up lots of things and so there’s a lot of cases where we’ll put something up and then some firm will go “hey, you know!” Like, for instance, we put up thousands of DOS programs. Well, many of these DOS programs were long obsolete and I would say about five or 10 had still been maintained. They were buried in there.
A really good example of that is the Ultima series. The Ultima series has been made and remade and is still in original form and in modern form from places like GOG.com or Steam and so there was a whole bunch of material that was removed from the software collection as it was already turning out to be accessible.
But we’re talking about a sliver of the stuff. With the arcades they were more companies and they would contact us and we would take them down. So that’s kind of how we go about it, you know.
Gaming Illustrated: What would say have been some of the most interesting additions to the software collection on the Internet Archive have been so far?
Jason Scott: There’s things that I personally like and there’s that I think get attention and so on.
Certainly one of the most interesting ones that I thought when it happened was I was contacted by a virus researcher named Mikko Hypponen in Finland. He runs a company called F-Secure and he said “I have a collection of defanged viruses and malware.” That’s malware where they’ve gone in and removed the ability of it to write to a disk. So you’re left with just the messages it would print and how it would portray its infections
We made a little section on the archive called the “Malware Museum” and we basically put up a few dozen of these malware programs. So when you boot it up in your window, it infects it with this little virus and you can see what the virus would print. Some of them put in graphics, some of them put in messages and some of them would modify how your system worked.
We kind of clumped them all together, wrote a little bit about their context and then put it up. Within like four hours it had worldwide press attention and I think we spent weeks actually doing interviews. Because it was something very interesting and magical to people to be able to go and experience old malware when it was mostly just messages and comments and weird graphics. Because, of course, they [the malware] couldn’t phone home, there was no internet in that way for them. So they all just kind of have to be amusing on their own and that kind of really entranced people and so I’m happy that we had that up.
There’s a whole bunch of stuff related to Windows 3.1. Windows 3.1, being a very early mass-market graphical interface, has a whole bunch of interesting programs like calculators and astronomy and information retrieval. I think there’s one about maintaining a bicycle.
And all of these programs are right there, clickable and people had no guidelines, in the DOS world anyway, the DOS/Windows world, like they did in the Macintosh. When you programmed on the Macintosh they gave you like a book that you could buy that was “here’s all of the standards of how a machine should interface with a human and what our humans were expected to interface with.”
Windows and DOS had nothing like that. So people just made up their own way of “this closes the window,” “this moves things,” “this is how you shift around” and so you can see all these weird little experiments.
Without the ability to instantly look at them, I think people wouldn’t go through the trouble, you know. They wouldn’t download the program, get an emulator running, boot it in, start it up, experiment with it. You’re talking, you know, a pretty significant amount of effort and then only to go “yeah, this is dumb,” you know. “This is interesting. This isn’t. This is.” What we’ve done is we’ve turned it into this embeddable, instantaneous experience.
Personally I like the system bench-marking tools and the disc checkers. I like the things where it’s making the emulated hardware do work to try and prove it’s working where it’ll say “okay, I’m testing your memory. Your memory is okay. I’m running through your disks. You’re disks are responding this way.” I like that. Because, for one thing, it proves whether or not the emulator writer did a good job. But it’s also kind of just fun to think about how the program has no idea what it’s located in and what’s going on and that it’s actually running inside of a window that’s inside of a computer that’s on a network and it’s been 25 years and they have no idea.
Gaming Illustrated: Again, that brings me to my next question. Because the use of technology, obviously in trying to preserve all this software, such as the malware and the video games… could you tell me a little bit about the technology the Internet Archive uses to preserve software and games and all that?
Jason Scott: So the Internet Archive is a relatively small amount of people. It’s actually really less that a hundred main employees and it’s got another bunch around the country and the world doing [the] scanning of books. But it’s a relatively small group and so we tend to focus on technological solutions.
So if something is added to the archive by anybody there are scripts that do dozens of tests to try and make all these decisions. Like how many pages does the book have, how many pixels per inch is the printing and is it functional and can I, you know, portray it different ways and convert it. So there’s a real technical lean. If given a point, the archive will tend to let a machine do the work instead of making a person do the work and that choice kind of pervades all the way through both the way the software is added and the way that other things are added.
One of the things about the software, of course, we’re talking about adding thousands and thousands. There’s well over 50,000 software titles up on the archive that in some emulated state and there’s probably a few million more that are in a non-emulated state at the moment and so we have to write scripts that go into them and play them and take screenshots and then put in the screen.
You know, I’ll have people say to me “oh, wow, when you put this in, didn’t you notice it errors out on level 4,” and I’d say “no, I’ve actually never touched it.” A program ingested it, a program played it, a program screenshot it, a program filed it away. [It was] absolutely untouched by human hands and as time goes on I’ve got other tools that do other kinds of testing and let me respond to it.
But that technological bent is one of those cases where you allow certain shortcomings hoping that the general response and result will be good. So instead of every program being personally vetted and checked and doubled up and everything else, I’d say put up thousands and start figuring out which ones are the best and which ones don’t work and then respond that way.
So as we’re adding more and more items, I’m pretty strongly in favor of, you know, just throwing it all up there and then seeing how things kind of push out and if people tell me that there’s a problem I push them away and move them out if it doesn’t work. So, you know, I mean that’s absolutely, you know, a choice, right?
Again, different places have different approaches and I never pretend one is the absolute right way. But a lot of places will, you know, they’ll take in a collection and they’ll be a person whose job is just to go through the collection piece by piece, catalog them, make sure that they’re all what they say they are then hand it off to a second team who’ll log it into another system, perhaps put [an] etching or a marking on it and then put all of that into an agreed upon spot in the physical space so that you can say “I want to find volume 5 of this encyclopedia” and they’ll say “that’s on the second floor, shelf three, area 5.”
I am a different animal and the whole archive is a different animal. Literally we have millions and millions of pieces and we have thousands of items being added every day. The amount of software being added of course is much, much smaller than the books and the movies and other pieces. But it’s a sizable amount.
People are uploading software, some of it emulated by other people out of the box, which has been a pleasure. It’s nice just to wander in and discover somebody has uploaded ten public domain games when they were fourteen and it just runs and I didn’t have anything to do with it, not even with a script. So there’s some collaboration, it’s going to be a learning process, I mean we’ve never really had this before in this way so it’s all new.
Gaming Illustrated: Speaking of which, that sort of brings me to a little back to our question about some of the problems to the Internet. I know you talked specifically about the legal problems which I understand you can’t talk much about. But have there been any other problems in regards to preserve software, video games, things like that?
Jason Scott: Well, a certain range of software had copy protection routines built into it. What that meant there was no copies available other than the original commercial copy and what that meant that there are titles out there for whom there are very few original boxes and original work and often only the pirated version is around and that means that a large grouping, you know, not the majority, but a large grouping of software that we have is the pirated version: the part which has a crack screen and which may or may not have all the functionality.
We chose to put it up because we said “well that’s the only evidence it even existed,” you know. It’d be like if the only version of a photo album was a photocopy that relatives did and you can’t find the original photo album, well at least you can put the poorly photocopied version to at least show what kind of pictures were in there. So, whenever possible, I much prefer that we have really good original software that came from an actual piece of magnetic or optical media.
But all of this kind of shortcoming over the last 30 years in terms of how we treated software is now coming back to roost. It’s not a dissimilar problem to early movies where they didn’t have that value attached to them and so a lot of them are missing and so [is] a lot of early software.
So we’re still at the point that I am being given estates of disk drives from people who used to be in user groups or some programmer will finally say “hey, look, just take my collection” and he’ll mail us his boxes of disks. From that we’ll say “wow, this is something that has never seen the light of day before. Here’s a prototype version. Here’s a version that they sold but they should only sold a couple hundred copies so there aren’t really any out there.”
Gaming Illustrated: And why do you actually think that is? Why has it only now recently that software and video games are being given the same respect as movies, books and music are now?
Jason Scott: I think part of it is that there’s always a period with a new medium or a new process that sometimes the easiest way to get it out there is to make something entertaining with it like a game or to make a comic book or to have it be a simple newspaper or something.
At some point people tend to look as them as not just entertainment, but disposable entertainment. So they end up saying “well, it’s just a bunch of comic books,” “well, it’s just a bunch of video games,” “eh, it’s just a bunch of children’s records” or whatever else and so there’s absolutely this kind of disposable feel that games have. You plug them it, you play them and then you sell them or throw them away. I mean you wouldn’t go running to the local museum and offer them and the local museum wouldn’t take them.
Now there’s always collectors, right? I mean everything’s always got collectors and certain a lot of collectors are important components in the saving of software where others didn’t see that value as a professional institution, amateur collectors, or even professional collectors, would acquire items often, you know, to be honest, for an economic angle of like “I’m going to get this rare thing and I’m going to resell it.”
And then, over time, we’ve started to see institutions say “okay, I guess this is valid. Let’s go try to find who has it.” They run into the collectors, they run into warehouses from old software stores, they run into video games that are all being kind of kept in a shed where an arcade closed. So they’re kind of drawing them back out.
But, like I said, it’s a catch up game, on every angle and then of course nobody’s ever had to deal with “well, how do you make a 1978 video game come back when it was all customized circuits,” or “how do you allow people to play a piece of software that requires a cloth map? Do you digitize the cloth map? Do you take pictures of the cloth map?” Do you realize that things aren’t working and you don’t know why? Oh, it’s because there was a manual and without the manual you can’t type in this word and, you know, things are just a disk.” So they’re dealing with all of that, like, just that catch up.
But I think really almost every medium has had some variation of this. I mean the classic one people always talk about was that Dr. Who was on what was considered to be disposable media and they would just write over it. They did the same thing with early Johnny Carson. They would record the tapes but then write over them when they were done, when they didn’t think they could use them. So there’s years of shows that there’s just no example of.
Gaming Illustrated: That sort of brings up again my next point which is that, to ask sort of a rhetorical question, what exactly would the value be in trying to save specifically software and video games? What makes them so very valuable that makes them worth saving?
Jason Scott: There’s several ways of looking at it. I mean obviously one of the first ones is that people are free to ask, you know, “why is energy being put into this direction? Couldn’t the energy be put in other directions?” That’s a pretty common thing if, you know, you say “oh, saving all this software. Nobody’s doing the same amount of work for blank.” a
And I don’t tend to play that zero sum game. I tend to play more of a game of “what if everything that had some amount of meaning had advocates and collectors and curators?” I often try to make it when I’m doing things with video games I’m showing why this unusual medium needs to be saved in the hope that it inspires others.
And as for why in what I say, it’s that software is a cultural and historical force and one can argue, that throughout the ’80s and the ’90s, it becomes a fundamental part of our fabric. Software currently is controlling a very large percentage of our lives. It’s dictating our financial arrangements, it is moderating our social contracts and interactions. It’s absolutely where creativity and communication is primarily taking place I could argue
And so people are, you know, living with this whole underground or invisible system of software. You know, even of course, as simple as games they used to play, programs they used to run: things being stored in something that the software that stored it is no longer available. Without an emulator or without the ability to get to it again, you’ll lose not just that old software but the data that it represents.
There’s a whole bunch of, just to me, cultural weight that software has that puts it up there with the others and again, I don’t expect everyone to agree with me, and I don’t think that I’m taking away from other things by focusing on this. I think there’s room for everyone.
Gaming Illustrated: Very good. One final question: what exactly are the future plans in regards to the Internet Archive’s project to preserve software and gaming and what not? Just in regards to the software department in general?
Jason Scott: The main thing with the archive itself and saving items is I’m just aggressively having us ingest as much digital material as possible and that’s always ongoing. We have thousands and thousands of pieces of media that we’re pulling from and so making that so that whatever happens next is easier because you’re able to just get to it. That is absolutely critical.
Then the additional step of making it run in the browser is kind of a whole other level of technical situations. How do you make a CD-ROM mount in a browser? How do you make a DVD-ROM mount in a browser? How do you run all these different platforms and has that platform been emulated? And finding where there’s unusual cases, you know, like there’s no support for this chip set or there’s no known emulator of this machine and then kind of pushing in that direction.
So it’s kind of like a multi-front project because you’re constantly trying to work through not just the technical aspects but to convince people that this is something that they should think about and contribute to themselves if they have old material or if they have knowledge of something that’s out there. Many times I get original authors showing up and then kind of writing about why they wrote that program in the first place or people tell their stories related to it.
So, you know, that’s a Sisyphean that never ends. So that continues to happen.
Gaming Illustrated: Thank you for your time.
Jason Scott: No problem. Thank you!
tags: digital preservation , interview , Jason Scott , technology , The Internet Archive