Go to ...

RSS Feed

February 26, 2020

Old MS KB Articles + WayBack Machine = MS KB Archive


Those of us old-timers who’ve been around awhile already know that the Wayback Machine can be the only way to access old Web content. I’ve used it constantly over the past decade to exhume now-defunct websites chasing after news, specifications and mailing list chatter in research for patent suits. Just yesterday, Portland, Oregon-based PKI Solutions announced it would be standing up a website called the Microsoft KB Archive (See full text of announcement). What does this all mean?

Let’s start with the backstory for the official Microsoft Knowledge Base, or KB. You’re already familiar with many parts of this huge database of information, because every published Windows Update item has a corresponding KB article (for example, looking at Update History in Windows Update, I see that KB4532695 corresponds to the latest 1909 Cumululative Update that took the Windows 10 build number to 18363.628).

Why Create an MS KB Archive?

As the afore-linked PKI solutions announcement observes, “It all started when we started to notice Microsoft was archiving/deleting Support KB articles from it’s [sic: its] site — often even when the information was still pertinent.” It goes onto talk about articles specific to the Active Directory Certificate Services (ADCS) hotfixes, a topic of particular interest and commercial value to PKI Solutions’ core business and product offerings. The folks at PKI solutions contacted MS to ask about this, and learned that the ADCS product team at Microsoft was likewise unaware that these older articles were disappearing. From this understanding, the PKI Solution project called the Microsoft KB Archive was born. It currently contains nearly 50K articles (49,434 to be exact as  of January 31, 2020). It’s searchable and well-indexed, just like the MS official KB database.

As a matter of policy, the PKI Solutions KB database captures articles that are (a) no longer present in the official MS KB database, and (b) still of potential relevance to sys admins and IT pros. They remain of potential interest, either because the old content still applies to current MS products (as is sometimes the case), or because they capture valuable, meaningful information about older products that are still in commercial use (think Windows 7, Windows Server 2003 and 2008, and related Microsoft add-on products and platforms). It’s worth reading past the headlines of the PKI announcement to get a full sense of the process they kicked off in November 2019, to try to recover and make as much this kind of thing available as possible. Here’s how the announcement’s author characterizes the database’s current composition:

In November 2019, I started research to find possible APIs to get the list of current articles (that exist or existed on Microsoft Support website), list of products, whatever else what I can use to retrieve data. I found an API that can retrieve articles by exact number, but no APIs to get a plain list of available articles. I’ve engaged several people (Vadim Sterkin, Vasily Gusev, Alexander Sukhovey and others) to do an exhaustive search to get the list. In total we made over 5 million requests using PowerShell and retrieved about 10GB of data (including HTML and attachments). [table present in original content omitted for brevity]

In total, we got 105 thousands of files. I did a deep analysis of downloaded data, grouped, filtered, shaped data and packed into SQL database. Current database size (as of January 2020) is about 5.6GB and 4GB of attachments to articles.

I can’t help but see this as a valuable and useful effort, particularly for those of us tasked with (or voluntarily taking on) maintenance of older Microsoft OSes and related products.

Using the MS KB Archive

You can search using text strings or KB article numbers. Please note that what shows up as the results of a search is what’s been deleted from the Microsoft Knowledge base, with links to “rescued” versions of such stuff. Given its recent retirement, here’s a partial listing of what came up with a search string of “Windows 7” by way of example:

MSKBarch.win7

MS has already deleted 17 items about Windows 7, of which I show 3 above. More focused searches will no doubt produce more useful results.
[Click image for full-sized view.]

Microsoft has to manage a HUGE corpus of information and documents, so I’m not surprised to understand that some stuff, especially older items, might disappear. Presumably, this number will keep growing as more and more items “age out” of the official MS KB database and its DOCS collection. Interestingly, our own Kari the Finn wrote about a related problem for the Microsoft DOCS document collection — namely, too much old stuff showing up that hasn’t been vetted for applicability to Windows 10. See his musings at Microsoft Docs — Excellent, but Imperfect.

Looks like Microsoft has its hands more than full trying to keep up with this stuff. As usual, they’re also stuck between the devil (keeping too much stuff around, or offering up irrelevant stuff) and the deep blue sea (getting rid of stuff that’s still of interest and use to those responsible to keep older products up and running, even past EOL). Hopefully, PKI Solution’s MS KB Archive will help with that. This means if you can’t find what you seek through an MS KB or DOCS search, you should turn to the MS KB Archive next, before giving up. This is a great resource, so I certainly wish the PKI Solutions team much luck, and every possible success, with this venture. Should funding become an issue I (and many of our readers here, I imagine) would be happy to support a crowdfunding effort.

[Note Added 1 hour after original post went up] Thanks to Mark Cooper’s (the PKI Guy’s) comment on this article, I corrected some misapprehensions about the tool’s inner workings and outputs. It’s always useful to hear from the original source when the details don’t add up properly. Now, I believe they do, and readers will be well-served by my current/updated description. Instead of counting doughnuts, it seems I should have counted holes. That is, I should’ve understood that the MS KB Archive pops up items originally present in the official MS KB, but are now no longer included therein. That’s what makes it work as a rescue tool. Good stuff!

Author: Ed Tittel

Ed Tittel is a 30-plus-year computer industry veteran. He’s a Princeton and multiple University of Texas graduate who’s worked in IT since 1981 when he started his first programming job. Over the past three decades he’s also worked as a manager, technical evangelist, consultant, trainer, and an expert witness. See his professional bio for all the details.

2 Responses “Old MS KB Articles + WayBack Machine = MS KB Archive”

  1. ThePKIGuy
    January 31, 2020 at 17:07

    Thanks Ed, we appreciate you sharing the details of this project. On the issue of your Windows 7 search, at this time our dataset is based on information in the Microsoft system as of December when we starting scanning the Microsoft KB site. What is shown in the tool is what has been deleted since we started our scanning in December, thus the count of 17 articles for Windows 7. We hope to use alternate sources to begin to backfill information for deleted articles prior to our initial scan.

  2. January 31, 2020 at 17:59

    Cool! Thanks for explaining, and good to know. Keep up the good work, and let us know if we can help out somehow. Certainly, you can count on us to help spread the word. Thanks again, –Ed–

Leave a Reply

More Stories From Admin Tools