SIGIR Forum 22, No. 1-2 (Fall 1987/Winter 1988) ISSN 0163-5840

Toward Hypertext Publishing

Issues And Choices In Database Design

by Robin Hanson

August 3, 1987

Abstract

Hypertext publishing, the integration of a large body (perhaps billions) of public writings into a unified hypertext environment, will require the simultaneous solution of problems involving very wide database distribution, royalties, freedom of speech, and privacy. This paper describes these problems and presents, for criticism and discussion, an abstract design which seems to solve many of them. This design, called LinkText, is presented both as a specification and as design approaches grouped around various levels of electronic publishing.

Introduction

Decades ago, visionaries [2,8,20] spread dreams of "hypertext", a universe of writings connected by "links" which allow a reader to jump from one writing to any number of associated writings. Reading and following links was to be cheap and easy with the aid of advanced technology. To the delight of many observers, these dreams have finally borne fruit in working prototypes and products, such as [4,5,9,11,12,26]. But most of this work has focused on managing the writings of an individual or small group, what I will call "local hypertext" (Exceptions include [9,11,24,28]). This was the right thing to do, as it allowed rapid feedback for making improvements, but attention will soon need to return to the more ambitious dream of hypertext publishing.

Hypertext publishing is the integration of an indefinitely large body of writings (and perhaps non-text media also) into an unified, connected hypertext environment. Research on local hypertext has dealt with issues such as finding visual metaphors for links, choosing node granularity, the need for navigation support, how to do versions, and what should happen to links when text is modified. Hypertext publishing will, in addition, require approaches for dealing with copyright and royalties, freedom of speech, the need to filter from view most of what is freely spoken, rights to privacy, the inevitability of standards, the need for generality to support both a wide range of uses and smooth integration with non-publishing databases, and very large geographic, temporal, and organizational distribution of data structures.

Research into hypertext publishing is difficult both because experiments with large systems are harder and because many of the new research issues seem to take the form of new constraints, such as "must withstand malicious agents" and "must discourage government spying on who reads what". When a design problem contains many interdependent constraints in a domain where it is expensive to experiment, it can be useful to seek, at a manageable level of abstraction (or detail), a specific design which can meet most of the proposed constraints. Once found, such a design can serve as a point of reference and "strawman" both for creating alternatives and for debating the value of the trade-offs they embody.

This paper will present an abstract design, called "LinkText", for the database 1 component of a hypertext publishing medium. It is my hope that not only will readers find this design useful for thinking about the trade-offs in hypertext publishing, but that some will be convinced, as I am, that hypertext publishing is feasible soon.

The LinkText design is intended mainly to highlight possibilities and issues, and so in order to simplify the design task, I have focused on a core set of interdependent issues at the expense of other rather independent and hence "modular" issues. In particular, this design focuses on the database component of publishing, as opposed to the user interface component for example.

The appendix to this paper will present this design in a specification-like format. The body of the paper will explain the basic ideas of the design, organized around a set of levels of functionality. After introducing the levels, I will discuss database features desired at each level, the design choices which make these features possible, and then discuss the design as a whole.

Levels Of Electronic Publishing

While hypertext publishing can be viewed as an extension of local hypertext, it is also useful to think of it as an extension to other electronic 2 publishing. And since even hypertext can be improved upon, hypertext is only one among many possible levels of electronic publishing. The following story will introduce these levels:

Jane Smith wrote and signed "What I Did On My Summer Vacation" for her third grade teacher. She got an A+, and her mother showed it to all of Jane's relatives who visited that Christmas, making copies for those who liked it.

In college, Jane expanded her essay into a book entitled "Summer of Passion", which sold well at grocery stores. Copyright laws ensured that Jane got a percentage of the sales, and privacy rules ensured that people who "don't read that sort of trash" could buy it without their friends finding out.

As a graduate student in sociology, Jane worked material from her book into "Seasonal Variability in Couple Dynamics: An Empirical Survey", accepted for publication in Sociological Review. She dutifully referenced other relevant papers, and many of her readers followed these references in order to expand their understanding of the subject.

In a fit of sentimentality on a Saturday in 2010, Jane thought about that old sociology paper. She turned to her computer, always connected to a local hypertext distributor. After browsing a menu of kinds of items referencing her paper, she choose "evaluations". She then saw that there were 27 textual evaluations and 210 votes cast (on a scale of 1 to 10). Jane's computer quickly calculated that these votes put her in the 78th percentile of similar articles. She typed out a short response to one of the textual evaluations, voted on 6 of them, and then went back to playing with her cat.

Two weeks later, Jane was reading a back issue of the Sociological Review and noticed the phrase "paradise fantasy", a term which she had introduced in her paper. Curious about the popularity of this phrase, she inquired and found 32 uses of it in contexts which referenced her paper. Browsing them she found that meaning of the phrase had drifted, so she published a short explanation about the meaning she had intended.

Three years later, a trigger she had set notified her of a marked increase in interest in her paper, which she traced to an article on the subject in Psychology Today.

Ten years after that, her new automated literature assistant, Charles, noticed one response to her paper that she might be especially interested in, and presented it to her. "Hmmm!" Jane muttered to Charles, and called the author for a chat.

Each of the times in Jane's life mentioned above corresponds to a certain level of electronic publishing. Higher levels have more features but represent more ambitious proposals, typically requiring that the publishing community agree on more standards.

Database Features At Each Level

Higher levels of electronic publishing require more sophisticated databases. This section will describe database features desired at each level, attempting to make them more vivid through the use of a set of physical analogies sketched in Figure 1.

Clippings Just as Jane composed her third grade assignment, most people can write or make a tape recording. These compositions, as well as those broadcast by mass media, can be easily rearranged, copied, and distributed to one's friends and acquaintances. One can clip an article out of a newspaper or record segments of TV or radio. These (usually linear) clippings can be viewed as "content" stored in "bags". Bags, like cassette tapes, have the handy property that if one pours the content of one bag into another, both bags will have that content. Bags are more convenient if they can integrate multiple media, as TV integrates sound and video, or if they can include the name of an author and an authenticating signature. In general, one wants the cost and time delay to publish an item (including bureaucratic costs and delays) to be as small as possible, in order to enable both more publishing and the publishing of smaller ideas. The basic idea of publishing as "trying to bring to public attention" must be preserved, however. Freedom of speech, the right to circulate ideas by distributing bags, must also be preserved.

Mail When a bag can be at a "place", and one can use a mail system to send a bag to a desired place, then new modes of use become possible. Journals can send issues to members of a distribution list. When money can be mailed and distribution centers set up, like VCR rental stores today, then a royalty can be collected and sent to the author whenever a new person is allowed to see the inside of a bag; Jane can sell her book at grocery stores. This selling of access should preserve concepts of privacy, avoid government spying on who reads what, and support anonyomous authorship. Electronic mail already exists, as do limited forms of electronic distribution with royalties.

References Present academic journals, like Jane's Sociological Review, make heavy use of a feature not used much in the mass media: references. Instead of making bags of content which copy pieces of the contents of other bags, each bag has a unique numbered marker on its outside. A bag can include, or refer to, another bag by including a copy of the marker of that bag in with its content. Bags which mostly contain references, such as bibliographies, can help organize the published literature. To make these markers useful, there must be a fast reliable way to find the indicated bag when one has its marker, i.e., to "follow a reference". There should also be a reliable way to verify claims about who published what when.

Hypertext 3 It can be of interest to find out what bags are referencing a particular bag, i.e., to "backfollow" a reference. At present there are aids, such as the Science Citation Index, but the process is expensive. What if the effort to backfollow a reference could be made very low? It would be as if the marker on each bag had infinitely stretchable strings connecting that marker to all of its copies within other bags [6 p. 218-220]. As a database, this low effort would come from adding an extra indexing pointer, called a "backreference", for every reference pointer. To backfollow a reference, one follows the backreference.

Backreferences can allow a "right to respond", advocated by John McCarthy [18], where anyone can make a response to an item such that the response can be easily found from that item. This facility can be used to connect a bag to criticism, evaluations, categorizations, and alternative versions. Easily found criticism could aid the evolution of ideas [7] by allowing solid criticism to halt the spread of bogus but popular ideas. Easily found evaluations could allow a reader to instruct his/her personal computer to filter out from view all but the bags most highly evaluated by people he/she respects [16,17]. This feature is crucial to managing the tremendous amounts of "garbage" that free speech and response would produce.

A third party could associate two already existing works by the use of "links". Imagine that a person notices that a well known bag w contains an argument that rebuts a claim made in a new bag n. Then that person might create a small bag with just one link l, consisting of three references: one to n, one to w, and one to the general link type "rebuttal". If backfollowing was easy, then readers of n could backfollow the reference from l to n, arrive at l, see that l's type was "rebuttal", follow the other reference to w, and read the well known rebuttal.

This whole process could be described as "following" a link, similar to the way one follows a reference. In fact, the concepts of a reference and a link can be unified by letting references have types and considering them to be a special case of links. 4 This allows user interfaces to uniformly present users with the option of following various types of links from a particular point. Using links to connect sections of text is a core idea in the original hypertext visions [2,8,20].

Uniform Objects 5 The concepts of a bag's content and its reference markers can also be unified, making the resulting database more expressive and amenable to automation. Imagine breaking Jane's paper up into many small objects representing characters, words, sentences, paragraphs, sections, etc. (See Figure 3). If these objects are individually referenceable, then other bags could, instead of referencing a whole bag, directly reference a variety of specific text points and regions within a bag. If we extend the concept of a reference so that, for example, a section specifies which paragraphs it contains by "referencing" the objects which represent those paragraphs, then alternative versions of a text can reference, and hence reuse, objects in the original text. Backreferences on heavily reused individual words and phrases allow keyword searches like those in present "information retrieval" systems [23]. Since references can be viewed as links, text is built from links, suggesting the name "LinkText" chosen for the design presented in this paper. In summary, text is now many small objects and links between them; one can make a link to a word or a chapter, and links which are to objects not changed during a text revision follow on to the revised version.

For a physical analogy, imagine that instead of a thick bag keeping its powdery contents from spilling onto the floor, we have a flimsy plastic bag containing many small rings, each of which has a few protruding hooks connecting it to other rings. The rings are the small objects, the hooks are the references which connect them, and a hook which pierces one plastic bag to connect to a ring in another bag is what we used to call a "reference". References are themselves objects; if you try to hook a hook itself, it sprouts a ring to oblige you!

(To implement all this, each of the small objects which a large piece of text is broken up into can have a memory location which contains two sets of pointers, one for the object's references and the other for its backreferences.)

By using a single basic building block and method of combination, this level uniformly represents all data, and therefore gains a broad expressiveness, generality, and cleanliness which aids the automation of many tasks. By naturally handling small structured objects, this level allows one to publish standard database-type information, and could be the foundation for a new "knowledge medium" [25].

Sensors Jane found it convenient to keep abreast of things by placing little "sensors" amid these many little objects, sensors which can check when certain conditions are met and notice when new things happen. The database concept which can embody the physical concept of a sensor is a "pattern".

Many databases allow a user to ask a question by presenting a pattern; the database answers by returning instances which match the pattern. This query or pattern language can provide a wide variety of uses, especially if patterns can themselves be database objects. A "trigger" pattern can be set up so that a person, or computer program, is notified when a change to the publishing database matches that trigger pattern. Abstract places can be defined as containing all bags in a certain place which match a certain pattern. Patterns can specify semantic constraints the database should meet. "Virtual" objects, which appear to be present but which are not explicitly stored unless needed to answer a user's request, can be specified with patterns.

Assistants The assistant level is a catch-all for the very wide range of functionality possible when one can have little computational assistants, like Jane's Charles, running around doing one's bidding. These agents might reside "at" distributor places for high bandwidth access to the local database hardware (making filtering cheaper), negotiate with other agents for useful information, move from place to place, or even be published themselves [15,19].

Design Choices At Each Level

In order to satisfy the many constraints implicit in the above desired database features, any design must make numerous specific choices. This section will try to convey, for each level, both the basic design issues and conflicts, and the choices this design has made to resolve them. To emphasize the distinction between desired features and design choices, published items, called "bags" in the last section, are called "works" in this section.

A pervasive issue in this design is that each solution to a problem must scale up well to the very large and widely distributed publishing universe that the intrinsic cheapness of new technologies could spawn. The issue is important because a great many of the solutions which come easily to mind will work well only in a localized system. Even approaches which work in present publishing may not scale well to wider distribution.

On the other hand, many design issues have the fortunate property of being "modular", meaning that there are a number of alternative solutions and it does not matter much for the rest of the design which of these solutions is chosen. As this design is focused on finding a way to simultaneously satisfy many conflicting design goals, modular issues, once identified, will not be focused on in the following discussion.

As mentioned before, this design focuses mainly on the database component of publishing, both for modularity reasons, and because this seems the part most likely to standardize first. Of course, other issues, such as the user interface, should be considered in the design of the database. For example, the user can be aided in navigating through the database without feeling lost by database support for "where you've been" trails and tree-based forms of organization [4 p.48-50]. Many such issues have been taken into consideration, but I welcome feedback about issues that may have been slighted.

The reader may notice that the "levels" are not strict boundaries. Under a close analysis, many higher level features are implicitly present in the lower levels. The difference is the degree of support provided for such features. This design gives the most attention to the hypertext and uniform object levels, and the least attention to the clippings and assistant levels.

Clippings

To enable the clippings level, there need to be standards for formatting content, enforcement of free speech, and a cryptographic convention for digital signatures.

Format Standards To allow people to exchange which are bags of content, the publishing community must agree on formats for content (such as ASCII for text or Postscript for graphics) or at least on a small set of intertranslatable formats. These formats can be viewed as a small set of atoms, like characters, and a (usually sequential) way of composing them. No specific standards are proposed -- the issue is modular as long as the formats are few and intertranslatable.

Freedom of Speech One should in general have the right to create and distribute any expressible content. I advocate an absolutist interpretation of this, resisting exceptions such as for pornography, libel, and national security, and agree with [21] that it is crucial that freedom of speech be interpreted as applying to electronic publishing.

Cryptographic Signatures There are various cryptographic means to gain cheap reliable authentication of the author and contents of a digitally represented work, including public key cryptography on hashes of works [3]. The issue is modular, and no choice is made here.

Mail

At the mail level, named works can be mailed to named places. Distributor places sell access to variably priced public works, with various tricks to reduce communication costs.

Piggyback on Other Mail System This design will not offer a design for a mail facility. I presume that a mail system with certain capabilities exists and describe how publishing can take advantage of it. The mail system must allow an agent to send a private message to another named agent by specifying only the recipient's mail name. The mail system is responsible for verifying the identities of both parties, should allow money to be mailed, and should allow a response to be associated with a request via some message identifier. It would be best if the mail system were composed of numerous independent organizations so that one could preserve privacy by choosing a carrier one trusted. Sending multiple copies of a message through different carriers should ensure message accuracy at the expense of privacy. The mail system should support anonymity by allowing one to send and receive messages under an alias name over a short or long duration.

Places "Places" are what one uses to express the idea of copy distribution. There are many places, some of which overlap, and each work is or is not at each given place at any given time. In practice, most places will be ordinary databases on specific machines.

Unique Names Each place has a unique name which is its mail address. Works also have globally unique names, easily generated locally by appending the name of the place with a number from a locally incremented counter.

Distributors To make sure that authors receive compensation for their creative work, one could make a law stating that whenever one privately copies a work, one should send the author his/her requested royalty amount. But most people may not follow this honor system, even if the personal effort required to comply could be dramatically reduced. Instead, I encourage the use of public distributors [22], whose actions can be policed. A distributor must pay the author each time a new customer is given access to the content of a work. Payment is made by mailing money when more than a certain fixed amount has accrued. Violations of this rule can be proved by showing that no royalties were sent even though access was given to anonymous auditors. Of course people might not use distributors unless they are cheaper or more convenient than getting copies from one's pool of friends. Therefore a distributor should either have a larger selection of items, or it must organize them better. 6

Sessions All communication between people and distributors happens through a uniform mail interface, but need not happen through an official mail system. Distributors can allow direct connect "sessions" which are abstractly described in terms of mail messages. In this case the distributor takes on the task of identifying users when necessary.

Pay Per Person Access In this design, the unit of purchase is the giving of one person access to one work, instead of the connect-time-per-database schemes use by information retrieval services such as DIALOG. An author chooses an asking price, the distributor adds a markup for storage costs, and a reader who chooses to must pay the total price for access. Other costs, such as CPU time or connection line time, should be charged for separately. This market solution, where people buy and sell at different prices, robustly ensures that authors receive compensation, and avoids creating spurious market incentives.

To make this work, user interfaces must communicate to users the costs of various possible actions. This might be done by flagging expensive yet directly executable actions, and by displaying the cost of a session so far.

It would be nice if one could buy access to a work only once instead of repeatedly at each new distributor, but I have not found a robust way to allow this. Also, while an author can be a composite person, representing a group of coauthors, a purchaser of access should be an individual person.

Anonymous Reading In order to discourage spying, by governments or others, on who reads what, readers should not need to identify themselves in typical reading sessions. While there are cryptographic identification methods, this design opts for the simplicity of no identification at all. The following design choices support anonymous reading, as did insisting that the mail system allow anonymous messages.

One Price for All If everyone pays the same price, no identification is needed for pricing.

Cash If a person has an account or line of credit at a place, then identification is needed. Therefore, it should be convenient for readers to pay with "cash" on the spot for each reading session. A typical scenario is to deposit an amount of cash, do some reading, and then withdraw the balance. There are many options for a standard representation for electronic money, such as [3]; the issue is modular.

All Works Public Access to works could be restricted by including in a work a specification of who is allowed to read it. For those not allowed, a distributor could act as if the work did not exist. However, restricted access requires reader identification, conflicts with choices at higher levels of this design, and seems in conflict with the basic idea of "publishing". In this design, all works are public.

Tokens Long-lived records of who has read what are even worse than temporary reader identification. But what if a person wants to access an item many times? He/she might have to either repeatedly pay the price or store a private copy. A modular solution is to allow distributors to generate a random "purchase token" string and both give it to the buyer and store it locally. Then the distributor would give access for free to anyone who could later produce this token, exchanging it for a new one to discourage token copying. (Authors who wished could disallow the use of tokens.) But spies who break into the distributor's private records could not identify readers. Tokens are not published. A small charge for token storage is acceptable.

Decryption Keys A variation on the above scheme, analogous to "prefetching", can help reduce the communication bottleneck of dealing with a distributor. One may at any time obtain for free an encrypted version of a work, together with an associated key identifier. At a later time, one can present the identifier, buy the work, and be sent only a decryption key which can decrypt the work. Decryption keys are valid only once and may incur storage costs. This is also a modular design choice.

It is not clear to me how one can both typically read anonymously and typically get a receipt describing a purchase.

Anonymous Authors The mail system's support for anonymous messages allows anonymous authors who still receive royalties. Most authors, however, will probably want to use their true name to establish or capitalize on a reputation.

References

References are easily implemented by having works include an ID string, the unique name of the referenced work. Freedom of speech is straightforwardly extended: allow any combination of content and references. The big problem is making reference following easy and reliable.

Note that in a widely distributed system it is not possible to ensure, as one would like [4 p.33], that all references can be followed in less than two seconds. User interfaces are responsibile for communicating to users that some directly executable operations, such as following a reference, may take a long time to complete.

Index on ID Imagine that one has been reading a work at a place, found a contained ID of another work, and would like to find that other work. First one wants to see if the work is at this place. To make this fast, each place is expected to keep an index mapping IDs to contained works. A local pointer implementation can make access faster.

Stubs If a work at this place has a reference to this ID, but does not contain the corresponding work, the ID index should map to a "stub", as in [14], which can centralize a place's hints about other places the work might be. Stubs are necessary; requiring each place to contain all works referenced by any local work is quite impractical in a well-connected literature.

Homes What if the local stub has no hints about where to find the work? One might try to ensure, as some do now, that most works are at large "libraries" near to almost everyone. But this approach does not scale well in a distributed system. Instead, each work can have embedded in its ID the name of its "home", the distributor where it was first published. If one could not find the work at any nearby places, one could reliably find it at its home, since it will probably be there if it is anywhere. This centralizing of responsiblity allows non-home places to delete the work without notifying anyone. Imbedding the home in the ID makes sure one cannot have an ID without knowing the home, but requires that a work cannot change homes.

Permanence If it is further required that works be permanent at their home, then there will be little risk of unfollowable references. To lower costs, the following exception is acceptable: if the ID of a work has never passed the boundary of a place, a local garbage collection algorithm can prove that there are no local references to it, and if its author so requests, then that work may be deleted, and some money refunded. Total permanence cannot be guaranteed, but can be approached with storage redundancy and write-once media.

Bonded Established Homes But what if a home goes out of business, or cuts corners to save money? Homes need to be long-lived "established" entities which the publishing community can trust. But how does one ever become a home in the first place? A solution is to require a home to post a bond guaranteeing permanence, paying an amount at least sufficient to finance an attempt to reconstruct the home from copies sent elsewhere. This bond can be obtained from an insurance company if necessary.

A home also needs to be established so that people can trust its claims about "priority", i.e., who published what when, and therefore had what idea first. Present academic media exist in large part to validate such claims. A home should also post a bond promising to pay a very large amount should they ever be caught falsifying the author, publication-date, or content of a work. Of course, distributing copies of a work to diverse organizations who record the arrival time is the most reliable way to ensure that claims of priority will be believed.

It is all right if the management or geographic location of a home changes, as long as its mail name and contents remain constant. People who now run bookstores and video rental stores should be able to afford to create homes, but it is important that homes be reliable. Note that a distributor need not be a home, and people who now run bulletin board systems should be able to create them.

Home Has Copy of Referenced Work The chances of finding a work are increased by requiring that the author of a work pay to bring all the works it references to that work's home (unless someone else has already paid). Following a reference from the home of a work is then fast. This design choice is modular, except that it conflicts with having restricted-access works; the author of a private work must trust the homes of all works which reference his/her work. The required distribution of copies aids permanence.

The costs of the above rules unfortunately raise the cost of publishing, and therefore the threshold at which one is willing to publish. But they are justified if the meaning of "to publish" implies that the author has a responsibility to make a work and the works it references easy to find.

Constancy Just as the semantics of a programming language are clearer without assignment [1 p. 175-180], the semantics of a publishing reference are clearer when works do not change. Work constancy fits present publishing well, and avoids the problem of a criticism being made to look stupid by changes in the item criticized. Also, consistently informing copies and references about changes becomes increasingly difficult with wider distribution. Constancy allows one to put a group of works on a CD-ROM disk, leave it in a drawer for three years, and then use it without any consistency worries. The choice of a static metaphor as primary does not strictly limit expressiveness; it is possible to make updates of works, they are just new different works that can reference the original. Note that while the references from a work will not change, the references to it will.

Wrappers Also, one may indicate an intention to reference the most recent version of a work by referencing a small "wrapper" which references both a specific version and a specification of how to find the most recent version. Wrappers are what one wraps around a reference to indicate how that reference should be treated. Wrappers can be used to allow references to non-published items; one references a wrapper which gives the name of the non-published item, and perhaps also some description of how to find it, such as its "home". This same technique can be used to reference a yet-to-be-published work. Note that while the wrapper will be permanent, a non-published item might not be, and so the wrapped reference might "dangle".

Hypertext

As this is the first level that is not in wide use now, the publishing standards required to make this level work may seem strange. Nonetheless, I think they are reasonable and worth the effort. The major task at this level is to ensure that backfollowing a reference is cheap and reliable.

It is clear that each copy of a work should locally mention what works it references. It is not clear that all copies of a work should have associated local information mentioning all works that reference it, i.e., all backreferences. There would need to be a centrally accessible way to find all public copies of a work (exempting private copies to preserve privacy) in order to install new backreferences on them. This would aid a government seeking to eliminate a "subversive" work. Also, the overhead for a small unpopular comment on a very popular, and hence widely distributed, work, such as a part of the Bible, becomes enormous. This is part of the general quality problem with backreferences. One might reasonably assume that works referenced by a better-than-average work are themselves better than average. But the works that reference it will be of only average quality, and will be much more numerous.

If not all copies have all backreferences, then there must be some combination of rules saying where backreferences must be, and rights saying who may choose what backreferences go where. There are in fact many potential conflicts of interest about where backreferences should go. Imagine that y references x. The owner of y may want y to be easily found from x, but the owner of x may want to eliminate clutter and negative criticism. Conversely, the owner of x may want to find all criticism so that he/she can respond to it, and the owner of y may want to avoid such critical responses. The owner of a place may want to be able to pick and choose the place's contents in order to attract a certain clientele. Readers trying to deal with the quality problem may want to read evaluations of a work (or have their programs read) before deciding whether to buy access to it. This design has chosen to forgo the total backreference solution in favor of a combination of rights and responsibilities.

Untraced Distribution To make it hard for a government to find all public copies of a work and to make it easy to move copies, distribution is untraced. Thus one cannot ask a work what places it is at, even though one can ask a place what works are there. However, this design will make the following copies easy to find: those at the homes of works which reference or are referenced by the work in question.

Right to Backfollow References This design will settle conflicts of interest by favoring those who want to find criticism or make criticism easy to find. By saying where backreferences must be, the following rules ensure that with a predictable and reasonable effort anyone can backfollow all references to any work. Imagine that, while at a place, one reads a work and would like to find works which reference it.

Local Backreference Pointers To make it easy to find other works at the same place, each place should manage a set of pointers associated with each work or stub the place contains; the pointers of a work at a place point to all works at that place which reference that work. When references have different types, these pointers should be organized by reference type. These pointers can then be used to answer backfollow queries within this place. Note that users need not think in terms of backreferences; they are just an implementation trick to allow faster response to queries about references. A pointer implementation is not actually required if, ignoring royalties, it is almost as easy to backfollow a reference as to follow it. Small places can and should preserve a simple consistency rule in answering queries: a reference appears at both ends or neither end within a place.

Home has Copy of Referencing Work To make it feasible to find all works at any place which reference a particular work, the author of a work must pay to put the work in the homes of all referenced works. Thus the home of a work will tend to accumulate all works that reference it. These copies should be permanent; thus homes must allow one to permanently store non-original copies. To allow wide distribution, a work should be allowed to be considered published before these copies actually arrive at their destination places. Since the copies are sent when a work is published, they aid the verification of who published what when. This rule conflicts with having restricted access items; an author must trust the homes of a work he/she references to not give away access. An alternative, not chosen by this design, is to require only a backreference pointer and not the whole work at the referenced homes. But this would mean losing local reference/backreference pairing consistency.

Note that this rule discourages large works and creates an economy of scale to encourage large homes. Large works really ought to be discouraged anyway; a person who is only interested in a small part of a work should not have to pay for too much extra "baggage" to obtain the part that he/she wants. If the large home incentive induces an oligopoly of "Library of Congress"-like homes, then governments may want to add extra constraints on their behavior.

Composite Places The above rule can be thought of as the managing of a "global" place, which includes everything but has the same semantics as any local place. The same technique can be used to manage a composite place called "Seattle" composed of all the public places in Seattle. A hash function could be used to assign a "gate" place within Seattle to every work. If every work within Seattle is placed at its gate and the gate of every work it references, then backfollowing will work within the boundary of Seattle. The overhead for this can be substantial, but is small if most works in Seattle have their home in Seattle. Places should be run by a single responsible organization with a specific mail address, except that the global place cannot be. Homes must be atomic (not composite) places.

Free Backfollowing There should be no royalty charge for backfollowing a reference from a y which references x. If one had to pay the cost of x, then one could not look at the evaluations of x before choosing to buy x. If one had to pay the cost of y, then one could not find out about y through the backreferences on x, and then look at the evaluations of y before choosing to buy y.

Right to Publicize The second part of this design's favoring of easy to find criticism is a right to publicize. Not only should reference backfollowing be reliable, but anyone should be able to pay to make it cheap at any given place. This right is supported by the following rules, but untraced distribution makes it harder to find the places where one might want to publicize something.

Common Carriers The freedom of speech and "copy at home of referenced work" rules together imply that all homes must allow anyone to pay to add copies of certain works to any home. Such homes should not be able to discourage certain references by charging them extra high storage prices. This, and the right to publicize, follows if we adopt the more general rule that all distributors must act as common carriers with respect to their contents, at prices that do not vary too dramatically over time. Thus anyone is able to pay to have a work at a place, with a fee (and performance) that is blind to the work's content. If a place can be a home, it should be willing to be any work's home. This implies that a public home must be prepared to be indefinitely expandable. The only alternative is to allow a home to say "Sorry, no more criticism. We're out of space." Places are allowed to have access-performance depend on frequency of access, i.e., to cache. If an organization wants to provide different quality storage media for different prices, such as fast vs. slow disk, each storage medium should be declared a different place.

Local Funders When places are common carriers, someone else must take the initiative in choosing what works should be at any given place. The author of a work chooses the work's home, but for other atomic places, the first person to request it becomes "funder" of the work at that place for a specified period of time. The right to publicize means that anyone may request to fund any work at any distributor. The funder is charged for local storage and one access charge (at the time of promising to fund), but receives a "markup" amount every time access is purchased locally. The funder chooses the markup, but the author can declare a maximum markup. When a funding time period ends and there is no waiting funder, then a work is deleted locally. When all of the local references to a stub are gone, it and all its local purchase tokens are deleted. As a distributor now has a right to obtain a work and sell access, it is even more important the paying of royalties by enforced; one cannot "take one's business elsewhere" if one suspects foul play.

The exact design of the financial incentives for funders is a delicate but modular issue. Funders should be able to gamble that a work will be popular at a place, and then reap benefits if they are right. But they should not be able to purposely discourage access to a work by choosing a very high markup.

Uniform Objects

Basics At this level all media content, including text, is built up from many small objects, some of which are links, all resembling a physical network of rings and hooks. Links can be followed in either direction. Works are now sets of objects which share bookkeeping and appear as a unit. The resulting publishing database is more uniform, expressive, and general. The cost of all this is that a straightforward implementation is somewhat more expensive, but the cheapness of information technology makes this trade well worth while, in my opinion.

Objects and Links The "object" is the database item which corresponds to the ring in the physical analogy. Each object has its own ID, and objects are the unit of reference. Corresponding to the hook is the link, which has three ends, all of which are objects: the "from" end, the "to" end, and the "type" end (see Figure 2i). Some links, not all, are references; the link type determines which. For references, the from end is referencing the to end. In general, an object can have any number of links of any types from or to it, but will only have a few references from it. Note that while all published items are made of objects, there are non-published items associated with the publishing database that need not be made of objects, such as purchase tokens and personal money accounts.

Links are Objects, Ad Infinitum A link is itself an object, with three references of type "from", "to" and "type" from the link to its ends (see Figure 2ii). (These three link types are called the "core" link types.) The resulting infinite number of objects is dealt with by having virtual objects, explicitly created only when they are needed. The resulting infinite number of links of type "to" to every object (see Figure 2iii), and of "type" links to the core link types, is dealt with by infinite answer streams. Each object has exactly one "type" link from it, and anyone may create a new object type, including a new link type. Having links be objects allows them to be more uniformly annotated than in systems, such as [12], which provide attribute/value pairs or keywords on links.

The Follow Query The basic read queries are to follow or backfollow links of a given type. Input an object and a link type, and the output is either a set of objects which are the other ends of links of that type, or the links themselves (see Figure 2). The user is charged royalties if the query requires following an as yet unpurchased non-free reference. Backfollowing a reference and following a "price" or "type" link are free of royalty charges. One can set money and time limits on a query, but the database seen through such filters may not satisfy various consistency properties. Variations on the basic query should allow one to ask what link types there are from or to an object, and to ask for the number of answers to a query. A stream facility, as in [1 p.242-292], allows for piecemeal retrieval of large or even infinite answer sets. The effort to find out the size of a large answer set should be independent of the size of the set.

This link-based data model was chosen over others [27], such as the relational model or the object plus instance variables model, because of its close association with links at lower publishing levels, its expressiveness in allowing as much as possible to be directly referenceable, the expectations of indexing it raises, and its lack of raising expectations of constraints which would limit free speech. Link models are very similar to semantic net data models, which are tuned for many of the same features.

Text Before discussing how these small objects are managed, let us discuss how text is represented at the uniform object level.

Text Trees As mentioned in the database features section, text is represented as a tree whose leaves are characters and whose intermediate nodes are words, phrases, sentences, paragraphs, sections, etc. (see Figure 3). The children of a node, and hence its content, are specified by references. To find the print representation of a text object, one collects the ordered set of leaf characters. To create a text object from a print representation, one chooses a specific parser to group characters into larger objects. Since one wants to find the print representation of an object often, the system maintains a "print-repr" link linking an object to a cache of its print representation. Print-repr is the only link type which has a non-object data structure at its to end, and is not backreferenced. A print-repr plus parser implementation can also be used to compactly send objects via communication media. Figure 4 shows how a user interface might convey text objects to users, and Figure 5 shows how it might convey links about text objects.

Binary Trees This design requires that text trees be binary; each node references two children, called its "head" and "tail". This choice allows text traversing algorithms to be expressed very simply, as with cons cells in Lisp, and forces a text to be broken up into the most possible sub-objects. Unfortunately many of these objects will not mean much to readers, so user interfaces may want to hide these from users. An alternative would be to use the following references: "1st", "2nd", "3rd", ... etc.

To reference a subsequence of the text for which there is no explicit text object, one can simply compose the appropriate subsequence object. Note that keyword searching is free of royalty charges since it only requires backfollowing.

Distinguish Meanings A "same-print-repr-as" link can be used to indicate that an object has the same print representation as another object, but a different semantic meaning. Thus a piece of text can contain a semantic object whose print representation is "love" but which specifies which particular meaning of love was intended. A "text-type" link can be used to explicitly label a text object as a sentence, paragraph, etc.

No Free Quotes The ability to directly reference a sentence allows one to quote that sentence in a way which makes it easy for readers to see the original context of that quote. However, since that sentence has a different author, the reader must pay an additional price to read the quote. This is contrary to the existing practice of allowing free small quotes from a work, but seems hard to avoid.

Versions Alternative versions can be directly expressed using various links, such as an "authors-next-version" link. Text made by modifying other text can reuse sub-objects, and so links on those sub-objects are directly "inherited" to alternative versions. A user interface can also choose to inherit links from other explicitly denoted alternative versions, especially if the author of a version indicated such an intention with an "inherit-links-from" link. If desired, a user interface can compare slightly different versions of text by searching from the top down for text objects contained in both text object trees. This design pays much less attention to versioning than [5,11].

More Wrappers Having fine grain text objects that one can reference and reuse creates a problem of ambiguous context. If one references a sentence, that sentence is reused in another context, and then later someone follows the first reference, it may not be clear which context was intended for the original sentence. This can be solved by referencing an "in-context" wrapper which references both specific sentence and the intended context. Wrappers can also specify how a user interface ought to display referenced text, distinguishing between standard source references, footnotes, quotes which ought to be visually displayed with the text, etc. A similar feature is available in [9].

Indexing The following design choices deal with maintaining indexing at the uniform object level.

Structures The references of an object describe its "content" and "define" what that object is, so an object without its references is an incomplete "stub". An object and its references are together called a "structure". At the uniform object level, structures follow the various "send copies there" rules which works had to follow at lower levels. The following consistency rule is preserved: either all or none of the references of an object are at a place. Thus one can tell if an object is a stub by asking for its type. For convenience, the ID of an object is imbedded in the ID of its references.

The Meta-home There will be a base set of objects which are necessary for the whole publishing system, including characters and link types. If they all have the same home, then this is where one would want to publish comments on the publishing system itself. Such commonly used objects should have short IDs for efficient communication.

Publishing Organizations Since all homes must be run by a single responsible organization, the meta-home must unfortunately be centrally controlled. An organization is also needed to settle the subtle question of when one item is similar enough to another that it constitutes "plagiarism". To whom these organizations are responsible and in what way is a crucial issue not addressed by this design.

Backreference Exceptions Many of the costs of total backreferencing are nicely bounded. Locally, the overhead is a factor of two -- one backreference pointer for each reference pointer. Globally, one need send no more copies of a work than it has references. There is the potential, however, of overwhelming the homes of very commonly referenced objects, especially the metaÐhome.

To avoid this, the following exceptions to the global backreferencing rule are allowed. Imagine that y is the nth object to make a t type reference to object x. If n is greater than 104, then only the stucture of y need be sent to the home of x, instead of the whole work. If n is greater than 108, nothing need be sent. (There is nothing magic about these numbers.) An interesting variation is to require only a random sampling of copies to be sent.

When one only need send a structure, then large structures are discouraged, as large works were before. To make this work, the overhead for sending a mail message should not discourage very small messages. When nothing is sent, a backfollow query could still be answered, but it would require the cost of polling all homes.

Reuse Objects To both discourage plagariasm and make the publishing database simpler to reason about, one would like structures to be unique. That is, two objects have the same references if and only if they have the same ID. Unfortunately this constraint cannot be satisfied. The same reference combination can be independently created at distant places, and after a time "merging" the IDs may not be feasible. So instead this design will attempt to minimize such user visible redundancy, and provide support for ignoring it.

To minimize redundancy, an author publishing at a place is required to reuse any old object at that place with a given set of references rather than create a new one. If one publishes an object in the home of one of its referenced objects, there is a good chance of finding an old structure. To allow redundancy to be ignored, each place must maintain "same-references" links between all contained pairs of objects (excluding stubs) which have exactly the same references and have at least one reference other than "type". If appropriate, one could use these links to treat redundant objects as the same object. To manage these links at the global place, the home of a work locally referenced by a pair of redundant objects must tell the homes of those objects, who must then send each other copies.

Places Not Too Small The combination of royalty free backreferences and references defining objects creates some potential problems. If one guesses that an object with certain references exists locally, one can confirm it, and so know all of the content of an object without paying any royalties. Worse, if one had a list of all the stubs and "basic" objects (such as characters and types) at a place, one could do a single pass on the backreferences and reconstruct all of the objects at that place. To avoid this problem no list of stubs at a place should be provided to readers, and places should be large enough to discourage systematic walks through their contents. Note that "garbage" also discourages this, as would a small backfollowing fee, which could be instituted if necessary. There are design alternatives to the free backfollowing rule, such as one free link type called "backreference".

Works The remaining design choices at this level deal with the concept of a work.

Work as Atom of Context In order for this design to function, each object needs to have various bookkeeping information associated with it, such as its price, author, publication date, local access tokens, etc. If these had to be stored directly on each object, the cost would be substantial. This information can be inherited from associated objects, but this could be awkward if there were no constraints about what other objects might be at the same place. The lack of a predictable context for an object also creates the potential of quoting someone "out of context", i.e., where only one sentence appears at a certain place and the surrounding discussion is available only at another distant place.

To solve these context dependence problems, a "work" is declared to be an atom of context; one cannot follow the references of a structure until the entire work is present at that place. Thus an author can insure claims are made in context by declaring a set of objects to be a work. And bookkeeping information can be reliably inherited within the scope of a work.

Forbidding one to follow references in lone structures unfortunately seems to violating reference/backreference pairing consistency. Fortunately, lone structures only arise when there are backreferencing exceptions.

Each structure is either entirely in or not in a work, and is in exactly one work, indicated by an "in-work" reference from the object representing the work to the central object of the structure. For convenience, the ID of a work is imbedded in the IDs of its objects. The all or nothing nature of a work implies that the basic write operation is to "publish" a set of objects as a work, and that reference cycles can only be within the scope of a work. A coarse grain read query to obtain a whole work would be convenient.

Inherit Bookkeeping from Work This design avoids complexities of inheriting bookkeeping by simply inheriting most of it from the work object. All of the objects in a work have the same author, publication-event, and are sold as a unit for one total price, except that individual structures can be flagged as "free". The "at" link, connecting a work to a place where it is present, is "common" so that local bookkeeping, such as the local funder and markup, can be associated with the "at" link itself and no copies of the "at" link or local bookkeeping need be sent elsewhere. A standard mapping between a work and a bit sequence is needed to support signatures and decryption keys.

Note that since people are the value of the "author" link, each author will need to have an object representing him/her, and therefore will have a home where comments about him/her are published. Since homes are permanent, a person cannot change homes. The mail system will still route royalty payments directly to that author, however.

Sensors

Sensors are built from fairly general patterns which can be matched in either a data or goal driven fashion, and can also be published.

Expressive Patterns The more constructs that the pattern language can support, the more convenient it will be. Patterns should have atomic terms which represent links between objects, with variables in any combination of positions and a modifier expressing what place the link is at. "And" and "or" combinations with matching variables are useful, as is the ability to bound the scope of a variable with "exists". Patterns should be matchable in a data-driven fashion, where one is told of variable bindings corresponding to any new match to a pattern, and a goal-driven fashion, where one inputs the bindings to some variables and is returned the bindings for the other variables in matches to the pattern. With a data driven pattern, also called a "trigger", one can be sent a mail message about any new matches. Data driven matching should allow the option of incrementally caching answers, as in the Rete algorithm [10], and is more useful with special constructs for expressing time and the movement of copies. Because of untraced distribution, trigger messages come from the home of a pattern only.

Note that with triggers, people can create facilities for tracing the distribution of works, and therefore any organization could create a composite place, like Seattle, without needing permission from component places.

Restricted to Avoid General Computation A big issue is whether to allow a pattern language to be powerful enough to support general computation, as in PROLOG or OPS5. An argument against this is that the publishing community would have to agree on an implementation approach so that people can predict the computational effort needed to answer their requests, especially if there is a chance of infinite loops. (Also, getting people to agree on a standard programming language seems hard!) To avoid controversy, this design will restrict pattern language power by restricting the use of "not", and by not allowing pattern defined virtual objects. Note that limited agreement on how patterns are matched is needed in any case, since users must be charged for works which an implementation examines in seeking matches.

Regularity Tags Using patterns to ensure that semantic constraints are met appears to conflict with freedom of speech, which insists that all objects be allowed. A solution is that everything is allowed, but not everything is "regular". When an object is published, it is checked against semantic constraint patterns. If it fails, then a "fails-to-meet" link is created between the object and the constraint, constituting an "irregularity tag". If this link is included in the work of the object, then one can say that an object is regular if there is no regular irregularity tag on it. Implementations should allow users to efficiently filter out irregular objects in their queries. This approach will not work well if anyone can create a semantic constraint -- what simple filter does one use? To solve this, we can centralize the responsibility for the semantics of an object and assign it to the other end of each object's one "type" reference. Thus the only constraints which decide if an object is regular are those referenced by the object's type. To limit backreferencing exceptions, irregular links must always be backreferenced, and are not considered "references" for the rules in this design. Semantic constraints should be time independent; if an object is ever regular, it is always regular. (The same would apply to virtual objects if this design included them.) This can be true if regularity patterns only look at objects found by following references and staying within a work. A convention is needed to resolve "pattern locking", where constraints for different objects are each waiting to see if the other objects are going to be regular. Authors should be able to create "temporary" objects so that they can check whether an object would be regular without actually publishing it.

Virtual Places "Virtual places", defined as all objects in a place which match a certain filter pattern, can be uniformly treated by users with a basic "ask" query, which instructs a place to forward a query to another place. In general, places which do not have a specific mail address, like virtual places and the global place, require unambiguous specifications of how they should respond to various messages so that other places can pretend to be them in response to ask queries. Attempts to publish an object at a virtual place whose filter rejects that object must fail, and so virtual places are not distributors.

Assistants

No standard programming language for coding computer agents is proposed here; it is a modular issue and widely distributed computation is a complicated issue [13]. Of particular interest are "proxies", which are programs that run very "near" the database hardware of a place. From this position they can cheaply examine many items and choose only a few to send to users over the more expensive communication lines. This prefiltering is very important for avoiding "garbage".

Small Grain Queries This design provides limited support for proxies by providing very fine grain database queries, and implementations are encouraged to make the execution of these much faster than needed to support a browsing human. It would be nice if proxy programs could use direct pointers in their queries instead of using global IDs, but only if they are prevented from reading works they have not bought. Communication between proxies and places is still abstractly described in terms of mail messages.

No Proxy Discounts To allow more automatic screening for readers, it would be nice if proxies could examine items and not pay full royalties if they did not pass the items on to their owners or find them of use in their analysis. However, I have not found a secure way to allow such an exception.

Scores which summarize the present evaluations of an object, as in [16] and the story of Jane, could be very useful. But this information would change often, and so might not be worth permanently publishing. The awkwardness of fitting evaluations summaries into this design is presently an open issue.

Discussion Of Design

The LinkText design presented here is clearly ambitious. It is, after all, a design for a database standard to allow the integration of most future publishing. As such, it tries to preserve rights to speak, respond, and publicize, to receive royalties and credit for being first, and to privately read. It tries to support both fast reliable following and backfollowing of all links, and to allow automatic prefiltering to eliminate "garbage". It is intended to preserve the basic technological fact that the human effort to compose or read even the smallest idea is greater than the cost of storing or retrieving that idea. As a design, it tries to deal seriously with very wide geographic, temporal, and organizational distribution of data structures. And it tries to be expressive and general enough to allow future innovation and a wide variety of present uses.

On the other hand, the LinkText design is not actually being proposed as a standard -- it is only a "strawman" for the purpose of highlighting possibilities and issues that such a standard would have to address. Because of this, I have focused on issues of modularity, seeking to discover which design choices depend on each other and which do not. Choices which do not depend on others have been noted and then mostly ignored, while a central core of interdependent issues has been given close attention. A specific design was sought which could simultaneously address all of these core issues. This design was then itself modularized into a number of publishing levels, each of which could be adopted without adopting higher levels.

The reader will have to judge for him/herself the success of the LinkText design at meeting its stated goals. To aid this judgment, the reader is now reminded of some of its failings. LinkText lacks complete generality since works are static, cannot be private, and cannot change homes. Pattern based computation is limited, and no explicit consideration has been given to graphics, video, speech, etc. Substantial extra costs are required of any author. Numerous copies of a work must be permanently stored, and the fine grain representation can be more expensive. Distributors and homes are more expensive than they might be as they must be bonded, not too small, and common carriers. Since expensive homes which operate under an economy of scale may become large and slow, hindering innovation. Worse, centralized publishing organizations are subject to abuse and might slow innovation even more. The right to respond has been compromised by untraced distribution, and often-computed evaluations are left out of the design. Finally, many design solutions are awkward. There are follow/backfollow consistency exceptions due to sending structures instead of works, and binary text trees force the creation of fairly meaningless text nodes.

These failings just show that LinkText, like most designs, consists of "elegant" solutions and not so elegant tradeoffs. Since most discussion of a design revolves around the tradeoffs, we conclude with such a discussion here. And since this design is distributed across organizational boundaries, creating unusually interesting tradeoffs where the interests of different parties fall on different sides, this final discussion will focus on tradeoffs involving such conflicts of interest.

The LinkText design usually tries to resolve conflicts of interests through concepts of rights and responsibilities, which all end up turning on the basic question of what it means to "publish". Publishing is conceived of here as the market for buying and selling "information" which makes the most concessions to the ideal of open debate. A published item is not independent of others, but is part of a connected whole, and has a responsibility to help maintain certain basics so that the whole can function smoothly. Publishing is in part an attempt by an author to gain many readers, and so the convenience of the author is subordinated to the convenience of the reader. Yet an author has certain rights to speak out and to respond to others. These basic ideas are reflected in the various design choices.

As a market, the right of an author to receive the price of his/her choosing for access to his/her creation is preserved and supported through an emphasis on distributors, charging per person access, and disallowing proxy discounts. As a debate medium, the right of an author to speak out is preserved and preferred over the interest of a reader in avoiding "garbage", and the interest of a distributor in choosing the content he/she sells. And the interests of an author responding to another author in making his/her response direct, apropos, and easy to find from the first author's work are preferred to any desires of the first author to lower costs, to change, delete, or restrict access to his/her work, and to keep its context free of unfavorable criticism or "garbage".

Once these author's rights are preserved, the interests of a reader are preferred to those of an author. A reader wants to reliably interpret a work, so an author must use a standard format. A reader wants to look at evaluations of something before buying it, so an author must accept a certain risk of giving access away for free. A reader wants to avoid government repression, and so an author cannot restrict access or charge different prices to different readers, and a responder cannot easily find all copies of what he/she is responding to. Most important, a reader wants to follow and backfollow references quickly, cheaply, and reliably, and an author has a responsibility to help maintain this. The base cost of publishing is increased to pay for permanence and a specific pattern of copy distribution, and the author cannot change, delete, hide, or restrict access to his/her work.

Finally, as distributors exist only to serve others, the interests of distributors are subordinated to the interests of all the other parties involved; startup costs are high as distributors must not be too small and homes must guarantee accuracy, permanence, and infinite expandability.

Conclusion

The LinkText design has been presented here to inspire thought about the possibilities and issues in creating a true hypertext publishing medium, allowing millions of people worldwide to read and write in full possession of their publishing rights. Though many more modular issues have been only touched on, a core set of interdependent issues has been identified, and a single unified design has been formed which addresses almost all of these issues. I hope the existence of this design will convince some that the dream of hypertext publishing is possible, and that that dream might soon be made a reality.

Acknowledgments

I thank Peggy Jackson and Eric Drexler for their substantial support, and Mark Miller for encouraging careful attention to "open systems" issues. I also thank all those who offered useful criticism, including Hugh Daniel, Roger Gregory, Russell Brand, Eric Raymond, Kirk Kelley, John Gilmore, Chip Morningstar, Ron Fisher, Chris Peterson, Tom Howland, Randy Trigg, Mark Stefik. and Frank Halasz. Russell Brand deserves credit for the purchase token concept, as does Eric Drexler for the decryption concept.

Footnotes

1 In this paper, "database" just means a module responsible for storing and managing data structures.

2 "Electronic" here means based on advanced digital storage, communication, and processing technology.

3 Some people may consider the references level sufficient to be called "hypertext".

4 In this paper, the term "link" is reserved for connections between items which can be followed in both directions, while a "reference" might only be followable in one direction. At the hypertext level and above, most references are links.

5 Software objects, not physical objects.

6 The combination of new technologies for easily copying and filtering out ads may result in the death of our "blockbuster" oriented mass media. There may be no way to get proportional compensation for the few most popular works, as these can be easily distributed through private channels.

captions to missing figures:

Figure 2. LINKS

DATABASE FEATURES AT EACH LEVEL

Figure 3. A Sample Work

Appendix: Design Specification

References

[1] Abelson, H., Sussman, G., Sussman, J., Structure and Interpretation of Computer Programs, The MIT Press, Cambridge, MA, 1985.

[2] Bush, V., "As We May Think", Atlantic Monthly 176, 1 (July 1945), pp. 101-108.

[3] Chaum, D., "Security Without Identification: Transaction Systems to Make Big Brother Obsolete", Communications of the ACM 28, 10 (Oct. 1985), pp. 1030-1044.

[4] Conklin, J., "A Survey of Hypertext," MCC Technical Report STP-356-86, Rev. 1, MCC Software Technology Program, Austin, TX, Feb., 1987.

[5] Delisle, N., Schwartz, M., "Context - A Partitioning Concept for Hypertext", Computer-Supported Cooperative Work Conf. Proc., Austin, TX, Dec., 1986, pp. 147-152.

[6] Drexler, K.E., Engines of Creation, NY, Anchor Press/Doubleday, 1986.

[7] Drexler, K. E., "Technologies of Danger and Wisdom," Directions and Implications of Advanced Computing Conf. Proc., Seattle, WA, July 1987, sponsored by Computer Professionals for Social Responsibility, pp. 182-186.

[8] Englebart, D.C., Watson, R., Norton, J., "The Augmented Knowledge Workshop", in Proc. National Computer Conf., Vol. 42, ARIPS Press, Arlington, VA, 1973, pp. 9-21.

[9] Engelbart, D.C., "Authorship Provisions in Augment", IEEE 1984 COMPCON Proceedings, Spring 1984, pp.465-472.

[10] Forgy, C., "Rete: A Fast Algorithm for the Many Pattern/ Many Object Pattern Match Problem", Artificial Intelligence 19 (Sept. 1982).

[11] Gregory, R., "Xanadu - Hypertext from the Future", Dr. Dobb's Journal 75 (Jan. 1983), pp. 28-35.

[12] Garrett, L.N., Smith, K., Meyrowitz, N., "Intermedia: Issues, Strategies, and Tactics in the Design of a Hypermedia Document System", Computer-Supported Cooperative Work Conf. Proc., Austin, TX, Dec., 1986, pp. 163-174.

[13] Hewitt, Carl, "The Challenge of Open Systems", BYTE, April 1985, pp. 223-242.

[14] Kaehler, T., "Virtual Memory on a Narrow Machine for an Object-Oriented Language", OOPSLA '86 Proceedings , Sept. 1986, pp. 87-106.

[15] Kay, Alan, "Computer Software", Scientific American 251, 3 (Sept. 1984).

[16] Lowe, D., "Co-operative structuring of information: the representation of reasoning and debate", International Journal of Man-Machine Studies 23 (1985), pp.97-111.

[17] Malone, T., Grant., K., Turbak, F., Brobst, S., Cohen, M., "Intelligent Information-Sharing Systems", Communications of the ACM 30, 5 (May 1987), pp. 390-402.

[18] McCarthy, John, in Panel Discussion, sponsored by Computer Professionals for Social Responsiblity San Jose State, CA, Oct. 1 1984.

[19] Miller, M., Drexler, K.E., articles in The Ecology of Computation , ed. Huberman B., to be published by Elsevier Science Publishers, North Holland.

[20] Nelson, T.H., Literary Machines, T.H. Nelson, Box 128, Swarthmore, PA. 19081, 1981.

[21] Pool, Ithiel de Sola, Technologies of Freedom, Cambridge, MA, Belknap Press of Harvard University Press, 1983.

[22] Pool, Ithiel de Sola, "Whither Electronic Copyright", Electronic Publishing Plus, ed. Martin Greenberger, White Plains, NY, Knowledge Industry Publications, Inc., 1985, p. 217.

[23] Salton, G., McGill, M.J., Introduction to Modern Information Retrieval, NY, McGraw-Hill, 1983.

[24] Shatz, B., "Telesophy: A System for Browsing and Sharing Inside a Large Information Space", Bell Communications Research, July 1986.

[25] Stefik, M., "The Next Knowledge Medium", The AI Magazine 7, 1 (Spring '86), pp.34-46.

[26] Trigg, R., Scuhman, L., and Halasz, F., "Supporting Collaboration in NoteCards", Computer-Supported Cooperative Work Conf. Proc., Austin, TX, Dec., 1986, pp. 153-162.

[27] Tsichritzis, D., Lochovsky, F., Data Models , Englewood Cliffs, NJ, Prentice-Hall, Inc., 1982.

[28] Turoff, M., "TEIES - Tailorable Electronic Information Exchange System", Technical Report, Computerized Conferencing and Communications Center, New Jersey Inst. of Tech., NJ, 07102.

Original version said: (Sorry, backreferences are not available here!)

But now!:

known by AltaVista
known by AltaVista