VIDEO ON DEMAND
THE INITIATIVE FOR WORLDWIDE MULTIMEDIA TELECONFERENCING AND VIDEO SERVER STANDARDS
NEW BUSINESS IMPERATIVES
STARTING WITH STANDARDS
TWO STANDARDS, ONE GOAL
Object Oriented Database Management Systems
Hard Copy References:
Hard Copy References:
Hard Copy References:
Hard Copy References:
TOPICS FOR FUTURE MEETINGS:
THE ATM ADAPTION LAYER
BROADBAND WAN IMPLEMENTATION
VIDEO ENCODING/DECODING STANDARDS
Video on demand has evolved as a major implementation problem for network integrators. Clients want the ability to retrieve and view stored video files asynchronously at near broadcast quality, on a local host. Some problems integrators face to achieve this goal include: video content preparation, server storage, network throughput, latency, client interfaces, quality of service, and cost. This paper addresses the design considerations for a private video on demand implementation.
The Initiative for Worldwide Multimedia Teleconferencing and Video Server Standards
The market for multipoint multimedia teleconferencing and video server equipment is poised for explosive growth. The technology for this necessary and much-anticipated business tool has been in development for years. By the turn of the century, teleconferences that include any combination of video, audio, data, and graphics will be standard business practice.
Compliance with teleconferencing standards will create compatible solutions from competing manufacturers, feeding the market with a variety of products that work together as smoothly as standard telephone products do today. Specifically, with the adoption of International Telecommunications Union (ITU) recommendations T.120, H.320 and H261, multimedia teleconferencing equipment manufacturers, developers, and service providers will have a basic established connectivity protocol upon which they can build products, applications, and services that will change the face of business communications.
New Business Imperatives
Voice on Demand systems are starting to be required by commercial, industrial, governmental and military associations to retrieve past information in order to prepare and anticipate future events. This preparation and anticipation can be crucial to the survival of these industries because of the key roll of the individuals or groups being monitored. It is this monitoring and collection of data that allows these organizations to make informed decisions and to take the appropriate action to current events.
Multipoint multimedia teleconferencing and video servers offer the required solution. As defined here, it involves a user-specified mix of traditional voice, motion video, and still-image information in the same session. The images can be documents, spreadsheets, simple hand-written drawings, highly-detailed color schematics, photographs or video clips. Participants can access the same image at the same time, including any changes or comments on that image that are entered by other participants. Video servers allow users to view stored video files of specific events, conferences, news clips and important information in near realtime.
The benefits are obvious. Instead of text interpretation of a video clip, all interested parties can access the information. Little is left to verbal interpretation since all users have access to the original video. In the case of video clips, a persons actions, verbal tones, mannerisms and reactions to events around them can be viewed and interpreted. Increased productivity, reduced cost, and reduced travel time are the primary benefits while proprietary technology and solutions are specified as the primary inhibitors of using video on demand products and services.
Starting with Standards
While multimedia teleconferencing and video servers promise to revolutionize vital everyday corporate tasks such as project management, training, and communication between geographically-dispersed teams, it is clear that standards-based solutions are a prerequisite for volume deployment. Standards ensure that end-users are not tied to any one supplier's proprietary technology. They also optimize capital investment in new technologies and prevent the creation of de facto communication islands, where products manufactured by different suppliers do not interoperate with each other or do not communicate over the same type of networks.
When adopted and adhered to by equipment suppliers and service providers alike, standards represent the most effective and rational market-making mechanism available. ISDN, fax, X.25, and GSM are a few obvious examples of standards-based technologies. Without internationally-accepted standards and the corresponding ability to interoperate, the services based on these technologies would almost certainly languish as simple curiosities.
Interoperability is particularly important in multipoint operation, where more than two sites communicate. A proprietary solution might suffice if two end users want to communicate only with each other; however, this limited type of communication is rare in today's business world. In typical business communications, multiple sites, multiple networks, and multiple users have communications equipment from multiple manufacturers, requiring the support of industry standards to be able to work together. This interoperability is also critically important when a video server may be transmitting data across a WAN to multiple users, in multiple sites.
Perhaps the most important effect of standards is that they protect the end users' investments. A customer purchasing a standards-based system can rely on not only the current interoperability of his equipment but also the prospect of future upgrades. In the end, standards foster the growth of the market by encouraging consumer purchases. They also encourage multiple manufacturers and service providers to develop competing and complementary solutions and services.
Two Standards, One Goal
Fortunately, standards for multimedia teleconferencing are at hand. Working within the United Nations-sanctioned ITU's Telecommunications Standardization Sector, two goals have been achieved: the T.120 audiographics standards and the H.320 videotelephony standards. T.120, H.320 and H.261 are "umbrella" standards that encompass the major aspects of the multimedia communications standards set. The T.120 series governs the audiographic portion of the H.320 series and operates either within H.320 or by itself.
Ratification of the core T.120 series of standards is complete. These recommendations specify how to use a set of infrastructure protocols to efficiently and reliably distribute files and graphical information in a multipoint multimedia meeting. The T.120 series consists of two major components. The first addresses interoperability at the application level, and includes T.126 and T.127. The second component includes three infrastructure components: T.122/T.125, T.124, and T.123.
The H.320 standards were ratified in 1990, but work continues to encompass connectivity across LAN-WAN gateways. The existing H.320 umbrella covers several general types of standards that govern video, audio, control, and system components. With many businesses using LANs to connect their PCs, the pressure is on to add videoconferencing to those networks. Since the H.320 standards currently address interoperability of video conferencing equipment across digital WANs, it is a logical and necessary step to expand the standards to address LAN connectivity issues. As the work to expand H.320 continues, it remains the accepted standard.
Both the T.120 and the H.320 series of standards will be improved upon and extended to cover networks and provide new functionality. This work will maintain interoperability with the existing standards.
Standards as complex and universal as the H.320 and T.120 series need a coordination point for the interim steps a proposal takes on its way to becoming a standard. The IMTC is an international group of more than 60 industry-leading companies working to complement the efforts of the ITU-T with an emphasis on assisting the industry to bring standards-based products successfully to the market. Its goals include promoting open standards, educating the end user and the industry on the value of standards compliance and applications of new technologies, and providing a forum for the discussion and development of new standards. The IMTC is approved as an ITU-T liaison, and interfaces with the ITU-T by participating in standards discussion and development, feeding information and findings into the appropriate ITU-T Study Groups.
The Standards First initiative encourages multimedia equipment manufacturers to start with compliance to at least the H.320 T.120 and H.261 standards described above. Further standards compliance is recommended but optional, and manufacturers will still have the ability to differentiate their products with proprietary features, creating Standards Plus products. Compliance to the minimum H.320/T.120 standards will ensure a basic level of connectivity across equipment from all participating manufacturers.
Standards have played an important part in the establishment and growth of several consumer and telecommunications markets. By creating a basic commonality, they insure compatibility among products from different manufacturers, thereby encouraging companies to produce varying solutions and end users to purchase products without fear of obsolescence or incompatibility.
The work of both the IMTC and the ITU-T represents an orchestrated effort to promote a basic connectivity protocol that will encourage the growth of the multimedia telecommunications market. The Standards First initiative, which has been accepted by several industry leading companies, requires a minimum of H.320, H.261 and T.120 compliance to establish that basic connectivity. Manufacturers are then able to build on the basic compliance by adding features to their products, creating Standards Plus equipment. By insuring interoperability among equipment from competing manufacturers, developers, and service providers, Standards First ensures that a customer's initial investment is protected and future system upgrades are possible.
The first step in a VOD system is the entry of Video information. The possible sources of video information in a large scale (Government) VOD system include: Recorded and Live video, Scanned Images, EO, IR, SAR collected Images. Recorded video is the primary concern of this paper. Since latency and jitter do not effect Imagery data types they will be noted but not expanded upon. Live video is the primary concern of video conferencing, but the requirements do overlap with recorded (VOD) video.
Recorded video must be digitized and compressed as soon as possible in the VOD architecture to minimize the system storage requirements. The Motion Picture Experts Group of the ISO developed the MPEG-1 and MPEG-2 standards for video compression. With MPEG 1 a 50 to 1 ratio is typical. MPEG-1 can encode images at up to 4k X 4k X 60 frames/sec. MPEG-2 was optimized for digital compression of TV and supports rates up to 16K X 16K X 30 frames/sec, but 1920 x 1080 x 30 frames/sec is considered broadcast quality (MPEG-2, Hewlet Packard pub. 5963-7511E). MPEG-2 offers a more efficient means to code interlaced video signals such as those which originate from electronic cameras. (Chadd Frogg 8/95)
CODECs encode and decode video into digital format. The CODEC must be configured to encode the information at the desired end resolution. If the end user requires broadcast quality video the CODEC must support that level of quality. The CODEC should also be compatible with the desired data throughput rate of the Content Preparation element. (This can of course be overcome with sufficient buffering .) Several CODECs output information in a form which is directly compatible with distribution HW. Some are designed to output information in DS3, ATM OC3, or Fiber Channel. The Pacific Bell "Cinema of the Future" project utilizes a HDTV CODEC. The analog HDTV signal is digitized and compressed to a DS3 rate (44.7mhz) by Alcatels 1741 CODEC. The CODEC imposes a Discrete Cosign Transform (DCT) hybrid compression algorithm with compensation for video motion. Though the precise algorithm performed by the 1741 is proprietary the following is a overview of the process: Pixel groups called blocks are translated into frequency information using the DCT (similar to a Fourier transform). Next a Quantization step drops off the least significant bits of information. These coefficients are then "entropy- encoded into variable bit length codes. This digital information , now 1/50 of its original size can be passed onto a output mechanism (HW or SW driver ). This is of course just a quick overview, the process for encoding information has been fairly well documented by the ISO.
Object Oriented Database Management Systems
In order to setup a searchable database of these MPEG objects several companies are introducing Object Oriented Data Base Management Systems (ODBMS). These systems can either be coupled with the Media Server element or Content Preparation element of the VOD system. It would be ideal if all ODBMS spoke the same language so that information could be exchanged between data bases. A common query language would be advantageous, but established standards such as SQL do not adequately address Video Objects. Illustra has added Object-Oriented extensions onto ANSI- SQL. These extensions are then used to create "DATABLADES" which provide image handling and manipulation capabilities. Since this architecture uses SQL it is more likely that third party front end Authoring software will be compatible with Illustra. (Interoperability 10/95').
If the VOD server is seen as a central library of video files, with multiple users archiving files and other users retrieving files; the requirement for format standards is evident. There is then, also a requirement to verify that these format standards are being met. This verification usually falls upon the content Preparation element of a VOD system. The natural medifore being that of a publisher ensuring that a book is legible and free of grammatical errors before releasing it to the public. ( This paper would probably be caught by such a publisher.) This auditing of compressed video information is not as straight forward. A particular video stream can flow through an MPEG-2 encoder without incident while a second stream will bog-down the system (possibly inducing errors). Rapidly changing backgrounds , like sports coverage can cause problems.. The MPEG-2 standard is complex and requires more than just an astute systems engineer to ensure that equipment designers of the encoders have not interpreted the MPEG standard differently (from the decoder designers). Hewlett Packard suggests that the industry needs to consider testability as a primary requirement of VOD systems. One way to resolve encoding concerns could be to create standardized test that carefully verify the implementation of the MPEG standard. Bit error rate testers can test transport layers, traditional data analysis tools can also be used to build new test tools for MPEG. It should be no surprise that testability is the last area of standardization for the VOD marketplace.
Preparing video information for VOD archiving has reached a point that developers are able to concentrate on accelerating the compression phase. The compression techniques are relatively well documented. The industry is now addressing how to implement them faster; HW vs. SW, Digitizing Cameras vs. DSP cards. Most experts agree that even though today's workstations have the processing power to perform the MPEG compression it is usually more efficient to perform as much processing in HW (like dedicated video cards) as possible. This is not always the case in Multimedia applications where the end product (do to BW limitations) is not really Broadcast Quality . Quality of Imagery the user expects is also a major consideration in selecting a content preparation element. If the user cannot take advantage of a hi-resolution 2k X 2k image; or if the BW of the distribution network is limited; then a hi-resolution MPEG-2 CODEC might not be justified. If the CODEC implements the "Spatial scalabilty" capability of MPEG-2 then the encoder provides the video in a two part format. This lets low-resolution decoders extract the video signal and with additional processing in more capable decoders, a high resolution picture can be provided.
Once the content is uploaded to the video server in the content preparation phase, and registered appropriately in the database, it becomes available for the end user. In order for this data to be available and viewable by the end user the server should have at least a Raid 5 SCSI controller, 4GB Hard Drives with 7200 RPM, and a high speed network interface. The server should support MPEG-2 compression at 4.0 Mpbs to deliver approximately 28 hours or 96 Hours of MPEG-1 compression of 30-fps, 640-by-480 pixel video on demand which equates to a minimum of 50 GB of Hard disk space. The server should employ RAM in order to buffer the data being received from the disk drive to ensure a smoother transfer of the video to the end user. A minimum of 256MB is recommended. The server should be able to handle MPEG-2 and MPEG-1 in NTSC, PAL or SECAM video formats and be able to meet broadcast and cable requirements for on-air program applications and video caching.
Compression Method *
Storage Required in Mb per 30 Second video clip
Storage Required in Mb per 60 Second video clip
Total Capacity 52GB HDD Holds
MPEG-1 @ 1.2 Mbps
MPEG-2 @ 4 Mbps
* Assumming the standard compression ratio per method type.
There are several major limitations that must be addressed in order to understand why the above requirements are imposed.
1) Storage--There appears to currently be a storage limitation on video servers because of retrieval and transmission time associated with video. Multiple servers will be needed to store and retrieve from large archives of video information. These servers should be distributed remotely to maximize local retrieval and viewing while minimizing WAN traffic.
2) Data stream--in order to view video information with a minimum of latency and without jitter the data stream needs to be constant and uninterrupted (with the exception of some buffering as necessary). There are several forms of buffering:
a) Media stream storage on hard disk.
b) cached at the transmit buffer
c) network transit latency and buffers may be viewed as another buffer.
d) the receive end may buffer a sufficient amount of the media stream to maintain a continuous stream for display and suitable synchronization with the transmit end.
3) Concurrent users--The video server should be limited to 100 concurrent users in order to ensure that each user is able to access the requested data as expeditiously as possible.
4) Network bandwidth size--The network needs to directly proportional to the number of simultaneous video streams. The bandwidth of the system is effectively limited by the bandwidth / transmission capabilities originating at the server.
5) Latency--Although hard to determine, there should be no more than 2 seconds for a video file retrieved locally and no more than 10 seconds for a video file retrieved over the WAN from a remote site.
Several products that are currently being marketed as video servers are:
1) The Network Connection, M2V Video Server:
a) 120 simultaneous 1.2 Mbps MPEG-1 video streams
b) 112GB, RAID 5 storage.
c) In excess of 200 Hours MPEG-1, and 60 Hours MPEG-2.
d) Supports JPEG, M-JPEG, DVI, AVS, AVI, Wavelte, Indeo and other video formats.
e) Supports Ethernet, Token Ring, FDDI and ATM.
2) Micropolic Corp, AV Server:
a) 16 Mpeg-2 Video Decoder Boards with 4 Channels per card is 64 channels at 6Mbps per channel.
b) 252GB, Raid storage.
c) In excess of 120 hours MPEG-2
d) Supports only MPEG-2
3) Sun Microsystems, Media Center 1000E Video Server:
a) 63GB, RAID4 storage.
b) In excess of 32 Hours MPEG-2, and 81 Hours MPEG-1
c) Supports MPEG-1 and MPEG-2
d) Supports ATM and Fast-Ethernet
Video on Demand (VOD) requires predictability and continuity of traffic flow to ensure real-time flow of information. MPEG and MPEG-2 (as described above) require an effective BW of 1.5 - 4 Mbits/sec. Multiplying this "media stream" BW requirement by the number of clients will give a rough estimate of the effective distribution networks bandwidth. The Common Imagery Ground/Surface System (CIGSS) 1 Handbook suggests the following steps to size and specify the LAN technology use for Image dissemination systems:
1. Approximate the system usage profile by estimating the amounts of image, video and text handling that will be required.
2. Convert the amount of images, video and text to be processed into average effective data rates. Raw data transferred directly to an archive ( our video server) and near real- time processed imagery should be estimated separately. The bandwidth requirements can be combined later if needed.
3. Adjust calculated rate for growth. The growth factor should be at least 50%.
4. Add a fraction (about .3 to .4) of the peak capacity to the growth adjusted rate for interprocessor communications.
Updating heritage networks to this new BW requirement can incur substantial costs. The cost of implementing a hi-speed network varies depending on the network architecture.
Several LAN architectures are being used in "trial" VOD systems. ATM, FDDI token ring and even variations of the Ethernet standard can provide the required 10-100Mb/sec BW.
A version of Ethernet called switched Ethernet can provide up to 10Mbps to all clients. Since this is a switched architecture the full 10 Mbps can be available to each client. This architecture provides the quickest most cost effective method of upgrading legacy systems since it does not require upgrade of existing 10baseT wiring. A voice grade Ethernet 100VG-AnyLAN can also be implemented in a VOD system. This architecture however will require some cable upgrades from CAT 3 to CAT 5. Ethernet 100VG is expected to "top-out" at 100Mbps, no further upgrades are foreseen.
Token ring networks have been implemented in a few VOD trail systems. FDDI can be setup to provide 100Mbps and because of the Token-ring architecture, the network can specify BW for each client. A simulated system, described in the Sept '95 edition of Multimedia Systems would be capable of handling 60 simultaneous MPEG-1 video streams. The video server (486DX) not the 100-Mbit/sec token ring limited the system size. This is of course a small system, and due to the "shared" nature of a token ring FDDI architecture , it should not be implemented for larger (1000+) systems.
ATM provides the highest BW and probably the most expensive network solution. ATM provides the proper class of service for video on demand applications. ATM connections running at OC3 rates (155Mbps) are currently priced at approx. $300-$500. ATM is not a "shared" topology. BW is not dependent on the number of users. In fact, as the number of users on an ATM net is increased, the effective BW of the ATM network increases. ATM can have hundreds of services operating simultaneously; voice, video, LAN and ISDN. These services can all be guaranteed, and assured that they won't interfere with each other. The LAN marketplace is currently providing 155Mbps products. Some of the ATM forum leaders (such as FORE systems) are also providing 622Mbps (OC12) network interface cards (NICs). The problem is that ATM is a relatively new protocol. Several companies have come together to form the ATM Forum, to help standardize the architecture. For most network application software the cell-based ATM layer is not an appropriate interface. The ATM adaption layer (AAL) was designed to bridge the gap between the ATM layer and the application requirements. The Forum's efforts have been very successful at the lower ATM adaptive layers but some interoperability issues still exist. The American ATM Forum has standardized on ATM AAL 5 to map MPEG-2 for transport. While the European ETSI has chosen AAL2. These inconsistencies effect the transport of multimedia only through ATM LANS.
There are several transport protocols that can be implemented for audio-video applications; TCP, UDP, SONET, TCP/IP Resource Reservation Protocol (RCVP) and IPX/SPX. Do to the effective data rate necessary to support VOD, protocols that minimize client/server interaction are preferable, except in cases where an over-abundance of network bandwidth exists. In ATM nets supporting mostly non-VOD applications retransmission of lost packets or corrupt packets will not be possible. For example, if cells are lost the Fore Systems AVA Real-time Display SW uses pixel tiles from a previous frame. In a typical VOD system , without error correction, QOS is directly proportional to network/LAN BER (Bit Error Rate). VOD systems which provide error correction as part of network protocol have to be designed to allow for the latency created by their error correcting protocols. (DSS currently implements interleaving, Reed Soloman and viterbi decoding) QOS trade-offs can be quantified and analyzed (see " QOS control in GRAMS for ATM LAN", IEEE Journal of Selected Areas in Communications, by Joseph Hui)
Networking, DBMS and server companies have been adopting upper layer protocols to VOD processes. Oracle Media Net utilizes a "sliding window" protocol. Sliding Window protocol is a well established methodology for ensuring transmission over lossy data links. Medianet monitors the response between client and server, lengthens the response checking time to the point of error and then backs off. (This process theoretically diminishes disruptive latencies ) . Novell developed the Novell Embedded System Technology (NEST) and Netware to run over IPX/SPX protocols. The Novell implementation provides prioritization for video users. Flow control from the client to the server does not yet exist. (Interoperability, 10/95).
Distributing VOD information outside the LAN requires either a very high bandwidth WAN with guaranteed availability, or substantial buffering and latency allowances at the client in order to ensure and maintain a constand display of data. When many people think of information distribution over a WAN, sourced by many different servers, to many isolated users; the Internet naturally comes to mind. The Internet was used by the National Information Infrastructure (NII) workshop as a model for the delivery of video services. This commercial organization conference in addition to supporting HDTV and DSS , is interested in providing VOD services to "all Americans". The Internet was seen as a good first attempt for distributing information. The Internet is inexpensive, requires no gatekeepers, provides search utilities and has several proven Human Machine Interfaces (HMIs). Unfortunately the Internet is also bandwidth limited, provides insufficient: traffic control, security, directories and no guaranteed delivery functions. The Internet may not be the solution to the VOD distribution problem, but it will expedite the development of an open architecture commercial VOD WAN.
Commercial enterprises have been considering hybrid fiber/coaxial cable as one possible solution. This implementation also referred to as "fiber to the curb" requires a partial upgrade to existing telephone distribution infrastructures. Signals are transmitted over fiber to a neighborhood distribution (Gateway) point. The signals are then either converted to RF and sent to the User (home) via coax, or converted to a lower data rate network Interface and sent onto the home. The RF implementation requires the "Set-Top Box" for decoding the RF , The latter could be a PC implementation. ISDN-B the broadband version of ISDN will probably evolve as the leading WAN technology. Narrowband ISDN is already an excepted method of providing the higher serial data rates necessary for minimal quality multimedia applications, like teleconferencing. True motion picture quality VOD implementations will require the Mbps data rates that should be provided by ISDN-B.
The DOD has also been interested in the distribution of video and imagery across WANs. The Defense Airborne Reconnaissance Office (DARO) has developed the Common Imagery
Source: Essay UK - http://www.essay.uk.com/coursework/video-on-demand.php