ⓘ File sequence. In computing, as well as in non-computing contexts, a file sequence is a well-ordered, collection of files, usually related to each other in some ..


ⓘ File sequence

In computing, as well as in non-computing contexts, a file sequence is a well-ordered, collection of files, usually related to each other in some way.

In computing, file sequences should ideally obey some kind of locality of reference principle, so that not only all the files belonging to the same sequence ought to be locally referenced to each other, but they also obey that as much as is their proximity with respect to the ordering relation. Explicit file sequences are, in fact, sequences whose filenames all end with a numeric or alphanumeric tag in the end excluding file extension.

The aforementioned locality of reference usually pertains either to the data, the metadata e.g. their filenames or last-access dates, or the physical proximity within the storage media they reside in. In the latter acception it is better to speak about file contiguity see below.


1. Identification

Every GUI program shows contents of folders by usually ordering its files according to some criteria, mostly related to the files metadata, like the filename. The criterion is, by default, the alphanumeric ordering of filenames, although some operating systems do that in "smarter" ways than others: for example file1.ext should ideally be placed before file10.ext, like GNOME Files and Thunar do, whereas, alphanumerically, it comes after more on that later. Other criteria exist, like ordering files by their file type or by their extension and, if the same type, by either filename or last-access date, and so on.

For this reason, when a file sequence has a more strong locality of reference, particularly when it is related to their actual contents, it is better to highlight this fact by letting their well-ordering induce an alphanumeric ordering of the filenames too. That is the case of explicit file sequences.


1.1. Identification Explicit file sequences

Explicit file sequences have the same filename including file extensions in order to confirm their contents locality of reference except for the final part excluding the extension, which is a sequence of either numeric, alphanumeric or purely alphabetical characters to force a specific ordering; such sequences should also be ideally located all within the same directory.

In this sense any files sharing the same filename and possibly extension, only differing by the sequence number at the end of the filename, automatically belong to the same file sequence, at least when they are located in the same folder. It is also part of many naming conventions that number-indexed file sequences in any number base containing as many files as to span at most a fixed number of digits, make use of "trailing zeroes" in their filenames so that:

  • non-smart alphanumeric orderings, like those of operating systems GUIs, do not erroneously permute them within the sequence.
  • all the files in the sequence share exactly the same number of characters in their complete filenames;

To better explain the latter point, consider that, strictly speaking, file1.ext 1st file in the sequence comes alphanumerically after file100.ext, which is actually the hundredth. By renaming the first file to file001.ext with two trailing zeroes, the problem is universally solved.

Examples of explicit file sequences include: file00000.ext, file00001.ext, file00002.ext. {\displaystyle.}, file02979.ext five trailing zeroes, and another with a hexadecimal ordering of 256 files tag_00.ext, tag_01.ext. {\displaystyle.}, tag_09.ext, tag_0A.ext., tag_0F.ext, tag_10.ext., tag_0F.ext., tag_FF.ext with just one trailing zero.

Software and programming conventions usually represent a file sequence as a single virtual file object, whose name is comprehensively written in C-like formatted-string notation to represent where the sequence number is located in the filename and what is its formatting. For the two examples above, that would be filename%05d.ext and tag_%02H.ext, respectively, whereas for the former one, the same convention without trailing zeroes would be filename%5d.ext. Note, however, that such notation is usually not valid at operating system and command-line interface levels, because the % character is neither a valid regular expression nor a universally legal filename character: that notation just stands as a placeholder for the virtual file-like representing the whole explicit file sequence.

Notable software packages acknowledging explicit file sequences as single filesystem objects, rather typical in the Audio/Video post-production industry see below, are found among products by Autodesk, Quantel, daVinci, DVS, as well as Adobe After Effects.


2. File scattering

A file sequence located within a mass storage device is said to be contiguous if:

  • consecutive files in the sequence occupy contiguous portions of storage space extents, yet consistently with their file ordering.
  • every file in the sequence is unfragmented, i.e. each file is stored in one contiguous and ordered piece of storage space ;

File contiguity is a more practical requirement for file sequences than just their locality of reference, because it is related to the storage medium hosting the whole sequence than to the sequence itself or its metadata. At the same time, it is a "high-level" feature, because it is not related to the physical and technical details of mass storage itself: particularly, file contiguity is realized in different ways according to the storage devices architecture and actual filesystem structure. At "low level", each file in a contiguous sequence must be placed in contiguous blocks, in spite of reserved areas or special metadata required by the filesystem like inodes or inter-sector headers actually interleaving them.

File contiguity is, in most practical applications, "invisible" at operating-system or user levels, since all the files in a sequence are always available to applications in the same way, regardless of their physical location on the storage device due to operating systems hiding the filesystem internals to higher-level services. Indeed, file contiguity may be related to I/O performance when the sequence is to be read or written in the shortest time possible. In some contexts like optical disk burning - also cfr. below, data in a file sequence must be accessed in the same order as the file sequence itself; in other contexts, a "random" access to the sequence may be required. In both cases, most professional filesystems provide faster access strategies to contiguous files than non-contiguous ones. Data pre-allocation is crucial for write access, whereas burst read speeds are achievable only for contiguous data.

When a file sequence is not contiguous, it is said to be scattered, since its files are stored in sparse locations on the storage device. File scattering is the process of allocating or re-allocating a file sequence as being or becoming uncontiguous. That is often associated with file fragmentation too, where each file is also stored in several, non-contiguous blocks; mechanisms contributing to the former are usually a common cause to the latter too. The act of reducing file scattering by means of allocating in the first place or moving for already-stored data files in the same sequence near together on the storage medium is called file de scattering. A few defragmentation strategies and dedicated software are able to both defragment single files and descatter file sequences.


3. Multimedia file sequences

There are many contexts which explicit file sequences are particularly important in: incremental backups, periodic logs and multimedia files captured or created with a chronological locality of reference. In the latter case, explicit file numbering is extremely important in order to provide both software and end users a way to discern the consequentiality of the contents stored therein. For example, digital cameras and similar devices save all the picture files in the same folder until it either reaches its maximum file-number capacity, or a new event like midnight-coming or device-switching takes place with a final number sequence: it would be very unpractical to choose a filename for each taken shot on the very shooting time, so the camera firmware/software picks one which is perfectly identifiable by its sequence number. With the aid of other metadata and usually of specialized PC software, users can later on discern the multimedia contents and re-organize them, if needed.


3.1. Multimedia file sequences The Digital Intermediate example

A typical example where explicit file sequences, as well as their contiguity, becomes crucial is in the digital intermediate DI workflow for motion picture and video industries. In such contexts, video data need to maintain the highest quality and be ready for visualization usually real-time if not even better. Usually video data are acquired from either a digital video camera or a motion picture film scanner and stored into file sequences as much as a common photographic camera does and need to be post-produced in several steps, including at least editing, conforming and colour-correction. That requires:

  • Unambiguous frames ordering, for obvious reasons, which is best accomplished grouping all the files together with explicit file numbering.
  • File contiguity, because many filesystem architectures employ higher I/O speeds if transferring data on contiguous areas of the storage, whereas random allocation might prevent real-time or better loading performances.
  • Frame-per-file data management, because common post-production operations imply the shortest seek-times ever; "fast-forwarding" or "rewinding" to a specific key frame is much faster if done at filesystem level rather than within a huge, possibly fragmented video file; every frame is then stored in a single file as a still digital picture.
  • Uncompressed data, because any lossy compression, which is common in most finalized products, introduces unacceptable quality losses.
  • Uncompressed data again, because decompression times may degrade playing/visualization performance by hardware and software.

Consider that a single frame in a DI project is currently from 9MB to 48MB large depending upon resolution and colour-depth, whereas video refresh rate is commonly 24 or 25 frames per second if not faster; any storage required for real-time playing such contents thus needs a minimum overall throughput of 220MB/s to 1.2GB/s, respectively. With those numbers, all the above requirements particularly file contiguity, given nowadays storage performances become strictly mandatory.

  • binary file is a computer file that is not a text file The term binary file is often used as a term meaning non - text file Many binary file formats
  • A text file sometimes spelled textfile an old alternative name is flatfile is a kind of computer file that is structured as a sequence of lines of electronic
  • In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence
  • jiffy equal to 1 60 of a second, or 16.666 ms. Sequence information present in the file determines the sequence of frames, and allows frames to be played more
  • identify the file format is to examine the file contents for distinguishable patterns among file types. The contents of a file are a sequence of bytes and
  • computing, file system fragmentation, sometimes called file system aging, is the tendency of a file system to lay out the contents of files non - continuously
  • VocaListener project file VSQ Vocaloid 2 Editor sequence excluding wave - file VSQX Vocaloid 3 Editor sequence excluding wave - file DVR - MS Windows XP
  • Efficiency Image File Format HEIF also known as High Efficiency Image Coding HEIC is a file format for individual images and image sequences It was developed
  • IENY A multiple sequence FASTA format would be obtained by concatenating several single sequence FASTA files in a common file also known as multi - FASTA
  • file on a Files - 11 disk or volume set has a unique file identification FID composed of three numbers: the file number NUM the file sequence number