Magic Bytes and Important File Formats

To identify the file format or signatures, one typically only needs to look for the first few bytes of the file in question. This is what’s often called magic bytes, a term referring to a block of byte values used to designate the file type in order for applications to be able to detect whether or not the file they plan to parse or consume is of the proper format. 

Magic bytes help in identifying the type of file. It can be helpful to look for file format signatures and infer how the application is using them based on these signatures, as well as how these formats may be abused to provoke undefined behaviour within the application.

For example, a jpeg file starts with

ffd8 ffe0 0010 4a46 4946 0001 0101 0047 ……JFIF….. or ffd8 

Commands that can help analyze the file formats:
  • file image.jpeg
  • file -i image.jpeg
  • xxd image.jpeg | head

A file signature (aka ‘magic bytes’) is typically 1-4 bytes in length and located at offset 0 in the file when inspecting RAW data. But there are many exceptions to this, certain files such as a Canon RAW formatted image or ‘GIF’ files have signatures larger than 4 bytes. Others such as an ISO9660 CD/DVD ISO image file have signatures located at separate offsets other than 0.

Another notable detail is that these initial sequences of bytes are generally not chosen at random i.e., most files of a given format will have a signature whose ASCII representation will be fairly recognizable at a glance as well as unique to the format.

Important File Formats:

Portable Network Graphics (PNG)

  • A PNG file has the magic bytes at the beginning followed by a series of chunks.
  • The first eight bytes of a PNG file always contain the following (decimal) values: 137 80 78 71 13 10 26 10
  • This signature indicates that the remainder of the file contains a single PNG image, consisting of a series of chunks beginning with an IHDR chunk and ending with an IEND chunk.

Joint Photographic Experts Group (JPEG)

  • It is a commonly used method of lossy compression for digital images, mostly for those images produced by digital photography. The degree of compression can be adjusted, allowing a tradeoff between storage size and image quality.
  • JPEG/Exif is the most common image format used by digital cameras and other image capture devices. JPEG/JFIF, it is the most common format for storing and transmitting photographic images on the Internet.
  • These files start with an image marker which always contains the marker code hex values FF D8 FF. It does not have the length of the file embedded, thus we need to find a JPEG trailer, which is FF D9.

MPEG-4 Part 14 (MP4)

  • MPEG-4 Part 14 or MP4 is a digital multimedia format most commonly used to store video and audio, but can also be used to store other data such as subtitles and still images. It allows streaming over the Internet. The only official filename extension for MPEG-4 Part 14 files is .mp4, but many have other extensions, most commonly .m4a and .m4p.
  • MP4 files consist of consecutive chunks. Each chunk has 8-byte header: 4-byte chunk size (big-endian, high byte first) and 4-byte chunk type - one of pre-defined signatures: "ftyp", "mdat", "moov", "pnot", "udta", "uuid", "moof", "free", "skip", "jP2 ", "wide", "load", "ctab", "imap", "matt", "kmat", "clip", "crgn", "sync", "chap", "tmcd", "scpt", "ssrc", "PICT".
  • First chunk must be of type "ftype" and has a sub-type at offset 8. MP4 defined by sub-type which must be one of values: "avc1", "iso2", "isom", "mmp4", "mp41", "mp42", "mp71", "msnv", "ndas", "ndsc", "ndsh", "ndsm", "ndsp", "ndss", "ndxc", "ndxh", "ndxm", "ndxp", "ndxs". 
  • Note: Image is from the internet.

You might also be interested in,
  • List of Common Magic Bytes or File Signatures - Click Here!
  • How magic bytes can be used to go undetected - Click Here!

We hope this helps. If any suggestions or doubts you can add a comment and we will reply as soon as possible.

No comments:

Post a Comment