As a system administrator, part of your responsibility is to help users manage their data. One of the vital aspects of doing that is to ensure your organization has a good backup plan, and that your users either make their backups regularly, or else don’t have to because you’ve automated the process.
However, sometimes the worst happens. A file gets deleted by mistake, a filesystem becomes corrupt, or a partition gets lost, and for whatever reason, the backups don’t contain what you need.
As we discussed in How to prevent and recover from accidental file deletion in Linux, before trying to recover lost data, you must find out why the data is missing in the first place. It’s possible that a user has simply misplaced the file, or that there is a backup that the user isn’t aware of. But if a user has indeed removed a file with no backups, then you know you need to recover a deleted file. If a partition table has become scrambled, though, then the files aren’t really lost at all, and you might want to consider using TestDisk to recover the partition table, or the partition itself.
What happens if your file or partition recovery isn’t successful, or is only in part? Then it’s time for Scalpel. Scalpel performs file carving operations based on patterns describing unique file types. It looks for these patterns based on binary strings and regular expressions, and then extracts the file accordingly.
This tool isn’t currently being maintained, but it’s ever-reliable, compiling and running exactly as expected. If you’re running Red Hat Enterprise Linux (RHEL) 7, RHEL 8, or Fedora, you can download Scalpel’s RPM installers, along with its dependency, libtre
, from klaatu.fedorapeople.org.
Starting with Scalpel
Scalpel comes bundled with a comprehensive list of file types and their most unique identifying features. Sometimes, a file can be identified by predictable text at its head and tail:
htm n 50000 <html </html>
While at other times, cryptic-looking hex codes are necessary:
jpg y 200000000 \xff\xd8\xff\xe0\x00\x10 \xff\xd9
Scalpel expects you to duplicate /etc/scalpel.conf
edit your copy to include the file types you hope to recover, and to exclude the file types you know you don’t need. For instance, if you know you don’t have or care about .fws
files, then comment that line out of the file. Doing this can speed up the recovery process and reduce false positives.
In the configuration file, the format of a file definition is, from left to right:
- The file’s extension.
- Whether the header and footer are case sensitive (
y
orn
). - The minimum and maximum file size you want Scalpel to find.
- A standard header that identifies the beginning of the file.
- A standard footer that identifies the end of the file.
The footer
field is optional. If no footer is provided, then Scalpel extracts the number of bytes you set as the file type’s maximum value.
You might find that a recovery effort only rescues part of a file, such as this mostly-recovered JPG:
This result means that you probably need to increase the file’s bounds maximum value, and then re-scan, so that the end of the file can be recovered, too:
Defining new file types
First, make a copy of the Scalpel configuration file. If all your users generate similar data, then you may only need one config file for your entire organization. Or, you might find it better to have one config file per department.
To add your own file types to a Scalpel config, start with some investigative forensics.
For text files, you ideally have some predictable structure you can anticipate. For instance, an XML file probably starts with <xml
and ends with </xml
. Binary files are similarly predictable. Using the hexdump
command, you can view a typical header from the file type you want to define. Here’s the results for an XCF, the default layered graphic file from GIMP:
$ head --bytes 8 example.xcf | hexdump --canonical
00000000 67 69 6d 70 20 78 63 66 |gimp xcf|
00000008
This output is from a Red Hat Enterprise Linux 8 system. On older systems, an older syntax may be necessary:
$ head --bytes 8 example.xcf | hexdump -C
00000000 67 69 6d 70 20 78 63 66 |gimp xcf|
00000008
The canonical output of hexdump
displays the address in the far left column, and the decoded values on the far right. In the center column are the hexadecimal bytes of the first 8 bytes of the XCF file’s first line.
Most binary files in /etc/scalpel.conf
look pretty similar to that output, except that these values are prefaced with the \x
escape sequence to denote that the numbers are actually hexadecimal digits. For instance, a JPG file looks like this in the configuration file:
jpg y 200000000 \xff\xd8\xff\xe0\x00\x10 \xff\xd9
Compare that value with a test hexdump of the first 6 bytes (because that’s how many bytes scalpel.conf
contains in its JPG definition) of any JPG file on your system:
$ head --bytes 6 example.jpg | | hexdump --canonical
00000000 ff d8 ff e0 00 10 |......|
00000006
Compare the footer with the last 2 bytes to match what the config file shows:
$ tail --bytes -2 example.jpg | hexdump --canonical
00000000 ff d9 |..|
00000002
These values match up, so you can be confident that valid JPG files probably all start and end in a predictable sequence.
Note: The Ogg entry in the scalpel.conf
file is misleading, as it lacks the \x
escape sequence. If you need to recover an Ogg file, fix this, or replace its definition.
Getting to work
Now, to obtain the same level of confidence for all files you need to recover (such as XCF, in the previous example). To reiterate, this is your workflow for defining the binary file types common to the victim drive:
- Get the hexadecimal values of the first few bytes of a file type using the
head --bytes n
command. - Get the last few bytes using the
tail --bytes -n
command. - Repeat this process on several different files of the same type to confirm consistency of this pattern, adjusting the length of your header and footer patterns as required.
- Enter the header and footer values into your custom Scalpel config, using the
\x
notation to identify each byte as a hexadecimal character.
Follow this sequence for each important binary file type you need to recover.
If a file is plaintext, provide a common header and footer, such as #!/bin/sh
for shell scripts, #
(the space after the #
is important) for markdown files with an h1 level title, <xml
for XML files, and so on.
When you’re ready to run Scalpel, create a directory where it can place your rescued files:
$ mkdir /run/media/seth/rescuer/scalped
Note: Do not create this directory on the same volume that contains the lost data.
If the victim drive is not yet mounted, mount it, and then run Scalpel:
$ scalpel -c my-scalpel.conf \
-o /run/media/seth/rescuer/scalped \
/run/media/seth/victim
You can also run Scalpel on a disk image:
$ scalpel -c my-scalpel.conf \
-o ~/scalped ~/victim.img
When Scalpel is done, review the files in your designated rescue directory.
All in all, it’s best to make backups so you can avoid doing file recovery at all. But, should the worst happen, try Scalpel and carve carefully.
Sobre o autor
Seth Kenlon is a Linux geek, open source enthusiast, free culture advocate, and tabletop gamer. Between gigs in the film industry and the tech industry (not necessarily exclusive of one another), he likes to design games and hack on code (also not necessarily exclusive of one another).
Navegue por canal
Automação
Últimas novidades em automação de TI para empresas de tecnologia, equipes e ambientes
Inteligência artificial
Descubra as atualizações nas plataformas que proporcionam aos clientes executar suas cargas de trabalho de IA em qualquer ambiente
Nuvem híbrida aberta
Veja como construímos um futuro mais flexível com a nuvem híbrida
Segurança
Veja as últimas novidades sobre como reduzimos riscos em ambientes e tecnologias
Edge computing
Saiba quais são as atualizações nas plataformas que simplificam as operações na borda
Infraestrutura
Saiba o que há de mais recente na plataforma Linux empresarial líder mundial
Aplicações
Conheça nossas soluções desenvolvidas para ajudar você a superar os desafios mais complexos de aplicações
Programas originais
Veja as histórias divertidas de criadores e líderes em tecnologia empresarial
Produtos
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Red Hat Cloud Services
- Veja todos os produtos
Ferramentas
- Treinamento e certificação
- Minha conta
- Suporte ao cliente
- Recursos para desenvolvedores
- Encontre um parceiro
- Red Hat Ecosystem Catalog
- Calculadora de valor Red Hat
- Documentação
Experimente, compre, venda
Comunicação
- Contate o setor de vendas
- Fale com o Atendimento ao Cliente
- Contate o setor de treinamento
- Redes sociais
Sobre a Red Hat
A Red Hat é a líder mundial em soluções empresariais open source como Linux, nuvem, containers e Kubernetes. Fornecemos soluções robustas que facilitam o trabalho em diversas plataformas e ambientes, do datacenter principal até a borda da rede.
Selecione um idioma
Red Hat legal and privacy links
- Sobre a Red Hat
- Oportunidades de emprego
- Eventos
- Escritórios
- Fale com a Red Hat
- Blog da Red Hat
- Diversidade, equidade e inclusão
- Cool Stuff Store
- Red Hat Summit