sorter - Sort files in an image into categories based on file type

NAME

       sorter - Sort files in an image into categories based on file type

SYNOPSIS

       [-b  size  ] [-e] [-E] [-h] [-l] [-md5] [-s] [-sha1] [-U] [-v] [-V] [-a
       hash_alert ] [-c config ] [-C config ] [-d dir ] [-m mnt ] [-n  nsrl_db
       ]  [-x  hash_exclude  ]  [-i  imgtype] [-o imgoffset] [-f fstype] image
       [image] [meta_addr]

DESCRIPTION

       sorter is a Perl script that analyzes a file  system  to  organize  the
       allocated  and  unallocated  files  by  file  type.  It runs the ’file’
       command on each file and organizes the files according to the rules  in
       configuration  files.   Extension  mismatching is also done to identify
       ’hidden’ files.  One can also provide hash databases for files that are
       known  to be good and can be ignored and files that are known to be bad
       and should be alerted.

       By default, the program uses the configuration files in  the  directory
       where  The Sleuth Kit was installed.   Those can be overruled with run-
       time options.  There is a standard  configuration  file  for  all  file
       system types and then a specific one for a given operating system.

ARGUMENTS

       The  required  arguments are as follows.  This will analyze one or more
       images and either save the results in the ’-d’ directory  or  list  the
       results to STDOUT (if ’-l’ is given).

       -d dir Specify the location of where all files should be written.  This
              includes the index files and subdirectories if the ’-s’ flag  is
              given.   This MUST be given, unless the ’-l’ list flag is given.

       -l     List information to STDOUT (no files are ever written).  This is
              useful  for  Incident  Response, with the use of ’netcat’.  This
              cannot be used if ’-d’ is used.

       images The file names of the image(s) to analyze.

       The options are as follows:

       -f fstype
              Specify the file system type of the image(s).  This is the  same
              type that The Sleuth Kit uses.

       -i imgtype
              Specify  the  image  type  in  which the file system is located.
              This is the same type that The Sleuth Kit uses.

       -o imgoffset
              Specify the sector offset from the beginning of the image to the
              start of the file system.

       -b size
              Specify  the  minimum  size  of file to process.  All files less
              than this size will be ignored.

       -c config
              Specify the location of an additional configuration file.   This
              file  will  be  loaded  in  addition to the standard ones in the
              install directory.  These settings will have priority  over  the
              standard files.

       -C config
              Specify  the  location  of  the  ONLY  configuration  file.  The
              standard config files will not  be  loaded  if  this  option  is
              given.   For  example,  in the ´share/sort´ directory there is a
              file called ’images.sort’.  This file contains only rules  about
              graphic  images.   If  it is specified with -C, then only images
              will be saved about the image.

       -m mnt Specify the mounting point of the image being analyzed.  This is
              only for cosmetic reasons.  When the entries in the output files
              are written, the files will have a the full path instead of just
              the relative path.  If this is given, then only one image can be
              given.

       -a hash_alert
              Specify the location a hash database with entries of known ’bad’
              files.   If  any  file  is  found with an MD5 hash value in this
              database, it will be placed  in  a  special  alert  file.   This
              database  must  have  been  indexed for MD5 using ’hfind’ in The
              Sleuth Kit before it is used by sorter.

       -n nsrl_db
              Specify the location of the  NIST  National  Software  Reference
              Library  (NSRL) database (www.nsrl.nist.org).  Any file found in
              the NSRL will be ignored and not placed into  a  category.   The
              database  must be indexed for MD5 with ’hfind’ in The Sleuth Kit
              before it is used by sorter.  The  database  file  is  currently
              called ’NSRLFile.txt’.

       -x hash_exclude
              Specify  the  location  a  hash  database  with entries of known
              ’good’ files.  If any file is found with an MD5  hash  value  in
              this  database, it will be ignored and not processed or saved to
              the category files.  This database must have  been  indexed  for
              MD5 using ’hfind’ in The Sleuth Kit before it is used by sorter.

       -e     Perform extension mismatch checks on (no  category  index  files
              are generated)

       -i     Perform category indexing only (no extension mismatch checks)

       -U     Do  no  save  data  about  unknown  file  types.  By default, an
              ’unknown’ file is created for files where the ’file’  output  is
              not  known.   This allows one to refine their configuration.  If
              this is not desired, use this flag.

       -h     Create category files in HTML

       -md5   Calculate the MD5 value  for  each  file  and  save  it  in  the
              category  file.  This will be done automatically when any of the
              databases are given.

       -sha1  Calculate the SHA-1 value for each  file  and  save  it  in  the
              category file.

       -s     Save the actual file content to sub-directories in the directory
              specified by ’-d’.  For example, all JPG  and  GIF  files  would
              actually  be  saved  in the ’images’ directory.  If ’-h’ is also
              given, thumbnails of graphic images are also created.

       -v     Display verbose information

       -V     Display version.

       [meta_addr]
              The meta data address  of  the  directory  to  start  with.   By
              default,  the  root  directory  is used.  If this is given, then
              only one image can be given.

HIGH-LEVEL OVERVIEW OF PROCESS

       sorter is a Perl script that interacts with other The Sleuth Kit tools.
       It  starts  by  reading  the  configuration files from the installation
       directory.  There is a general configuration file and  a  specific  one
       for  each  operating  system.   The specific one is determined from the
       ’-f’ flag.  Each configuration file contains rules for  processing  the
       output  of  the  ’file’  command.   One  type  of line identifies which
       category (i.e. ’images’) a given ’file’ output belongs to (i.e.  ´image
       data´)  (using  regular  expressions).   Another  rule  shows  the file
       extensions  (i.e.  .txt)  that  belong  to  a   ’file’   output   (i.e.
       ASCII(.*?)text).  See the Rules section below.

       The  program then runs the ’fls’ tool in The Sleuth Kit to identify the
       files in the file system image.  Each identified file is  viewed  using
       the  ’icat’ tool.  If a hash database is given, the hash of the file is
       calculated and looked up.  If it is found in an ’alert’ database,  then
       it  is added to a special ’alert.txt’ file.  If it is found in the NSRL
       or ’exclude’ database, then  it  is  ignored  as  a  known  good  file.
       Excluded  files  are recorded in an ’exclude’ file for future reference
       but it is not saved in the category files.

       The ’file’ command is then run to identify  the  file  type  (based  on
       header information).  The configuration file rules are used to identify
       which category it belongs to.  An entry is added to  the  corresponding
       category  file (in the ’-d dir’ directory).  If the ’-s’ flag is given,
       then a copy of the file is saved in a subdirectory of the same name  as
       the  category.  If the HTML format is used, then hyper-links will allow
       one to easily view saved files and view what is in each category.

       Files that do not  have  a  category  are  recorded  in  the  ’unknown’
       category and the ’data’ category.  ’data’ is for files with a structure
       that ’file’ does not know and ’unknown’ is for files with  a  structure
       that ’file’ knows about.  These are saved for future reference, but the
       unknown category can be ignored by using the ’-U’ flag.

       A copy of the files can be saved by using the ’-s’ flag.  If  so,  then
       the  files  are saved in a subdirectory that is named with the category
       name.  Each file is named using the file system image name followed  by
       the  meta  data  address and the original file extension.  The category
       index file can be used to translate the actual name to the saved  name.
       The  HTML  format  makes viewing easier as there are links to each file
       from the category index file.

       The program will also consult the rules about the file  extension.   If
       the  file  has an extension at the end of it (anything after a ´.´), it
       will be compared to the rules.  If the extension is not  found  in  the
       rules  as  a valid extension for the file type, it will be added to the
       file of ’mismatch’.  If the file does not have an extension it will not
       be  entered  even if the file type has valid extensions.  This check is
       done even if the file is found in one of the known good hash databases.
       If  it  is  found  in one of those, it will be added to a special file.
       Files of type ’data’ have no extension checks done by default (as  they
       have an unknown structure).

       The  program repeats the above procedures using the output of the ’ils’
       command as well.  This allows  ’sorter’  to  examine  the  contents  of
       unallocated  files  that still have pointers to the data units (not all
       file systems will produce data from this step).

CONFIGURATION FILES

       Configuration files are used to define what file types belong in  which
       categories   and   what   extensions   belong   to   what  file  types.
       Configuration files are distributed with  the  ’sorter’  tool  and  are
       located  in the installation directory in the ’share/sorter’ directory.

       The ’default.sort’ file is used by any file system type.   It  contains
       entries  for  common file types.  A specific operating system file also
       exists, which is useful for extensions that are specific to a given OS.
       By  default,  the  default  file  and the OS specific one will be used.
       Using the ’-c’ flag, an additional file can be used.  If the ’-C’  flag
       is used, then only the supplied configuration file is used.

       There  are two rule types in the configuration files.  Each rule starts
       with a header that specifies which rule type it is (category  or  ext).
       Both  rule  types  have two additional columns that can be separated by
       any white space.

       The category rule has the category name as the second column and a Perl
       regular expression in the third column.  The category name can not have
       any spaces in it and can only be  letters  and  numbers.   The  regular
       expression  is  used  to  examine  the  output  of ’file’.  The regular
       expression will be used case insensitive.  More than one rule can exist
       for  a  category,  but  only  one  category  can exist for a given file
       output.  For example:

       This saves all file output with ’image data’  anywhere  in  it  to  the
       ´images´ category:
           category        images          image data

       This  saves  all  file output that has ’ASCII’ followed by anything and
       then ’text’ to be saved to the ’text’ category:
           category        text            ASCII(.*?)text

       This saves all file output that is just ’data’ to the  ’data’  category
       (the  ^  and  $  define  the  boundaries in Perl).  The ’data’ value is
       common in the output of file for unknown binary data.
           category        data            ^data?

       There is a special category of ’ignore’ that is used to skip over files
       of this type.  This is mainly a time and space saver.

       The  extension  rule  is  similar except that the second column has the
       value extensions for the file output.  Multiple rules can exist for the
       same  file  type.  The comparison will be done case insensitive.  If no
       extension is valid for the file type, a rule does not need to be  made.
       That is already assumed.

       For  example,  the  ASCII  is  used  for several file extensions so the
       following rules could exist:

           ext             txt,log         ASCII(.*?)text
           ext             c,cpp,h,js      ASCII(.*?)text

       Please  email  me  any  rules  that  you  find  useful   for   standard
       investigations  and  I  will  incorporate  them  into  future  releases
       (carrier at sleuthkit dot org).

EXAMPLES

       To run sorter with no hash databases, the following can be used:

           # sorter -f ntfs -d data/sorter images/hda1.dd
           # sorter -d data/sorter images/hda1.dd

           # sorter -i raw -f ntfs -o 63 -d data/sorter images/hda.dd

       To include the NSRL, an exclude, and an alert hash database:

           # sorter -f ntfs -d data/sorter -a /usr/hash/rootkit.db          -x
       /usr/hash/win2k.db -n /usr/hash/nsrl/NSRLFile.txt       images/hda1.dd

       To just identify images using the supplied ’images.sort’ file:

           #  sorter -f ntfs -C /usr/local/sleuthkit/share/sort/images.sort
       -d data/sorter -h -s images/hda1.dd

REQUIREMENTS

       The NIST National Software Reference Library (NSRL)  can  be  found  at
       www.nsrl.nist.gov.

LICENSE

       Distributed  under  the  Common Public License, found in the cpl1.0.txt
       file in the The Sleuth Kit licenses directory.

AUTHOR

       Brian Carrier <carrier at sleuthkit dogt org>

       Send documentation updates to <doc-updates at sleuthkit dot org>