Palo Alto to Third party IPSEC Device: Rekey causes VPN tunnel to stop sending network traffic
End user is having a weird issue with VPNs between a Palo Alto Cloud Firewall (PanOS9.1.3h) and Cisco Meraki Z3. All VPN Tunnels are established properly, but after a random period of time during the rekey step, a tunnel stays online, but network traffic can't be send anymore. We are currently having 5 of these connections
They was able to capture a log, but I'm not able to troubleshoot it.
On the Meraki site/log, you can see the there are two steps happening repeatedly on a working tunnel.
inbound CHILD_SA
outbound CHILD_SA
At the time the error occurs, the outbound step is missing.
this is common when the Tunnel DPD timers are turned off or mismatched;
Dead Peer Detection and Tunnel Monitoring
DPD is used to detect if the peer device still has a valid IKE-SA. Periodically, it will send a “ISAKMP R-U-THERE” packet to the peer, which will respond back with an “ISAKMP R-U-THERE-ACK” acknowledgement.
The Palo Alto Networks does not currently have a log associated with DPD packets, but can be detected in a debug packet capture. The following is a PCAP from a peer device:
Mar 4 14:32:36 ike_st_i_n: Start, doi = 1, protocol = 1, code = unknown (36137), spi[0..16] = cd11b885 588eeb56 ..., data[0..4] = 003d65fc 00000000 ...
Mar 4 14:32:36 DPD; updating EoL (P2 Notify
Mar 4 14:32:36 Received IKE DPD R_U_THERE_ACK from IKE peer: 169.132.58.9
Mar 4 14:32:36 DPD: Peer 169.132.58.9 is UP status_val: 0.
The DPD query and delay interval can be configured when DPD is enabled on the Palo Alto Networks device. DPD will tear down the SA once it realizes the peer is no longer responding.
Note: The DPD is "not persistent" and is only triggered by a Phase 2 rekey. This means if Phase 2 is up, Palo Alto Networks will not check to see if IKE-SA is active. To get Phase 2 to trigger a rekey, and trigger the DPD to validate the Phase 1 IKE-SA, enable tunnel monitoring.
Recommended by LinkedIn
Tunnel Monitoring
Tunnel Monitoring is used to verify connectivity across an IPSec tunnel. If a tunnel monitor profile is created it will specify one of two action options if the tunnel is not available: Wait Recover or Fail Over.
In both cases, the firewall will try to negotiate new IPSec keys to accelerate the recovery.
A threshold option can be set to specify the number of heartbeats to wait before taking the specified action. The range is between 2 and 100 and the default is 5. The interval between heartbeats can also be configured. The range is between 2 and 10 and the default is 3.
Once the tunnel monitoring profile is created, as shown below, select it and enter the IP address of the remote end to be monitored.
Additionally, The issue may be due to a Dead Peer Detection (DPD) configuration mismatch.
Resolution
Check and modify the Palo Alto Networks firewall and Cisco router to have the same DPD configuration.
On the Palo Alto Networks firewall, go to Network > Network Profiles > IKE Gateways as follows:
the following example shows the timers matching on a cisco ios router at the other end.