# Real-time data processing with ML

### 周启东(ZHOU Qi-Dong) Institute of Frontier and Interdisciplinary Science, Shandong Univ. (Qingdao)



11-12 Jan. 2025, Hefei USTC 量子计算和人工智能与高能物理交叉研讨会



# Exp.Run timeData (PB)TotalBESIII2008-20280.510STCF-300-500

| Exp.              | Run time             | Data (PB)                                                 | Total            |     |  |  |  |
|-------------------|----------------------|-----------------------------------------------------------|------------------|-----|--|--|--|
| BESIII            | 2008-2028            | 0.5                                                       | 10               |     |  |  |  |
| STCF              | -                    | 300-500                                                   | -                |     |  |  |  |
| CEPC              | _                    | 1.5-3(H)<br>500-50000 (Z)                                 | _                | DR  |  |  |  |
| Data              | High data rate       | ASICs and systems                                         |                  | 7.1 |  |  |  |
| density           | New link tech        | nologies (fibre, wireless,                                | wireline)        | 7.1 |  |  |  |
|                   | Power and rea        | adout efficiency                                          |                  | 7.1 |  |  |  |
| Intelligence      | Front-end prog       | Front-end programmability, modularity and configurability |                  |     |  |  |  |
| on the            | Intelligent pov      | ver management                                            |                  | 7.2 |  |  |  |
| detector          | Advanced dat         | a reduction techniques                                    | (ML/AI)          | 7.2 |  |  |  |
| 40                | High-performa        | ance sampling (TDCs, A                                    | DCs)             | 7.3 |  |  |  |
| 4U-<br>tochniquos | High precisior       | High precision timing distribution                        |                  |     |  |  |  |
| techniques        | Novel on-chip        | Novel on-chip architectures                               |                  |     |  |  |  |
| Extromo           | Radiation hard       | dness                                                     |                  | 7.4 |  |  |  |
| environmer        | Cryogenic ten        | Cryogenic temperatures                                    |                  |     |  |  |  |
| and longev        | ity Reliability, fau | lt tolerance, detector co                                 | ontrol           | 7.4 |  |  |  |
|                   | Cooling              | Cooling                                                   |                  |     |  |  |  |
|                   | Novel microele       | ectronic technologies, dev                                | vices, materials | 7.5 |  |  |  |
| Emeraina          | Silicon photor       | Silicon photonics                                         |                  |     |  |  |  |
| technologie       | s 3D-integration     | and high-density interc                                   | connects         | 7.5 |  |  |  |
| •                 | Keeping pace         | Keeping pace with, adapting and interfacing to COTS       |                  |     |  |  |  |

Must happen or main physics goals cannot be met



\* LHCb Velo







Desirable to enhance physics reach

R&D needs being met

ECFA detector R&D

2

# Readout system (Belle II vs. LHCb)

- Belle II: L1 trigger + HLT
  - Trigger efficiency:
    - •Had. B physics  $\sim$  100%  $\tau$  physics
      - 70~95%



- •LHCb: "triggerless" readout & DAQ
  - CPU+GPU based software trigger
  - Rate of physical process: ~MHz
    - •No hardware trigger available

- ALICE: continus readout
  - TPC w/ triggerless readout + others w/ hardware trigger •TPC signal: ~100 µs, physical event rate 50 kHz, TPC signal overlap
  - Very basic hardware+ more effective software trigger





### Readout and DAQ system(ALICE)



### Gain power of apparatus with data acceleration Continues readout (less-hardware filtering) Powered by hardware acceleration Heterogeneous computing CPU

### Typical TDAQ system

### Decisions

Trigger system (L1) (hardware filtering)

**Trigger-less** data readout system

Digital

Data readout system



### Luminosity frontier: SuperKEKB



### Luminosity frontier: SuperKEKB

- Asymmetric e+e- collider
  - $e^+e^- \rightarrow \gamma(4S) \rightarrow B\overline{B}$
  - very clean and well-known initial state



### Beam current: KEKB x ~1.5



### **Belle II detector and dataset**

### Vertex detector (VXD)

Inner 2 layers: pixel detector (PXD) Outer 4 layers: strip sensor (SVD)

### **Central Drift Chamber (CDC)**

He (50%),  $C_2H_6$  (50%), small cells, long lever arm

### **Particle Identification**

Barrel: Time-Of-Propagation counters (TOP) Forward: Aerogel RICH (ARICH)

### ElectroMagnetic Calorimeter (ECL)

CsI(TI) + waveform sampling

### Features:

- Near-hermetic detector

Gev

• Good at measuring neutrals,  $\pi^0$ ,  $\gamma$ ,  $K_{L...}$   $\sigma(E)/E \sim 2-4\%$ 



• Vertexing and tracking:  $\sigma$  vertex ~ 15µm, CDC spatial res. 100µm  $\sigma(P_T)/P_T$  ~ 0.4%



# Belle II trigger strategy

- Design requirements: ~100% for  $\gamma(4S)$ ->BB(hadronic decay), Tau/Charm, Exotics
  - No dead-time -> pipeline
  - Single photon trigger
  - Single track trigger
- Max. trigger rate: 30 kHz @ 6 x 10<sup>35</sup> cm<sup>-2</sup> s<sup>-1</sup> Physics trigger ~15 kHz
- Latency limit: ~5 usec (SVD APV25 buffer structure)
  - A fixed latency of about 4.4 usec
- Event timing resolution: 10 nsec

| Process       | σ(nb) | Rate@L=6x10 <sup>35</sup> (kHz) |
|---------------|-------|---------------------------------|
| Bunch. cross. | _     | 2x10 <sup>5</sup>               |
| Beam bkg      | _     | 300-600                         |
| Bhabha        | 44    | 50                              |
| Total->L1     | _     | 200350->~15                     |

|             |       | _                         |
|-------------|-------|---------------------------|
| Process     | σ(nb) | L1@L=6x10 <sup>35</sup> ( |
| Bhabha      | 44    | 0.35*                     |
| Two photon  | 13    | 10                        |
| Upsilon(4S) | 1.2   | 0.96                      |
| Continuum   | 2.8   | 2.2                       |
| μμ          | 0.8   | 0.64                      |
| ττ          | 0.8   | 0.64                      |
| γ-γ         | 2.4   | 0.019*                    |
| Total       | 67    | ~15                       |





# Belle II trigger system

- CDC, ECL: main triggers for tracks Challenges: and clusters low multiplicity trigger vs. background
- KLM: trigger muon
- TOP: event timing  $\bullet$
- GRL: matching of sub-triggers
- GDL: final trigger decision  $\bullet$



. . .

- High track trigger vs. crosstalk
- Drawback of track trigger at endcap
- Latency budget vs. transmission and logics



10

# Belle II TDAQ system

- Unified common readout system (except for PXD)
- Unified timing and trigger distribution (TTD) system
- A pipeline readout
- To handle 30 kHz level 1(L1) trigger with O 1% dead time under raw event size of 1 MB





 Provide L1 trigger signal to DAQ using FPGA chips for real-time processing on detector raw data.

• HLT provide Region of Interest (RoI) to PXD for significantly reducing the data size.

• Latency O sec.

### Motivation of Neural Network for L1 Track trigger

- DAQ system is designed to handle 30 kHz
  - Physical trigger ~15 kHz, require S/N = 1
- L1 trigger rate depends significant on background condition
- Advanced CDC algorithm to further suppress background
- A fixed latency of about 4.4 usec







### Tracks $z_0$ distribution after trigger



12

# Machine Learning for L1 Track trigger (HARDWARE)

13



Axial wire

Stereo wire



# **Basics of L1 CDC trigger**

# Neural Network z-trigger

- **Crossing angle**  $\alpha$  for priority wires



# **Deep Neural Network**



- Inputs: Drift time  $t_{drift}$ , wires relative location  $\phi_{rel}$ , Crossing angle  $\alpha$  for priority wires + Drift time for all other wires
- Introduce the self-attention architecture to "focus" on certain inputs
- Output track vertex  $z_0$ , track  $\theta$  and signal/background classifier output (Q)

| Parameter | #Attention value | #hidden<br>nodes | #hidden<br>layer | activate   | precision | Total multiplier |
|-----------|------------------|------------------|------------------|------------|-----------|------------------|
| Values    | 27               | 27               | 2                | Leaky Relu | Float 16  | 4,185            |

16

### **Development flow of DNN on FPGA**



### Belle II UT4



Xilinx UltraScale XCVU080, XCVU160 25 Gbps with 64B/66B





### Simulation performance of DNN



- Latency : 76 clock = 592.8 ns ;require: < 600ns
- FPGA resource (UT4: Virtex UltraScale XCVU160) usage:
  - DSP: ~70%, LUT: ~50%, others <30%
- AUC do not get large drop comparing RTL and software simulation
- At signal efficiency ~95%
  - Background rejection rate ~85%
- DNN trigger with **HARDWARE** under commissioning, close to operate







### Improvement try for CDC track trigger

- Develop a algorithm improve the performance for the upgrade (10 usec latency) Start from optimization of DNN model
- Modify the number of hidden layers and learning rate
  - Hidden layer:  $2 \rightarrow 4$ , learning rate:  $1e^{-2} \rightarrow 1e^{-3}$
  - Others keep the same
- Latency: 76 clock (592.8 ns) -> 82 clocks (640 ns) • Next step, change the inputs (CDC hits info.), instead of 2D track parameters





19

### **DNN implementation on Versal ACAP**

- R&D of a new general FPGA device using the Versal ACAP
  - Heterogenous acceleration (VCK190, VCK5000 evaluation kit)
    - Al engine



### **UG1079**

Figure 2: AI Engine Array



### Versal ACAP 5000 evaluation kit)

Figure 4: AI Engine







- DNN implementation:
  - Model on a "graph"
  - Dense layer on a "kernel"
- Al engine: C++ based coding on Vitis
  - Al engine libraries
  - Al engine specific functions
  - Scaler, Vector engines, pipelining, etc.

|              | Layer 1   | Layer 2 | Layer 3 | Layer 4   | L |
|--------------|-----------|---------|---------|-----------|---|
| Input nodes  | 71        | 27      | 27      | 27        |   |
| Output nodes | 27        | 27      | 27      | 27        |   |
| Active Func. | LeakyReLU | Softmax |         | LeakyReRU |   |

| Al Engine Resource Utilization                |                   |
|-----------------------------------------------|-------------------|
| Tiles used for AI Engine Kernels:             | 5 of 400 (1.25 %) |
| Tiles used for Buffers:                       | 7 of 400 (1.75 %) |
| Tiles used for Stream Interconnect:           | 8 of 450 (1.78 %) |
| DMA FIFO Buffers:                             | 0                 |
| Interface Channels used for ADF Input/Output: | 4 ( PLIO: 4 )     |
| Interface Channels used for Trace data:       | 0                 |
|                                               |                   |







# Latency optimization on Versal ACAP

| NAME                 | VALUE     | 0.00000 us  | 20.000000 us        | 40.00000 (    | is  60.00       | 0000 us     |  |
|----------------------|-----------|-------------|---------------------|---------------|-----------------|-------------|--|
| > Tile(24,0)         | _main     |             |                     |               | _main           |             |  |
| > Tile(24,1)         |           |             |                     |               |                 |             |  |
| >Interface Tile(24)  |           |             |                     |               |                 |             |  |
| > Tile(25,0)         | _main     | main        | <u></u>             |               | _main           |             |  |
| > Tile(25,1)         | _main     | main        |                     | _main         |                 |             |  |
| > Tile(25,2)         | main      |             |                     | main          |                 |             |  |
| > Interface Tile(26) |           |             |                     |               |                 |             |  |
|                      |           |             |                     |               |                 |             |  |
|                      |           |             |                     |               |                 |             |  |
|                      |           | 0.6         | 644 us              |               |                 |             |  |
| NAME                 | VALUE     | 0.000000 us | 1.000000 us         | 2.000000 us   | 3.000000 us     | 4.000000 us |  |
| > Tile(23,0)         | _main     |             |                     |               | _main           |             |  |
| > Tile(23,1)         | _main     |             |                     | _main         | S               | oftm        |  |
| > Interface Tile(23) |           |             |                     |               |                 |             |  |
| > Tile(24,0)         | _main     |             |                     | _main         |                 | hid4_2      |  |
| > Tile(24,1)         | _main     |             | mai                 | .n            | hid3_27to27_no_ | act(adf     |  |
| > Tile(24,2)         | _main     |             | mai                 | .n            | hid2_27to2      |             |  |
| > Tile(25,0)         | _main     |             | hid1_71to27_leakyre | elu(adf::io_b |                 |             |  |
|                      | Laver 1   | Laver 2     | Laver 3             | Laver 4       | Laver 5         | Total       |  |
|                      |           |             |                     |               |                 |             |  |
| Input nodes          | 71        | 27          | 27                  | 27            | 27              | —           |  |
| Output nodes         | 27        | 27          | 27                  | 27            | 3               | _           |  |
| Active Func.         | LeakyReLU | Softmax     |                     | LeakyReRU     | Tanh            | _           |  |
| Ver.0 latency        | ~12us     | ~66us       | ~1.5us              | ~5.5us        | ~9.9us          | ~86us       |  |
| Ver.1 latency        | ~2.1us    | ~1.3us      | ~1.5us              | 0.9us         | ~0.2us          | ~5us        |  |

|      | Compare      |                  |                         |                        | ٢                    |
|------|--------------|------------------|-------------------------|------------------------|----------------------|
|      |              |                  |                         | <mark>86.160 us</mark> |                      |
| us   | 40.000000    | is 60.00         | 0000 us                 | 80.00000 us            | 100.                 |
|      |              | _main            |                         |                        |                      |
|      |              |                  |                         |                        |                      |
|      |              |                  |                         |                        |                      |
|      | main         | _main            |                         |                        | ain                  |
|      |              |                  |                         | main                   | 10111                |
|      | _main        |                  |                         |                        | main                 |
|      |              |                  |                         |                        |                      |
|      |              |                  |                         |                        | 5.588 us             |
|      |              |                  |                         |                        |                      |
| ;    | 2.000000 us  | 3.000000 us      | 4.000000 us             | 5.00000                | u <mark>s  </mark> 6 |
|      | <u> </u>     | main             | · · · · · · · · · · · · |                        |                      |
|      | main         |                  | oftm                    |                        |                      |
|      |              |                  |                         |                        |                      |
|      |              |                  |                         |                        |                      |
|      | _main        |                  | hid4_                   | 27to27                 |                      |
| mai  | n            | hid3_27to27_no_a | act(adf                 |                        |                      |
| mai  | n            | hid2_27to2       |                         |                        | _main                |
| kyre | lu(adf::io_b |                  |                         | _main                  |                      |
|      |              |                  |                         |                        |                      |
|      | Layer 4      | Layer 5          | Total                   |                        |                      |
|      | 27           | 27               |                         |                        |                      |
|      | 27           | 3                | _                       |                        |                      |
|      |              |                  |                         | -                      |                      |





# Machine Learning for software track trigger (SFOTWARE)



# Overview of high level trigger system at Belle II

- Full event reconstruction (same as offline processing)
- Crude calibration constant
- •13 HLT units, in total ~6200 CPU cores (design: 7000 cores)
- Data processing: ~ 2.1kHz/ HLT unit w/ hyper-threading
- Event size at HLT in the last run period: ~150 kB/event
- PXD event size = 1MB/event, 10 times larger than the rest of detectors
- Region of interest (Rol) method is effective to reduce the data size
- ROI
  - Tracking software running on HLT nodes

### Concept of HLT processing









- Motivations of introducing a GNN track finder (SOFTWARE)
- Low efficiency for displaced vertices
  - Efficiency decrease as displacement increase
  - Important signature for new physics search
- Higher background
- CDC wire inefficiencies
  - Bad wires or electrics
  - Decreased efficiency



### **GNN based CDC track finder**

### Comput.Phys.Commun. 259 (2021) 107610

• Modular structure for track finding, with flexible of reconstruction sequence





### GNN for offline track finding

- Find track parameters: momentum, starting position and charge
- Find unknown number of tracks  $\rightarrow$  Object Condensation (arXiv:2002.03605)
- Computing resource and time constraint may reducible



- lacksquare

  - TDC and ADC of signal information
  - layer, superlayer, and layer info. with suprlayer
- Adjustable Parameters
  - 797,812 trainable parameters (3MB weight files)



### Performance of GNN

- Efficiency of displaced vertex tracks improved from 85.4% with a fake rate of 2.5%, compared to 52.2% and 4.1%
  - The other performance similar as original algorithm
- Momentum  $p_x$ ,  $p_y$ ,  $p_z$  starting position  $v_x$ ,  $v_y$ , v<sub>z</sub>,charge
  - Provide initial inputs for GENFIT
- GNN prediction is drawn according to the track parameters predicted by the GNN
- Plan to added as additional track finder for Belle II



### L. Reuter et. at (KIT) arXiv: 2411.13596





### **GNN for CDC track background filtering**

- Developed a GNN algorithm (based on BESIII's algorithm) for Belle II CDC hits clean up
  - lacksquare



### **Belle II** simulation (own work)



 $\mu + \mu$ - (particle gun)

GNN noise filtering

Transform space

Transform a space

DBSCAN clustering



### NN acceleration on Versal ACAP

- Real-time graph building algorithm enables GNN implementation on FPGA for Belle II <u>M. Neu et al. Comp. Soft. BigSci. 8, 8(2024)</u>
- R&D of a new general FPGA device using the Versal ACAP
  - Heterogenous acceleration (VCK190, VCK5000 evaluation kit)
    - Al engine, DPU







### Acceleration on Versal ACAP platform



# Summary and prospects

- Belle II TDAQ system was designed to handle 30 kHz level 1 trigger
- NN and DNN with hardware based CDC L1 track trigger to improve background rejection
- GNN with software based offline CDC track finder to improve the efficiency of displaced vertex tracks
- Not covered in the talk: GNN with hardware based clustering trigger for Belle II is under commissioning
- Target the upgrade of ongoing and future collider projects
  - ML implementation on heterogenous computing system for acceleration









Backup



### FPGA implementation path of ML algorithm





# Al engine structure



### Figure 1: Conceptual Overview of the ADF Graph

Figure 4: AI Engine



Figure 2: AI Engine Array

**UG1079** 



Figure 3: AI Engine Tile Details

**Fixed-Point Vector Unit Floating-Point** Vector Unit

Instruction Fetch & Decode Unit

Stream Interface

X25020-011321



### Kernel optimization for latency

|                                             |                      |          | 0.755                     | 2.052.00 |              |                  |          |    |          |                  |           |         |          | 12,125 -            |
|---------------------------------------------|----------------------|----------|---------------------------|----------|--------------|------------------|----------|----|----------|------------------|-----------|---------|----------|---------------------|
| NAME                                        | VALUE                | 0.000    | 0.755 US<br>000 us 12.000 | 000 us   | 4.000000     | ) us             | 6.000000 | us | 8.000000 | us               | 10.0000   | 90 us   | 12.00000 | <sup>13.125</sup> u |
| √Tile(24,0)                                 | _main                |          | hid1_71to27               |          |              |                  |          |    |          |                  |           |         |          |                     |
| √Core                                       |                      |          |                           |          |              |                  |          |    |          |                  |           |         |          |                     |
| √Functions                                  |                      |          |                           |          |              |                  |          |    |          |                  |           |         |          |                     |
| _main_init                                  | inactive             |          |                           |          |              |                  |          |    |          |                  |           |         |          |                     |
| _main                                       | lock stall           | $\vdash$ |                           |          |              |                  |          |    |          |                  |           |         |          |                     |
| ∽hid1 (hid1_71to27_leakyrelu(adf::io_buffer | inactive             |          |                           |          |              |                  |          |    | 111      |                  | -   -   - |         |          |                     |
| in1 (in[0])                                 | 2.997:2.797:2.447:0. |          |                           |          |              |                  |          |    |          | 2.997:           | 2.797:2.  | 447:0.0 | 00       |                     |
| out1 (out[0])                               | 35.479               |          |                           | •••      |              |                  |          |    |          |                  |           |         |          |                     |
| _fini                                       | inactive             |          |                           |          |              |                  |          |    |          |                  |           |         |          |                     |
| cxa_finalize                                | inactive             |          |                           |          |              |                  |          |    |          |                  |           |         |          |                     |
| f32_to_f64(unsigned int)                    | inactive             |          |                           |          |              | $\left  \right $ | +++      |    | +        | + $+$            | +++       | +       |          |                     |
| f64_mul(unsigned long long, unsigned lon    | inactive             |          |                           |          |              | I    I    I      | ▋▋₿⅃ℍ₿   |    |          | ▋┨╢┨╢            | 8-8-8     | ▋▋₿₿    | 1        |                     |
| f64_to_f32(unsigned long long)              | inactive             | $\vdash$ |                           |          | +++          | ++               |          |    |          |                  |           | +++     |          |                     |
| softfloat_countLeadingZeros64(unsigned l    | inactive             | $\vdash$ |                           |          |              |                  |          |    |          |                  |           |         |          |                     |
| softfloat_mul64To128M(unsigned long lor     | inactive             | <u> </u> |                           |          | $\mathbf{H}$ |                  | нн       |    |          |                  |           |         |          |                     |
| softfloat_normSubnormalF32Sig(unsigned      | inactive             | $\vdash$ |                           |          |              |                  |          |    |          |                  |           |         |          |                     |
| softfloat_normSubnormalF64Sig(unsigned      | inactive             | $\vdash$ |                           |          |              |                  |          |    |          |                  |           |         |          |                     |
| softfloat_roundPackToF32(bool, int, unsig   | inactive             | <u> </u> |                           |          |              | $\rightarrow$    |          | ++ | +++      | ++               |           | +++     |          |                     |
| softfloat_roundPackToF64(bool, int, unsig   | inactive             | $\vdash$ |                           |          | + + +        |                  |          |    | +++      | $\left  \right $ |           |         |          |                     |
| softfloat_propagateNaNF64UI(unsigned lo     | inactive             | $\vdash$ |                           |          |              |                  |          |    |          |                  |           |         |          |                     |
| > Core Lock Requests                        |                      |          |                           |          |              |                  |          |    |          |                  |           |         |          |                     |

### before

| Optimization  | Before           | After            |
|---------------|------------------|------------------|
| Dense layer   | Vector algorithm | Vector algorithm |
| Act. function | Scalar algorithm | Vector algorithm |
| Latency       | ~12us            | ~1.6us           |

|                                |                         |              |                               |                                                     | 8,388.800 ns  |                                   |
|--------------------------------|-------------------------|--------------|-------------------------------|-----------------------------------------------------|---------------|-----------------------------------|
| NAME                           | VALUE                   | 6,000.000 ns | 6,774.400 ns                  | 8,000.000 n                                         |               | 9,000.000 ns                      |
| > Tile(23,0)                   |                         |              |                               |                                                     |               |                                   |
| √Tile(24,0)                    | _main                   | _main        | hid1_71to27_leakyrelu(adf::io | buffer <float,< td=""><td></td><td>m</td></float,<> |               | m                                 |
| ∽Core                          |                         |              |                               |                                                     |               |                                   |
| ✓Functions                     |                         |              |                               |                                                     |               |                                   |
| _main_init                     | inactive –              |              |                               |                                                     |               |                                   |
| _main                          | _main                   | lock stall   |                               |                                                     |               | lock                              |
| ∽hid1 (hid1_71to27_leakyrelu(a | adf::io_buffer inactive |              | hid1_71to27_leakyrelu(adf::io | buffer <float,< td=""><td></td><td></td></float,<>  |               |                                   |
| input (in[0])                  | 2.344:2.220:2.352:2.    |              |                               |                                                     | 2.344:2       | .220:2.352:2.294/2.               |
| weight (in[1])                 | 0.000:0.000:0.000:0.    |              |                               | 0.000:0.000:0.000                                   | :0.000/0.000: | 0.000:0.000:0.000                 |
| output (out[0])                | 3108222.250:81374: -    |              |                               |                                                     | 310           | 8222.250:81374 <mark>32.00</mark> |
| _fini                          | inactive                |              |                               |                                                     |               |                                   |
| cxa_finalize                   | inactive                |              |                               |                                                     |               |                                   |
| > Core Lock Requests           |                         |              |                               |                                                     |               |                                   |
| > DMA                          |                         |              |                               |                                                     |               |                                   |
| > Locks                        |                         |              |                               |                                                     |               |                                   |
| ∽Network                       |                         |              |                               |                                                     |               |                                   |
| > Inputs                       |                         |              |                               |                                                     |               |                                   |
| > Outputs                      |                         |              |                               |                                                     |               |                                   |
| > Tile(24,1)                   |                         |              |                               |                                                     |               |                                   |
| > Interface Tile(24)           |                         |              |                               |                                                     |               |                                   |
| > Tile(25.0)                   | main                    |              |                               | main                                                | 1,614.400 ns  |                                   |

after



# Motivations of trigger-DAQ upgrade

Physics

- Tau trigger efficiency now is >95% (to be pre-scaled if luminosity is high)
- Low multiplicity trigger efficiency (to be pre-scaled pre-scaled if luminosity is high) •
- Low-momentum track trigger efficiency •
- "Anomaly" trigger
  - Design a special trigger line for some specific physics channel
- Trigger efficiency of displaced vertex

Current hardware limitation:

- DAQ system is designed to handle 30 kHz •
  - L1 latency 4.4 us (SVD APV25 buffer)
    - CDC DNN trigger latency ~500 ns, latency already limited more large model
- ulletfull HLT: 15 units (7000 CPU cores)
- TTD system: VME bus limit, no more than 3 triggers within 80 clock (624ns) •

Vertex detector is planed to be upgraded during long shutdown 2 (after 2028) Latency limit target: 5 us -> 10 us (5.2 us KLM, 9 us TOP, considering upgrade)

- New TTD hardware: VME bus -> Ethernet
- New trigger board (UT5): Versal ACAP

L1 trigger rate will reach to ~20 kHz at  $0.9x10^{-35}$  cm<sup>-2</sup> s<sup>-1</sup> (13 HLT units, w/o hyperthreading), planed











- Simulate 1 million events with over 4 million tracks
  - Train: Validation = 4:1
- Training samples contain different topologies that cover all interested event features, to not bias the model, **no conservation laws involved here!**  $\rightarrow$  crucial step to be agnostic about the physics processes
- Sample features
  - Low momentum tracks forming circles in the CDC ( $P_t < 0.4$  GeV) <-> High momentum tracks
  - Short tracks <-> tracks penetrate all CDC layers
  - Small opening angle <-> well isolated two tracks









### **Development of GNN tracking algorithm**

- Belle II MC simulation data-set (Own simulation)
  - $\mu$ +  $\mu$  (particle gan)
  - 0.3 GeV/c < P < 5.0 GeV/c
  - Theta: 30°-120°, within on barrel CDC
  - Phi: 0-2π
  - Train: Validation: Test = 3: 1: 1
  - noise : /group/belle2/dataprod/BGOverlay/early\_phase3/release-06-00-05/overlay/BGx1/set0/









# **Development of GNN tracking algorithm**

- Graph Neural Network edge classifier
- Input network
  - Node features embedded in latent space
- Graph model
  - Edge network computes weights for edges using the features of the start and end nodes Node network computes new node features using the edge weight aggregated features s of the connected nodes and the nodes' current features
  - lacksquare
  - MLPs
  - 8 graph iterations
- Strengthen important connections and weaken useless or spurious ones



Hit selection efficiency: 98.4% Hit selection purity



### Performance step-by-step



 $\mu + \mu$ - (particle gan)

1. Original MC data sample

- $\mu$ +  $\mu$  (use particle gan)
- P (0.3GeV 5.0GeV)
- 2. Remove noise via GNN
- 3. Transform to Conformal plane

• 
$$X=2x/(x^2+y^2)$$
  $Y=2y/(X^2+y^2)$ 

 Circle passing the origin transform into a straight line

- 4. Transform to 'α' parameter plane
  - Hits connected in the X-Y plane in a straight line
  - α as the angle between the straight line and X axis
  - The parameter space as cosα and sinα
- 5. DBSCAN clustering in 'α'parameter plane
  - Density-Based Spatial Clustering of Application with Noise
  - Hits in a cluster are considered to be in the same track

Cluster efficiency: 97.7% Cluster purity : 96.9%





