1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
1. Memory allocations
for () {
cudaMalloc()
....
cudaFree()
}
is significantly! faster than
for () {
cudaMalloc()
...
}
for () {
cudaFree()
}
However, it is even better (significantly) to allocate everything at once
and just segment access.
|