bzr branch
http://darksoft.org/webbzr/alps/pcitool
76
by Suren A. Chilingaryan
Handle correctly reference counting in the driver |
1 |
Memory Addressing
|
2 |
=================
|
|
3 |
There is 3 types of addresses: virtual, physical, and bus. For DMA a bus |
|
4 |
address is used. However, on x86 physical and bus addresses are the same (on |
|
5 |
other architectures it is not guaranteed). Anyway, this assumption is still |
|
6 |
used by xdma driver, it uses phiscal address for DMA access. I have ported |
|
7 |
in the same way. Now, we need to provide additionaly bus-addresses in kmem |
|
8 |
abstraction and use it in NWL DMA implementation. |
|
9 |
||
72
by Suren A. Chilingaryan
Provide formal description of DMA access synchronization |
10 |
DMA Access Synchronization
|
11 |
==========================
|
|
12 |
- At driver level, few types of buffers are supported:
|
|
13 |
* SIMPLE - non-reusable buffers, the use infomation can be used for cleanup
|
|
14 |
after crashed applications. |
|
15 |
* EXCLUSIVE - reusable buffers which can be mmaped by a single appliction
|
|
16 |
only. There is two modes of these buffers: |
|
17 |
+ Buffers in a STANDARD mode are created for a single DMA operation and
|
|
18 |
if such buffer is detected while trying to reuse, the last operation |
|
19 |
has failed and reset is needed. |
|
20 |
+ Buffers in a PERSISTENT mode are preserved between invocations of
|
|
21 |
control application and cleaned up only after the PERSISTENT flag is |
|
22 |
removed |
|
23 |
* SHARED - reusable buffers shared by multiple processes. Not really
|
|
24 |
needed at the moment. |
|
25 |
||
26 |
KMEM_FLAG_HW - indicates that buffer can be used by hardware, acually this |
|
27 |
means that DMA will be enabled afterwards. The driver is not able to check |
|
28 |
if it really was enable and therefore will block any attempt to release |
|
29 |
buffer until KMEM_HW_FLAG is passed to kmem_free routine as well. The later |
|
30 |
should only called with KMEM_HW_FLAG after the DMA engine is stopped. Then, |
|
31 |
the driver can be realesd by kmem_free if ref count reaches 0. |
|
32 |
||
33 |
KMEM_FLAG_EXCLUSIVE - prevents multiple processes mmaping the buffer |
|
34 |
simultaneously. This is used to prevent multiple processes use the same |
|
81
by Suren A. Chilingaryan
Support forceful clean-up of kernel memory |
35 |
DMA engine at the same time. When passed to kmem_free, allows to clean |
36 |
buffers with lost clients even for shared buffers. |
|
72
by Suren A. Chilingaryan
Provide formal description of DMA access synchronization |
37 |
|
38 |
KMEM_FLAG_REUSE - requires reuse of existing buffer. If reusable buffer is |
|
39 |
found (non-reusable buffers, i.e. allocated without KMEM_FLAG_REUSE are |
|
40 |
ignored), it is returned instead of allocation. Three types of usage |
|
41 |
counters are used. At moment of allocation, the HW reference is set if |
|
42 |
neccessary. The usage counter is increased by kmem_alloc function and |
|
43 |
decreased by kmem_free. Finally, the reference is obtained at returned |
|
44 |
during mmap/munmap. So, on kmem_free, we do not clean |
|
73
by Suren A. Chilingaryan
Implement DMA access synchronization in the driver |
45 |
a) buffers with reference count above zero or hardware reference set. |
46 |
REUSE flag should be supplied, overwise the error is returned |
|
47 |
b) PERSISTENT buffer. REUSE flash should be supplied, overwise the |
|
48 |
error is returned |
|
49 |
c) non-exclusive buffers with usage counter above zero (For exclusive
|
|
72
by Suren A. Chilingaryan
Provide formal description of DMA access synchronization |
50 |
buffer the value of usage counter above zero just means that application |
51 |
have failed without cleaning buffers first. There is no easy way to |
|
52 |
detect that for shared buffers, so it is left as manual operation in |
|
53 |
this case) |
|
73
by Suren A. Chilingaryan
Implement DMA access synchronization in the driver |
54 |
d) any buffer if KMEM_FLAG_REUSE was provided to function |
72
by Suren A. Chilingaryan
Provide formal description of DMA access synchronization |
55 |
During module unload, only buffers with references can prevent cleanup. In |
56 |
this case the only possiblity to free the driver is to call kmem_free |
|
57 |
passing FORCE flags. |
|
58 |
||
59 |
KMEM_FLAG_PERSISTENT - if passed to allocation routine, changes mode of |
|
60 |
buffer to PERSISTENT, if passed to free routine, vice-versa changes mode |
|
61 |
of buffer to NORMAL. Basically, if we call 'pci --dma-start' this flag |
|
62 |
should be passed to alloc and if we call 'pci --dma-stop' it should be |
|
63 |
passed to free. In other case, the flag should not be present. |
|
64 |
||
65 |
If application crashed, the munmap while be still called cleaning software |
|
66 |
references. However, the hardware reference will stay since it is not clear |
|
67 |
if hardware channel was closed or not. To lift hardware reference, the |
|
68 |
application can be re-executed (or dma_stop called, for instance). |
|
69 |
* If there is no hardware reference, the buffers will be reused by next
|
|
70 |
call to application and for EXCLUSIVE buffer cleaned at the end. For SHARED |
|
71 |
buffers they will be cleaned during module cleanup only (no active |
|
72 |
references). |
|
73 |
* The buffer will be reused by next call which can result in wrong behaviour
|
|
74 |
if buffer left in incoherent stage. This should be handled on upper level. |
|
75 |
||
76 |
- At pcilib/kmem level synchronization of multiple buffers is performed
|
|
73
by Suren A. Chilingaryan
Implement DMA access synchronization in the driver |
77 |
* The HW reference and following modes should be consistent between member
|
78 |
parts: REUSABLE, PERSISTENT, EXCLUSIVE (only HW reference and PERSISTENT |
|
79 |
mode should be checked, others are handled on dirver level) |
|
80 |
* It is fine if only part of buffers are reused and others are newly
|
|
81 |
allocated. However, on higher level this can be checked and resulting |
|
82 |
in failure. |
|
83 |
||
84 |
Treatment of inconsistencies: |
|
72
by Suren A. Chilingaryan
Provide formal description of DMA access synchronization |
85 |
* Buffers are in PRESISTENT mode, but newly allocated, OK
|
86 |
* Buffers are reused, but are not in PERSISTENT mode (for EXCLUSIVE buffers
|
|
87 |
this means that application has crashed during the last execution), OK |
|
88 |
* Some of buffers are reused (not just REUSABLE, but actually reused),
|
|
73
by Suren A. Chilingaryan
Implement DMA access synchronization in the driver |
89 |
others - not, OK until |
90 |
a) either PERSISTENT flag is set or reused buffers are non-PERSISTENT |
|
91 |
b) either HW flag is set or reused buffers does not hold HW reference |
|
92 |
* PERSISTENT mode inconsistency, FAIL (even if we are going to set
|
|
93 |
PERSISTENT mode anyway) |
|
94 |
* HW reference inconsistency, FAIL (even if we are going to set
|
|
95 |
HW flag anyway) |
|
72
by Suren A. Chilingaryan
Provide formal description of DMA access synchronization |
96 |
|
97 |
On allocation error at some of the buffer, call clean routine and |
|
74
by Suren A. Chilingaryan
Implement DMA access synchronization for NWL implementation |
98 |
* Preserve PERSISTENT mode and HW reference if buffers held them before
|
99 |
unsuccessful kmem initialization. Until the last failed block, the blocks |
|
100 |
of kmem should be consistent. The HW/PERSISTENT flags should be removed |
|
101 |
if all reused blocks were in HW/PERSISTENT mode. The last block needs |
|
102 |
special treatment. The flags may be removed for the block if it was |
|
103 |
HW/PERSISTENT state (and others not). |
|
104 |
* Remove REUSE flag, we want to clean if allowed by current buffer status
|
|
72
by Suren A. Chilingaryan
Provide formal description of DMA access synchronization |
105 |
* EXCLUSIVE flag is not important for kmem_free routine.
|
106 |
||
107 |
- At DMA level
|
|
108 |
There is 4 components of DMA access: |
|
109 |
* DMA engine enabled/disabled
|
|
110 |
* DMA engine IRQs enabled/disabled - always enabled at startup
|
|
111 |
* Memory buffers
|
|
112 |
* Ring start/stop pointers
|
|
113 |
||
114 |
To prevent multiple processes accessing DMA engine in parallel, the first |
|
74
by Suren A. Chilingaryan
Implement DMA access synchronization for NWL implementation |
115 |
action is buffer initialization which will fail if buffers already used |
72
by Suren A. Chilingaryan
Provide formal description of DMA access synchronization |
116 |
* Always with REUSE, EXCLUSIVE, and HW flags
|
117 |
* Optionally with PERSISTENT flag (if DMA_PERSISTENT flag is set)
|
|
118 |
If another DMA app is running, the buffer allocation will fail (no dma_stop |
|
119 |
is executed in this case) |
|
120 |
||
121 |
Depending on PRESERVE flag, kmem_free will be called with REUSE flag |
|
122 |
keeping buffer in memory (this is redundant since HW flag is enough) or HW |
|
123 |
flag indicating that DMA engine is stopped and buffer could be cleaned. |
|
124 |
PERSISTENT flag is defined by DMA_PERSISTENT flag passed to stop routine. |
|
125 |
||
126 |
PRESERVE flag is enforced if DMA_PERSISTENT is not passed to dma_stop |
|
127 |
routine and either it: |
|
128 |
a) Explicitely set by DMA_PERMANENT flag passed to dma_start |
|
129 |
function |
|
130 |
b) Implicitely set if DMA engine is already enabled during dma_start, |
|
131 |
all buffers are reused, and are in persistent mode. |
|
132 |
If PRESERVE flag is on, the engine will not be stopped at the end of |
|
133 |
execution (and buffers will stay because of HW flag). |
|
134 |
||
74
by Suren A. Chilingaryan
Implement DMA access synchronization for NWL implementation |
135 |
If buffers are reused and are already in PERSISTENT mode, DMA engine was on |
136 |
before dma_start (PRESERVE flag is ignored, because it can be enforced), |
|
72
by Suren A. Chilingaryan
Provide formal description of DMA access synchronization |
137 |
ring pointers are calculated from LAST_BD and states of ring elements. |
138 |
If previous application crashed (i.e. buffers may be corrupted). Two |
|
139 |
cases are possible: |
|
140 |
* If during the call buffers were in non-PERSISTENT mode, it can be
|
|
141 |
easily detected - buffers are reused, but are not in PERSISTENT mode |
|
142 |
(or at least was not before we set them to). In this case we just |
|
143 |
reinitialize all buffers. |
|
144 |
* If during the call buffers were in PERSISTENT mode, it is up to
|
|
145 |
user to check their consistency and restart DMA engine.] |
|
146 |
||
147 |
IRQs are enabled and disabled at each call |
|
111
by Suren A. Chilingaryan
Update scripts |
148 |
|
149 |
DMA Reads
|
|
150 |
=========
|
|
151 |
standard: default reading mode, reads a single full packet |
|
152 |
multipacket: reads all available packets |
|
153 |
waiting multipacket: reads all available packets, after finishing the |
|
154 |
last one waiting if new data arrives |
|
155 |
exact read: read exactly specified number of bytes (should be |
|
156 |
only supported if it is multiple of packets, otherwise |
|
157 |
error should be returned) |
|
158 |
ignore packets: autoterminate each buffer, depends on engine |
|
159 |
configuration |
|
117
by Suren A. Chilingaryan
new event architecture, first trial |
160 |
|
161 |
To handle differnt cases, the value returned by callback function instructs |
|
162 |
the DMA library how long to wait for the next data to appear before timing |
|
163 |
out. The following variants are possible: |
|
164 |
terminate: just bail out |
|
165 |
check: no timeout, just check if there is data, otherwise |
|
166 |
terminate |
|
167 |
timeout: standard DMA timeout, normaly used while receiving |
|
168 |
fragments of packet: in this case it is expected |
|
169 |
that device has already prepared data and only |
|
170 |
the performance of DMA engine limits transfer speed |
|
171 |
wait: wait until the data is prepared by the device, this |
|
172 |
timeout is specified as argument to the dma_stream |
|
173 |
function (standard DMA timeout is used by default) |
|
111
by Suren A. Chilingaryan
Update scripts |
174 |
|
175 |
first | new_pkt | bufer |
|
176 |
-------------------------- |
|
117
by Suren A. Chilingaryan
new event architecture, first trial |
177 |
standard wait | term | timeout |
178 |
multiple packets wait | check | timeout - DMA_READ_FLAG_MULTIPACKET |
|
179 |
waiting multipacket wait | wait | timeout - DMA_READ_FLAG_WAIT |
|
180 |
exact wait | wait/term | timeout - limited by size parameter |
|
181 |
ignore packets wait | wait/check| wait/check - just autoterminated |
|
111
by Suren A. Chilingaryan
Update scripts |
182 |
|
183 |
Shall we do a special handling in case of overflow? |
|
72
by Suren A. Chilingaryan
Provide formal description of DMA access synchronization |
184 |
|
117
by Suren A. Chilingaryan
new event architecture, first trial |
185 |
|
186 |
Buffering
|
|
187 |
=========
|
|
188 |
The DMA addresses are limited to 32 bits (~4GB for everything). This means we |
|
126
by Suren A. Chilingaryan
multithread preprocessing of ipecamera frames and code reorganization |
189 |
can't really use DMA pages are sole buffers. Therefore, a second thread, with |
190 |
a realtime scheduling policy if possible, will be spawned and will copy the |
|
191 |
data from the DMA pages into the allocated buffers. On expiration of duration |
|
192 |
or number of events set by autostop call, this thread will be stopped but |
|
193 |
processing in streaming mode will continue until all copyied data is passed |
|
194 |
to the callbacks. |
|
195 |
||
196 |
To avoid stalls, the IPECamera requires data to be read continuously read out. |
|
197 |
For this reason, there is no locks in the readout thread. It will simplify |
|
198 |
overwrite the old frames if data is not copied out timely. To handle this case |
|
199 |
after getting the data and processing it, the calling application should use |
|
200 |
return_data function and check return code. This function may return error |
|
201 |
indicating that the data was overwritten meanwhile. Hence, the data is |
|
202 |
corrupted and shoud be droped by the application. The copy_data function |
|
203 |
performs this check and user application can be sure it get coherent data |
|
204 |
in this case. |
|
205 |
||
206 |
There is a way to avoid this problem. For raw data, the rawdata callback |
|
207 |
can be requested. This callback blocks execution of readout thread and |
|
208 |
data may be treated safely by calling application. However, this may |
|
209 |
cause problems to electronics. Therefore, only memcpy should be performed |
|
210 |
on the data normally. |
|
211 |
||
212 |
The reconstructed data, however, may be safely accessed. As described above, |
|
213 |
the raw data will be continuously overwritten by the reader thread. However, |
|
214 |
reconstructed data, upon the get_data call, will be protected by the mutex. |
|
117
by Suren A. Chilingaryan
new event architecture, first trial |
215 |
|
216 |
||
72
by Suren A. Chilingaryan
Provide formal description of DMA access synchronization |
217 |
Register Access Synchronization
|
218 |
===============================
|
|
219 |
We need to serialize access to the registers by the different running |
|
220 |
applications and handle case when registers are accessed indirectly by |
|
221 |
writting PCI BARs (DMA implementations, for instance). |
|
222 |
||
74
by Suren A. Chilingaryan
Implement DMA access synchronization for NWL implementation |
223 |
- Module-assisted locking:
|
224 |
* During initialization the locking context is created (which is basicaly
|
|
225 |
a kmem_handle of type LOCK_PAGE. |
|
226 |
* This locking context is passed to the kernel module along with lock type
|
|
227 |
(LOCK_BANK) and lock item (BANK ADDRESS). If lock context is already owns |
|
228 |
lock on the specified bank, just reference number is increased, otherwise |
|
229 |
we are trying to obtain new lock. |
|
230 |
* Kernel module just iterates over all registered lock pages and checks if
|
|
231 |
any holds the specified lock. if not, the lock is obtained and registered |
|
232 |
in the our lock page. |
|
233 |
* This allows to share access between multiple threads of single application
|
|
234 |
(by using the same lock page) or protect (by using own lock pages by each of |
|
235 |
the threads) |
|
236 |
* Either on application cleanup or if application crashed, the memory mapping
|
|
237 |
of lock page is removed and, hence, locks are freed. |
|
238 |
||
239 |
- Multiple-ways of accessing registers
|
|
240 |
Because of reference counting, we can successfully obtain locks multiple |
|
241 |
times if necessary. The following locks are protecting register access: |
|
242 |
a) Global register_read/write lock bank before executing implementation |
|
243 |
b) DMA bank is locked by global DMA functions. So we can access the |
|
244 |
registers using plain PCI bar read/write. |
|
245 |
c) Sequence of register operations can be protected with pcilib_lock_bank
|
|
246 |
function |
|
247 |
Reading raw register space or PCI bank is not locked. |
|
248 |
* Ok. We can detect banks which will be affected by PCI read/write and
|
|
249 |
lock them. But shall we do it? |
|
250 |
||
72
by Suren A. Chilingaryan
Provide formal description of DMA access synchronization |
251 |
Register/DMA Configuration
|
252 |
==========================
|
|
253 |
- XML description of registers
|
|
254 |
- Formal XML-based (or non XML-based) language for DMA implementation.
|
|
255 |
a) Writting/Reading register values |
|
256 |
b) Wait until <register1>=<value> on <register2>=<value> report error |
|
257 |
c) ... ?
|
|
88
by Suren A. Chilingaryan
IRQ acknowledgement support in the engine API |
258 |
|
259 |
IRQ Handling
|
|
260 |
============
|
|
261 |
IRQ types: DMA IRQ, Event IRQ, other types |
|
262 |
IRQ hardware source: To allow purely user-space implementation, as general |
|
263 |
rule, only a single (standard) source should be used. |
|
264 |
IRQ source: The dma/event engines, however, may detail this hardware source |
|
265 |
and produce real IRQ source basing on the values of registers. For example, |
|
266 |
for DMA IRQs the source may present engine number and for Event IRQs the |
|
267 |
source may present event type. |
|
268 |
||
269 |
Only types can be enabled or disabled. The sources are enabled/disabled |
|
270 |
by enabling/disabling correspondent DMA engines or Event types. The expected |
|
271 |
workflow is following: |
|
272 |
* We enabling IRQs in user-space (normally setting some registers). Normally,
|
|
273 |
just an Event IRQs, the DMA if necessary will be managed by DMA engine itself. |
|
274 |
* We waiting for standard IRQ from hardware (driver)
|
|
275 |
* In the user space, we are checking registers to find out the real source
|
|
276 |
of IRQ (driver reports us just hardware source), generating appropriate |
|
277 |
events, and acknowledge IRQ. This is dependent on implementation and should |
|
278 |
be managed inside event API. |
|
279 |
||
280 |
I.e. the driver implements just two methods pcilib_wait_irq(hw_source), |
|
281 |
pcilib_clear_irq(hw_source). Only a few hardware IRQ sources are defined. |
|
282 |
In most cirstumances, the IRQ_SOURCE_DEFAULT is used. |
|
283 |
||
284 |
The DMA engine may provide 3 additional methods, to enable, disable, |
|
285 |
and acknowledge IRQ. |
|
286 |
||
287 |
... To be decided in details upon the need...
|
|
288 |
||
289 |
Updating Firmware
|
|
290 |
=================
|
|
90
by Suren A. Chilingaryan
Small documentation update |
291 |
- JTag should be connected to USB connector on the board (next to Ethernet)
|
88
by Suren A. Chilingaryan
IRQ acknowledgement support in the engine API |
292 |
- The computer should be tourned off and on before programming
|
90
by Suren A. Chilingaryan
Small documentation update |
293 |
- The environment variable should be loaded
|
294 |
. /home/uros/.bashrc |
|
88
by Suren A. Chilingaryan
IRQ acknowledgement support in the engine API |
295 |
- The application is called 'impact'
|
90
by Suren A. Chilingaryan
Small documentation update |
296 |
No project is needed, cancel initial proposals (No/Cancel) |
297 |
Double-click on "Boundary Scan" |
|
298 |
Right click in the right window and select "Init Chain" |
|
299 |
We don't want to select bit file now (Yes and, then, click Cancel) |
|
300 |
Right click on second (right) item and choose "Assign new CF file" |
|
301 |
Select a bit file. Answer No, we don't want to attach SPI to SPI Prom |
|
302 |
Select xv6vlx240t and program it |
|
303 |
- Shutdown and start computer
|
|
304 |
||
305 |
Firmware are in |
|
88
by Suren A. Chilingaryan
IRQ acknowledgement support in the engine API |
306 |
v.2: /home/uros/Repo/UFO2_last_good_version_UFO2.bit |
307 |
v.3: /home/uros/Repo/UFO3 |
|
308 |
Step5 - best working revision |
|
309 |
Step6 - last revision |
|
310 |
||
311 |