LLVM 23.0.0git
HexagonXQFloatGenerator.cpp
Go to the documentation of this file.
1//===-------------------- HexagonXQFloatGenerator.cpp --------------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This pass enables generation of XQFloat instructions. XQF instructions
10// are more efficient, but can be less precise in comparison to IEEE ones.
11// Based on the accuracy preservation of the generated code, we enabled four
12// modes - Strict IEEE-754 compliant, IEEE-754 compliant, Lossy subnormals and
13// legacy mode.
14//
15// Strict IEEE mode adheres to similar accuracy and precision as of IEEE-754.
16//
17// IEEE-754 compliant mode excludes IEEE-754 overflows and lower precision
18// subnormals due to larger dynamic range than IEEE-754.
19// All subnormals have extra precision.
20//
21// Lossy subnormals mode without normalization result in a loss of accuracy.
22// This provides greater precision than a clamp of subnormals to 0.
23// If dataset excludes subnormals, it behavas as IEEE-754 compliant mode.
24//
25// The direct mode has a loss of 1 bit of accuracy compared to IEEE-754.
26//
27// V79 replaces the prior internal HVX floating point format for floating-point
28// arithmetic. The new internal HVX floating-point format yields results
29// identical to IEEE-754 round-to-even mode. The new format contains more bits
30// than IEEE-754, which optionally produces results with greater range and
31// accuracy. Only the HVX vector registers use the HVX floating-point format.
32// Memory maintains all floating-point data in IEEE-754 format,
33// and all loads/stores use the IEEE-754 format. A subset of HVX floating-point
34// operations transform IEEE-754 floating-point data to HVX floating-point data.
35// Subsequent HVX floating-point instructions may consume operands in the HVX
36// floating-point without conversion to IEEE-754, which allows for performant
37// & energy efficient code. The program does not need to switch between formats
38// continuously. The program must convert the HVX floating-point results to
39// IEEE-754 prior to storing to memory.
40
41// HVX floating-point achieves IEEE-754 compliance through normalization.
42// The program may skip normalization when faster calculation is desired, and
43// IEEE-754 compliance isn’t required. HVX floating-point contains two input
44// types: qf32, single precision floating point, and qf16, half precision
45// floating point. In Hexagon, IEEE-754 contains two input types: sf, single
46// precision floating point, and hf, half precision floating point.
47//
48// Only HVX floating-point source and destination instructions use HVX
49// floating-point values. Instructions specify the HVX floating-point format
50// with the qf16 and qf32 identifier. A source vector register will drop the
51// extended state of a HVX floating-point value when an instruction reads the
52// source vector register without the qf16 or qf32 identifier. A destination
53// vector register will reset its extended state when an instruction writes to
54// a vector register without the qf16 or qf32 identifier. When dropping the
55// extended state, the floating-point value loses accuracy. The program may
56// preserve the floating-point value by converting HVX floating-point values
57// to IEEE-754 values. Compiler must convert HVX floating-point values to
58// IEEE-754 values before using as an input to stores, permutes, shifts, and
59// any other operations that do not source the HVX floating-point format.
60//
61// Depending on the desired results, HVX floating-point operations may have
62// some requirements on the input sources. The HVX floating-point values
63// require normalization to achieve IEEE-754 compliance, while faster operations
64// may skip normalization. The program normalizes HVX floating-point values
65// before subsequent HVX floating-point operations, so the floating-point value
66// does not lose precision. The program also obtains results identical to
67// IEEE-754 by converting all HVX floating-point results to IEEE-754 format
68// before consumed in any subsequent operation. There are however cases where
69// this conversion is redundant, or the differences between IEEE-754 and HVX
70// floating-point may not be a concern.
71//
72// The conversion logic can be understood by the table below:
73//
74// ================================================================================================================================================
75// | | | |
76// | Inputs to add/subtarct | Inputs to
77// multiplication instuctions | Non-HVX floating
78// point | | instructions | | instruction
79// | | | | |
80// ===============================================================================================================================================|
81// Sources | IEEE- | HVX | HVX | sf | qf32 | qf32 | hf
82// | qf16 | qf16 | IEE-754 | HVX | HVX |
83// | 754 | floating | floating | | from | from | |
84// from | from | | floating | floating | | |
85// point | point | | mult | adder | | mult
86// | adder | | point | point | | | from |
87// from | | | | | | | |
88// from | from | | | multi | adder | |
89// | | | | | | mult |
90// adder | | | | | | | | | | |
91// | | |
92// ===============================================================================================================================================|
93// Strict | Direct | Convert | Convert | Normalize | Convert | Convert
94// | widening | Convert | Convert | Direct | Convert | Convert | IEEE-754
95// | Use | to | to | | to IEEE | to IEEE | multiply
96// | to IEEE, | to IEEE, | use | to | to | compliance | |
97// IEEE | IEEE | | then | then | then | widening
98// | widening | | IEEE | IEEE |
99// | | | | | normalize | normalize
100// | convert | multiply,| multiply,| | | |
101// | | | | | | | to IEEE
102// | convert | convert | | | | | |
103// | | | | | | to
104// IEEE | to IEEE | | | |
105// -----------------------------------------------------------------------------------------------------------------------------------------------|
106// IEEE-754 | Direct | Direct | Direct | Normalize | Direct | Normalize
107// | Widening | Direct | Widening | Direct | Convert | Convert | compliance
108// | Use | Use | Use | | use | | multiply
109// | use | multiply | use | to IEEE | to IEEE |
110// -----------------------------------------------------------------------------------------------------------------------------------------------|
111// Lossy | Direct | Direct | Direct | Direct | Direct | Normalize
112// | Direct | Direct | Widening | Direct | Convert | Convert | Subnormals
113// | Use | Use | Use | Use | use | | use |
114// use | multiply | use | to IEEE | to IEEE |
115// -----------------------------------------------------------------------------------------------------------------------------------------------|
116// Direct | Direct | Direct | Direct | Direct | Direct | Direct |
117// Direct | Direct | Direct | Direct | Direct | Direct | Lossy |
118// Use | Use | Use | Use | use | use | use |
119// use | use | use | use | use |
120// -----------------------------------------------------------------------------------------------------------------------------------------------|
121//
122// For v81, the normalization sequence changes. Instead of multiplying 0
123// and -0, a simple copy operation normalizes the unnormal value. Both
124// qf and IEEE-754 value can be unnormal.
125// Additionally for v81, we have two new vsub instructions which are handled.
126
127#define HEXAGON_XQFLOAT_GENERATOR "XQFloat Generator pass"
128
129#include "Hexagon.h"
130#include "HexagonInstrInfo.h"
131#include "HexagonSubtarget.h"
132#include "HexagonTargetMachine.h"
133#include "llvm/ADT/SmallPtrSet.h"
134#include "llvm/ADT/SmallVector.h"
135#include "llvm/ADT/Statistic.h"
141#include "llvm/CodeGen/Passes.h"
142#include "llvm/IR/DebugLoc.h"
143#include "llvm/Pass.h"
145#include "llvm/Support/Debug.h"
147#include <vector>
148
149#define DEBUG_TYPE "hexagon-xqf-gen"
150
151using namespace llvm;
152
154
155// Master flag to enable XQF generations
156cl::opt<bool> EnableHVXXQFloat("enable-xqf-gen", cl::init(false),
157 cl::desc("Enable XQFloat generations"));
158// This vector contains the opcodes which generate qf32 from add/subtract
159static constexpr unsigned XQFPAdd32[] = {
160 // vector add instructions
161 Hexagon::V6_vadd_sf, Hexagon::V6_vadd_qf32, Hexagon::V6_vadd_qf32_mix,
162
163 // vector subtract instructions
164 Hexagon::V6_vsub_qf32, Hexagon::V6_vsub_qf32_mix, Hexagon::V6_vsub_sf,
165 Hexagon::V6_vsub_sf_mix};
166
167// This vector contains the opcodes which generate qf16 from add/subtract
168static constexpr unsigned XQFPAdd16[] = {
169 // vector add instructions
170 Hexagon::V6_vadd_hf, Hexagon::V6_vadd_qf16, Hexagon::V6_vadd_qf16_mix,
171
172 // vector subtract intrutions
173 Hexagon::V6_vsub_hf, Hexagon::V6_vsub_qf16, Hexagon::V6_vsub_qf16_mix,
174 Hexagon::V6_vsub_hf_mix};
175
176// This vector contains the opcodes which generate qf32 from multiplication
177static constexpr unsigned XQFPMult32[] = {
178 Hexagon::V6_vmpy_qf32, Hexagon::V6_vmpy_qf32_qf16, Hexagon::V6_vmpy_qf32_hf,
179 Hexagon::V6_vmpy_qf32_sf, Hexagon::V6_vmpy_qf32_mix_hf};
180// This vector contains the opcodes which generate qf16 from multiplication
181static constexpr unsigned XQFPMult16[] = {Hexagon::V6_vmpy_qf16,
182 Hexagon::V6_vmpy_qf16_hf,
183 Hexagon::V6_vmpy_qf16_mix_hf};
184
185namespace llvm {
188} // namespace llvm
189
190namespace {
191
192struct HexagonXQFloatGenerator : public MachineFunctionPass {
193public:
194 static char ID;
195 HexagonXQFloatGenerator() : MachineFunctionPass(ID) {}
196
197 bool runOnMachineFunction(MachineFunction &MF) override;
198
199 StringRef getPassName() const override { return HEXAGON_XQFLOAT_GENERATOR; }
200
201 void getAnalysisUsage(AnalysisUsage &AU) const override {
203 }
204
205private:
206 // Handle each XQF optimization level
207 bool HandleStrictIEEE(MachineFunction &);
208 bool HandleCompliantIEEE(MachineFunction &);
209 bool HandleLossySubnormals(MachineFunction &);
210 bool HandleLossyLegacy(MachineFunction &);
211
212 // Checkers functions for input operands
213 bool checkIfInputFromAdder32(Register Reg);
214 bool checkIfInputFromAdder16(Register Reg);
215 bool checkIfInputFromMult32(Register Reg);
216 bool checkIfInputFromMult16(Register Reg);
217 bool deleteList();
218
219 // Helper functions for conversion/normalization/widening
220 bool widenMultiplicationInputF16(MachineInstr &, Register &, Register &,
221 Register &, bool);
222 bool widenMultiplicationInputF16Rt(MachineInstr &, Register &, Register &,
223 Register &);
224 void widenMultiplyInputHF(MachineInstr &, Register &, Register &, Register &);
225 bool normalizeMultiplicationInputF32(MachineInstr &, Register &, Register &,
226 Register &, Register &, bool &);
227 void normalizeMultiplicationInputSF(MachineInstr &, Register &, Register &,
228 Register &, Register &, bool &);
229 bool convertNormalizeMultOp32(MachineInstr &, Register &, Register &,
230 Register &, Register &, bool &);
231 bool convertWidenMultOp16(MachineInstr &, Register &, Register &, Register &,
232 bool);
233 bool convertWidenMultOp32(MachineInstr &, Register &, Register &, Register &,
234 bool);
235 void createPrologInstructions(MachineInstr &, Register &);
236 bool convertAddOpToIEEE16(MachineInstr &, Register &, Register &, Register &,
237 bool, bool, bool);
238 bool convertAddOpToIEEE32(MachineInstr &, Register &, Register &, Register &,
239 bool, bool, bool);
240 void generateQF16FromQF32(MachineInstr &, Register &, Register &);
241 bool convertIfInputToNonHVX(MachineInstr &, bool);
242 void createConvertInstr(MachineInstr *, Register &, Register &, bool);
243
244 // V81 specific normalization function
245 bool V81normalizeMultF32(MachineInstr &, Register &, Register &, Register &,
246 bool, bool, bool);
247
248 const HexagonSubtarget *HST = nullptr;
249 const HexagonInstrInfo *HII = nullptr;
250 MachineRegisterInfo *MRI = nullptr;
251
253 OriginalMI; // Hold the instructions to be deleted
254};
255
256// This class removes redundant vector convert instructions from qf to hf/sf.
257// Additionally, it relaces use of sf/hf registers with qf types.
258// The resulting code is complete without dangling instructions.
259// FIXME: Liveness is not preserved.
260char HexagonXQFloatGenerator::ID = 0;
261
262} // namespace
263
264INITIALIZE_PASS(HexagonXQFloatGenerator, "hexagon-xqfloat-generator",
265 HEXAGON_XQFLOAT_GENERATOR, false, false)
266
268 return new HexagonXQFloatGenerator();
269}
270
271// Returns true if qf32 input is from an adder/subtract unit
272bool HexagonXQFloatGenerator::checkIfInputFromAdder32(Register Reg) {
273 MachineInstr *Def = MRI->getVRegDef(Reg);
274 if (!Def)
275 return false;
276
277 // If the definition is a copy, we need to analyze its def again
278 if (Def->getOpcode() == TargetOpcode::COPY) {
279 Register SrcReg = Def->getOperand(1).getReg();
280 if (SrcReg.isValid())
281 return checkIfInputFromAdder32(SrcReg);
282 return false;
283 } else if (Def->getOpcode() == TargetOpcode::REG_SEQUENCE) {
284 Register SrcReg1 = Def->getOperand(1).getReg();
285 Register SrcReg2 = Def->getOperand(2).getReg();
286 bool isTrue = false;
287 if (SrcReg1.isValid())
288 isTrue = checkIfInputFromAdder32(SrcReg1);
289 if (SrcReg2.isValid())
290 isTrue |= checkIfInputFromAdder32(SrcReg2);
291 return isTrue;
292 } else
293 return llvm::is_contained(XQFPAdd32, Def->getOpcode());
294}
295
296// Returns true if qf16 input is from an adder/subtract unit
297bool HexagonXQFloatGenerator::checkIfInputFromAdder16(Register Reg) {
298 MachineInstr *Def = MRI->getVRegDef(Reg);
299 if (!Def)
300 return false;
301
302 // if the definition is a copy, we need to analyze its def again
303 if (Def->getOpcode() == TargetOpcode::COPY) {
304 Register SrcReg = Def->getOperand(1).getReg();
305 if (SrcReg.isValid())
306 return checkIfInputFromAdder16(SrcReg);
307 return false;
308 } else
309 return llvm::is_contained(XQFPAdd16, Def->getOpcode());
310}
311
312// Returns true if qf32 input is from a multiplier unit
313bool HexagonXQFloatGenerator::checkIfInputFromMult32(Register Reg) {
314 MachineInstr *Def = MRI->getVRegDef(Reg);
315 if (!Def)
316 return false;
317
318 // if the definition is a copy, we need to analyze its def again
319 if (Def->getOpcode() == TargetOpcode::COPY) {
320 Register SrcReg = Def->getOperand(1).getReg();
321 if (SrcReg.isValid())
322 return checkIfInputFromMult32(SrcReg);
323 return false;
324 } else if (Def->getOpcode() == TargetOpcode::REG_SEQUENCE) {
325 Register SrcReg1 = Def->getOperand(1).getReg();
326 Register SrcReg2 = Def->getOperand(2).getReg();
327 bool isTrue = false;
328 if (SrcReg1.isValid())
329 isTrue |= checkIfInputFromMult32(SrcReg1);
330 if (SrcReg2.isValid())
331 isTrue |= checkIfInputFromMult32(SrcReg2);
332 return isTrue;
333 } else
334 return llvm::is_contained(XQFPMult32, Def->getOpcode());
335}
336
337// Returns true if qf16 input is from a multiplier unit
338bool HexagonXQFloatGenerator::checkIfInputFromMult16(Register Reg) {
339 MachineInstr *Def = MRI->getVRegDef(Reg);
340 if (!Def)
341 return false;
342
343 // if the definition is a copy, we need to analyze its def again
344 if (Def->getOpcode() == TargetOpcode::COPY) {
345 Register SrcReg = Def->getOperand(1).getReg();
346 if (SrcReg.isValid())
347 return checkIfInputFromMult16(SrcReg);
348 return false;
349 } else
350 return llvm::is_contained(XQFPMult16, Def->getOpcode());
351}
352
353// Generates sf = qf32 instruction or hf = qf16 intruction
354void HexagonXQFloatGenerator::createConvertInstr(MachineInstr *UseMI,
355 Register &NewR, Register &OldR,
356 bool is32bit) {
357 const DebugLoc &DL = UseMI->getDebugLoc();
358 MachineBasicBlock *MBB = UseMI->getParent();
359 NewR = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
360 if (is32bit)
361 BuildMI(*MBB, *UseMI, DL, HII->get(Hexagon::V6_vconv_sf_qf32), NewR)
362 .addReg(OldR);
363 else
364 BuildMI(*MBB, *UseMI, DL, HII->get(Hexagon::V6_vconv_hf_qf16), NewR)
365 .addReg(OldR);
366}
367
368// Generate HVX to IEEE conversion instruction for all non-HVX uses
369bool HexagonXQFloatGenerator::convertIfInputToNonHVX(MachineInstr &MI,
370 bool is32bit) {
371 Register NewR;
372 bool Changed = false;
373 ;
374 Register Dest = MI.getOperand(0).getReg();
375
376 // Iterate over all uses of the Def we are analyzing
377 for (auto &MO : make_range(MRI->use_begin(Dest), MRI->use_end())) {
378 MachineInstr *UseMI = MO.getParent();
379 // Omit if the use is a REG_SEQUENCE instruction, since the only
380 // use of REG_SEQUENCE in qf context is transforming to IEEE.
381 // Omit for use in DBG instructions.
382 // Omit for use in PHI instructions since PHI result can be used as a qf
383 // operand.
384 if (UseMI->getOpcode() == TargetOpcode::REG_SEQUENCE ||
385 UseMI->getOpcode() == TargetOpcode::DBG_VALUE ||
386 UseMI->getOpcode() == TargetOpcode::DBG_LABEL ||
387 UseMI->getOpcode() == TargetOpcode::PHI)
388 continue;
389
390 // If 32-bit operand
391 if (is32bit) {
392 // If it is a copy instruction, we need to analyze it uses
393 if (UseMI->getOpcode() == TargetOpcode::COPY)
394 return convertIfInputToNonHVX(*UseMI, /* 32 bit */ true);
395 if (!HII->usesQFOperand(UseMI)) {
396 createConvertInstr(UseMI, NewR, Dest, /*32 bit*/ true);
397 MO.setReg(NewR);
398 Changed = true;
399 }
400 // If 16-bit operand
401 } else {
402 // If it is a copy instruction, we need to analyze it uses
403 if (UseMI->getOpcode() == TargetOpcode::COPY)
404 return convertIfInputToNonHVX(*UseMI, /* 16 bit */ false);
405 if (!HII->usesQFOperand(UseMI)) {
406 createConvertInstr(UseMI, NewR, Dest, /*16 bit*/ false);
407 MO.setReg(NewR);
408 Changed = true;
409 }
410 }
411 }
412 return Changed;
413}
414
415// generate qf16 = qf32 via:
416// hf = qf32
417// V0 = #0
418// qf16 = vsub(hf,V0)
419void HexagonXQFloatGenerator::generateQF16FromQF32(MachineInstr &MI,
420 Register &Dest,
421 Register &SrcReg) {
422
423 MachineBasicBlock &MBB = *MI.getParent();
424 const DebugLoc &DL = MI.getDebugLoc();
425
426 Register convertReg = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
427 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_hf_qf32), convertReg)
428 .addReg(SrcReg);
429 Register VR0 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
430 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vd0), VR0);
431
432 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vsub_hf), Dest)
433 .addReg(convertReg)
434 .addReg(VR0);
435}
436
437// Widen qf16 = vmpy(hf, hf) result unconditionally
438void HexagonXQFloatGenerator::widenMultiplyInputHF(MachineInstr &MI,
439 Register &Reg1,
440 Register &Reg2,
441 Register &Dest) {
442 Register output_mpy = MRI->createVirtualRegister(&Hexagon::HvxWRRegClass);
443 MachineBasicBlock &MBB = *MI.getParent();
444 const DebugLoc &DL = MI.getDebugLoc();
445
446 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_hf), output_mpy)
447 .addReg(Reg1)
448 .addReg(Reg2);
449 generateQF16FromQF32(MI, Dest, output_mpy);
450}
451
452// Widen vmpy(qf16, qf16/hf) result conditionally
453bool HexagonXQFloatGenerator::widenMultiplicationInputF16(MachineInstr &MI,
454 Register &Reg1,
455 Register &Reg2,
456 Register &Dest,
457 bool twoOps) {
458 bool firstconvert = false, secondconvert = false;
459 MachineBasicBlock &MBB = *MI.getParent();
460 const DebugLoc &DL = MI.getDebugLoc();
461
462 // We widen only that operand which comes from add/subtract unit.
463 if (checkIfInputFromAdder16(Reg1))
464 firstconvert = true;
465 // twoOps == true suggest 2nd operand is qf16, else it is hf
466 if (twoOps && checkIfInputFromAdder16(Reg2))
467 secondconvert = true;
468
469 Register widenReg;
470 // if either operands from add/subtract unit, we widen
471 if (twoOps) {
472 if (firstconvert || secondconvert) {
473 widenReg = MRI->createVirtualRegister(&Hexagon::HvxWRRegClass);
474 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_qf16), widenReg)
475 .addReg(Reg1)
476 .addReg(Reg2);
477 } else {
478 return false;
479 }
480 } else {
481 if (firstconvert) {
482 widenReg = MRI->createVirtualRegister(&Hexagon::HvxWRRegClass);
483 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_mix_hf), widenReg)
484 .addReg(Reg1)
485 .addReg(Reg2);
486 } else {
487 return false;
488 }
489 }
490
491 // generate qf16 = qf32
492 generateQF16FromQF32(MI, Dest, widenReg);
493
494 return true;
495}
496
497// Handle qf16 = vmpy(qf16, Rt)
498// For strict IEEE mode, convert the qf16 to IEEE before widening
499bool HexagonXQFloatGenerator::widenMultiplicationInputF16Rt(MachineInstr &MI,
500 Register &Reg1,
501 Register &Reg2,
502 Register &Dest) {
503 // If the first input is not from an adder, for strict-ieee check if
504 // input from mult, else return false.
505 if (!checkIfInputFromAdder16(Reg1)) {
506 if (QFloatModeValue == QFloatMode::StrictIEEE) {
507 if (!checkIfInputFromMult16(Reg1))
508 return false;
509 } else
510 return false;
511 }
512
513 MachineBasicBlock &MBB = *MI.getParent();
514 const DebugLoc &DL = MI.getDebugLoc();
515
516 Register VSplatReg = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
517 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_lvsplatw), VSplatReg).addReg(Reg2);
518
519 Register widenReg = MRI->createVirtualRegister(&Hexagon::HvxWRRegClass);
520 if (QFloatModeValue == QFloatMode::StrictIEEE) {
521 Register VHf = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
522 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_hf_qf16), VHf).addReg(Reg1);
523 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_hf), widenReg)
524 .addReg(VHf)
525 .addReg(VSplatReg);
526 } else {
527 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_mix_hf), widenReg)
528 .addReg(Reg1)
529 .addReg(VSplatReg);
530 }
531
532 // generate qf16 = qf32
533 generateQF16FromQF32(MI, Dest, widenReg);
534 return true;
535}
536
537// Handle qf32 = vadd/vsub(qf32/sf, qf32/sf)
538// Handle vadd/vsub instructions with qf32 operands conditionally
539// isAdd: true if an add instruction is analyzed, false for subtract
540// isFirstOpQf: true if 1st operand is qf32 type, false if sf type
541// isSecOpQf: true if 2nd operand is qf32 type, false if sf type
542bool HexagonXQFloatGenerator::convertAddOpToIEEE32(
543 MachineInstr &MI, Register &Reg1, Register &Reg2, Register &Dest,
544 bool isAdd, bool isFirstOpQf, bool isSecOpQf) {
545
546 Register VR1;
547 Register VR2;
548 bool firstconvert = false, secondconvert = false;
549 MachineBasicBlock &MBB = *MI.getParent();
550 const DebugLoc &DL = MI.getDebugLoc();
551
552 // If the first operand is qf32 type
553 if (isFirstOpQf) {
554 // If the first operand is from add/sub/mul unit,
555 // generate IEEE conversion instruction sf = qf32
556 if (checkIfInputFromAdder32(Reg1) || checkIfInputFromMult32(Reg1)) {
557 VR1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
558 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_sf_qf32), VR1)
559 .addReg(Reg1);
560 firstconvert = true;
561 }
562 }
563
564 // If 2nd operand is of qf32 type
565 if (isSecOpQf) {
566 // If the second operand is from add/sub/mul unit,
567 // generate IEEE conversion instruction
568 if (checkIfInputFromAdder32(Reg2) || checkIfInputFromMult32(Reg2)) {
569 VR2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
570 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_sf_qf32), VR2)
571 .addReg(Reg2);
572 secondconvert = true;
573 }
574 }
575
576 // If both operands are qf32 type, use V6_v[add/sub]_sf instruction
577 // If one of them is of sf type, use V6_v[add/sub]_qf32_mix instruction
578 // Output is qf32
579 if (isFirstOpQf && isSecOpQf) {
580 if (firstconvert && secondconvert) {
581 BuildMI(MBB, MI, DL,
582 HII->get(isAdd ? Hexagon::V6_vadd_sf : Hexagon::V6_vsub_sf), Dest)
583 .addReg(VR1)
584 .addReg(VR2);
585 } else if (firstconvert) {
586 if (isAdd)
587 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32_mix), Dest)
588 .addReg(Reg2)
589 .addReg(VR1);
590 // For vsub type, for v81 we use a different opcode,
591 // for v79, we convert the 2nd op to IEEE too.
592 else {
593 if (HST->useHVXV81Ops())
594 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vsub_sf_mix), Dest)
595 .addReg(VR1)
596 .addReg(Reg2);
597 else {
598 VR2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
599 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_sf_qf32), VR2)
600 .addReg(Reg2);
601 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vsub_sf), Dest)
602 .addReg(VR1)
603 .addReg(VR2);
604 }
605 }
606 } else if (secondconvert) {
607 BuildMI(MBB, MI, DL,
608 HII->get(isAdd ? Hexagon::V6_vadd_qf32_mix
609 : Hexagon::V6_vsub_qf32_mix),
610 Dest)
611 .addReg(Reg1)
612 .addReg(VR2);
613 } else { // none of the inputs is from an add/sub/mul unit
614 return false;
615 }
616 // handle vadd/vsub when the 1st op of original instruction is qf type
617 } else if (isFirstOpQf) {
618 if (firstconvert)
619 BuildMI(MBB, MI, DL,
620 HII->get(isAdd ? Hexagon::V6_vadd_sf : Hexagon::V6_vsub_sf), Dest)
621 .addReg(VR1)
622 .addReg(Reg2);
623 else
624 return false;
625 // handle vadd/vsub when the 2nd op of original instruction is qf type
626 } else if (isSecOpQf) {
627 if (secondconvert)
628 BuildMI(MBB, MI, DL,
629 HII->get(isAdd ? Hexagon::V6_vadd_sf : Hexagon::V6_vsub_sf), Dest)
630 .addReg(Reg1)
631 .addReg(VR2);
632 else
633 return false;
634 } else
635 return false;
636 return true;
637}
638
639// Handle qf16 = vadd/vsub(qf16, qf16/hf)
640// Handle vadd/vsub instructions with qf16 operands conditionally
641// isAdd: true if an add instruction is analyzed, false for subtract
642// isFirstOpQf: true if 1st operand is qf16 type, false if hf type
643// isSecOpQf: true if 2nd operand is qf16 type, false if hf type
644bool HexagonXQFloatGenerator::convertAddOpToIEEE16(
645 MachineInstr &MI, Register &Reg1, Register &Reg2, Register &Dest,
646 bool isAdd, bool isFirstOpQf, bool isSecOpQf) {
647
648 MachineBasicBlock &MBB = *MI.getParent();
649 const DebugLoc &DL = MI.getDebugLoc();
650 Register VR1;
651 Register VR2;
652 bool firstconvert = false, secondconvert = false;
653
654 // If the first qf16 operand is from add/sub/mul unit,
655 // generate IEEE conversion instruction
656 if (isFirstOpQf) {
657 if (checkIfInputFromAdder16(Reg1) || checkIfInputFromMult16(Reg1)) {
658 VR1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
659 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_hf_qf16), VR1)
660 .addReg(Reg1);
661 firstconvert = true;
662 }
663 }
664 if (isSecOpQf) {
665 // If the second operand is from add/sub/mul unit,
666 // generate IEEE conversion instruction
667 if (checkIfInputFromAdder16(Reg2) || checkIfInputFromMult16(Reg2)) {
668 VR2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
669 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_hf_qf16), VR2)
670 .addReg(Reg2);
671 secondconvert = true;
672 }
673 }
674
675 // If both operands are qf16 type, use V6_v[add/sub]_hf instruction
676 // If one of them is of hf type, use V6_v[add/sub]_qf16_mix instruction
677 // Output is qf16
678 if (isFirstOpQf && isSecOpQf) {
679 if (firstconvert && secondconvert) {
680 BuildMI(MBB, MI, DL,
681 HII->get(isAdd ? Hexagon::V6_vadd_hf : Hexagon::V6_vsub_hf), Dest)
682 .addReg(VR1)
683 .addReg(VR2);
684 } else if (firstconvert) {
685 if (isAdd)
686 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf16_mix), Dest)
687 .addReg(Reg2)
688 .addReg(VR1);
689 // For vsub type, for v81 we use a different opcode,
690 // for v79, we convert the 2nd op to IEEE too.
691 else {
692 if (HST->useHVXV81Ops())
693 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vsub_hf_mix), Dest)
694 .addReg(VR1)
695 .addReg(Reg2);
696 else {
697 VR2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
698 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_hf_qf16), VR2)
699 .addReg(Reg2);
700 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vsub_hf), Dest)
701 .addReg(VR1)
702 .addReg(VR2);
703 }
704 }
705 } else if (secondconvert) {
706 BuildMI(MBB, MI, DL,
707 HII->get(isAdd ? Hexagon::V6_vadd_qf16_mix
708 : Hexagon::V6_vsub_qf16_mix),
709 Dest)
710 .addReg(Reg1)
711 .addReg(VR2);
712 } else { // none of the inputs is from an add/sub/mul unit
713 return false;
714 }
715 // handle vadd/vsub when the 1st op of original instruction is qf type
716 } else if (isFirstOpQf) {
717 if (firstconvert)
718 BuildMI(MBB, MI, DL,
719 HII->get(isAdd ? Hexagon::V6_vadd_hf : Hexagon::V6_vsub_hf), Dest)
720 .addReg(VR1)
721 .addReg(Reg2);
722 else
723 return false;
724 // handle vadd/vsub when the 2nd op of original instruction is qf type
725 } else if (isSecOpQf) {
726 if (secondconvert)
727 BuildMI(MBB, MI, DL,
728 HII->get(isAdd ? Hexagon::V6_vadd_hf : Hexagon::V6_vsub_hf), Dest)
729 .addReg(Reg1)
730 .addReg(VR2);
731 else
732 return false;
733 } else
734 return false;
735 return true;
736}
737
738// Create the prolog
739// v0 = #0
740// R1 = #0x80000000
741// v1.sf = vsplat(R1)
742// v2.sf = vmpy(v0.sf, v1.sf)
743void HexagonXQFloatGenerator::createPrologInstructions(MachineInstr &MI,
744 Register &R_mpy) {
745
746 MachineBasicBlock &MBB = *MI.getParent();
747 const DebugLoc &DL = MI.getDebugLoc();
748
749 Register VR0 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
750 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vd0), VR0);
751
752 Register R_0 = MRI->createVirtualRegister(&Hexagon::IntRegsRegClass);
753 BuildMI(MBB, MI, DL, HII->get(Hexagon::A2_tfrsi), R_0).addImm(0x80000000);
754
755 Register VR_0 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
756 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_lvsplatw), VR_0).addReg(R_0);
757
758 R_mpy = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
759 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_sf), R_mpy)
760 .addReg(VR0)
761 .addReg(VR_0);
762}
763
764bool HexagonXQFloatGenerator::V81normalizeMultF32(
765 MachineInstr &MI, Register &Reg1, Register &Reg2, Register &Dest,
766 bool firstconvert, bool secondconvert, bool strictieee) {
767 MachineBasicBlock &MBB = *MI.getParent();
768 const DebugLoc &DL = MI.getDebugLoc();
769 Register input_mpy1, input_mpy2;
770
771 auto Op =
772 strictieee ? Hexagon::V6_vconv_qf32_sf : Hexagon::V6_vconv_qf32_qf32;
773
774 // Normalize both input operands
775 if (firstconvert && secondconvert) {
776 input_mpy1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
777 input_mpy2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
778
779 BuildMI(MBB, MI, DL, HII->get(Op), input_mpy1).addReg(Reg1);
780 BuildMI(MBB, MI, DL, HII->get(Op), input_mpy2).addReg(Reg2);
781 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
782 .addReg(input_mpy1)
783 .addReg(input_mpy2);
784 }
785 // Normalize only first operand
786 else if (firstconvert) {
787 input_mpy1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
788 BuildMI(MBB, MI, DL, HII->get(Op), input_mpy1).addReg(Reg1);
789 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
790 .addReg(input_mpy1)
791 .addReg(Reg2);
792 }
793 // Normalize only second operand
794 else if (secondconvert) {
795 input_mpy2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
796 BuildMI(MBB, MI, DL, HII->get(Op), input_mpy2).addReg(Reg2);
797 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
798 .addReg(Reg1)
799 .addReg(input_mpy2);
800 } else
801 // we do nothing if the inputs are not from adder/sub/mult unit
802 return false;
803
804 return true;
805}
806
807// Normalize qf32 = vmpy(sf, sf) instruction unconditionally
808void HexagonXQFloatGenerator::normalizeMultiplicationInputSF(
809 MachineInstr &MI, Register &Src1, Register &Src2, Register &Dest,
810 Register &R_mpy, bool &PrologCreated) {
811
812 MachineBasicBlock &MBB = *MI.getParent();
813 const DebugLoc &DL = MI.getDebugLoc();
814
815 if (HST->useHVXV81Ops()) {
816 Register input_mpy1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
817 Register input_mpy2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
818
819 // Normalize both inputs
820 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_qf32_sf), input_mpy1)
821 .addReg(Src1);
822 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_qf32_sf), input_mpy2)
823 .addReg(Src2);
824 // Add the new vmpy
825 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
826 .addReg(input_mpy1)
827 .addReg(input_mpy2);
828 return;
829 }
830
831 if (!PrologCreated) {
832 createPrologInstructions(MI, R_mpy);
833 PrologCreated = true;
834 }
835
836 Register input_mpy1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
837 Register input_mpy2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
838 // Normalize both inputs
839 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32_mix), input_mpy1)
840 .addReg(R_mpy)
841 .addReg(Src1);
842 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32_mix), input_mpy2)
843 .addReg(R_mpy)
844 .addReg(Src2);
845 // Add the new vmpy
846 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
847 .addReg(input_mpy1)
848 .addReg(input_mpy2);
849}
850
851// Convert and normalize qf32 = vmpy(qf32, qf32) instructions conditionally
852bool HexagonXQFloatGenerator::convertNormalizeMultOp32(
853 MachineInstr &MI, Register &Reg1, Register &Reg2, Register &Dest,
854 Register &R_mpy, bool &PrologCreated) {
855
856 Register VR1, VR2;
857 bool firstconvert = false, secondconvert = false;
858 MachineBasicBlock &MBB = *MI.getParent();
859 const DebugLoc &DL = MI.getDebugLoc();
860
861 // If the first operand is from add/subtract/multiply unit, generate IEEE
862 // conversion instruction
863 if (checkIfInputFromAdder32(Reg1) || checkIfInputFromMult32(Reg1)) {
864 VR1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
865 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_sf_qf32), VR1).addReg(Reg1);
866 firstconvert = true;
867 }
868
869 if (checkIfInputFromAdder32(Reg2) || checkIfInputFromMult32(Reg2)) {
870 VR2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
871 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_sf_qf32), VR2).addReg(Reg2);
872 secondconvert = true;
873 }
874
875 if (HST->useHVXV81Ops()) {
876 if (firstconvert && secondconvert)
877 return V81normalizeMultF32(MI, VR1, VR2, Dest, true, true, true);
878 else if (firstconvert)
879 return V81normalizeMultF32(MI, VR1, Reg2, Dest, true, false, true);
880 else if (secondconvert)
881 return V81normalizeMultF32(MI, Reg1, VR2, Dest, false, true, true);
882 else
883 return false;
884 }
885
886 // create prolog if not already created
887 if (!PrologCreated && (firstconvert || secondconvert)) {
888 createPrologInstructions(MI, R_mpy);
889 PrologCreated = true;
890 }
891
892 Register input_mpy1, input_mpy2;
893
894 // Normalize both IEEE converts
895 if (firstconvert && secondconvert) {
896 input_mpy2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
897 input_mpy1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
898
899 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32_mix), input_mpy1)
900 .addReg(R_mpy)
901 .addReg(VR1);
902 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32_mix), input_mpy2)
903 .addReg(R_mpy)
904 .addReg(VR2);
905 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
906 .addReg(input_mpy1)
907 .addReg(input_mpy2);
908 // Normalize only first operand
909 } else if (firstconvert) {
910 input_mpy1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
911
912 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32_mix), input_mpy1)
913 .addReg(R_mpy)
914 .addReg(VR1);
915 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
916 .addReg(input_mpy1)
917 .addReg(Reg2);
918 // Normalize only second operand
919 } else if (secondconvert) {
920 input_mpy2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
921
922 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32_mix), input_mpy2)
923 .addReg(R_mpy)
924 .addReg(VR2);
925 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
926 .addReg(input_mpy2)
927 .addReg(Reg2);
928 } else {
929 // we do nothing if the inputs are not fromadder/subtracter/multiplier unit
930 return false;
931 }
932 return true;
933}
934
935// Convert to IEEE and widen qf16 = vmpy(qf16/hf, qf16) conditionally
936// Then convert qf32 to qf16
937// twoOps: true if the first operand is qf type, false if hf type
938bool HexagonXQFloatGenerator::convertWidenMultOp16(MachineInstr &MI,
939 Register &Reg1,
940 Register &Reg2,
941 Register &Dest,
942 bool twoOps) {
943
944 Register VR1, VR2, output_mpy;
945 bool firstconvert = false,
946 secondconvert = false; // normalize with hf or qf16 operands
947 MachineBasicBlock &MBB = *MI.getParent();
948 const DebugLoc &DL = MI.getDebugLoc();
949
950 // If the first operand is from add/sub/mul unit,
951 // generate IEEE conversion instruction
952 if (checkIfInputFromAdder16(Reg1) || checkIfInputFromMult16(Reg1)) {
953 VR1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
954 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_hf_qf16), VR1).addReg(Reg1);
955 firstconvert = true;
956 }
957
958 if (twoOps) {
959 if (checkIfInputFromAdder16(Reg2) || checkIfInputFromMult16(Reg2)) {
960 VR2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
961 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_hf_qf16), VR2)
962 .addReg(Reg2);
963 secondconvert = true;
964 }
965 }
966
967 if (twoOps) {
968 // Both operands have been converted to IEEE
969 if (firstconvert && secondconvert) {
970 output_mpy = MRI->createVirtualRegister(&Hexagon::HvxWRRegClass);
971 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_hf), output_mpy)
972 .addReg(VR1)
973 .addReg(VR2);
974 // Only one operand has been converted to IEEE
975 } else if (firstconvert) {
976 output_mpy = MRI->createVirtualRegister(&Hexagon::HvxWRRegClass);
977 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_mix_hf), output_mpy)
978 .addReg(Reg2)
979 .addReg(VR1);
980 } else if (secondconvert) {
981 output_mpy = MRI->createVirtualRegister(&Hexagon::HvxWRRegClass);
982 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_mix_hf), output_mpy)
983 .addReg(Reg1)
984 .addReg(VR2);
985 } else {
986 // Neither have to be transformed
987 return false;
988 }
989 } else {
990 if (firstconvert) {
991 output_mpy = MRI->createVirtualRegister(&Hexagon::HvxWRRegClass);
992 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_hf), output_mpy)
993 .addReg(VR1)
994 .addReg(Reg2);
995 } else
996 return false;
997 }
998
999 // convert qf32 to qf16
1000 generateQF16FromQF32(MI, Dest, output_mpy);
1001
1002 return true;
1003}
1004
1005// Convert to IEEE and perform qf32 = vmpy(qf16/hf, qf16) conditionally
1006// Final output is qf32 type
1007bool HexagonXQFloatGenerator::convertWidenMultOp32(MachineInstr &MI,
1008 Register &Reg1,
1009 Register &Reg2,
1010 Register &Dest,
1011 bool twoOps) {
1012 Register VR1, VR2;
1013 bool firstconvert = false,
1014 secondconvert = false; // normalize with hf or qf16 operands
1015 MachineBasicBlock &MBB = *MI.getParent();
1016 const DebugLoc &DL = MI.getDebugLoc();
1017
1018 // If the first operand is from add/subtract/multiply unit, generate IEEE
1019 // conversion instruction
1020 if (checkIfInputFromAdder16(Reg1) || checkIfInputFromMult16(Reg1)) {
1021 VR1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
1022 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_hf_qf16), VR1).addReg(Reg1);
1023 firstconvert = true;
1024 }
1025
1026 if (twoOps) {
1027 if (checkIfInputFromAdder16(Reg2) || checkIfInputFromMult16(Reg2)) {
1028 VR2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
1029 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_hf_qf16), VR2)
1030 .addReg(Reg2);
1031 secondconvert = true;
1032 }
1033 }
1034
1035 if (twoOps) {
1036 // Both operands have been converted to IEEE
1037 if (firstconvert && secondconvert) {
1038 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_hf), Dest)
1039 .addReg(VR1)
1040 .addReg(VR2);
1041 // Only one operand has been converted to IEEE
1042 } else if (firstconvert) {
1043 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_mix_hf), Dest)
1044 .addReg(Reg2)
1045 .addReg(VR1);
1046 } else if (secondconvert) {
1047 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_mix_hf), Dest)
1048 .addReg(Reg1)
1049 .addReg(VR2);
1050 } else
1051 // Neither have to be transformed
1052 return false;
1053 } else {
1054 if (firstconvert)
1055 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_hf), Dest)
1056 .addReg(VR1)
1057 .addReg(Reg2);
1058 else
1059 return false;
1060 }
1061
1062 return true;
1063}
1064
1065// Normalize instructions of type qf32 = vmpy(qf32, qf32)
1066bool HexagonXQFloatGenerator::normalizeMultiplicationInputF32(
1067 MachineInstr &MI, Register &Reg1, Register &Reg2, Register &Dest,
1068 Register &R_mpy, bool &PrologCreated) {
1069 bool firstconvert = false, secondconvert = false;
1070 MachineBasicBlock &MBB = *MI.getParent();
1071 const DebugLoc &DL = MI.getDebugLoc();
1072
1073 // We normalize only that operand which comes from add/subtract unit.
1074 if (checkIfInputFromAdder32(Reg1))
1075 firstconvert = true;
1076 if (checkIfInputFromAdder32(Reg2))
1077 secondconvert = true;
1078
1079 // v81 normalization
1080 if (HST->useHVXV81Ops())
1081 return V81normalizeMultF32(MI, Reg1, Reg2, Dest, firstconvert,
1082 secondconvert, false);
1083
1084 // create normalization operand conditionally for v79
1085 if ((!PrologCreated && (firstconvert || secondconvert))) {
1086 createPrologInstructions(MI, R_mpy);
1087 PrologCreated = true;
1088 }
1089
1090 Register input_mpy1, input_mpy2;
1091
1092 // Normalize both input operands
1093 if (firstconvert && secondconvert) {
1094 input_mpy2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
1095 input_mpy1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
1096
1097 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32), input_mpy1)
1098 .addReg(R_mpy)
1099 .addReg(Reg1);
1100 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32), input_mpy2)
1101 .addReg(R_mpy)
1102 .addReg(Reg2);
1103 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
1104 .addReg(input_mpy1)
1105 .addReg(input_mpy2);
1106 // Normalize only first operand
1107 } else if (firstconvert) {
1108 input_mpy1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
1109
1110 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32), input_mpy1)
1111 .addReg(R_mpy)
1112 .addReg(Reg1);
1113 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
1114 .addReg(input_mpy1)
1115 .addReg(Reg2);
1116 // Normalize only second operand
1117 } else if (secondconvert) {
1118 input_mpy2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
1119
1120 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32), input_mpy2)
1121 .addReg(R_mpy)
1122 .addReg(Reg2);
1123 BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
1124 .addReg(input_mpy2)
1125 .addReg(Reg1);
1126 } else {
1127 // we do nothing if the inputs are not from adder/subtracter/multiplier unit
1128 return false;
1129 }
1130
1131 return true;
1132}
1133
1134bool HexagonXQFloatGenerator::deleteList() {
1135 if (OriginalMI.empty())
1136 return false;
1137 bool Changed = false;
1138 for (MachineInstr *origMI : OriginalMI) {
1139 LLVM_DEBUG(dbgs() << "deleting redundant instruction");
1140 LLVM_DEBUG(origMI->dump());
1141 origMI->eraseFromParent();
1142 Changed = true;
1143 }
1144 OriginalMI.clear();
1145 return Changed;
1146}
1147
1148// Parent function to handle Loosy subnormal transformations
1149bool HexagonXQFloatGenerator::HandleLossySubnormals(MachineFunction &MF) {
1150 bool Changed = false;
1151 Register R_mpy;
1152 for (auto &MBB : MF) {
1153 bool PrologCreated = false;
1154 for (auto &MI : MBB) {
1155 Changed |= deleteList();
1156 // Skip if the instruction does not have two operands,
1157 // or is a bundle instruction
1158 // or is a debug instruction
1159 if (MI.getNumOperands() != 3 || MI.isDebugInstr())
1160 continue;
1161 auto Op1 = MI.getOperand(1);
1162 if (!Op1.isReg())
1163 continue;
1164 auto Op2 = MI.getOperand(2);
1165 if (!Op2.isReg())
1166 continue;
1167 auto Op0 = MI.getOperand(0);
1168 if (!Op0.isReg())
1169 continue;
1170 Register Reg1 = Op1.getReg();
1171 Register Reg2 = Op2.getReg();
1172 Register Dest = Op0.getReg();
1173
1174 // FIXME Do not process physical registers as operands
1175 if (!Reg1.isVirtual() || !Reg2.isVirtual() || !Dest.isVirtual())
1176 continue;
1177
1178 switch (MI.getOpcode()) {
1179 // qf32 = vmpy(qf32, qf32)
1180 // Normalize one or both input operands
1181 // if from add/sub unit
1182 case Hexagon::V6_vmpy_qf32:
1183 if (normalizeMultiplicationInputF32(MI, Reg1, Reg2, Dest, R_mpy,
1184 PrologCreated))
1185 OriginalMI.push_back(&MI);
1186 Changed |= convertIfInputToNonHVX(MI, true);
1187 break;
1188
1189 // qf16 = vmpy(qf16, qf16)
1190 // Widening multiply to qf32 and convert back to qf16
1191 // if any of the operands are from add/sub unit
1192 case Hexagon::V6_vmpy_qf16:
1193 if (widenMultiplicationInputF16(MI, Reg1, Reg2, Dest, true))
1194 OriginalMI.push_back(&MI);
1195 Changed |= convertIfInputToNonHVX(MI, false);
1196 break;
1197
1198 // qf16 = vmpy(qf16, Rt.hf)
1199 // Splat Rt to vector and then widening multiply
1200 // and then convert back to qf16
1201 // if first operand is from add/sub unit
1202 case Hexagon::V6_vmpy_rt_qf16:
1203 if (widenMultiplicationInputF16Rt(MI, Reg1, Reg2, Dest))
1204 OriginalMI.push_back(&MI);
1205 Changed |= convertIfInputToNonHVX(MI, false);
1206 break;
1207
1208 // qf16 = vmpy(qf16, hf)
1209 // Widening multiply to qf32 and convert back to qf16
1210 // if first operand is from add/sub unit
1211 case Hexagon::V6_vmpy_qf16_mix_hf:
1212 if (widenMultiplicationInputF16(MI, Reg1, Reg2, Dest, false))
1213 OriginalMI.push_back(&MI);
1214 Changed |= convertIfInputToNonHVX(MI, false);
1215 break;
1216 // Check if use of qf32 generating add/sub/mul instructions
1217 // are used as non-HVX operands.
1218 // If yes, convert the use to IEEE
1219 case Hexagon::V6_vadd_sf:
1220 case Hexagon::V6_vadd_qf32:
1221 case Hexagon::V6_vadd_qf32_mix:
1222 case Hexagon::V6_vsub_sf:
1223 case Hexagon::V6_vsub_qf32:
1224 case Hexagon::V6_vsub_qf32_mix:
1225 case Hexagon::V6_vsub_sf_mix:
1226 case Hexagon::V6_vmpy_qf32_qf16:
1227 case Hexagon::V6_vmpy_qf32_hf:
1228 case Hexagon::V6_vmpy_qf32_mix_hf:
1229 case Hexagon::V6_vmpy_rt_sf:
1230 case Hexagon::V6_vmpy_qf32_sf:
1231 Changed |= convertIfInputToNonHVX(MI, true);
1232 break;
1233 // Check if use of qf16 generating add/sub/mul instructions
1234 // are used as non-HVX operands.
1235 // If yes, convert the use to IEEE
1236 case Hexagon::V6_vadd_hf:
1237 case Hexagon::V6_vsub_hf:
1238 case Hexagon::V6_vadd_qf16:
1239 case Hexagon::V6_vsub_qf16:
1240 case Hexagon::V6_vadd_qf16_mix:
1241 case Hexagon::V6_vsub_qf16_mix:
1242 case Hexagon::V6_vsub_hf_mix:
1243 case Hexagon::V6_vmpy_qf16_hf:
1244 case Hexagon::V6_vmpy_rt_hf:
1245 Changed |= convertIfInputToNonHVX(MI, false);
1246 break;
1247 default:
1248 break;
1249 }
1250 }
1251 }
1252 if (OriginalMI.empty() || !Changed)
1253 return false;
1254 return true;
1255}
1256
1257// Parent function to handle all IEEE-754 compliant transformations
1258bool HexagonXQFloatGenerator::HandleCompliantIEEE(MachineFunction &MF) {
1259 bool Changed = false;
1260 Register R_mpy;
1261 for (auto &MBB : MF) {
1262 bool PrologCreated = false;
1263 for (auto &MI : MBB) {
1264 Changed |= deleteList();
1265 // Skip if the instruction does not have two operands,
1266 // or is a bundle instruction
1267 // or is a debug instruction
1268 if (MI.getNumOperands() != 3 || MI.isDebugInstr())
1269 continue;
1270
1271 auto Op1 = MI.getOperand(1);
1272 if (!Op1.isReg())
1273 continue;
1274 auto Op2 = MI.getOperand(2);
1275 if (!Op2.isReg())
1276 continue;
1277 auto Op0 = MI.getOperand(0);
1278 if (!Op0.isReg())
1279 continue;
1280 Register Reg1 = Op1.getReg();
1281 Register Reg2 = Op2.getReg();
1282 Register Dest = Op0.getReg();
1283 Register VRtSplat;
1284
1285 // FIXME Do not process physical registers as operands
1286 if (!Reg1.isVirtual() || !Reg2.isVirtual() || !Dest.isVirtual())
1287 continue;
1288
1289 switch (MI.getOpcode()) {
1290
1291 // ==== Handle multiplication instructions ====
1292
1293 // qf32 = vmpy(sf, Rt.sf)
1294 // Splat Rt to a vector
1295 // Normalize both input operands unconditionally
1296 case Hexagon::V6_vmpy_rt_sf:
1297 VRtSplat = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
1298 BuildMI(MBB, MI, MI.getDebugLoc(), HII->get(Hexagon::V6_lvsplatw),
1299 VRtSplat)
1300 .addReg(Reg2);
1301 normalizeMultiplicationInputSF(MI, Reg1, VRtSplat, Dest, R_mpy,
1302 PrologCreated);
1303 OriginalMI.push_back(&MI);
1304 Changed |= convertIfInputToNonHVX(MI, true);
1305 break;
1306
1307 // qf32 = vmpy(sf, sf)
1308 // Normalize both operands unconditionally
1309 case Hexagon::V6_vmpy_qf32_sf:
1310 normalizeMultiplicationInputSF(MI, Reg1, Reg2, Dest, R_mpy,
1311 PrologCreated);
1312 OriginalMI.push_back(&MI);
1313 Changed |= convertIfInputToNonHVX(MI, true);
1314 break;
1315
1316 // qf32 = vmpy(qf32, qf32)
1317 // Normalize one or both input operands
1318 // if from add/sub unit
1319 case Hexagon::V6_vmpy_qf32:
1320 if (normalizeMultiplicationInputF32(MI, Reg1, Reg2, Dest, R_mpy,
1321 PrologCreated))
1322 OriginalMI.push_back(&MI);
1323 Changed |= convertIfInputToNonHVX(MI, true);
1324 break;
1325
1326 // qf16 = vmpy(hf, rt)
1327 // Splat Rt to vector and then widening multiply
1328 case Hexagon::V6_vmpy_rt_hf:
1329 VRtSplat = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
1330 BuildMI(MBB, MI, MI.getDebugLoc(), HII->get(Hexagon::V6_lvsplatw),
1331 VRtSplat)
1332 .addReg(Reg2);
1333 widenMultiplyInputHF(MI, Reg1, VRtSplat, Dest);
1334 OriginalMI.push_back(&MI);
1335 Changed |= convertIfInputToNonHVX(MI, false);
1336 break;
1337
1338 // Widening multiply
1339 // qf16 = vmpy(hf, hf)
1340 case Hexagon::V6_vmpy_qf16_hf:
1341 widenMultiplyInputHF(MI, Reg1, Reg2, Dest);
1342 OriginalMI.push_back(&MI);
1343 Changed |= convertIfInputToNonHVX(MI, false);
1344 break;
1345
1346 // qf16 = vmpy(qf16, qf16)
1347 // Widening multiply to qf32 and convert back to qf16
1348 // if any of the operands are from add/sub unit
1349 case Hexagon::V6_vmpy_qf16:
1350 if (widenMultiplicationInputF16(MI, Reg1, Reg2, Dest, true))
1351 OriginalMI.push_back(&MI);
1352 Changed |= convertIfInputToNonHVX(MI, false);
1353 break;
1354
1355 // qf16 = vmpy(qf16, Rt.hf)
1356 // Splat Rt to vector and then widening multiply
1357 // and then convert back to qf16
1358 // if first operand is from add/sub unit
1359 case Hexagon::V6_vmpy_rt_qf16:
1360 if (widenMultiplicationInputF16Rt(MI, Reg1, Reg2, Dest))
1361 OriginalMI.push_back(&MI);
1362 Changed |= convertIfInputToNonHVX(MI, false);
1363 break;
1364
1365 // qf16 = vmpy(qf16, hf)
1366 // Widening multiply to qf32 and convert back to qf16
1367 // if first operand is from add/sub unit
1368 case Hexagon::V6_vmpy_qf16_mix_hf:
1369 if (widenMultiplicationInputF16(MI, Reg1, Reg2, Dest, false))
1370 OriginalMI.push_back(&MI);
1371 Changed |= convertIfInputToNonHVX(MI, false);
1372 break;
1373
1374 // Check if use of qf32/qf16 generating add/sub/mul
1375 // instructions are used as non-HVX operands.
1376 // If yes, convert the use to IEEE
1377 case Hexagon::V6_vadd_sf:
1378 case Hexagon::V6_vadd_qf32:
1379 case Hexagon::V6_vadd_qf32_mix:
1380 case Hexagon::V6_vsub_sf:
1381 case Hexagon::V6_vsub_qf32:
1382 case Hexagon::V6_vsub_qf32_mix:
1383 case Hexagon::V6_vsub_sf_mix:
1384 case Hexagon::V6_vmpy_qf32_qf16:
1385 case Hexagon::V6_vmpy_qf32_hf:
1386 case Hexagon::V6_vmpy_qf32_mix_hf:
1387 Changed |= convertIfInputToNonHVX(MI, true);
1388 break;
1389 case Hexagon::V6_vadd_hf:
1390 case Hexagon::V6_vsub_hf:
1391 case Hexagon::V6_vadd_qf16:
1392 case Hexagon::V6_vsub_qf16:
1393 case Hexagon::V6_vadd_qf16_mix:
1394 case Hexagon::V6_vsub_qf16_mix:
1395 case Hexagon::V6_vsub_hf_mix:
1396 Changed |= convertIfInputToNonHVX(MI, false);
1397 break;
1398 default:
1399 break;
1400 }
1401 }
1402 }
1403 if (OriginalMI.empty() || !Changed)
1404 return false;
1405 return true;
1406}
1407
1408// Parent function to do strict IEEE transformations
1409bool HexagonXQFloatGenerator::HandleStrictIEEE(MachineFunction &MF) {
1410
1411 bool Changed = false;
1412 Register R_mpy;
1413 for (auto &MBB : MF) {
1414 bool PrologCreated = false;
1415 for (auto &MI : MBB) {
1416 Changed |= deleteList();
1417 // Skip if the instruction does not have two operands,
1418 // or is a bundle instruction
1419 // or is a debug instruction
1420 if (MI.getNumOperands() != 3 || MI.isDebugInstr())
1421 continue;
1422
1423 auto Op1 = MI.getOperand(1);
1424 if (!Op1.isReg())
1425 continue;
1426 auto Op2 = MI.getOperand(2);
1427 if (!Op2.isReg())
1428 continue;
1429 auto Op0 = MI.getOperand(0);
1430 if (!Op0.isReg())
1431 continue;
1432 Register Reg1 = Op1.getReg();
1433 Register Reg2 = Op2.getReg();
1434 Register Dest = Op0.getReg();
1435 Register VRtSplat;
1436
1437 // FIXME Do not process physical registers as operands
1438 if (!Reg1.isVirtual() || !Reg2.isVirtual() || !Dest.isVirtual())
1439 continue;
1440
1441 switch (MI.getOpcode()) {
1442 // ==== Handle add/subtract instructions ====
1443 // Convert one or both the input operands to IEEE 32-bit
1444 // if from add/sub/mult unit(s)
1445 // qf32 = vadd(qf32, qf32)
1446 case Hexagon::V6_vadd_qf32:
1447 if (convertAddOpToIEEE32(MI, Reg1, Reg2, Dest, true, true, true))
1448 OriginalMI.push_back(&MI);
1449 Changed |= convertIfInputToNonHVX(MI, true);
1450 break;
1451 // qf32 = vsub(qf32, qf32)
1452 case Hexagon::V6_vsub_qf32:
1453 if (convertAddOpToIEEE32(MI, Reg1, Reg2, Dest, false, true, true))
1454 OriginalMI.push_back(&MI);
1455 Changed |= convertIfInputToNonHVX(MI, true);
1456 break;
1457 // Convert only the first input operand to IEEE 32-bit
1458 // if it is from add/sub/mult unit
1459 // qf32 = vadd(qf32, sf)
1460 case Hexagon::V6_vadd_qf32_mix:
1461 if (convertAddOpToIEEE32(MI, Reg1, Reg2, Dest, true, true, false))
1462 OriginalMI.push_back(&MI);
1463 Changed |= convertIfInputToNonHVX(MI, true);
1464 break;
1465 // qf32 = vsub(qf32, sf)
1466 case Hexagon::V6_vsub_qf32_mix:
1467 if (convertAddOpToIEEE32(MI, Reg1, Reg2, Dest, false, true, false))
1468 OriginalMI.push_back(&MI);
1469 Changed |= convertIfInputToNonHVX(MI, true);
1470 break;
1471 // qf32 = vsub(sf, qf32)
1472 case Hexagon::V6_vsub_sf_mix:
1473 if (convertAddOpToIEEE32(MI, Reg1, Reg2, Dest, false, false, true))
1474 OriginalMI.push_back(&MI);
1475 Changed |= convertIfInputToNonHVX(MI, true);
1476 break;
1477 break;
1478
1479 // Convert one or both the input operands to IEEE 16-bit
1480 // if from add/sub/mult unit(s)
1481 // qf16 = vadd(qf16, qf16)
1482 case Hexagon::V6_vadd_qf16:
1483 if (convertAddOpToIEEE16(MI, Reg1, Reg2, Dest, true, true, true))
1484 OriginalMI.push_back(&MI);
1485 Changed |= convertIfInputToNonHVX(MI, false);
1486 break;
1487 // qf16 = vsub(qf16, qf16)
1488 case Hexagon::V6_vsub_qf16:
1489 if (convertAddOpToIEEE16(MI, Reg1, Reg2, Dest, false, true, true))
1490 OriginalMI.push_back(&MI);
1491 Changed |= convertIfInputToNonHVX(MI, false);
1492 break;
1493 // Convert only the first input operand IEEE 16-bit
1494 // if it is from add/sub/mul unit
1495 // qf16 = vadd(qf16, hf)
1496 case Hexagon::V6_vadd_qf16_mix:
1497 if (convertAddOpToIEEE16(MI, Reg1, Reg2, Dest, true, true, false))
1498 OriginalMI.push_back(&MI);
1499 Changed |= convertIfInputToNonHVX(MI, false);
1500 break;
1501 // qf16 = vsub(qf16, hf)
1502 case Hexagon::V6_vsub_qf16_mix:
1503 if (convertAddOpToIEEE16(MI, Reg1, Reg2, Dest, false, true, false))
1504 OriginalMI.push_back(&MI);
1505 Changed |= convertIfInputToNonHVX(MI, false);
1506 break;
1507 // qf16 = vsub(hf, qf16)
1508 case Hexagon::V6_vsub_hf_mix:
1509 if (convertAddOpToIEEE16(MI, Reg1, Reg2, Dest, false, false, true))
1510 OriginalMI.push_back(&MI);
1511 Changed |= convertIfInputToNonHVX(MI, false);
1512 break;
1513
1514 // ==== Handle multiplication instructions ====
1515
1516 // qf32 = vmpy(sf, Rt.sf)
1517 // Splat Rt to a vector
1518 // Normalize both input operands unconditionally
1519 case Hexagon::V6_vmpy_rt_sf:
1520 VRtSplat = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
1521 BuildMI(MBB, MI, MI.getDebugLoc(), HII->get(Hexagon::V6_lvsplatw),
1522 VRtSplat)
1523 .addReg(Reg2);
1524 normalizeMultiplicationInputSF(MI, Reg1, VRtSplat, Dest, R_mpy,
1525 PrologCreated);
1526 OriginalMI.push_back(&MI);
1527 Changed |= convertIfInputToNonHVX(MI, true);
1528 break;
1529
1530 // Normalize both operands unconditionally
1531 // qf32 = vmpy(sf, sf)
1532 case Hexagon::V6_vmpy_qf32_sf:
1533 normalizeMultiplicationInputSF(MI, Reg1, Reg2, Dest, R_mpy,
1534 PrologCreated);
1535 Changed |= convertIfInputToNonHVX(MI, true);
1536 OriginalMI.push_back(&MI);
1537 break;
1538
1539 // Convert one or both input operands to IEEE 32-bit
1540 // if from add/sub/mult unit and normalize
1541 // qf32 = vmpy(qf32, qf32)
1542 case Hexagon::V6_vmpy_qf32:
1543 if (convertNormalizeMultOp32(MI, Reg1, Reg2, Dest, R_mpy,
1544 PrologCreated))
1545 OriginalMI.push_back(&MI);
1546 Changed |= convertIfInputToNonHVX(MI, true);
1547 break;
1548
1549 // Convert one or both input operands to IEEE 16-bit
1550 // if from mul/add/sub unit;
1551 // then widening multiply to generate qf32
1552 // then convert to qf16
1553 // qf16 = vmpy(qf16, qf16)
1554 case Hexagon::V6_vmpy_qf16:
1555 if (convertWidenMultOp16(MI, Reg1, Reg2, Dest, true))
1556 OriginalMI.push_back(&MI);
1557 Changed |= convertIfInputToNonHVX(MI, false);
1558 break;
1559
1560 // Convert one or both input operands to IEEE 16-bit
1561 // if from mul/add/sub unit;
1562 // then widening multiply to generate qf32
1563 // qf32 = vmpy(qf16, qf16)
1564 case Hexagon::V6_vmpy_qf32_qf16:
1565 if (convertWidenMultOp32(MI, Reg1, Reg2, Dest, true))
1566 OriginalMI.push_back(&MI);
1567 Changed |= convertIfInputToNonHVX(MI, true);
1568 break;
1569
1570 // qf16 = vmpy(hf, rt)
1571 // Splat Rt to vector and then widening multiply
1572 case Hexagon::V6_vmpy_rt_hf:
1573 VRtSplat = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
1574 BuildMI(MBB, MI, MI.getDebugLoc(), HII->get(Hexagon::V6_lvsplatw),
1575 VRtSplat)
1576 .addReg(Reg2);
1577 widenMultiplyInputHF(MI, Reg1, VRtSplat, Dest);
1578 OriginalMI.push_back(&MI);
1579 Changed |= convertIfInputToNonHVX(MI, false);
1580 break;
1581
1582 // Widening multiply, then convert to IEEE
1583 // qf16 = vmpy(hf, hf)
1584 case Hexagon::V6_vmpy_qf16_hf:
1585 widenMultiplyInputHF(MI, Reg1, Reg2, Dest);
1586 OriginalMI.push_back(&MI);
1587 Changed |= convertIfInputToNonHVX(MI, false);
1588 break;
1589
1590 // qf16 = vmpy(qf16, Rt.hf)
1591 // Splat Rt to vector and then widening multiply
1592 // and then convert back to qf16
1593 // if first operand is from add/sub unit
1594 case Hexagon::V6_vmpy_rt_qf16:
1595 if (widenMultiplicationInputF16Rt(MI, Reg1, Reg2, Dest))
1596 OriginalMI.push_back(&MI);
1597 Changed |= convertIfInputToNonHVX(MI, false);
1598 break;
1599
1600 // qf16 = vmpy(qf16, hf)
1601 // Convert only the first input operans to IEEE 16-bit
1602 // if from mul/add/sub unit;
1603 // then widening multiply to generate qf32
1604 // then convert back to qf16
1605 case Hexagon::V6_vmpy_qf16_mix_hf:
1606 if (convertWidenMultOp16(MI, Reg1, Reg2, Dest, false))
1607 OriginalMI.push_back(&MI);
1608 Changed |= convertIfInputToNonHVX(MI, false);
1609 break;
1610
1611 // qf32 = vmpy(qf16, hf)
1612 // Convert only the first input operans to IEEE 16-bit
1613 // if from mul/add/sub unit;
1614 // then widening multiply to generate qf32
1615 case Hexagon::V6_vmpy_qf32_mix_hf:
1616 if (convertWidenMultOp32(MI, Reg1, Reg2, Dest, false))
1617 OriginalMI.push_back(&MI);
1618 Changed |= convertIfInputToNonHVX(MI, true);
1619 break;
1620 // Check if use of qf32/qf16 generating add/sub/mul
1621 // instructions are used as non-HVX operands.
1622 // If yes, convert the use to IEEE
1623 case Hexagon::V6_vadd_sf:
1624 case Hexagon::V6_vsub_sf:
1625 Changed |= convertIfInputToNonHVX(MI, true);
1626 break;
1627 case Hexagon::V6_vadd_hf:
1628 case Hexagon::V6_vsub_hf:
1629 Changed |= convertIfInputToNonHVX(MI, false);
1630 break;
1631 default:
1632 break;
1633 }
1634 }
1635 }
1636 if (OriginalMI.empty() || !Changed)
1637 return false;
1638 return true;
1639}
1640
1641// There is no conversions in lossy mode
1642bool HexagonXQFloatGenerator::HandleLossyLegacy(MachineFunction &MF) {
1643 return false;
1644}
1645
1646bool HexagonXQFloatGenerator::runOnMachineFunction(MachineFunction &MF) {
1647 if (!EnableHVXXQFloat || (QFloatModeValue == QFloatMode::Legacy))
1648 return false;
1649
1650 bool Changed = false;
1651 HST = &MF.getSubtarget<HexagonSubtarget>();
1652 HII = HST->getInstrInfo();
1653 MRI = &MF.getRegInfo();
1654
1655 switch (QFloatModeValue) {
1656 case QFloatMode::StrictIEEE:
1657 LLVM_DEBUG(dbgs() << "\nGenerating code for STRICT-IEEE mode.\n");
1658 Changed = HandleStrictIEEE(MF);
1659 break;
1660 case QFloatMode::IEEE:
1661 LLVM_DEBUG(dbgs() << "\nGenerating code for IEEE mode.\n");
1662 Changed = HandleCompliantIEEE(MF);
1663 break;
1664 case QFloatMode::Lossy:
1665 LLVM_DEBUG(dbgs() << "\nGenerating code for LOSSY mode.\n");
1666 Changed = HandleLossySubnormals(MF);
1667 break;
1668 case QFloatMode::Legacy:
1669 LLVM_DEBUG(dbgs() << "\nGenerating code for LEGACY mode.\n");
1670 Changed = HandleLossyLegacy(MF);
1671 break;
1672 }
1673 LLVM_DEBUG(dbgs() << "...fine");
1674
1675 // Delete the original instructions
1676 for (MachineInstr *origMI : OriginalMI) {
1677 LLVM_DEBUG(origMI->dump());
1678 origMI->eraseFromParent();
1679 }
1680 OriginalMI.clear();
1681
1682 return Changed;
1683}
MachineInstrBuilder & UseMI
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
cl::opt< QFloatMode > QFloatModeValue
static constexpr unsigned XQFPMult32[]
cl::opt< bool > EnableHVXXQFloat("enable-xqf-gen", cl::init(false), cl::desc("Enable XQFloat generations"))
static constexpr unsigned XQFPAdd16[]
static constexpr unsigned XQFPAdd32[]
#define HEXAGON_XQFLOAT_GENERATOR
static constexpr unsigned XQFPMult16[]
IRTranslator LLVM IR MI
Register Reg
Promote Memory to Register
Definition Mem2Reg.cpp:110
#define INITIALIZE_PASS(passName, arg, name, cfg, analysis)
Definition PassSupport.h:56
This file defines the SmallPtrSet class.
This file defines the SmallVector class.
This file defines the 'Statistic' class, which is designed to be an easy way to expose various metric...
#define LLVM_DEBUG(...)
Definition Debug.h:119
FunctionPass class - This class is used to implement most global optimizations.
Definition Pass.h:314
bool usesQFOperand(MachineInstr *MI, unsigned Index=0) const
const HexagonInstrInfo * getInstrInfo() const override
MachineFunctionPass - This class adapts the FunctionPass interface to allow convenient creation of pa...
void getAnalysisUsage(AnalysisUsage &AU) const override
getAnalysisUsage - Subclasses that override getAnalysisUsage must call this.
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
const MachineInstrBuilder & addReg(Register RegNo, RegState Flags={}, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
unsigned getOpcode() const
Returns the opcode of this MachineInstr.
const MachineBasicBlock * getParent() const
const DebugLoc & getDebugLoc() const
Returns the debug location id of this MachineInstr.
LLVM_ABI MachineInstr * getVRegDef(Register Reg) const
getVRegDef - Return the machine instr that defines the specified virtual register or null if none is ...
LLVM_ABI Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
use_iterator use_begin(Register RegNo) const
static use_iterator use_end()
PassRegistry - This class manages the registration and intitialization of the pass subsystem as appli...
Wrapper class representing virtual and physical registers.
Definition Register.h:20
constexpr bool isValid() const
Definition Register.h:112
constexpr bool isVirtual() const
Return true if the specified register number is in the virtual register namespace.
Definition Register.h:79
Changed
initializer< Ty > init(const Ty &Val)
NodeAddr< DefNode * > Def
Definition RDFGraph.h:384
This is an optimization pass for GlobalISel generic memory operations.
FunctionPass * createHexagonXQFloatGenerator()
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
iterator_range< T > make_range(T x, T y)
Convenience function for iterating over sub-ranges.
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:209
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
void initializeHexagonXQFloatGeneratorPass(PassRegistry &)
DWARFExpression::Operation Op
bool is_contained(R &&Range, const E &Element)
Returns true if Element is found in Range.
Definition STLExtras.h:1947