Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[linux vs macos] fails to inline sqrt #3591

Closed
nlewycky opened this issue Dec 16, 2008 · 5 comments
Closed

[linux vs macos] fails to inline sqrt #3591

nlewycky opened this issue Dec 16, 2008 · 5 comments
Labels
backend:X86 bugzilla Issues migrated from bugzilla

Comments

@nlewycky
Copy link
Contributor

Bugzilla Link 3219
Version trunk
OS Linux
CC @asl,@lattner,@emaste,@jryans

Extended Description

On this program:

#include <math.h>
double foo(double X) { return sqrt(X); }

GCC will inline the fast path (non-negative X) and produce this:

foo:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
fldl 8(%ebp)
fld %st(0)
fsqrt
fucomi %st(0), %st
jp .L7
je .L8
fstp %st(0)
jmp .L5
.L7:
fstp %st(0)
.p2align 4,,7
.p2align 3
.L5:
fstpl (%esp)
.p2align 4,,5
.p2align 3
call sqrt
.p2align 4,,2
.p2align 3
jmp .L2
.p2align 4,,7
.p2align 3
.L8:
fstp %st(1)
.L2:
leave
.p2align 4,,1
.p2align 3
ret

@lattner
Copy link
Collaborator

lattner commented Dec 16, 2008

On Mac OS/X, llvm-gcc inlines the call to sqrtd because the mac defaults to -fno-math-errno.

@lattner
Copy link
Collaborator

lattner commented Dec 16, 2008

On the mac, using:
$ gcc math.c -S -o - -O3 -fmath-errno -fomit-frame-pointer -fno-optimize-sibling-calls

Produces:
_test:
subl $44, %esp
movsd 48(%esp), %xmm1
sqrtsd %xmm1, %xmm0
ucomisd %xmm0, %xmm0
jp L5
je L2
L5:
movsd %xmm1, (%esp)
call L_sqrt$stub
fstpl 24(%esp)
movsd 24(%esp), %xmm0
L2:
movsd %xmm0, 24(%esp)
fldl 24(%esp)
addl $44, %esp
ret

This is substantially better than using the fpstack version of sqrt. 64-bit is even cleaner because of lack of stack traffic:

_test:
subq $8, %rsp
movapd %xmm0, %xmm1
sqrtsd %xmm0, %xmm0
ucomisd %xmm0, %xmm0
jp L5
je L2
L5:
movapd %xmm1, %xmm0
call _sqrt
L2:
addq $8, %rsp
ret

-Chris

@lattner
Copy link
Collaborator

lattner commented Jan 29, 2009

With this patch:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20090126/072879.html

This issue only impacts people who explicitly use -fmath-errno.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
@bwendling
Copy link
Collaborator

This looks to be resolved?

$ clang pr.c -o - -S -O3 -fmath-errno -fomit-frame-pointer -fno-optimize-sibling-calls
	.text
	.file	"pr.c"
	.globl	foo                             # -- Begin function foo
	.p2align	4, 0x90
	.type	foo,@function
foo:                                    # @foo
	.cfi_startproc
# %bb.0:
	xorpd	%xmm1, %xmm1
	ucomisd	%xmm1, %xmm0
	jb	.LBB0_2
# %bb.1:
	sqrtsd	%xmm0, %xmm0
	retq
.LBB0_2:
	pushq	%rax
	.cfi_def_cfa_offset 16
	callq	sqrt
	addq	$8, %rsp
	.cfi_def_cfa_offset 8
	retq
.Lfunc_end0:
	.size	foo, .Lfunc_end0-foo
	.cfi_endproc
                                        # -- End function
	.ident	"Debian clang version 13.0.1-+rc1-1~exp4"
	.section	".note.GNU-stack","",@progbits
	.addrsig

$ gcc pr.c -o - -S -O3 -fmath-errno -fomit-frame-pointer -fno-optimize-sibling-calls
	.file	"pr.c"
	.text
	.p2align 4
	.globl	foo
	.type	foo, @function
foo:
.LFB0:
	.cfi_startproc
	pxor	%xmm1, %xmm1
	ucomisd	%xmm0, %xmm1
	ja	.L6
	sqrtsd	%xmm0, %xmm0
	ret
.L6:
	pushq	%rax
	.cfi_def_cfa_offset 16
	call	sqrt@PLT
	popq	%rdx
	.cfi_def_cfa_offset 8
	ret
	.cfi_endproc
.LFE0:
	.size	foo, .-foo
	.ident	"GCC: (Debian 11.2.0-14) 11.2.0"
	.section	.note.GNU-stack,"",@progbits

@rotateright
Copy link
Contributor

I can't tell exactly what the problem was from the description, but yes, this seems fixed 13+ years later. :)
Note that there are still missing optimizations for sqrt even with today's optimizer/codegen such as #52620.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

4 participants